Slashdot Mirror


Comparison of Bayesian POP3 Spam Filters

kreide writes "Spam e-mail has become an ever increasing problem, and these days it is next to impossible to use e-mail without receiving it in large amounts. Although various techniques exits to combat the problem, spammers seemed to be winning the war - until a new, powerful weapon appeared on the scene: Bayesian filters, our last, best hope for spam-free inboxes. In this review I compare POP3 based bayesian spam filters." We did an Ask Slashdot on this a few weeks ago.

17 of 326 comments (clear)

  1. Bayesian filters are useful, but... by fr0z · · Score: 5, Funny

    I still believe that we should have a hunting season for spammers, just like we do for ducks...

    --
    Never underestimate the predictability of human stupidity...
    1. Re:Bayesian filters are useful, but... by dtfinch · · Score: 5, Insightful

      You know, computer crimes are considered terrorism under the USA PATRIOT Act. Until that silly law gets repealed, lets hunt down those terrorists for their, umm, denial of service attacks against innocent email users, bandwidth theft, failure to provide real opt-out links, sending email advertisements with fake return addresses, presenting obscene material to minors, etc...

  2. You just don't get it by frovingslosh · · Score: 5, Insightful
    None of these spam filters will have any effect on spam at all if they are just installed on the systems of people who hate spam and would never buy from a spammer anyway. Hell, they might even have the opposite effect; I will never buy something if I get spam for it. But if I personally filter my spam and don't even see subject lines, I might end up buying the product without knowing they also marketed it by spam.

    Spam is effective because it reaches millions of people who are not installing these filters on their systems. Until ISP's start applying these filters to all spam by default, then the spam filters will have no effect at all, exactly the same number of marks will be reached and respond no matter if the people who know better than to respond to spam go ahead and filter their e-mail or not!

    --
    I'm an American. I love this country and the freedoms that we used to have.
    1. Re:You just don't get it by Plug · · Score: 5, Insightful

      Realistically, I don't give a damn how much spam _you_ get, I care that _I_ don't get any.

      You cannot automatically filter spam. Bayesian filtering works because it works on your own personal items only, and you have a method of manually removing false positives. There is nothing worse than the possibility that an ISP will filter out a real email in their spam system. That simple fact makes server side spam filtering impossible for most situations. You can filter spam into /dev/null (unacceptable), you can filter into a spam box (How many POP users would that rule out, who only have one POP box?), or you can keep it bundled in email with a flag, and expect people to update their clients, in which case you have the exact scenario you have now - the client has to do something themselves.

      Until Hotmail et al starts offering bayesian filtering with a separate 'spam' mailbox, consider server side filtering worthless.

      I am smart and don't get any spam. A lot of people I see in my line of work, aren't. These people are going to get something like Outclass (an Outlook plugin for POPfile), and then they are going to see the problem go away, and they're not going to lose any email in the process.

      I'd rather use SpamBayes, but the Outlook plugin has an annoying bug that renders autocompleting addresses in Outlook useless.

    2. Re:You just don't get it by Plug · · Score: 5, Funny

      How many people you know that email you 12 gifs/jpegs in one message with LARGE red text. ????

      Lots of them. They're called 'girls' and Slashdot should encourage communication with them wherever possible.

  3. Spamprobe by 1029 · · Score: 5, Informative

    The article didn't mention SpamProbe. It is what I use, and it has worked quite well for the past month or so that I've been using it. Perhaps this is just because the author didn't test this spam filter yet, but I like it quite a lot with my current mutt/procmail setup. Take this for what it's worth.

    --
    - I love animals. I try to eat at least one a day.
  4. Missing the point? by aquishix · · Score: 5, Insightful

    As someone who recently acquired a B.S. in mathematics several days ago, I understand how these filters work. They are an excellent way to fight spam over the older methods.

    However, I think that ultimately this sort of thing misses the point. Spam needs to be fought in the courts, not in the battlefield. I'm afraid that the success of these filters will cause spam NOT to become illegal, and thus lead to a world where we have a constant trickle of spam, albeit in small amounts.

    I think we all agree that we want spam to be gone entirely, as is evidence by the first post being labeled as "troll" ;)

    --
    - I am a viral sig. Please copy me and help me spread. [strain #2] Thank you
  5. Filters do not stop spam... by Tehrasha · · Score: 5, Insightful
    ...they only prevent you from seeing it.

    Your server and its harddrives still end up being a storage bin for it, and the spammers will continue to send as long as your machine allows it to be recieved. Always remember that spam differs from postal junk mail, in that the -receiver- pays for it. Unsolicited postage due mail.

    Spam must be -blocked- and the ISPs that allow/encourage its continued spread must re-educated, or be put out of business. Only when spam becomes costly to send with it diminish.

    The current proposed laws concerning the subject are currently focusing on content rather than consent. They dont mind if you get spammed with hundreds of ads, provided what is being advertised isnt fraudulent. They overlook the fact that the claim of you having 'opt in' for the spam is in itself the lie and fraud.

    --Teh

  6. I changed my mind. Simpler is better. by Peter+Cooper · · Score: 5, Interesting

    I have long been an advocate of Bayesian or keyword based spam filters, but have recently been forced to change my outlook, and to argue that MULTIPLE SIMULTANEOUS solutions are the answer.

    I encountered a very simple but unique spam system which works entirely on the sender's address. Simply, you create a small database with the domains/addresses you want to whitelist. Then, a program screens your mail, and if the sender is not in your whitelist, it sends an e-mail BACK to the sender with a simple URL (or even an actual link for HTML e-mail clients) which states that they REALLY want to send the e-mail to its destination. When this is done, they are added to the whitelist. Therefore, mails from forged remote addresses are no longer a problem, and neither are mails from trusted sources. And, better than SPEWS or similar blacklists, the sender gets a SECOND CHANCE to send their mail to you.

    There's a commercial solution using this system right now, although the URL escapes me.

    Of course, one could encounter problems when ordering online, say. Droids at Amazon will not be clicking your links to make sure your order receipt got through. One could argue that you'd put things like Amazon.com in the whitelist, but what if someone used amazon.com as a spoofed e-mail domain/address? Ay, there's the rub. But if this system were tied in with a Bayesian system, it'd be pretty unbeatable. What's more the Bayesian system would have extra data for negative matches, in the form of e-mails that were never 'approved', and positive data in the form of those that were.

    So, I'd be more interested in producing a homebrew system that used MULTIPLE weaker systems, than one supposed 'sure fire' method.. as I feel no one method is perfect, whereas multiple systems can approach this nirvana.

    1. Re:I changed my mind. Simpler is better. by scj · · Score: 5, Interesting
      I had thought of something similar for fighting spam. Here's how I'd handle each email:
      1. If the email is from someone in my whitelist, allow the mail to go through and feed it as 'ham' to the Bayesian filter.
      2. If the email is not in my whitelist, run it through spam filtering software (Spamassassin works well) to determine if it is likely to be spam.
      3. If it seems like spam, then use a challenge-response system (like TMDA) to find out if a human sent the email.
      4. If the mail doesn't seem like spam, just deliver it. If I get 3 non-spammy messages from the same person (separated by a day or more) then add them to my whitelist automatically.
      5. If someone responds to the TMDA challenge, put them in the whitelist and deliver the original email.
      6. If no one responds to the TMDA challenge after a week, feed the mail as 'spam' the the Bayesian filter.
      In addition, I'd use a system like Sneakemail to generate random email addresses to give out to businesses I want to do business with and use to sign up to mailing lists. These email addresses would be added to my whitelist so they could send me mail without going through the challenge-response system. If they start spamming me, I put the random email I gave them on my blacklist.

      This system has the following benefits:
      • Business mail I want (like receipts and newsletters from companies I do business with) get through always since the Sneakemail-type address is whitelisted. This solves the problem of businesses not responding to TMDA challenges.
      • My real email address is protected from businesses who are likely to sell it and from people farming addresses from mailing lists.
      • Personal email that the spam filter sees as non-spam gets delivered without bothering the sender with a challenge-response system.
      • Personal email that does seem spammy by the filter still has a second chance to make it through the system with the challenge-response system. This should reduce false-positives to include only spammy emails from people who don't respond the the challenge.
      • The Bayesian filter is automatically trained based on mails from people in my whitelist and mails from people who never respond to the challenge-response.
      You would still get spam with this system (personal email that your filter thinks is non-spam), but hopefully your false-positive rate would be zero. Also, you don't annoy other people much by only sending challenge-response messages to spam-like emails. Finally, this would be easy for end users to use. They don't have to train the spam filter, since it should train itself. The only complicated part would be generating and using the random emails that you give to businesses and mailing lists.
  7. Re:great by devnulljapan · · Score: 5, Insightful
    Just remember though, we would never have television without commercials. Sometimes advertising is necessary.

    NEVER?....Try the BBC?
    No ads, quality programming, small fee.

  8. You really just don't get it by frovingslosh · · Score: 5, Insightful
    Realistically, I don't give a damn how much spam _you_ get, I care that _I_ don't get any.

    But you still do get spam. Exactly as much of not more because you use Bayesian filtering. Spam still wastes your bandwidth to download that spam before it can be filtered. Spam still wastes any inbox size limits your ISP might impose. Spam cuts into any quota a forwarding service might now or in the future impose on your account, or it could take you to a higher charge level if you pay for a forwarding service. It costs your ISP money, costs that one way or another are eventually paid by you. Even the processing power for that Bayesian filtering costs you CPU cycles, while having no negative effect on the spammers whatsoever.

    While you might not think you care how much spam I get, you might care if dozens, hundreds or thousands of other users at your work also get tons of spam, particularly when all of that spam significantly cuts into your bandwidth. And you will care when overload from spam on your mail server is so bad that it causes failures, effectively causing a D.O.S. situation.

    And as long as geeks happly play with their little Bayesian filters, they stop seeing spam and so stop complaining to the providers that are letting spam get through. They stop doing other things that might make spammer's life difficult. Heck, I fully expect some spam haters with an additude like yours to say within earshot of a congressman or Senator something like "Oh, I never get any Spam. Spam can be filtered easily and nothing should be done about it". The spammers should love Bayesian filtering, it takes the presure off them while allowing them to reach exactly the same number of marks with a mailing.

    --
    I'm an American. I love this country and the freedoms that we used to have.
    1. Re:You really just don't get it by schon · · Score: 5, Interesting

      spammers should love Bayesian filtering, it takes the presure off them while allowing them to reach exactly the same number of marks with a mailing.

      I'm afraid you've made the cardinal mistake of thinking that spammers follow logic.

      First question: Why do people install filters on their mailboxes?

      Answer: To stop spam.

      Now, take a look at any interview with any spammer.. you'll note that when they're asked, the spammer will say "I don't send it to people who don't want it."

      They'll also say "we're always coming up with ways to bypass filters."

      Now, you'd think that with the two statements, that one of them is false - however (besides the fact that spammers lie), any sociologist will tell you that the spammer actually believes he's telling the truth in each of these statements..

      How he justifies it in his mind is that he believes that even though someone has installed a spam filter, that this person only wants to filter spam from other spammers - that his spam is somehow "special".

      Spammers are sociopaths, and like all sociopaths, they believe the rules do not apply to them.

      If spammers weren't sociopaths, and were capable of applied logic, then they'd realize that any filter (not just Bayseian) would benefit them.. but then, if they weren't sociopaths, they wouldn't be spammers in the first place.

  9. Why not stop the sellers? by Anonymous Coward · · Score: 5, Insightful

    I know this is slightly off topic, but can someone answer me a reasonably simple question thats been bugging me for a while?

    Why not instead of hunting down the spammers do we not hunt down the people who are selling and advertising their junk via the spammers?

    The spammers purposly make themselves difficult to find, but it must be easier to track down a company that is collecting money and sending out products? Why not make the using of spammers services illegal and fine and punish those doing so?

    I think Im correct in saying and please tell me if Im wrong, but here in the UK a similar situation is people "fly-posting". In these cases, if advertising posters are put somewhere illegal or unwanted, it is not the person who put the poster up that is fined, but the club, record label, whoever is beign advertised that takes the rap.

    Just my 0.02p

  10. Re:"Bayesian" by file-exists-p · · Score: 5, Informative

    As far as I know, many of those filters are based on a decision rule of the form

    P(mail is spam | words X, Y, Z, ... are in it) > 1-epsilon

    The computation is then done using Bayse's rule (P(A|B)=P(B|A)*P(A)/P(B)) under certain independance assumption which makes it tractable.

    So this is actually bayesian filtering ...

    My favorite filter is spamoracle

  11. Re:hmm, if you really are so clever by Anonymous Coward · · Score: 5, Interesting
    Very good.

    Speaking from experience, I know for a fact that many of the harvesting programs (written in perl, running on linux, written by geeks) are very robust at deciphering most email obfuscation methods. You all sit and shake your fists, and the spamware writers are laughing their asses off.

    You have the easy answer: don't obfuscate your email, don't even bother putting it on your posts.

  12. It's virtually impossible to not get spam? by setien · · Score: 5, Informative

    No it's not.
    I get spam at the rate of 1 spam mail per 6 months or so. Or maybe even less. I can't remember getting a single spam email on my actual email address for about a year.

    If you have an account on a crapless domain (i.e. not hotmail.com, msn.com, aol.com and the likes),
    it all comes down to this very simple rule:
    Do not, under any circumstance, have your email address posted publicly accessible ANYWHERE on the web.
    It WILL get trawled. And then it will be spammed relentlessly.

    If you have an existing address you don't want to give up, or an address at hotmail.com or a similar place, dump it.
    Then exercise a bit of common sense about where you use your actual address.

    I have a domain which catches email to unknown addresses and put them in my regular mailbox.
    Whenever I have to give an email address to some place on the web, I use *domain-i-am-currently-visiting*@mydomain.com. So if I am visiting foobar.com, I would put in foorbar.com@mydomain.com.
    I have been doing this for years. It enables me to see what was the source of the leak when I get spam on one of the addresses.
    It has taught me one thing: I have never, ever, ever, in all my years of online shopping, forum posting etc, come across a single website that have ignored their own privacy statement. Ever. Even the slightly sketchy sites (like divx subtitle sites) don't leak addresses.
    I was surprised to realize this.

    The only addresses I ever get spam on are the ones I know to be publicly displayed on the web.

    So it's that easy to avoid spam.

    --
    Give me liberty or give me kill -s 9