Slashdot Mirror


Comparison of Bayesian POP3 Spam Filters

kreide writes "Spam e-mail has become an ever increasing problem, and these days it is next to impossible to use e-mail without receiving it in large amounts. Although various techniques exits to combat the problem, spammers seemed to be winning the war - until a new, powerful weapon appeared on the scene: Bayesian filters, our last, best hope for spam-free inboxes. In this review I compare POP3 based bayesian spam filters." We did an Ask Slashdot on this a few weeks ago.

35 of 326 comments (clear)

  1. Re:great by Eric+Ass+Raymond · · Score: 3, Insightful
    Spam is not advertising.

    It's harrasment.

  2. Nitpick... by 1029 · · Score: 2, Insightful

    I just sure as hell hope he meant "latest, best hope", because anyone who thinks bayesian is the LAST best hope doesn't understand CS technology at all. And such a person sure as all hell shouldn't be given an audience on /.

    --
    - I love animals. I try to eat at least one a day.
  3. Re:great by mirko · · Score: 3, Insightful

    I think spam is overhyped : it is not convenient to get some but with properly adjusted filters, very few of these will land elsewhere than in you trash can.

    Personally, I get around 100 of these a day, but only 3 get in my inbox instead of one of my specific mail directories, this is not *that* disturbing.

    I just wish these spams were better targetted : getting some penis-enlargement, ultra-fast-diet, university-diploma or cheap-herbal alternative to viagra is somehow repetitive and boring.

    --
    Trolling using another account since 2005.
  4. You just don't get it by frovingslosh · · Score: 5, Insightful
    None of these spam filters will have any effect on spam at all if they are just installed on the systems of people who hate spam and would never buy from a spammer anyway. Hell, they might even have the opposite effect; I will never buy something if I get spam for it. But if I personally filter my spam and don't even see subject lines, I might end up buying the product without knowing they also marketed it by spam.

    Spam is effective because it reaches millions of people who are not installing these filters on their systems. Until ISP's start applying these filters to all spam by default, then the spam filters will have no effect at all, exactly the same number of marks will be reached and respond no matter if the people who know better than to respond to spam go ahead and filter their e-mail or not!

    --
    I'm an American. I love this country and the freedoms that we used to have.
    1. Re:You just don't get it by Plug · · Score: 5, Insightful

      Realistically, I don't give a damn how much spam _you_ get, I care that _I_ don't get any.

      You cannot automatically filter spam. Bayesian filtering works because it works on your own personal items only, and you have a method of manually removing false positives. There is nothing worse than the possibility that an ISP will filter out a real email in their spam system. That simple fact makes server side spam filtering impossible for most situations. You can filter spam into /dev/null (unacceptable), you can filter into a spam box (How many POP users would that rule out, who only have one POP box?), or you can keep it bundled in email with a flag, and expect people to update their clients, in which case you have the exact scenario you have now - the client has to do something themselves.

      Until Hotmail et al starts offering bayesian filtering with a separate 'spam' mailbox, consider server side filtering worthless.

      I am smart and don't get any spam. A lot of people I see in my line of work, aren't. These people are going to get something like Outclass (an Outlook plugin for POPfile), and then they are going to see the problem go away, and they're not going to lose any email in the process.

      I'd rather use SpamBayes, but the Outlook plugin has an annoying bug that renders autocompleting addresses in Outlook useless.

    2. Re:You just don't get it by hankwang · · Score: 2, Insightful
      >None of these spam filters will have any effect on spam at all if they are just installed on the systems of people who hate spam and would never buy from a spammer anyway.

      Still, there are plenty of people who hate spam but don't know how to handle it. At our department, quite a few people receive over 30 spams per day and hate it, but no one has installed a spam filter better than the subject/sender filter built-in in their (Windows) mail clients. One has stopped reading e-mail from his university account and asks us to send mail to a private address, because he isn't allowed to change his email address.

      I mentioned Bayesian filters, but it turns out that not every computer user enjoys downloading and trying out five different programs to see whether they filter effectively and work together with their existing mail software. On top of that is the fear of false positives. (I am one of the few Linux users and on top of that I don't receive so much spam that I should worry, so I can't advise them.)

  5. Re:Bayesian filters are useful, but... by frovingslosh · · Score: 4, Insightful
    I still believe that we should have a hunting season for spammers, just like we do for ducks...

    No, it should be longer, if not all year long.

    --
    I'm an American. I love this country and the freedoms that we used to have.
  6. Missing the point? by aquishix · · Score: 5, Insightful

    As someone who recently acquired a B.S. in mathematics several days ago, I understand how these filters work. They are an excellent way to fight spam over the older methods.

    However, I think that ultimately this sort of thing misses the point. Spam needs to be fought in the courts, not in the battlefield. I'm afraid that the success of these filters will cause spam NOT to become illegal, and thus lead to a world where we have a constant trickle of spam, albeit in small amounts.

    I think we all agree that we want spam to be gone entirely, as is evidence by the first post being labeled as "troll" ;)

    --
    - I am a viral sig. Please copy me and help me spread. [strain #2] Thank you
    1. Re:Missing the point? by schon · · Score: 2, Insightful

      SPAM absolutely does not need to be fought in the courts when the markets can work this out on their own (as we see w/ these filters)

      Yes, absolutely does - just like any other sociopathic behaviour. We need clearly defined rules of what is and is not acceptable. Perhaps you haven't noticed, but "the market" is not working anything out - spam is getting worse, not better, and things such as filters make it worse, by hiding the problem (hint: even though your filters hide your spam from you, you're still paying for it.)

      In the end we'll have better technology for sorting and filtering emails

      This is the fundamental flaw in your reasoning - you can't solve a social problem with technology.

      Legislation would only be valid in the country in which the legislation was enacted so spammers could simply move their operations to a SPAM friendly country.

      This argument is fundamentally flawed. "Moving operations" won't do anything - they could still be prosecuted if they stay in the country... and so the question becomes: how many spammers would physically move to another country - permanently - just so they could spam? No, it's more likely they'd just go back to whatever scam they had before they began spamming.

      Also, what constitues spam?

      The definition of spam is "Unsolicited bulk email". That's pretty simple.

      What if I only send 10,000 emails out?

      Then it's bulk. If it's unsolicited, then it's spam.

      What if I change the email each time I send it so it's unique to you?

      Is it unsolicited bulk email? If so, then it's spam.

      What if I'm not selling anything?

      So? IF IT'S BULK, UNSOLICITED EMAIL THEN IT'S SPAM

      What if someone comrpomised my system and sent all the emails from my PC?

      Then you're not the one spamming, are you?

      Why shouldn' ISPs be liable too...

      If the ISPs are condoning the spam, then they probably should be liable. If that's the case, then there will be a paper trail.

      why are they letting people send those SPAMs... let's sue them too... somebody get a rope!!

      If you feel you can't win an argument except by inciting a (hysterical) straw man, then you've already lost.

      Spam is a social problem - it doesn't matter what technologies you come up with, spammers will find a way around them. We need to start social remedies to the spam problem.

  7. Fighting spam requires drastic measures by Anonymous Coward · · Score: 1, Insightful

    Fighting spam as an individual will never work no matter how great filter algorithms you develop. Hell, even the blacklists won't work until the ISPs are forced, by guerilla action if necessary, to crack down on spammers and hard.

  8. Re:great by Goldberg's+Pants · · Score: 3, Insightful

    But that's still 3 pieces of shit you have to deal with. Sure, it's a simple click to delete, but the fact is WE SHOULD NOT FUCKING HAVE TOO.

    Some wanker spammer got my email address and within two days my spam volume went from zero (seriously) to 30+ a day. All for the same fucking thing. These shits should be legal to hunt and kill.

    In respose to the original troll, it's a bogus analogy. We PAY for our internet access. We get bombarded with ads on damn near every site... The revenue generated from these scumbags does NOT go towards funding your internet access, or the production of new content. It goes to their wallets. Ergo, you're an idiot.

    Side note: "Last, best hope"... I can't be alone in expecting "for peace" to come after that.

  9. Filters do not stop spam... by Tehrasha · · Score: 5, Insightful
    ...they only prevent you from seeing it.

    Your server and its harddrives still end up being a storage bin for it, and the spammers will continue to send as long as your machine allows it to be recieved. Always remember that spam differs from postal junk mail, in that the -receiver- pays for it. Unsolicited postage due mail.

    Spam must be -blocked- and the ISPs that allow/encourage its continued spread must re-educated, or be put out of business. Only when spam becomes costly to send with it diminish.

    The current proposed laws concerning the subject are currently focusing on content rather than consent. They dont mind if you get spammed with hundreds of ads, provided what is being advertised isnt fraudulent. They overlook the fact that the claim of you having 'opt in' for the spam is in itself the lie and fraud.

    --Teh

    1. Re:Filters do not stop spam... by Tehrasha · · Score: 2, Insightful

      If you think that your ISP does not incur cost by having to deal with the traffic load and disk storage caused by spam, you are the one in need of a reality check. And if you think that your DSL/Cable traffic is free, then gimmie some of the stuff you're smoking, it must be good.

  10. Re:Bayesian filters are useful, but... by dtfinch · · Score: 5, Insightful

    You know, computer crimes are considered terrorism under the USA PATRIOT Act. Until that silly law gets repealed, lets hunt down those terrorists for their, umm, denial of service attacks against innocent email users, bandwidth theft, failure to provide real opt-out links, sending email advertisements with fake return addresses, presenting obscene material to minors, etc...

  11. Re:great by mirko · · Score: 3, Insightful

    I have more than enough things to worry, including my shopping list, my housekeeping tasks, my garden... to just lose time and nerves other that few junk : when I get an unexpected commercial in my snail-mailbox, this *is* annoying as, here, in Switzerland, we pay for each garbage bag we throw away.
    So, spam is junk, indeed, but i dispose of it almost instantaneously.

    I won't make spamfighting my Holy War...
    I have more interesting and valuable things to deal with IRL and I am naturally optimistic.

    Let the spammers waste their time sending their hectobytes of off-topic (mostly american-centric) mail to my ever-improving filter.

    --
    Trolling using another account since 2005.
  12. Re:great by devnulljapan · · Score: 5, Insightful
    Just remember though, we would never have television without commercials. Sometimes advertising is necessary.

    NEVER?....Try the BBC?
    No ads, quality programming, small fee.

  13. Spam is not the same as commercial by Eric+Ass+Raymond · · Score: 4, Insightful
    Please, go right on ahead and point out why spam is not the same as a commercial.

    I'd be happy to.

    I don't know about you but for me e-mail is an important part of my work - not something comparable to watching cable TV.

    Spam clogs my mailbox and I have lost several important e-mails from clients when deleting the spam which, by the way, is often disguised as legitimate non-commercial mail and comes with forged headers. In addition to pushing fraudulent products, these facts make spam a completely different beast from the cable TV and its legitimate, controlled ads which eat up only my free time - not my emails or work efficiency.

  14. "Bayesian" by RDPIII · · Score: 4, Insightful

    I don't mean to troll, but I hope it's not too late to put an end to the unfortunate term "Bayesian spam filtering". This is perhaps the worst abuse of the adjective "Bayesian" I've seen, because nothing crucially depends on the application of Bayes' Theorem and/or on the use of Bayesian methods (informative priors, model selection, etc.). Why not simply call it "data driven spam classification" (as opposed to "rule based") or "empirical spam filtering"?

    If the spam disaster had struck fifteen years ago, we'd all be talking about "neural spam filtering" (using artificial neural networks, ANNs) and basking in the warm fuzzy feeling imparted by the term "neural". But ANNs and Bayesian classifiers have the same interface: both are trained on labeled data and can be used to classify unlabeled data. The implementation details are not of primary importance, and if you think they are, I'd encourage you to look into large margin classifiers instead of Naive Bayes or ANNs.

    --
    Marklar: marklar
  15. You really just don't get it by frovingslosh · · Score: 5, Insightful
    Realistically, I don't give a damn how much spam _you_ get, I care that _I_ don't get any.

    But you still do get spam. Exactly as much of not more because you use Bayesian filtering. Spam still wastes your bandwidth to download that spam before it can be filtered. Spam still wastes any inbox size limits your ISP might impose. Spam cuts into any quota a forwarding service might now or in the future impose on your account, or it could take you to a higher charge level if you pay for a forwarding service. It costs your ISP money, costs that one way or another are eventually paid by you. Even the processing power for that Bayesian filtering costs you CPU cycles, while having no negative effect on the spammers whatsoever.

    While you might not think you care how much spam I get, you might care if dozens, hundreds or thousands of other users at your work also get tons of spam, particularly when all of that spam significantly cuts into your bandwidth. And you will care when overload from spam on your mail server is so bad that it causes failures, effectively causing a D.O.S. situation.

    And as long as geeks happly play with their little Bayesian filters, they stop seeing spam and so stop complaining to the providers that are letting spam get through. They stop doing other things that might make spammer's life difficult. Heck, I fully expect some spam haters with an additude like yours to say within earshot of a congressman or Senator something like "Oh, I never get any Spam. Spam can be filtered easily and nothing should be done about it". The spammers should love Bayesian filtering, it takes the presure off them while allowing them to reach exactly the same number of marks with a mailing.

    --
    I'm an American. I love this country and the freedoms that we used to have.
  16. wtf by timerider · · Score: 2, Insightful

    When will 'the net community' finally get it?
    filtering is no solution as long as there's no way to stop the spammers!

    Or would you say that ignoring the corpses in the gutters would be a solution to the problem of violence on the streets?

    bye
    [L]

  17. Re:great by impluvian · · Score: 4, Insightful

    I think there's a very simple distinction that can be made between spam and television advertising, and it has to do with the amount of control that your service provider exercises over the advertising content.

    When you watch cable TV, you know that for an hour of content, you are going to see up to 12 minutes of advertising. The advertising is controlled by the cable company, and no-one can advertise on the channel without going through that 'filter'.

    Spam, on the other hand, is not restricted. If I receive 100 e-mails a day, anywhere from 0 to 100 of them could be spam. None of those spams are sanctioned (or controlled) by my service-provider, and they were not part of the package I signed up for.

  18. Authentication of senders by flakac · · Score: 2, Insightful

    Sorry, but filters are not the final answer. Even when the filters can "learn", the user still has to expend a certain amount of effort to "teach" the software. And quite frankly, spammers (or the people who write automated spamming software) just need to study the filters and learn to get around them. And worse, you can never be sure that the filter is not deleting email that you actually want, unless you set it to never delete suspect mail, allowing you to examine and delete it manually. But at this point, you've gained absolutely nothing -- simply setting your email client to put all email that's from addresses not in your address book, or that doesn't contain your exact address in the "To:" line will achieve exactly the same effect.

    The only thing that can truly save email is to switch to a service that requires authentication of senders.

  19. Why not stop the sellers? by Anonymous Coward · · Score: 5, Insightful

    I know this is slightly off topic, but can someone answer me a reasonably simple question thats been bugging me for a while?

    Why not instead of hunting down the spammers do we not hunt down the people who are selling and advertising their junk via the spammers?

    The spammers purposly make themselves difficult to find, but it must be easier to track down a company that is collecting money and sending out products? Why not make the using of spammers services illegal and fine and punish those doing so?

    I think Im correct in saying and please tell me if Im wrong, but here in the UK a similar situation is people "fly-posting". In these cases, if advertising posters are put somewhere illegal or unwanted, it is not the person who put the poster up that is fined, but the club, record label, whoever is beign advertised that takes the rap.

    Just my 0.02p

  20. Re:great by Goldberg's+Pants · · Score: 4, Insightful

    You probably ARE a scumbag spammer.

    For people who have to pay for their online time (England for example), these scumbags are essentially stealing money from people. Filtering only works once you've downloaded the mail. You still have to download their worthless drivel. Sure, it may be pennies a week in costs for a user, but you tally that up over a year or two of dealing with these idiots, and you've got a sizeable chunk of change. Certainly enough for a nice pizza.

    Let's not forget the TIME these shits waste as well. All this work invested in stopping spam. Who know's what cool stuff may have come from the minds who instead are working on ways of dealing with the email cancer.

    As I said, these scumbags should be legal to hunt and kill.

  21. Re:great by ntmuffin · · Score: 2, Insightful

    Yup, I was also thinking "for peace" ;) Long live B5

    But on the other side ... I've had the same problem you had - going from 0 to 25-30 a day, sometimes even more. I don't think we'll ever be able to stop the spammers, but I think that some of the blame has to be put on those people offering free mail services like Hotmail, Yahoo.com (and .ca) and AOL. 95% of my spam originates from accounts on their domain, and when I'll try to send bounce messages with Mailwasher, the accounts used to spam me doesn't exist anymore ... so if these mailservices had made a system couldn't be used to create accounts automatically with a script, we might se a little more spam out on the net, as I doubt that the spammers would bother using lots of time creating accounts themselves ...

    I like the thought of an all year huntinglicense for spammers though ;)

  22. Everyone? by Jon+Peterson · · Score: 2, Insightful

    "Support both Windows and Linux " ...
    "The first requirement is because I wanted the results to be applicable to everyone"

    My how the definition of everyone has changed. So it's bad luck Mac, Solaris, *BSD, HP-UX, VMS users...

    --
    ----- .sig: file not found
  23. Why filtering isn't the solution by nuwayser · · Score: 4, Insightful

    An analysis of filtering methods against spam is kind of like a comparison of bullet-proof vests in that there's no incentive to stop someone from pointing a gun at you and firing it. In the past, spammers have been grossly affected by more sweeping changes, and I'm afraid filtering methods are only creating the mindset of, "Give up, use this software, it will do the deleting for you." It takes the attitude of, "just delete the stuff" and makes it automatic; sure it's convenient for a time, but in a year you're still going to get spam and your ISP will likely have fewer resources to deal with the complaints.

    I'm saying, why not focus instead on technology which puts a bigger dent in spammers' ability to operate, like how to secure against proxy hijacking.

    --
    "The cup... the drop... it's a YES!"
  24. Re:great by Afty0r · · Score: 2, Insightful

    Actually, the rapid growth of endorsements, product placements, "documentaries" about products etc. means that you're really seeing far more than just 12 minutes of advertising, the only restriction is that you're limited to 12 minutes of OBVIOUS advertising.

  25. Re:Bayesian filters are useful, but... by PhxBlue · · Score: 2, Insightful

    You know, computer crimes are considered terrorism under the USA PATRIOT Act. Until that silly law gets repealed, lets hunt down those terrorists for their, umm, denial of service ...

    An immoral law is no less immoral just because you can find a practical use for it. If you don't like the PATRIOT Act, don't support it, period.

    --
    !#@%*)anks for hanging up the phone, dear.
  26. Re:I changed my mind. Simpler is better. by PhilHibbs · · Score: 2, Insightful
    ...it sends an e-mail BACK to the sender with a simple URL...
    And, not being on their whitelist, their email filter sends you an email back with a simple URL...
  27. Re:great by lone_marauder · · Score: 2, Insightful

    OK, I'll bite on this troll just because it's still at zero, and the moderators need a reason to finish it off, placing it firmly in -1 hell where it belongs.

    In the days before user-paid television service, it is true that advertising was the business impetus to put up huge powerful TV transmitters and undertake the other investmentss necessary to support land-based TV broadcasting. You are correct, therefore, in pointing out that TV content from 1977 derives from the business need to advertise.

    But to suggest that the meager investments in bandwidth and hardware the average spammer makes is somehow otherwise useful to the world is absurd. When one considers that most of the infrastructure costs of spam are borne by the recipient rather than the sender, the idea of spammers contributing to the public good is assinine.

    --
    who are those slashdot people? they swept over like Mongol-Tartars.
  28. Blame the idiots that respond to SPAM. by momus_radar · · Score: 3, Insightful

    This method of combating SPAM is amazing to me. Admitingly I'm a little behind the geek times so my interest in this method was peaked when Apple released Mail.app. But I still use Mac OS 9 and am in no rush to run X yet so I'm glad to see there are alternatives that I can use.

    I think the only reasonable way to rid the world of SPAM is to get the foolish folk who respond to it to stop. The reason there is so much of it now is that it seems to work; there are people who actually respond to it. If these people stopped responding to it the use of SPAM would most likely diminish.

    Sending SPAM costs money. No sence spending that money if no profit is made.

  29. Re:Spamprobe by HermanAB · · Score: 3, Insightful

    Yes, SpamProbe is the best one I tested and I tested most of them. The reason being that it not only counts single words, but also word pairs. It is about 99.5% accurate for me and never gives false positives. My wife uses it in her law office, where I run it on the server - one database for everybody. It works like a charm and doesn't get tripped up with matrimonial fighting mail, which can resemble sleaze mail in many respects...

    --
    Oh well, what the hell...
  30. Re:It's virtually impossible to not get spam? by Aidtopia · · Score: 2, Insightful

    There's one more ingredient to your recipe: get lucky.

    It doesn't help when the spammers use a dictionary attack against your domain (aaron@domain.com, abigail@domain.com, adam@domain.com, ...). I guess your domain has never caught the attention of such spammers. Lucky you. They troll my domain on a regular basis.

    Some of the published experiments that try to track the harvesters have found that short names near the beginning of the alphabet (like mine) are far more likely to get tons of spam. Other problems are needing to support addresses like "webmaster".

  31. Re:A new *law* is required by felis_panthera · · Score: 4, Insightful

    Out of that 2.2 million people, somewhere near 700,000 are in jail from possession, use or distribution of marijuana. A law that was originally used to control migrant mexican workers has bogged down the american legal system to the breaking point. Imagine, 700,000 new cells open for child molesters, rapists, spammers, and SCO executives.

    Wouldn't it be grand?

    PS: Sorry about the OT, but things like this need to be said whenever the opportunity presents itself.

    --

    The chains are broken
    Loki is free
    Ragnarok is at hand...