Slashdot Mirror


More on Bayesian Spam Filtering

michaeld writes "The "Bayesian" techniques for spam filtering recently publicized in Paul Graham's essay A Plan for Spam doesn't actually seem to have anything Bayesian about it, according to Gary Robinson (an expert on collaborative filtering). It is based on a non-Bayesian probabilistic approach. It works well enough, because it is frequently the case that technology doesn't have to be 100% perfect in order to do something that really needs to be done. The problem interested Robinson, and he posted his thoughts about trying to fix the problems in the Graham approach, including adding an actual Bayesian element to the calculations."

4 of 251 comments (clear)

  1. How about Macchiavellian Spam Filtering by Anonymous Coward · · Score: 1, Funny

    kill 'em. might = right

  2. poor Hotmail users are still in the cold... by saskboy · · Score: 4, Funny

    I have some tricks for Hotmail users who cannot benefit from the technique above:
    Filter any message without the @ in the address.
    Filter Britney, Boobs, Penis, Inches, WIN, ___ ..... and your own email address userid.
    Now you only have about 40 spams a day to deal with instead of 100.
    Uncheck your information from being in the MSN directory too.

    Enjoy :-)
    John

    --
    Saskboy's blog is good. 9 out of 10 dentists agree.
  3. Let's see by sam_handelman · · Score: 5, Funny

    P (This is spam) = P (This is Spam | It will enlarge my penis) * P (It will enlarge my penis)

    Now, given that I have prior knowledge that:
    P (It will enlarge my penis)

    is very low,

    and given that, having never encountered anything which enlarges my penis in any permanent way, I have no knowledge of
    P (This is Spam | It will enlarge my penis)

    and we have the product of one probability which I know is low, and another of which I have no posterior knowledge, so we conclude that P (It is Spam) is also low, and that I must have requested more information on their new penile enlargement technique.

    So, that message goes into the keepers.

    Meanwhile,

    P (It is Spam) = P (It is Spam | Frank is getting maried) * P (Frank is getting married)

    So, I know frank is getting married, since he sent me this e-mail I'm considering filtering as Spam, and weather or not it is spam is pretty much independent of whether or not frank is getting married, so.... it's Spam. Away it goes.

    P.S. I've deliberated made a hash of this for a joke. The actual rule is:

    P (A & B) = P (A | B) * P (B)

    --
    The good and new comes from no quarter where it is looked for, and is always something different from what is expected.
  4. Brain exploded by operagost · · Score: 2, Funny
    Note to statisticians: the product of the probabilities is monotonic with the Fisher inverse chi-square combined probability technique from meta-analysis. The null hypothesis is that the probabilities are independent and uniformly distributed.
    Ouch! My brain is hurting, Doc!
    --

    Gamingmuseum.com: Give your 3D accelerator a rest.