Slashdot Mirror


Paul Graham on Fighting Spam

Ramakrishnan M writes "Paul Graham, the Lisp Guru is back with a great technique to fight spam. It is based on trust matric, and he claims, only 5 out of 1000 spams got leaked out of this system with 0 false positives. Worth looking at."

2 of 675 comments (clear)

  1. Major geek bias there... by Kaa · · Score: 5, Funny

    From the article:

    Based on my corpus, "sex" indicates a .97 probability of the containing email being a spam, whereas "sexy" indicates .99 probability. And Bayes' Rule, equally unambiguous, says that an email containing both words would, in the (unlikely) absence of any other evidence, have a 99.97% chance of being a spam.

    Hmm.... take an average adult geek and yes, an email mentioning sex or sexy can go to /dev/null immediately without as much as a second glance... :-)

    On the other hand if you run the statistics on email of an average horny teenager, the probabilities might get a bit different.

    --

    Kaa
    Kaa's Law: In any sufficiently large group of people most are idiots.
  2. False positives... by dillon_rinker · · Score: 5, Funny

    From the article:

    In the spam filtering business, false positives are your biggest worry...Based on my corpus, "sex" indicates a .97 probability of the containing email being a spam, whereas "sexy" indicates .99 probability...an email containing both words would have a 99.97% chance of being a spam.

    False positives could be a HUGE problem in this case...imagine the agony if you missed this email from your wife: "I'm feeling REALLY sexy today - meet me at the motel off 12th street at noon for some lunch-hour sex!"