Paul Graham on Fighting Spam
Ramakrishnan M writes "Paul Graham, the Lisp Guru is back with a great technique to fight spam. It is based on trust matric, and he claims, only 5 out of 1000 spams got leaked out of this system with 0 false positives. Worth looking at."
From the article:
.97 probability of the containing email being a spam, whereas "sexy" indicates .99 probability. And Bayes' Rule, equally unambiguous, says that an email containing both words would, in the (unlikely) absence of any other evidence, have a 99.97% chance of being a spam.
/dev/null immediately without as much as a second glance... :-)
Based on my corpus, "sex" indicates a
Hmm.... take an average adult geek and yes, an email mentioning sex or sexy can go to
On the other hand if you run the statistics on email of an average horny teenager, the probabilities might get a bit different.
Kaa
Kaa's Law: In any sufficiently large group of people most are idiots.
From the article:
.97 probability of the containing email being a spam, whereas "sexy" indicates .99 probability...an email containing both words would have a 99.97% chance of being a spam.
In the spam filtering business, false positives are your biggest worry...Based on my corpus, "sex" indicates a
False positives could be a HUGE problem in this case...imagine the agony if you missed this email from your wife: "I'm feeling REALLY sexy today - meet me at the motel off 12th street at noon for some lunch-hour sex!"