Plan for Spam, Version 2
bugbear writes "I just posted a new version of the Plan for
Spam Bayesian filtering algorithm. The big change is to mark tokens by context. The new version decreases spams missed by 50%, to 2.5 per 1000, even though spam has gotten harder to filter since the summer. I also talk about how spam will evolve, and what to do about it."
And the conflict rages on. The better filters we use, the sneakier the spam artists get. Now we're developing self-modifying algorithms to detect and kill spam, and I'm sure the spammers are developing self-modifying algorithms to craft filter-tricking spam.
How long before the back-and-forth of spam filters and spam crafters becomes self-aware? It's got to happen. Eventually the spam filters will become a skeptic consciousness that *feels* its way through spam and spots the phoneys, and the spam crafters will become a persuasive consciousness that tries to think and write as a close friend or relative.
...
Without spam, how else would I be able to sit home every day and make $1,000 a week watching TV while playing with my 12 inch penis?
Reply or e-mail; don't vaguely moderate. Ex-O'Reilly/MIT employee, now a full-time Google employee.
Could Bayesian filtering be applied to filter offtopic posts as well?
Good people do not need laws to tell them to act responsibly, while bad people will find a way around the laws-Plato
>Based on my corpus, "sex" indicates a .97 probability of the containing email being a spam...
Spoken like a true geek.