Plan for Spam, Version 2
bugbear writes "I just posted a new version of the Plan for
Spam Bayesian filtering algorithm. The big change is to mark tokens by context. The new version decreases spams missed by 50%, to 2.5 per 1000, even though spam has gotten harder to filter since the summer. I also talk about how spam will evolve, and what to do about it."
First of all, let's realize that email is communication is data transmission. Spam is noise. This immediately brings to mind Claude Shannin's work on information and entropy. He made it very clear that noise can be reduced to a level that is O log(n) that of the information transmitted. This means that as we have more and more email out there, we are going to get more and more noise, unless we change something.
Let's go back to the definition of information. Basically, it's stuff that nobody knows about. If it is surprising to you, it is information (in non-technical language). That suggests that perhaps the information content (and therefore spam) could be reduced if, instead of secretively emailing our friends individually, we CC'd them on all our missives. This would make the amount of information lower (since people would be less surprised by our further revelations, having seen the foregoing matter) and therefore spam might even be eliminated.
Great! Where do I find those options in telnet?
Or would you care to specify what mail client your sage advice applies to.
If you were blocking sigs, you wouldn't have to read this.