Slashdot Mirror


Seeking Prior Art on Markov-Based SPAM Filters?

Theovon asks: "One of today's hot topics seems to be SPAM filtering. I have wanted very much to make my own contribution to this, but I have been thwarted by a patent. Probably before Paul Graham began working on his Bayesian SPAM filter, I began work on a Markov-model based filter. Things were going well until I posted to the usenet about it, and got this Google Groups response. This usenet post describes a Mitsubishi patent issued in 2000, US Patent #6,112,021. One of the key aspects of my design was that I would train the Markov model with both positive and negative examples. This patent is spot-on what I'm doing, because it deals specifically with the idea of using negative examples in Markov models to filter, among other things, 'inappropriate web content.' Well, the patent looks like a good one, assuming they really developed this idea. I mean, of course, I would think it's a good idea; I came up with it too, and it works very well. (Then again, I also thought it was 'obvious')..."

"Not too long ago, I was discussing it with a co-worker who has a degree in Electrical Engineering. She had taken an AI class in college which mentioned the idea of negative examples for Markov models, and this was well before the year 2000. The bottom line is that I think I have a great idea that could potentially add to our collective arsenal against the ever-growing SPAM problem. I would very much like to work on it and publish it under GPL, but before I can do this, I have to protect my self against the patent and the large pocketbook of Mitsubishi.

I'm not asking for legal advice. I have already consulted an attorney, and it was suggested that I should remove the SourceForge project, which I have done. I also attempted to contact the EFF (no luck so far). I'm asking for those of you out there who are familiar with this sort of thing to help me to find verifiable prior art that dates from before the 2000 patent. I would very much like to share with the world my ideas and the code I have written, but this is standing in our way."

1 of 36 comments (clear)

  1. Markov Models, Negative Reward, Negative Examples by hawkstone · · Score: 5, Informative

    I'm not sure I fully understand what you're attempting and what prior art you are looking for, but let me try anyway.

    The google groups message says the patent covers explicity "markov model discriminator using negative examples". Markov decision problems are often those associated with a set of states and a policy which transitions between them. For spam filtering, I would guess there are three states: unknown, spam, non-spam, and the policy needs only determine if the transition should go between the unknown and and either the spam or non-spam state.

    The optimal policy in an MDP is static, and thus it would say "the optimal policy is to mark an email spam". This sounds useless. A Partially Observable MDP (POMDP) may follow more with your plans since it follows percepts and probability distributions over the current set of beliefs.

    As for training for an optimal policy for MDPs, one algorithm ("Value Iteration") makes use of a reward function. This reward function is explicitly allowed to be negative, and probably should be for some decisions or else the problem is trivial. And "positive" or "negative" examples is just a boolean simplification of multiple categories. Why not have "pr0n" "ads" "personal" "work" and a bunch of other categories?

    In this case, positive and negative are meaningless, but all (both) are *always* neeeded to define the optimal policy.

    In other words, I'm not sure I understand enough of what you are attempting to help find prior art, but from what I can tell "negative" examples are inherently part of defining the optimal policy in an MDP. For a reference, Russel & Norvig, Artificial Intelligence, A Modern Approach (1995) has a ton of information about MDPs and the training of them.