Slashdot Mirror


Seeking Prior Art on Markov-Based SPAM Filters?

Theovon asks: "One of today's hot topics seems to be SPAM filtering. I have wanted very much to make my own contribution to this, but I have been thwarted by a patent. Probably before Paul Graham began working on his Bayesian SPAM filter, I began work on a Markov-model based filter. Things were going well until I posted to the usenet about it, and got this Google Groups response. This usenet post describes a Mitsubishi patent issued in 2000, US Patent #6,112,021. One of the key aspects of my design was that I would train the Markov model with both positive and negative examples. This patent is spot-on what I'm doing, because it deals specifically with the idea of using negative examples in Markov models to filter, among other things, 'inappropriate web content.' Well, the patent looks like a good one, assuming they really developed this idea. I mean, of course, I would think it's a good idea; I came up with it too, and it works very well. (Then again, I also thought it was 'obvious')..."

"Not too long ago, I was discussing it with a co-worker who has a degree in Electrical Engineering. She had taken an AI class in college which mentioned the idea of negative examples for Markov models, and this was well before the year 2000. The bottom line is that I think I have a great idea that could potentially add to our collective arsenal against the ever-growing SPAM problem. I would very much like to work on it and publish it under GPL, but before I can do this, I have to protect my self against the patent and the large pocketbook of Mitsubishi.

I'm not asking for legal advice. I have already consulted an attorney, and it was suggested that I should remove the SourceForge project, which I have done. I also attempted to contact the EFF (no luck so far). I'm asking for those of you out there who are familiar with this sort of thing to help me to find verifiable prior art that dates from before the 2000 patent. I would very much like to share with the world my ideas and the code I have written, but this is standing in our way."

2 of 36 comments (clear)

  1. Andrei by Trusty+Penfold · · Score: 3, Interesting


    Andrei Markov himself used his models to filter text. Not e-mails obviously, but poetry.

    He would both write and solicit poetry and he devised what are now know as markov models to assess poetry for aesthetic quality. Sounds like a silly idea now, but this was in the Victorian era when science and maths were blossoming. It was believed that everything could be measured and quantified; even philosophical qualities such as 'goodness' of poetry.

    So,

    step 1 : first search for rhyme and meter.
    step 2 : then search for spamlike characteristics.

  2. Re:Markov Models, Negative Reward, Negative Exampl by crisco · · Score: 3, Interesting
    One Bayesian email filter bills itself as more than just a spam filter but an general email classification system. You can easily define many categories and train it accordingly.

    I don't know enough about MDP to know if this is feasible work but it may move the scope of the solution far enough away from the patent to allow the submitter to continue his work.

    --

    Bleh!