Hm, I've been thinking along those lines myself. I wouldn't expect the result to be really useful, but a GA might find some interesting trends.
Problem is, we're talking semantics, which is a real pain to handle programmatically. The same sentence can be expressed (and misspelled) in a plethora of different ways. The GA would probably have to rely on some predefined framework for handling sentences (ideas, anyone of you GA hackers?).
Some obvious criteria defining spam are of course: * messages with ALL CAPITAL SUBJECTS * messages stating "This message is not spam" * messages containing "earn $" * etc
There are other criteria, like the message has no sender, or it contains forged header tags, but for starters the GA could concentrate on the subject and the body text.
Hm, I've been thinking along those lines myself.
I wouldn't expect the result to be really useful, but a GA might find some interesting trends.
Problem is, we're talking semantics, which is a real pain to handle programmatically.
The same sentence can be expressed (and misspelled) in a plethora of different ways.
The GA would probably have to rely on some predefined framework for handling sentences
(ideas, anyone of you GA hackers?).
Some obvious criteria defining spam are of course:
* messages with ALL CAPITAL SUBJECTS
* messages stating "This message is not spam"
* messages containing "earn $"
* etc
There are other criteria, like the message has no sender, or it contains forged header tags,
but for starters the GA could concentrate on the subject and the body text.