Spam Detection Using an Artificial Immune System

← Back to Stories (view on slashdot.org)

Spam Detection Using an Artificial Immune System

Posted by ryuzaki0 on Monday July 10, 2006 @09:04AM from the lymp0cty3z-narf-poit!-claire-said-the-laundry-wheel dept.

rangeva writes "As anti-spam solutions evolve to limit junk email, the senders quickly adapt to make sure their messages are seen. an interesting article describes the application of an artificial immune system model to effectively protect email users from unwanted messages. In particular, it tests a spam immune system against the publicly available SpamAssassin corpus of spam and non-spam. It does so by classifying email messages with the detectors produced by the immune system. The resulting system classifies the messages with accuracy similar to that of other spam filters, but it does so with fewer detectors."

13 of 114 comments (clear)

Min score:

Reason:

Sort:

Finally by nizo · 2006-07-10 09:09 · Score: 4, Funny

So now we can look forward to a spam filtering solution that actively searches for spammers and kills them?

--
I Am My Own Worst Enemy
Great.... by (pvb)charon · 2006-07-10 09:17 · Score: 4, Funny

Ever heard of hay fever? Allergies? Think, people, think! charon
Re:The difference? by DragonWriter · 2006-07-10 09:17 · Score: 4, Insightful

What seperates this from a Bayesian filter?
If nothing else, it has new, improve buzzwords. "Artificial immune system" is so much more evocative than "Bayesian filter".
Fancy by roman_mir · 2006-07-10 09:23 · Score: 4, Insightful

It looks fancy but when you get down to it, all it means is that there are a number of heuristics that are combined into filters (this happens by user training.) The filters are 'weighted' and filters that are not used often enough are 'culled' (killed off.) I don't think this will be significantly better than any other Bayesian-type spam systems.

--
You can't handle the truth.
Not much by jfengel · 2006-07-10 09:28 · Score: 5, Informative

Ultimately, very little. At core, they're probably identical techniques, and if I were reviewing this as a scientific paper I'd ding them for not answering exactly that question. There are such strong parallels between the two (train them on known data, add up probabilities, cut stuff on a threshold) that I strongly suspect that they're identical.

There are useful things to be gained from a change of metaphor. For example, one difference between this and most bayesian spam filter implementations is that this explicitly incorporates a decay function. That could be useful, if a word that used to be common in spam no longer is (e.g. if I actually decided to buy a Rolex, it's no longer a strong spam indicator, whereas right now any email mentionining "Rolex" is 99.9999% certain to be spam).

You could easily modify a Bayesian filter to have time-decaying weights, but if the change in metaphor leads somebody to come up with a good insight, then perhaps this is useful. Mathematically, though, the equations look very similar.
1. Re:Not much by adrianbaugh · 2006-07-10 09:43 · Score: 4, Interesting
  
  Perhaps a neat way to extend this idea would be to have the filter scan your outgoing mail, too; not to search for spam as such, but to look for changes in behaviour. Then, supposing you emailed sales@igottagetmearolex.com enquiring the price of a Rolex, the filter could modify the spam and ham probabilities of rolex. I suppose it would have to be clever enough to ignore emails sent to abuse@ addresses reporting spam and attaching the spam message, among other things I can't be bothered to think of now, but it's an idea that comes more readily from the immune system metaphor than the pure probability metaphor.
  
  --
  "'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
  - JRR Tolkien.
I gave up by Scratch-O-Matic · 2006-07-10 09:30 · Score: 4, Interesting

I recently gave up on tweaking filters for myself and a few dozen people whose accounts I administer. I wrote a little script that asks for confirmation from the sender...if the sender confirms, they are added to a whitelist and will go straight through after that. I can also add addresses manually to the whitelist, and will soon be able to have wildcard (domain-wide) approved addresses. I've gotten exactly two spam in 6 weeks...both were confirmed by either a person or an autoresponder. Five years ago I never would have wanted such a blunt system...nowadays it's just the ticket.

--

Evil is the money of root.
1. Re:I gave up by babaloo · 2006-07-10 10:09 · Score: 5, Interesting
  
  I understand your frustration but I was the victim of a Joe Job attack and systems like you describe just add to the pain of the victim. I feel that these types of responses are just as unwelcome as spam and I report them as such. Have you had any issues like this?
2. Re:I gave up by CFrankBernard · 2006-07-10 10:28 · Score: 4, Interesting
  
  I recommend joining the SPAM-L mailing list of 900+ email admins and ask for opinions on "challenge response" (C/R) spam fighting systems. Sending a confirmation message to the alleged/purported sending address *is* spam when it is spoofed/forged (quite common). The only way to ensure sending info back to the connecting email server is to do so /during/ the SMTP conversation.
3. Re:I gave up by rudedog · 2006-07-10 12:01 · Score: 4, Insightful
  
  So it appears that you decided that the responsibility for fighting your spam should be moved onto the backs of everybody else on the Internet? Spam almost always comes from a forged sender. By doing this, you're just sending tons of spam to the forgery victims. Please do us and you a favor and google "challenge response harmful", and then turn off your C/R system.
Re:The utility of newer systems by crotherm · 2006-07-10 09:36 · Score: 4, Insightful

I have to admit, I don't see the need for these recent wizbang horseless carriages. Sure, they might be ingenious, but on a practical level, they don't do anything more than a fine team of horses. yada yada

But seriously, your attitude is one that would stop all progress. This new method does the job more efficiently.

From TFA, The lightweight nature of this solution -- requiring significantly smaller number of detectors when compared to SpamAssassin -- will doubtlessly prove attractive to those looking to implement a server-based solution where processing overhead may well be an issue. A server-based solution would be a one-size-fits-all mold since the filter is not personalized and does not learn for each particular user, but the reduced processing and storage time makes such a solution attractive.

That sounds like a good reason for this research.

--
"Those who make peaceful revolution impossible, make violent revolution inevitable" - JFK
Abysmal results by gvc · 2006-07-10 09:40 · Score: 4, Interesting

More specifically, it correctly classifies 84% of spam and 98% of non-spam.

The authors used the SpamAssassin corpus. Holden shows that, on the Spamassasin corpus, Bogofilter correctly classifies 90.3% of spam and 99.88% of non-spam. See http://sam.holden.id.au/writings/spam2/

This approach is nowhere near state of the art.
no more biological metaphors.... by illuminatedwax · 2006-07-10 09:53 · Score: 5, Insightful

I'm seriously sick of people abusing biological methodolgies. People seem very attracted to ideas simply because they are grounded in "how nature works" and ignore the mathematical benefits or weaknesses. Now this idea pretty much just sounds like statistical rules based on a corpus - pretty much how every successful solution out there now works. This solution simply prunes rules that aren't being used, but there are better ways to get a smaller spam detection database. Have you seen the stuff the CRM114 people are doing? This is nothing new.

Read your Russell and Norvig, people. Airplane research didn't get off the ground (ugh) until we stopped trying to mimic birds and study physical principles of flight.

--
Did you ever notice that *nix doesn't even cover Linux?