Spam Detection Using an Artificial Immune System

← Back to Stories (view on slashdot.org)

Spam Detection Using an Artificial Immune System

Posted by ryuzaki0 on Monday July 10, 2006 @09:04AM from the lymp0cty3z-narf-poit!-claire-said-the-laundry-wheel dept.

rangeva writes "As anti-spam solutions evolve to limit junk email, the senders quickly adapt to make sure their messages are seen. an interesting article describes the application of an artificial immune system model to effectively protect email users from unwanted messages. In particular, it tests a spam immune system against the publicly available SpamAssassin corpus of spam and non-spam. It does so by classifying email messages with the detectors produced by the immune system. The resulting system classifies the messages with accuracy similar to that of other spam filters, but it does so with fewer detectors."

27 of 114 comments (clear)

Min score:

Reason:

Sort:

The utility of newer systems by CRCulver · 2006-07-10 09:08 · Score: 3, Informative

I have to admit, I don't see the need for these recent whizbang's additions to the spam-fighting repertoire. Sure, they might be ingenious, but on a practical level they don't do anything more than a properly-configured SpamAssassin system. I used to get a lot of spam coming through a default installation of SpamAssassin, but after spending some time with O'Reilly's book (the free docs may already be up to this level of reader-friendliness, it's been a couple of years) and tweaking my installation, I get spam once in a blue moon. There's just no need for anything more.
1. Re:The utility of newer systems by crotherm · 2006-07-10 09:36 · Score: 4, Insightful
  
  I have to admit, I don't see the need for these recent wizbang horseless carriages. Sure, they might be ingenious, but on a practical level, they don't do anything more than a fine team of horses. yada yada
  
  But seriously, your attitude is one that would stop all progress. This new method does the job more efficiently.
  
  From TFA, The lightweight nature of this solution -- requiring significantly smaller number of detectors when compared to SpamAssassin -- will doubtlessly prove attractive to those looking to implement a server-based solution where processing overhead may well be an issue. A server-based solution would be a one-size-fits-all mold since the filter is not personalized and does not learn for each particular user, but the reduced processing and storage time makes such a solution attractive.
  
  That sounds like a good reason for this research.
  
  --
  "Those who make peaceful revolution impossible, make violent revolution inevitable" - JFK
2. Re:The utility of newer systems by a_n_d_e_r_s · 2006-07-10 10:09 · Score: 2, Interesting
  
  Good spammers run their spam through SpamAssassin to make sure they get a 0 score in it to make sure the spam gets through. Most sysadmins use the standards settings and thus the spam gets through.
  
  No very smart to send spam that get caught by SpamAssassin.
  
  --
  Just saying it like it are.
Finally by nizo · 2006-07-10 09:09 · Score: 4, Funny

So now we can look forward to a spam filtering solution that actively searches for spammers and kills them?

--
I Am My Own Worst Enemy
The difference? by MoeMoe · 2006-07-10 09:14 · Score: 2, Insightful

Not that I'm arguing that it's the same, rather I'd like to know:

What seperates this from a Bayesian filter?

--
Business \Busi"ness\, n.;
A scam in which all people involved perceive as beneficial...
1. Re:The difference? by DragonWriter · 2006-07-10 09:17 · Score: 4, Insightful
  
  What seperates this from a Bayesian filter?
  If nothing else, it has new, improve buzzwords. "Artificial immune system" is so much more evocative than "Bayesian filter".
Great.... by (pvb)charon · 2006-07-10 09:17 · Score: 4, Funny

Ever heard of hay fever? Allergies? Think, people, think! charon
1. Re:Great.... by Dannon · 2006-07-10 09:51 · Score: 3, Funny
  
  Thanks, now I have the mental image of a spam filter with sinus problems. Ewwww...
  
  --
  Good judgment comes from experience.
  Experience comes from bad judgment.
Fancy by roman_mir · 2006-07-10 09:23 · Score: 4, Insightful

It looks fancy but when you get down to it, all it means is that there are a number of heuristics that are combined into filters (this happens by user training.) The filters are 'weighted' and filters that are not used often enough are 'culled' (killed off.) I don't think this will be significantly better than any other Bayesian-type spam systems.

--
You can't handle the truth.
Not much by jfengel · 2006-07-10 09:28 · Score: 5, Informative

Ultimately, very little. At core, they're probably identical techniques, and if I were reviewing this as a scientific paper I'd ding them for not answering exactly that question. There are such strong parallels between the two (train them on known data, add up probabilities, cut stuff on a threshold) that I strongly suspect that they're identical.

There are useful things to be gained from a change of metaphor. For example, one difference between this and most bayesian spam filter implementations is that this explicitly incorporates a decay function. That could be useful, if a word that used to be common in spam no longer is (e.g. if I actually decided to buy a Rolex, it's no longer a strong spam indicator, whereas right now any email mentionining "Rolex" is 99.9999% certain to be spam).

You could easily modify a Bayesian filter to have time-decaying weights, but if the change in metaphor leads somebody to come up with a good insight, then perhaps this is useful. Mathematically, though, the equations look very similar.
1. Re:Not much by adrianbaugh · 2006-07-10 09:43 · Score: 4, Interesting
  
  Perhaps a neat way to extend this idea would be to have the filter scan your outgoing mail, too; not to search for spam as such, but to look for changes in behaviour. Then, supposing you emailed sales@igottagetmearolex.com enquiring the price of a Rolex, the filter could modify the spam and ham probabilities of rolex. I suppose it would have to be clever enough to ignore emails sent to abuse@ addresses reporting spam and attaching the spam message, among other things I can't be bothered to think of now, but it's an idea that comes more readily from the immune system metaphor than the pure probability metaphor.
  
  --
  "'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
  - JRR Tolkien.
Real spam solution by Dryanta · 2006-07-10 09:30 · Score: 3, Interesting

Spam and content filtering will always be a struggle for anybody who actually utilizes email. Simply adding more logic will not solve the problem. Reporting spammers to every rbl list you can think of, and alerting forums and newsgroups of abusive ip blocks on the other hand is already doing quite nicely.
I gave up by Scratch-O-Matic · 2006-07-10 09:30 · Score: 4, Interesting

I recently gave up on tweaking filters for myself and a few dozen people whose accounts I administer. I wrote a little script that asks for confirmation from the sender...if the sender confirms, they are added to a whitelist and will go straight through after that. I can also add addresses manually to the whitelist, and will soon be able to have wildcard (domain-wide) approved addresses. I've gotten exactly two spam in 6 weeks...both were confirmed by either a person or an autoresponder. Five years ago I never would have wanted such a blunt system...nowadays it's just the ticket.

--

Evil is the money of root.
1. Re:I gave up by babaloo · 2006-07-10 10:09 · Score: 5, Interesting
  
  I understand your frustration but I was the victim of a Joe Job attack and systems like you describe just add to the pain of the victim. I feel that these types of responses are just as unwelcome as spam and I report them as such. Have you had any issues like this?
2. Re:I gave up by CFrankBernard · 2006-07-10 10:28 · Score: 4, Interesting
  
  I recommend joining the SPAM-L mailing list of 900+ email admins and ask for opinions on "challenge response" (C/R) spam fighting systems. Sending a confirmation message to the alleged/purported sending address *is* spam when it is spoofed/forged (quite common). The only way to ensure sending info back to the connecting email server is to do so /during/ the SMTP conversation.
3. Re:I gave up by rudedog · 2006-07-10 12:01 · Score: 4, Insightful
  
  So it appears that you decided that the responsibility for fighting your spam should be moved onto the backs of everybody else on the Internet? Spam almost always comes from a forged sender. By doing this, you're just sending tons of spam to the forgery victims. Please do us and you a favor and google "challenge response harmful", and then turn off your C/R system.
More of the same; not a solution by mrheckman · 2006-07-10 09:33 · Score: 3, Interesting

The "immune system" solution is just another way to detect spam, but it is unlikely to be much more successful than existing methods. As someone else pointed out, SpamAssasin is pretty good already. So what if this new type of filter eventually improves the spam filtering accuracy from 98% to 99%? A more highly-polished rock is still a rock.

The real problem is the sending of spam itself, and that problem arrises from an inability to correctly attribute the spam to the spammers. If we can do that, we can block it, or at least better convict the spammers who violate the law. Things that solve this problem, like Yahoo!'s "DomainKeys", are the future of anti-spam, not more highly-polished rocks.
1. Re:More of the same; not a solution by Mean+Variance · 2006-07-10 09:56 · Score: 2, Interesting
  
  Things that solve this problem, like Yahoo!'s "DomainKeys", are the future of anti-spam, not more highly-polished rocks.
  Domain Keys, at least to this point is utter crap in my experience. I get these small floods of spam into my Yahoo! mailbox. What most of them have in common is they are certified by Domain Keys. A couple months ago, I was getting the exact same spam every day for some mortgage coming from different addresses. All were DK certified.
  For what it's worth, I do send off those specific emails to the abuse alias at Yahoo! Their canned emails state that they have dealt with the problem according to their TOS.
  I don't know where the flaw lies, but it's there in Domain Keys.
Useless -- solves a non-problem (performance) by CurtMonash · 2006-07-10 09:37 · Score: 2, Interesting

I have two major objections to this idea, and to the article that presents it.

1. The ONLY problem this solves is performance -- i.e., processing throughput. And that's not what's wrong with anti-spam systems today. They live and die on the precision/accuracy tradeoff, or maybe on UI.

2. The authors seem to assume that Bayesian systems work really, really well. While technically most or all current spam-filtering products are Bayesian in some sense, that still speaks of considerable naivete about real-world spam.

--
To err is human. To forgive is good system design.
The easiest way to eliminate most spam ..... by travisco_nabisco · 2006-07-10 09:38 · Score: 2, Insightful

I just had a thought while reading about the spam filters about spelling. So I went and looked in my spam folder and found that every piece of spam has many, many words that are not in a dictionary, ie not spelled correctly.

Why not run a script that filters messages based on spelling? If there are more than 'xx' many words that do not exist in the dictionary you choose to use, then the message gets sent to the spam folder. This would catch the odd e-mail from friends who don't know how to spell or what a spell checker is, but then when you clean out your spam folder you should notice it.
1. Re:The easiest way to eliminate most spam ..... by dhasenan · 2006-07-10 11:15 · Score: 2, Insightful
  
  Do you actually WANT to interview a job applicant who can't spell 20 words in a 150-word email?
Modelling Nature by A+Dafa+Disciple · 2006-07-10 09:39 · Score: 3, Interesting

Your post advocates a

(x) technical ( ) legislative ( ) market-based ( ) vigilante

approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)

( ) Spammers can easily use it to harvest email addresses
( ) Mailing lists and other legitimate email uses would be affected
( ) No one will be able to find the guy or collect the money
( ) It is defenseless against brute force attacks
( ) It will stop spam for two weeks and then we'll be stuck with it
(x) An enormous amount of spam will initially go undetected before your idea is effective
( ) Users of email will not put up with it
( ) Microsoft will not put up with it
( ) The police will not put up with it
(x) Your idea proposes a solution that only large corporations could deploy
( ) Requires too much cooperation from spammers
( ) Requires immediate total cooperation from everybody at once
( ) Many email users cannot afford to lose business or alienate potential employers
( ) Spammers don't care about invalid addresses in their lists
( ) Anyone could anonymously destroy anyone else's career or business

Specifically, your plan fails to account for

( ) Laws expressly prohibiting it
( ) Lack of centrally controlling authority for email
( ) Open relays in foreign countries
( ) Ease of searching tiny alphanumeric address space of all email addresses
( ) Asshats
( ) Jurisdictional problems
( ) Unpopularity of weird new taxes
( ) Public reluctance to accept weird new forms of money
( ) Huge existing software investment in SMTP
( ) Susceptibility of protocols other than SMTP to attack
( ) Willingness of users to install OS patches received by email
( ) Armies of worm riddled broadband-connected Windows boxes
( ) Eternal arms race involved in all filtering approaches
( ) Extreme profitability of spam
( ) Joe jobs and/or identity theft
( ) Technically illiterate politicians
( ) Extreme stupidity on the part of people who do business with spammers
( ) Dishonesty on the part of spammers themselves
( ) Bandwidth costs that are unaffected by client filtering
(x) The large amount of resources needed for implementation of your idea that small companies don't have
( ) Outlook

and the following philosophical objections may also apply:

( ) Ideas similar to yours are easy to come up with, yet none have ever been shown practical
( ) Any scheme based on opt-out is unacceptable
( ) SMTP headers should not be the subject of legislation
( ) Blacklists suck
( ) Whitelists suck
( ) We should be able to talk about Viagra without being censored
(x) Your solution is nothing more than a conceptual remanifestation of a solution that already exists
( ) Countermeasures should not involve wire fraud or credit card fraud
( ) Countermeasures should not involve sabotage of public networks
( ) Countermeasures must work if phased in gradually
( ) Sending email should be free
( ) Why should we have to trust you and your servers?
( ) Incompatiblity with open source or open source licenses
( ) Feel-good measures do nothing to solve the problem
( ) Temporary/one-time email addresses are cumbersome
( ) I don't want the government reading my email
( ) Killing them that way is not slow and painful enough

Furthermore, this is what I think about you:

(x) I think it is a creative concept, but there is no need to reinvent the wheel.
( ) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your house down!

--

Falun Dafa is good!
SpamAssassin does "decay" them. by khasim · 2006-07-10 09:40 · Score: 2, Informative

Look up "bayes_expiry_max_db_size". If your database gets larger than the limit you set then the lesser used tokens are deleted.
Abysmal results by gvc · 2006-07-10 09:40 · Score: 4, Interesting

More specifically, it correctly classifies 84% of spam and 98% of non-spam.

The authors used the SpamAssassin corpus. Holden shows that, on the Spamassasin corpus, Bogofilter correctly classifies 90.3% of spam and 99.88% of non-spam. See http://sam.holden.id.au/writings/spam2/

This approach is nowhere near state of the art.
no more biological metaphors.... by illuminatedwax · 2006-07-10 09:53 · Score: 5, Insightful

I'm seriously sick of people abusing biological methodolgies. People seem very attracted to ideas simply because they are grounded in "how nature works" and ignore the mathematical benefits or weaknesses. Now this idea pretty much just sounds like statistical rules based on a corpus - pretty much how every successful solution out there now works. This solution simply prunes rules that aren't being used, but there are better ways to get a smaller spam detection database. Have you seen the stuff the CRM114 people are doing? This is nothing new.

Read your Russell and Norvig, people. Airplane research didn't get off the ground (ugh) until we stopped trying to mimic birds and study physical principles of flight.

--
Did you ever notice that *nix doesn't even cover Linux?
Re:False positives still a problem by CRCulver · 2006-07-10 10:03 · Score: 2, Interesting

And Spammers will adapt to this technology as well, reducing its effectiveness.

One wonders what sort of people have so little moral fiber that they study spam-blockers and create new methods for getting around it. Really, it would be great if Slashdot could profile one of these twisted people and show just who does it, what country they are from, what kind of upbringing they had, etc. But maybe anyone is susceptible to the temptation. Recently, while making a comment on a blog, I was thinking about just how easy it would be to automatically circumvent its arithmetic-based anti-spam question ("What is 7 + 9?"). It was like being called over to the dark side.
Immune System Attacking Spammers by cyberscan · 2006-07-10 13:46 · Score: 3, Interesting

Here is a better Idea: Blue Security was attacked and shut down because the Internet is septic. The germs (spammers) have taken over. The best way to win this is to take the profit out of spamming. This can be done in a similar manner in which the body's t cells alert the rest of an immune system on how to attack a pathogen. A cryptographically signed spammer complaint (attack) file should be distributed via a peer to peer network protocol. This file is sent amongst complaining programs that complain to a spammer's website each time a spam advertising said website is received.

Like an immune system, this network of spam attack programs will have a t-cell. The "t-cells" will be a small group of people who draw up the complaint instruction file. Whenever the pathogen (spammer) releases enough toxins (spam) into the body (Internet), the T-cells (people who write the complaint instruction file) alert the immune cells (spam complaint program) of the presence of the pathogen and how to attack (complain to website advertised) it. The pathogen is overwhelmed with a quick immuno responce (high bandwidth usage resulting from many, many complaints).

When the cost of running a website surpasses the revenue earned from said website, the website is shut down. When the costs of spamming or advertising via spam exceeds the income, spam stops. Blue Security was beginning to become successful. Too bad they bowed out.