Slashdot Mirror


Spam Detection Using an Artificial Immune System

rangeva writes "As anti-spam solutions evolve to limit junk email, the senders quickly adapt to make sure their messages are seen. an interesting article describes the application of an artificial immune system model to effectively protect email users from unwanted messages. In particular, it tests a spam immune system against the publicly available SpamAssassin corpus of spam and non-spam. It does so by classifying email messages with the detectors produced by the immune system. The resulting system classifies the messages with accuracy similar to that of other spam filters, but it does so with fewer detectors."

19 of 114 comments (clear)

  1. The utility of newer systems by CRCulver · · Score: 3, Informative

    I have to admit, I don't see the need for these recent whizbang's additions to the spam-fighting repertoire. Sure, they might be ingenious, but on a practical level they don't do anything more than a properly-configured SpamAssassin system. I used to get a lot of spam coming through a default installation of SpamAssassin, but after spending some time with O'Reilly's book (the free docs may already be up to this level of reader-friendliness, it's been a couple of years) and tweaking my installation, I get spam once in a blue moon. There's just no need for anything more.

    1. Re:The utility of newer systems by crotherm · · Score: 4, Insightful

      I have to admit, I don't see the need for these recent wizbang horseless carriages. Sure, they might be ingenious, but on a practical level, they don't do anything more than a fine team of horses. yada yada

      But seriously, your attitude is one that would stop all progress. This new method does the job more efficiently.

      From TFA, The lightweight nature of this solution -- requiring significantly smaller number of detectors when compared to SpamAssassin -- will doubtlessly prove attractive to those looking to implement a server-based solution where processing overhead may well be an issue. A server-based solution would be a one-size-fits-all mold since the filter is not personalized and does not learn for each particular user, but the reduced processing and storage time makes such a solution attractive.


      That sounds like a good reason for this research.

      --
      "Those who make peaceful revolution impossible, make violent revolution inevitable" - JFK
  2. Finally by nizo · · Score: 4, Funny

    So now we can look forward to a spam filtering solution that actively searches for spammers and kills them?

  3. Great.... by (pvb)charon · · Score: 4, Funny

    Ever heard of hay fever? Allergies? Think, people, think! charon

    1. Re:Great.... by Dannon · · Score: 3, Funny

      Thanks, now I have the mental image of a spam filter with sinus problems. Ewwww...

      --
      Good judgment comes from experience.
      Experience comes from bad judgment.
  4. Re:The difference? by DragonWriter · · Score: 4, Insightful
    What seperates this from a Bayesian filter?
    If nothing else, it has new, improve buzzwords. "Artificial immune system" is so much more evocative than "Bayesian filter".
  5. Fancy by roman_mir · · Score: 4, Insightful

    It looks fancy but when you get down to it, all it means is that there are a number of heuristics that are combined into filters (this happens by user training.) The filters are 'weighted' and filters that are not used often enough are 'culled' (killed off.) I don't think this will be significantly better than any other Bayesian-type spam systems.

  6. Not much by jfengel · · Score: 5, Informative

    Ultimately, very little. At core, they're probably identical techniques, and if I were reviewing this as a scientific paper I'd ding them for not answering exactly that question. There are such strong parallels between the two (train them on known data, add up probabilities, cut stuff on a threshold) that I strongly suspect that they're identical.

    There are useful things to be gained from a change of metaphor. For example, one difference between this and most bayesian spam filter implementations is that this explicitly incorporates a decay function. That could be useful, if a word that used to be common in spam no longer is (e.g. if I actually decided to buy a Rolex, it's no longer a strong spam indicator, whereas right now any email mentionining "Rolex" is 99.9999% certain to be spam).

    You could easily modify a Bayesian filter to have time-decaying weights, but if the change in metaphor leads somebody to come up with a good insight, then perhaps this is useful. Mathematically, though, the equations look very similar.

    1. Re:Not much by adrianbaugh · · Score: 4, Interesting

      Perhaps a neat way to extend this idea would be to have the filter scan your outgoing mail, too; not to search for spam as such, but to look for changes in behaviour. Then, supposing you emailed sales@igottagetmearolex.com enquiring the price of a Rolex, the filter could modify the spam and ham probabilities of rolex. I suppose it would have to be clever enough to ignore emails sent to abuse@ addresses reporting spam and attaching the spam message, among other things I can't be bothered to think of now, but it's an idea that comes more readily from the immune system metaphor than the pure probability metaphor.

      --
      "'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
      - JRR Tolkien.
  7. Real spam solution by Dryanta · · Score: 3, Interesting

    Spam and content filtering will always be a struggle for anybody who actually utilizes email. Simply adding more logic will not solve the problem. Reporting spammers to every rbl list you can think of, and alerting forums and newsgroups of abusive ip blocks on the other hand is already doing quite nicely.

  8. I gave up by Scratch-O-Matic · · Score: 4, Interesting

    I recently gave up on tweaking filters for myself and a few dozen people whose accounts I administer. I wrote a little script that asks for confirmation from the sender...if the sender confirms, they are added to a whitelist and will go straight through after that. I can also add addresses manually to the whitelist, and will soon be able to have wildcard (domain-wide) approved addresses. I've gotten exactly two spam in 6 weeks...both were confirmed by either a person or an autoresponder. Five years ago I never would have wanted such a blunt system...nowadays it's just the ticket.

    --


    Evil is the money of root.
    1. Re:I gave up by babaloo · · Score: 5, Interesting

      I understand your frustration but I was the victim of a Joe Job attack and systems like you describe just add to the pain of the victim. I feel that these types of responses are just as unwelcome as spam and I report them as such. Have you had any issues like this?

    2. Re:I gave up by CFrankBernard · · Score: 4, Interesting

      I recommend joining the SPAM-L mailing list of 900+ email admins and ask for opinions on "challenge response" (C/R) spam fighting systems. Sending a confirmation message to the alleged/purported sending address *is* spam when it is spoofed/forged (quite common). The only way to ensure sending info back to the connecting email server is to do so /during/ the SMTP conversation.

    3. Re:I gave up by rudedog · · Score: 4, Insightful

      So it appears that you decided that the responsibility for fighting your spam should be moved onto the backs of everybody else on the Internet? Spam almost always comes from a forged sender. By doing this, you're just sending tons of spam to the forgery victims. Please do us and you a favor and google "challenge response harmful", and then turn off your C/R system.

  9. More of the same; not a solution by mrheckman · · Score: 3, Interesting

    The "immune system" solution is just another way to detect spam, but it is unlikely to be much more successful than existing methods. As someone else pointed out, SpamAssasin is pretty good already. So what if this new type of filter eventually improves the spam filtering accuracy from 98% to 99%? A more highly-polished rock is still a rock.

    The real problem is the sending of spam itself, and that problem arrises from an inability to correctly attribute the spam to the spammers. If we can do that, we can block it, or at least better convict the spammers who violate the law. Things that solve this problem, like Yahoo!'s "DomainKeys", are the future of anti-spam, not more highly-polished rocks.

  10. Modelling Nature by A+Dafa+Disciple · · Score: 3, Interesting

    Your post advocates a

    (x) technical ( ) legislative ( ) market-based ( ) vigilante

    approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)

    ( ) Spammers can easily use it to harvest email addresses
    ( ) Mailing lists and other legitimate email uses would be affected
    ( ) No one will be able to find the guy or collect the money
    ( ) It is defenseless against brute force attacks
    ( ) It will stop spam for two weeks and then we'll be stuck with it
    (x) An enormous amount of spam will initially go undetected before your idea is effective
    ( ) Users of email will not put up with it
    ( ) Microsoft will not put up with it
    ( ) The police will not put up with it
    (x) Your idea proposes a solution that only large corporations could deploy
    ( ) Requires too much cooperation from spammers
    ( ) Requires immediate total cooperation from everybody at once
    ( ) Many email users cannot afford to lose business or alienate potential employers
    ( ) Spammers don't care about invalid addresses in their lists
    ( ) Anyone could anonymously destroy anyone else's career or business

    Specifically, your plan fails to account for

    ( ) Laws expressly prohibiting it
    ( ) Lack of centrally controlling authority for email
    ( ) Open relays in foreign countries
    ( ) Ease of searching tiny alphanumeric address space of all email addresses
    ( ) Asshats
    ( ) Jurisdictional problems
    ( ) Unpopularity of weird new taxes
    ( ) Public reluctance to accept weird new forms of money
    ( ) Huge existing software investment in SMTP
    ( ) Susceptibility of protocols other than SMTP to attack
    ( ) Willingness of users to install OS patches received by email
    ( ) Armies of worm riddled broadband-connected Windows boxes
    ( ) Eternal arms race involved in all filtering approaches
    ( ) Extreme profitability of spam
    ( ) Joe jobs and/or identity theft
    ( ) Technically illiterate politicians
    ( ) Extreme stupidity on the part of people who do business with spammers
    ( ) Dishonesty on the part of spammers themselves
    ( ) Bandwidth costs that are unaffected by client filtering
    (x) The large amount of resources needed for implementation of your idea that small companies don't have
    ( ) Outlook

    and the following philosophical objections may also apply:

    ( ) Ideas similar to yours are easy to come up with, yet none have ever been shown practical
    ( ) Any scheme based on opt-out is unacceptable
    ( ) SMTP headers should not be the subject of legislation
    ( ) Blacklists suck
    ( ) Whitelists suck
    ( ) We should be able to talk about Viagra without being censored
    (x) Your solution is nothing more than a conceptual remanifestation of a solution that already exists
    ( ) Countermeasures should not involve wire fraud or credit card fraud
    ( ) Countermeasures should not involve sabotage of public networks
    ( ) Countermeasures must work if phased in gradually
    ( ) Sending email should be free
    ( ) Why should we have to trust you and your servers?
    ( ) Incompatiblity with open source or open source licenses
    ( ) Feel-good measures do nothing to solve the problem
    ( ) Temporary/one-time email addresses are cumbersome
    ( ) I don't want the government reading my email
    ( ) Killing them that way is not slow and painful enough

    Furthermore, this is what I think about you:

    (x) I think it is a creative concept, but there is no need to reinvent the wheel.
    ( ) Sorry dude, but I don't think it would work.
    ( ) This is a stupid idea, and you're a stupid person for suggesting it.
    ( ) Nice try, assh0le! I'm going to find out where you live and burn your house down!

  11. Abysmal results by gvc · · Score: 4, Interesting

    More specifically, it correctly classifies 84% of spam and 98% of non-spam.

    The authors used the SpamAssassin corpus. Holden shows that, on the Spamassasin corpus, Bogofilter correctly classifies 90.3% of spam and 99.88% of non-spam. See http://sam.holden.id.au/writings/spam2/

    This approach is nowhere near state of the art.

  12. no more biological metaphors.... by illuminatedwax · · Score: 5, Insightful

    I'm seriously sick of people abusing biological methodolgies. People seem very attracted to ideas simply because they are grounded in "how nature works" and ignore the mathematical benefits or weaknesses. Now this idea pretty much just sounds like statistical rules based on a corpus - pretty much how every successful solution out there now works. This solution simply prunes rules that aren't being used, but there are better ways to get a smaller spam detection database. Have you seen the stuff the CRM114 people are doing? This is nothing new.

    Read your Russell and Norvig, people. Airplane research didn't get off the ground (ugh) until we stopped trying to mimic birds and study physical principles of flight.

    --
    Did you ever notice that *nix doesn't even cover Linux?
  13. Immune System Attacking Spammers by cyberscan · · Score: 3, Interesting

    Here is a better Idea: Blue Security was attacked and shut down because the Internet is septic. The germs (spammers) have taken over. The best way to win this is to take the profit out of spamming. This can be done in a similar manner in which the body's t cells alert the rest of an immune system on how to attack a pathogen. A cryptographically signed spammer complaint (attack) file should be distributed via a peer to peer network protocol. This file is sent amongst complaining programs that complain to a spammer's website each time a spam advertising said website is received.

    Like an immune system, this network of spam attack programs will have a t-cell. The "t-cells" will be a small group of people who draw up the complaint instruction file. Whenever the pathogen (spammer) releases enough toxins (spam) into the body (Internet), the T-cells (people who write the complaint instruction file) alert the immune cells (spam complaint program) of the presence of the pathogen and how to attack (complain to website advertised) it. The pathogen is overwhelmed with a quick immuno responce (high bandwidth usage resulting from many, many complaints).

    When the cost of running a website surpasses the revenue earned from said website, the website is shut down. When the costs of spamming or advertising via spam exceeds the income, spam stops. Blue Security was beginning to become successful. Too bad they bowed out.