Slashdot Mirror


Spam Detection Using an Artificial Immune System

rangeva writes "As anti-spam solutions evolve to limit junk email, the senders quickly adapt to make sure their messages are seen. an interesting article describes the application of an artificial immune system model to effectively protect email users from unwanted messages. In particular, it tests a spam immune system against the publicly available SpamAssassin corpus of spam and non-spam. It does so by classifying email messages with the detectors produced by the immune system. The resulting system classifies the messages with accuracy similar to that of other spam filters, but it does so with fewer detectors."

9 of 114 comments (clear)

  1. The difference? by MoeMoe · · Score: 2, Insightful

    Not that I'm arguing that it's the same, rather I'd like to know:

    What seperates this from a Bayesian filter?

    --
    Business \Busi"ness\, n.;
    A scam in which all people involved perceive as beneficial...
    1. Re:The difference? by DragonWriter · · Score: 4, Insightful
      What seperates this from a Bayesian filter?
      If nothing else, it has new, improve buzzwords. "Artificial immune system" is so much more evocative than "Bayesian filter".
  2. Fancy by roman_mir · · Score: 4, Insightful

    It looks fancy but when you get down to it, all it means is that there are a number of heuristics that are combined into filters (this happens by user training.) The filters are 'weighted' and filters that are not used often enough are 'culled' (killed off.) I don't think this will be significantly better than any other Bayesian-type spam systems.

  3. Re:The utility of newer systems by crotherm · · Score: 4, Insightful

    I have to admit, I don't see the need for these recent wizbang horseless carriages. Sure, they might be ingenious, but on a practical level, they don't do anything more than a fine team of horses. yada yada

    But seriously, your attitude is one that would stop all progress. This new method does the job more efficiently.

    From TFA, The lightweight nature of this solution -- requiring significantly smaller number of detectors when compared to SpamAssassin -- will doubtlessly prove attractive to those looking to implement a server-based solution where processing overhead may well be an issue. A server-based solution would be a one-size-fits-all mold since the filter is not personalized and does not learn for each particular user, but the reduced processing and storage time makes such a solution attractive.


    That sounds like a good reason for this research.

    --
    "Those who make peaceful revolution impossible, make violent revolution inevitable" - JFK
  4. The easiest way to eliminate most spam ..... by travisco_nabisco · · Score: 2, Insightful

    I just had a thought while reading about the spam filters about spelling. So I went and looked in my spam folder and found that every piece of spam has many, many words that are not in a dictionary, ie not spelled correctly.

    Why not run a script that filters messages based on spelling? If there are more than 'xx' many words that do not exist in the dictionary you choose to use, then the message gets sent to the spam folder. This would catch the odd e-mail from friends who don't know how to spell or what a spell checker is, but then when you clean out your spam folder you should notice it.

    1. Re:The easiest way to eliminate most spam ..... by dhasenan · · Score: 2, Insightful

      Do you actually WANT to interview a job applicant who can't spell 20 words in a 150-word email?

  5. no more biological metaphors.... by illuminatedwax · · Score: 5, Insightful

    I'm seriously sick of people abusing biological methodolgies. People seem very attracted to ideas simply because they are grounded in "how nature works" and ignore the mathematical benefits or weaknesses. Now this idea pretty much just sounds like statistical rules based on a corpus - pretty much how every successful solution out there now works. This solution simply prunes rules that aren't being used, but there are better ways to get a smaller spam detection database. Have you seen the stuff the CRM114 people are doing? This is nothing new.

    Read your Russell and Norvig, people. Airplane research didn't get off the ground (ugh) until we stopped trying to mimic birds and study physical principles of flight.

    --
    Did you ever notice that *nix doesn't even cover Linux?
  6. Re:I gave up by rudedog · · Score: 4, Insightful

    So it appears that you decided that the responsibility for fighting your spam should be moved onto the backs of everybody else on the Internet? Spam almost always comes from a forged sender. By doing this, you're just sending tons of spam to the forgery victims. Please do us and you a favor and google "challenge response harmful", and then turn off your C/R system.

  7. Are we still doing this? by Anonymous Coward · · Score: 1, Insightful

    Are we still on the message-filtering bandwagon? I know it was all the rage when we talked about it in 2000, but now it's 2006, and we've all had experience with it. Pattern-matching has been defeated, and it was an embarassing defeat. This is usually a sign to those who proposed it that they should consider a career change. With the exception of those patterns that correspond to firewall rules blocking domains run by companies with names like "Megaultra Webcram Holdings, Inc", it's a dead issue.

    The real issue I have is with those researchers and businesses that to continue to push this cyber snakeoil. It's getting to the point that e-mail is worthless, not because of the high volume of spam, but because easy-confused pattern-matching blockers remove just enough messages to cause major problems for the rest of us. Here is why it's stupid, and should be stopped:

    * While contaminated pattern-matching filters don't always block wanted messages, they remove just enough messages to cause doubt and frustration with my users, and those on the other end of the loop. This leads to network administrator (me) having to individually resolve each problem by sifting through the logs.

    * Because the matched-messages are removed on the far end of the transaction, i.e. on the "client side", there's no indication of trouble, or even an error message (to the user or in the logs). Neither party understands where the message has gone, and this reinforces superstition. For years, I whined, teased and scolded to get the attention of the morons who were going gung-ho with client-end filtering for spam and viruses, but they just wouldn't listen.

    * ISPs and other service providers have deployed these infernal filters everywhere, making a huge mess which I cannot resolve. It is next to impossible to politely explain the problem is theirs, without having their attention tossed amid a sea of techie jargon. They usually come away with the message, "it is your fault, not ours". I'm fed up dealing with the hostile confrontations that result.

    I have a sneaking suspicion that the same morons who thought spam/virus filtering based on pattern-matching the 'From' line was brilliant are the same idiots responsible for the current crop of "security" dud-ware. Do I sound hostile? I am, and these charlatans can go shove it. At this point, I think only the "homeopathic remedy" market has more frauds than the computer industry.

    I'm sorry, no matter how graceful the descriptions or the analogies, I will no longer accept content-based pattern-matching filters on e-mail. They have been proven horribly ineffective. Spam-filtering isn't rocket science, okay? First you block any SMTP traffic without a zone pointer, then block large chunks of addresses from underdeveloped countries based on message header sampling. From there, build up a list of UK, US, and Canadian spam-pushers based on their domain registrations. You'll eliminate most of it, and unless you communicate extensively with people in China, Bolivia, Russia or Brazil, you won't have to do much tuning.

    This is all incredibly stupid anyway. The solution to the spam problem is not a technological one, or a political one. It's an economic problem. The powers that be chose - in their infinite wisdom - to allocate huge blocks of addresses to largely underdeveloped nations based on populace, instead of demand. Most of these people don't have a network device, and won't have one in the foreseeable future. The value of these addresses is so ridiculously deflated, that they're worth close to nothing. Spammers have massive chunks of address space, and can cycle through millions of IPs before all of them are at risk of being blocked. Want it to stop? Charge a reasonable rate to pass the traffic through your country's network backbone.