Spam Detection Using an Artificial Immune System

The utility of newer systems by CRCulver · 2006-07-10 09:08 · Score: 3, Informative

I have to admit, I don't see the need for these recent whizbang's additions to the spam-fighting repertoire. Sure, they might be ingenious, but on a practical level they don't do anything more than a properly-configured SpamAssassin system. I used to get a lot of spam coming through a default installation of SpamAssassin, but after spending some time with O'Reilly's book (the free docs may already be up to this level of reader-friendliness, it's been a couple of years) and tweaking my installation, I get spam once in a blue moon. There's just no need for anything more.

Re:The utility of newer systems by crotherm · 2006-07-10 09:36 · Score: 4, Insightful

I have to admit, I don't see the need for these recent wizbang horseless carriages. Sure, they might be ingenious, but on a practical level, they don't do anything more than a fine team of horses. yada yada

But seriously, your attitude is one that would stop all progress. This new method does the job more efficiently.

From TFA, The lightweight nature of this solution -- requiring significantly smaller number of detectors when compared to SpamAssassin -- will doubtlessly prove attractive to those looking to implement a server-based solution where processing overhead may well be an issue. A server-based solution would be a one-size-fits-all mold since the filter is not personalized and does not learn for each particular user, but the reduced processing and storage time makes such a solution attractive.

That sounds like a good reason for this research.

--
"Those who make peaceful revolution impossible, make violent revolution inevitable" - JFK
Re:The utility of newer systems by TheOtherChimeraTwin · 2006-07-10 09:46 · Score: 1

Wow, I wished spamassassin worked that well for me. Mind you, it does get rid of most of the junk, but I still get a fair amount of spam that slips past SA. Every few days, I'll even get spam that has a score of exactly 0!
Re:The utility of newer systems by a_n_d_e_r_s · 2006-07-10 10:09 · Score: 2, Interesting

Good spammers run their spam through SpamAssassin to make sure they get a 0 score in it to make sure the spam gets through. Most sysadmins use the standards settings and thus the spam gets through.

No very smart to send spam that get caught by SpamAssassin.

--
Just saying it like it are.
Re:The utility of newer systems by AigariusDebian · 2006-07-10 12:02 · Score: 1

The real precision of current good Bayesian filtering is close to the precission of a human filter - from 80 to 90 percent. There are newest advances in natural language processing (word sequence processing) and neural and functional text classfifcation areas (support vector machines with nonlinear kernels) that can get spam classification precision up to 99 percent. It might not be too much for spam, BUT when you transfer the same knoledge to other areas of text classification 99 percent of binary classfication precision turns into 80 precent of precision when classifying into 5 categories.

There is a lot of research in this area - I am actually doing it now.
Re:The utility of newer systems by AigariusDebian · 2006-07-10 12:05 · Score: 1

But the research in the article is pretty lame indeed - I have seen expiring Bayesian classifiers before, the only thing that I find interesting there is the use of word sequencing to reduce the feature vectors, but the paper is short of the details of automation of sequence selection which is a major reason why that process is quite underused currently.
Re:The utility of newer systems by nixkuroi · 2006-07-10 17:34 · Score: 1

Yeah, it might work for today, but Spam is only going to get worse and there's a point where traditional models won't scale. If you can get something that does the same job in fewer cycles, that implies you can scale up higher using fewer resources. Also, the methodology they're talking about here is growing organically. This probably means that it'll evolve organically, making it better with each generation. Spam fighters can't stop innovating because the spammers aren't going to.
Re:The utility of newer systems by CarpetShark · 2006-07-11 01:11 · Score: 1

Hmm.. I agree with the need for research and progress, of course. However, I also agree with the parent poster (relative to your post) in the sense that, as far as fighting spam itself goes... if something isn't broke, it's silly to fix it. Most technological progress does cost something in terms of society and the happiness of simplicity, and sometimes that price isn't worth paying.

I guess for me the question is... "what KIND of efficiency are we talking about here? Simplicity for the CPU? Simplicity for developers? Or simplicity for users?"

Perhaps we'll find more insidious forms of spam in future that require this technology to efficiently fight it though, and then it will (assuming it lives up to the hype, which is quite an assumption) undoubtedly be better.
Re:The utility of newer systems by mrxak · 2006-07-11 01:54 · Score: 1

The real issue is efficiency at the server level. If your email server was running something like this you'd have protection just as good, but doesn't bog down with the thousands of emails going through it.

Of course servers are getting faster all the time, but the whole point of computer science is to make things work more efficiently regardless of the actual hardware it runs on.

--
-mrxak
Onions Will Kill You
Re:The utility of newer systems by CarpetShark · 2006-07-13 22:15 · Score: 1

Yes, good point on servers. On the point of computer science though... hmm. I think the point of all science is understanding, which may lead to efficient use, abandonment, or almost any other course of action.

Finally by nizo · 2006-07-10 09:09 · Score: 4, Funny

So now we can look forward to a spam filtering solution that actively searches for spammers and kills them?

--
I Am My Own Worst Enemy

Re:Finally by modecx · 2006-07-10 09:30 · Score: 1

So now we can look forward to a spam filtering solution that actively searches for spammers and kills them?

Hooah! First one to hook this up with an MLRS gets a cookie!

--
Constitutional rights may be respected, repealed, or modified; but they must never be ignored.
Re:Finally by kesuki · 2006-07-10 10:23 · Score: 1

I know where most of them live, just kidding. the problem isn't that we have spammers the problem is that we have kids pretending to be spammers who just hack into legitimate spam networks to send out scams.

--
https://www.gnu.org/philosophy/free-sw.html
Re:Finally by cybercobra · 2006-07-10 11:50 · Score: 1

They've already got that. It's called the Robospamassassin!
http://mirror12.escomposlinux.org/comic/ecol-205-e .png

False positives still a problem by Hungry+Admin · 2006-07-10 09:11 · Score: 1, Redundant

I think this is a very useful new anti-spam tool, but as usual, it will have the possibility of false positives, which can be very damaging. And Spammers will adapt to this technology as well, reducing its effectiveness.

--
Be who you are and say what you feel, because the people who mind don't matter, and the people who matter don't mind.

Re:False positives still a problem by CRCulver · 2006-07-10 10:03 · Score: 2, Interesting

And Spammers will adapt to this technology as well, reducing its effectiveness.

One wonders what sort of people have so little moral fiber that they study spam-blockers and create new methods for getting around it. Really, it would be great if Slashdot could profile one of these twisted people and show just who does it, what country they are from, what kind of upbringing they had, etc. But maybe anyone is susceptible to the temptation. Recently, while making a comment on a blog, I was thinking about just how easy it would be to automatically circumvent its arithmetic-based anti-spam question ("What is 7 + 9?"). It was like being called over to the dark side.
Re:False positives still a problem by d1337 · 2006-07-10 10:19 · Score: 1

Really, it would be great if Slashdot could profile one of these twisted people and show just who does it, what country they are from, what kind of upbringing they had, etc.
You forgot...let's get their /. username also

--
sig d1337ed
Re:False positives still a problem by shawb · 2006-07-10 10:27 · Score: 1

Slashdot would NEVER post a story about the sorts of sick, twisted individuals that perpetrate such sleazy tactics for profit.

(N.B: Okay, yeah, there's a difference between spyware and spam... I'd think that spyware is the worse of the two evils, though.)

--
I'll never make that mistake again, reading the experts' opinions. - Feynman
Re:False positives still a problem by techno-vampire · 2006-07-10 10:45 · Score: 1

One wonders what sort of people have so little moral fiber that they study spam-blockers and create new methods for getting around it.

Simple: people who see the profit in it and don't care what people think of them. Who cares if there's a .001% reply rate when you send out tens of millions of spam per day? As long as there's a way to get money out of people with spam, there will be spam, and there will be people looking for ways to get around sny filtering program or algorythm designed.

--
Good, inexpensive web hosting
Re:False positives still a problem by stonecypher · 2006-07-10 12:49 · Score: 1

Er, I think it's just people who don't think spam is a big deal and are amused by the several million dollars a year of revenue it generates. You act like they're organ-leggers.

--
StoneCypher is Full of BS

The difference? by MoeMoe · 2006-07-10 09:14 · Score: 2, Insightful

Not that I'm arguing that it's the same, rather I'd like to know:

What seperates this from a Bayesian filter?

--
Business \Busi"ness\, n.;
A scam in which all people involved perceive as beneficial...

Re:The difference? by DragonWriter · 2006-07-10 09:17 · Score: 4, Insightful

What seperates this from a Bayesian filter?
If nothing else, it has new, improve buzzwords. "Artificial immune system" is so much more evocative than "Bayesian filter".

Great.... by (pvb)charon · 2006-07-10 09:17 · Score: 4, Funny

Ever heard of hay fever? Allergies? Think, people, think! charon

Re:Great.... by Dannon · 2006-07-10 09:51 · Score: 3, Funny

Thanks, now I have the mental image of a spam filter with sinus problems. Ewwww...

--
Good judgment comes from experience.
Experience comes from bad judgment.
Re:Great.... by megaditto · 2006-07-10 10:40 · Score: 1

Good point, what the authors are doing is probably trying to score some NSF funds or something.

Arthritis, AIDS, tuberculosis, Leukemia, lupus, endometriosis, etc. Deadlier cousins of the failures of the immune system you mentioned.

What they should be modelling the next-gen spam filters on are intracellular def. mechanisms, RNAi, si/shRNA, nuclear translocation tags, etc. Which is what blacklists, senderid, etc. are copying anyways.

--
Obama likes poor people so much, he wants to make more of them.
Re:Great.... by pyrote · 2006-07-10 18:06 · Score: 1

Ever heard of hay fever? Allergies?

Greeeat I can see it now...
Doctor: Do you have any allergies to medication?
you: No, But my computer has developed an allergy to Viagra, Cialis, and is also sensitive to weight loss pills. not to mention the keyboard seems to have grown several inches in length.

--
THE WORLD IS GOING TO END!!!! eventually.
Re:Great.... by (pvb)charon · 2006-07-10 20:00 · Score: 1

And you don't want to know what your mails look like after they've gone through that.
charon

Fancy by roman_mir · 2006-07-10 09:23 · Score: 4, Insightful

It looks fancy but when you get down to it, all it means is that there are a number of heuristics that are combined into filters (this happens by user training.) The filters are 'weighted' and filters that are not used often enough are 'culled' (killed off.) I don't think this will be significantly better than any other Bayesian-type spam systems.

--
You can't handle the truth.

Not much by jfengel · 2006-07-10 09:28 · Score: 5, Informative

Ultimately, very little. At core, they're probably identical techniques, and if I were reviewing this as a scientific paper I'd ding them for not answering exactly that question. There are such strong parallels between the two (train them on known data, add up probabilities, cut stuff on a threshold) that I strongly suspect that they're identical.

There are useful things to be gained from a change of metaphor. For example, one difference between this and most bayesian spam filter implementations is that this explicitly incorporates a decay function. That could be useful, if a word that used to be common in spam no longer is (e.g. if I actually decided to buy a Rolex, it's no longer a strong spam indicator, whereas right now any email mentionining "Rolex" is 99.9999% certain to be spam).

You could easily modify a Bayesian filter to have time-decaying weights, but if the change in metaphor leads somebody to come up with a good insight, then perhaps this is useful. Mathematically, though, the equations look very similar.

Re:Not much by adrianbaugh · 2006-07-10 09:43 · Score: 4, Interesting

Perhaps a neat way to extend this idea would be to have the filter scan your outgoing mail, too; not to search for spam as such, but to look for changes in behaviour. Then, supposing you emailed sales@igottagetmearolex.com enquiring the price of a Rolex, the filter could modify the spam and ham probabilities of rolex. I suppose it would have to be clever enough to ignore emails sent to abuse@ addresses reporting spam and attaching the spam message, among other things I can't be bothered to think of now, but it's an idea that comes more readily from the immune system metaphor than the pure probability metaphor.

--
"'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
- JRR Tolkien.
Re:Not much by Anonymous Coward · 2006-07-10 12:17 · Score: 0

Take a look at the Markovian filters, CRM114, at crm114.sourceforge.net. It's a more interesting approach and highly trainable. Unfortunately, it's not well bundled and has a lot of rough edges, but it seems to be faster to run and more effective than SpamAssassin.
Re:Not much by stonecypher · 2006-07-10 12:51 · Score: 1

The reason you're not working for a scientific paper is that you guess about techniques being identical and pooh pooh them based on said guesses. A bayesian filter is a very specific mathematical technique. This isn't actually very similar at all, other than that it's being used towards the same end.

Perhaps in the future you could know something about two algorithms before declaring them identical. Just a thought.

--
StoneCypher is Full of BS
Re:Not much by Anonymous Coward · 2006-07-10 18:57 · Score: 1, Informative

Ah, commenting on things we don't know like we are an expert.... just like telling the doctor that a cold and flu must be the same because they have similar symptoms....

Here, let me clarify the differences for you. The primary difference is in the nature of the tokens used to classify a message. The Bayesian system has words/tokens that are either predefined by a human or taken from messages verbatim. The artificial immune system has tokens that are randomly and automatically generated by the system using some method. In this case, the method is to build a regular expression from a pre-defined list of stubs. Random generation of genes (tokens) is one of the basic tenents of AIS. This random generation has some positives (generating antibodies for unseen pathogens) and negatives (generating antibodies that attack self). In comparison, a Bayesian classifier tokens are all pre-existing: either from human input or from the message. The AIS is a more dynamic and adaptive approach because it can contain tokens that match unseen values.

Secondly, you're confusing the intended use of the collection of statistics. Both have a similar method for collecting statistics (mark how many spam and non-spam messages the token matches), but for entirely different purposes. For a Bayesian classifier, the statistics are used to assign a probability that the word indicates a spam message. The overall spam weight for the message is the sum of the probabilities of each individual word (and some also use the position of the word in the message to affect the probability as well). For AIS, the statistics are used to measure the worthiness of the detector. If a detector is detecting too many normal messages, it is not a good detector (think auto-immune diseases in humans) and should be given very little credibility. An AIS usually includes a very vital stage called negative selection where detectors that react to "self" (non-spam messages) are eliminated. This work seems to not include a full negative selection algorithm, where the bad detector is thrown out and replaced with a new one, and instead does something more akin to Bayesian classifiers, ie sum probabilities and update the statistics based on live data (classic AIS has has static detectors once they leave the training phase). They also seem to rely on chance to create memory cells rather than an affinity maturation process using genetic algorithms or other techniques. But the training phase is still intended to judge the worthiness of the candidate detectors in their AIS. They just don't follow through completely on this AIS technique.

I would say this work is a very simplified AIS as it does not include all of the hallmarks of other AIS. As such, it does have some superficial resemblance to a Bayesian classifier, particularly the modified classifier algorithms used in the spam tools. But it would not take much modification include more AIS features into their system to make it more distinct from (and hopefully more accurate than) Bayesian classifiers. It's unclear if the authors skipped the more advanced features due to desire for low computational costs. It is however distinct from classifiers in token generation at the very least and could be converted to a more traditional AIS with a little work.

Real spam solution by Dryanta · 2006-07-10 09:30 · Score: 3, Interesting

Spam and content filtering will always be a struggle for anybody who actually utilizes email. Simply adding more logic will not solve the problem. Reporting spammers to every rbl list you can think of, and alerting forums and newsgroups of abusive ip blocks on the other hand is already doing quite nicely.

Re:Real spam solution by techno-vampire · 2006-07-10 10:51 · Score: 1

Reporting spammers to every rbl list you can think of...

Sure, for those of us with the time, knowledge and inclination to do it. Expecting Aunt Minnie to do it is unreasonable. All she cares about is keeping spam out of her inbox, and if running something like this, or SpamAssasin at the server gets rid of most of it, isn't that all she can reasonable ask for?

--
Good, inexpensive web hosting
Re:Real spam solution by tdelaney · 2006-07-10 11:17 · Score: 1

token spamprob #ham #spam 'utilizes' 0.992422 1 6140

I gave up by Scratch-O-Matic · 2006-07-10 09:30 · Score: 4, Interesting

I recently gave up on tweaking filters for myself and a few dozen people whose accounts I administer. I wrote a little script that asks for confirmation from the sender...if the sender confirms, they are added to a whitelist and will go straight through after that. I can also add addresses manually to the whitelist, and will soon be able to have wildcard (domain-wide) approved addresses. I've gotten exactly two spam in 6 weeks...both were confirmed by either a person or an autoresponder. Five years ago I never would have wanted such a blunt system...nowadays it's just the ticket.

--

Evil is the money of root.

Re:I gave up by babaloo · 2006-07-10 10:09 · Score: 5, Interesting

I understand your frustration but I was the victim of a Joe Job attack and systems like you describe just add to the pain of the victim. I feel that these types of responses are just as unwelcome as spam and I report them as such. Have you had any issues like this?
Re:I gave up by CFrankBernard · 2006-07-10 10:28 · Score: 4, Interesting

I recommend joining the SPAM-L mailing list of 900+ email admins and ask for opinions on "challenge response" (C/R) spam fighting systems. Sending a confirmation message to the alleged/purported sending address *is* spam when it is spoofed/forged (quite common). The only way to ensure sending info back to the connecting email server is to do so /during/ the SMTP conversation.
Re:I gave up by rudedog · 2006-07-10 12:01 · Score: 4, Insightful

So it appears that you decided that the responsibility for fighting your spam should be moved onto the backs of everybody else on the Internet? Spam almost always comes from a forged sender. By doing this, you're just sending tons of spam to the forgery victims. Please do us and you a favor and google "challenge response harmful", and then turn off your C/R system.

More of the same; not a solution by mrheckman · 2006-07-10 09:33 · Score: 3, Interesting

The "immune system" solution is just another way to detect spam, but it is unlikely to be much more successful than existing methods. As someone else pointed out, SpamAssasin is pretty good already. So what if this new type of filter eventually improves the spam filtering accuracy from 98% to 99%? A more highly-polished rock is still a rock.

The real problem is the sending of spam itself, and that problem arrises from an inability to correctly attribute the spam to the spammers. If we can do that, we can block it, or at least better convict the spammers who violate the law. Things that solve this problem, like Yahoo!'s "DomainKeys", are the future of anti-spam, not more highly-polished rocks.

Re:More of the same; not a solution by Mean+Variance · 2006-07-10 09:56 · Score: 2, Interesting

Things that solve this problem, like Yahoo!'s "DomainKeys", are the future of anti-spam, not more highly-polished rocks.
Domain Keys, at least to this point is utter crap in my experience. I get these small floods of spam into my Yahoo! mailbox. What most of them have in common is they are certified by Domain Keys. A couple months ago, I was getting the exact same spam every day for some mortgage coming from different addresses. All were DK certified.
For what it's worth, I do send off those specific emails to the abuse alias at Yahoo! Their canned emails state that they have dealt with the problem according to their TOS.
I don't know where the flaw lies, but it's there in Domain Keys.
Re:More of the same; not a solution by mrbobjoe · 2006-07-10 10:54 · Score: 1

So what if this new type of filter eventually improves the spam filtering accuracy from 98% to 99%?
Halving the number of errors? Sounds like a good deal.
Re:More of the same; not a solution by mrheckman · 2006-07-10 11:04 · Score: 1

Halving the number of errors is good, but that wouldn't stop my problems with spam. My chief objection to spam now is that there are still too many false positives -- things that show up in the spam box that should not -- so I still have to look through all of the hundreds of spam messages that arrive every day to find the few that are misclassified. Even cutting the number of false positives in half won't solve that problem. If, however, we could eliminate most of the spam, then I would have many fewer false positives and many fewer real spam messages to have to look through to find the false positives (I don't think we will ever eliminate false positives. Some people just label as "spam" anything they don't like. In systems that depend on feedback from users, such as Yahoo mail, that means that one person's valuable message is another's spam, which results in messages from one source sometimes being labeled as spam and other times not.)

Also consider that the net bandwidth is flooded with spam, which slows things down for everyone. Improving the filtering at your mailbox doesn't help with this either.
Re:More of the same; not a solution by Antique+Geekmeister · 2006-07-10 12:47 · Score: 1

In fact, such keys are currently strong signs that the ad is, in fact, spam. They're far too easy to buy or steal from other people's machines, often by installing spam zombie software on the machines of unsuspecting and innocent people.

fgdfg by Anonymous Coward · 2006-07-10 09:34 · Score: 0

oh snapz terminator coming soon D:

Obligatory HIV & AIDS reference by ElliotLee · 2006-07-10 09:35 · Score: 1

Now your spam filter can catch AIDS too. But don't ask how.

Re:Obligatory HIV & AIDS reference by Anonymous Coward · 2006-07-10 09:52 · Score: 0

Sometimes you just need the honesty and security of a whore.
Re:Obligatory HIV & AIDS reference by Anonymous Coward · 2006-07-10 10:43 · Score: 0

Sometimes you just need the honesty and security of a whore.
Well, my spam filter has acheived the honesty and security of a politician. God willing, one day it will reach that of a lawyer. And some future generation may indeed be able to achieve the fabled honesty and security level of a whore, but I won't guarantee it. :-P
Re:Obligatory HIV & AIDS reference by stonecypher · 2006-07-10 14:16 · Score: 1

One supposes it's from the porn and penis enlarging cream, though one is led to wonder whether smacking the monkey for an iPod is a disease vector.

--
StoneCypher is Full of BS

I'm waiting... by darkrowan · 2006-07-10 09:35 · Score: 1

I'm waiting for the day when we see our first email 'virus'. Something not unlike what happens with real viruses. Then we'd need antibodies similar to this.

--
AccountKiller

Re:I'm waiting... by stonecypher · 2006-07-10 12:55 · Score: 1

Yeah, hi. 1992 is on the phone. They said you need to shut off the portal, because their power bill is stratospheric. (Either that, or this is the subtlest Keanu Reeves joke 3v4r.)

--
StoneCypher is Full of BS
Re:I'm waiting... by not-admin · 2006-07-10 13:17 · Score: 1

You won't have to wait long, nay, you won't have to wait at all.

There have been e-mail "virii" around for a long time, one of the most famous being the Bill Gates Quick Cash. Don't think that all viruses require an attachment.

From their paper, nothing. by khasim · 2006-07-10 09:35 · Score: 1

They claim to be as accurate as a Bayesian process, but with fewer check items.

But from their paper, it seems that they're "tuning" their check items to the corpus of spam that they're testing against.

So of course they will use fewer check items. There are a finite number of characteristics of that corpus.

I did not see where they were using their system in a Real World environment (I may have missed it, the article was pretty painful to read). Now, if they can do as good as a fully tuned SpamAssassin system (comparable true positives, true negatives, false positives and false negatives), in a Real World environment, with fewer check items, then they MAY be on to something.

Useless -- solves a non-problem (performance) by CurtMonash · 2006-07-10 09:37 · Score: 2, Interesting

I have two major objections to this idea, and to the article that presents it.

1. The ONLY problem this solves is performance -- i.e., processing throughput. And that's not what's wrong with anti-spam systems today. They live and die on the precision/accuracy tradeoff, or maybe on UI.

2. The authors seem to assume that Bayesian systems work really, really well. While technically most or all current spam-filtering products are Bayesian in some sense, that still speaks of considerable naivete about real-world spam.

--
To err is human. To forgive is good system design.

The easiest way to eliminate most spam ..... by travisco_nabisco · 2006-07-10 09:38 · Score: 2, Insightful

I just had a thought while reading about the spam filters about spelling. So I went and looked in my spam folder and found that every piece of spam has many, many words that are not in a dictionary, ie not spelled correctly.

Why not run a script that filters messages based on spelling? If there are more than 'xx' many words that do not exist in the dictionary you choose to use, then the message gets sent to the spam folder. This would catch the odd e-mail from friends who don't know how to spell or what a spell checker is, but then when you clean out your spam folder you should notice it.

Re:The easiest way to eliminate most spam ..... by Cisko+Kid · 2006-07-10 10:03 · Score: 0

Because I suck at spelling and many people I know suck at spelling. Hoked on fonix werked fer me.

Seriously, the spammers will adapt no matter when anti-spam tactics you use.

--
I may not have gone where I intended to go, but I think I have ended up where I needed to be.- Douglas Adams
Re:The easiest way to eliminate most spam ..... by Senzei · 2006-07-10 10:11 · Score: 1

Generally techniques like that are not used because false positives are much more disasterous than false negatives. Accidentally allowing a couple of spam messages to creep into the regular mail is not so big of a deal; deleting a reply asking for a job interview because it was miscategorized is. Most spam detection systems have to walk a fine line between doing their job and not hosing somebody's mail. That said the systems could be set up so that misspellings add weight towards the decision to categorize as spam.

--
Slashdot: Where anecdotes and generalizations can be freely substituted for facts, logic, or intelligence
Re:The easiest way to eliminate most spam ..... by CFrankBernard · 2006-07-10 10:34 · Score: 1

To avoid false positives, I recommend using a regex generator for spamvertized variations of common spam terms.
See http://public.kvalley.com/regex/regex.asp
Fore example, to allow viagra but detect most of its spamvertized variations:
(?!viagra)(([v])|(\\\W{0,2}\/))[i1l\|\\\/!îíìï:;]( ([a@àáâãäå^æ])|(\/\W{0,2}\\))[gqp96][r](([a@àáâãäå ^æ])|(\/\W{0,2}\\))
Re:The easiest way to eliminate most spam ..... by cyber-dragon.net · 2006-07-10 11:14 · Score: 1

An interesting idea... but you would need to allow for multiple dictionaries. I commonly get e-mail in english, american, french and japanese every day. And before anyone flames me I -do- make a distinction between english and american. They are spelled and pronounced differently so when discussing dictionaries they ARE different.

As another responder pointed out... perhaps this could be used in some form of "weight" calculation. I would think counting special characters and individual characters ( barring I and A ) would hold just as much "weight" however.

The proposed system I liked better was requiring mail servers to be "registered" and any email being received would check it's claimed registration against the IP it came from. Thus any email being sent via bot from a dsl line is automatically thrown out. If it is "legit" spam you have a record in the header of who sent it and can track them down.
Re:The easiest way to eliminate most spam ..... by dhasenan · 2006-07-10 11:15 · Score: 2, Insightful

Do you actually WANT to interview a job applicant who can't spell 20 words in a 150-word email?
Re:The easiest way to eliminate most spam ..... by Senzei · 2006-07-11 04:21 · Score: 1

I was actually talking about getting a callback on a resume for an interview, but the point may still hold there as well.

--
Slashdot: Where anecdotes and generalizations can be freely substituted for facts, logic, or intelligence
Re:The easiest way to eliminate most spam ..... by iamcf13 · 2006-07-12 13:32 · Score: 1

Seriously, the spammers will adapt no matter when anti-spam tactics you use.

They 'cannot' beat the filtering I use now...

Not long ago, I added a form of rbl support to a personal copy of My homebrew Windows email client freebie and the results were 'amazing'....

Essentially NO spam gets through now!

Recently, one got through so I spent a few minutes to take care of it.

The only drawback to using a rbl is that it can be inaccurate if an innocent party starts using a blacklisted IP. But in the real world due to laziness, inertia, and corporate indifference, that is quite unlikely.... :P

The 2nd half of my approach uses a few rules that simply take away the ASCII characters a spammer is likely to use in their message. I patently refuse such email at iamcf13@hotpop.com so they get deleted immediately.

All the spammers are doing is wasting the small amount of time and computing resources it takes me to get my email with this 'updated' program. But usually I am doing something else of importance at the time so presumably no time is wasted at all...

My approach is 'transparent' with the current email system and could be useful -- nowadays there is talk of replacing the current, spammed out system for something else -- a likely far remote possibility....

Food for thought.... :)

P.S. Shout/mod me down if you want but you have to admit, Baysean spam filtering is just not working anymore. Challenge/response is cumbersome, considered 'bad manners' by some, and can generate more unwanted email messages. How about giving a different approach such as mine a try?....

Modelling Nature by A+Dafa+Disciple · 2006-07-10 09:39 · Score: 3, Interesting

Your post advocates a

(x) technical ( ) legislative ( ) market-based ( ) vigilante

approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)

( ) Spammers can easily use it to harvest email addresses
( ) Mailing lists and other legitimate email uses would be affected
( ) No one will be able to find the guy or collect the money
( ) It is defenseless against brute force attacks
( ) It will stop spam for two weeks and then we'll be stuck with it
(x) An enormous amount of spam will initially go undetected before your idea is effective
( ) Users of email will not put up with it
( ) Microsoft will not put up with it
( ) The police will not put up with it
(x) Your idea proposes a solution that only large corporations could deploy
( ) Requires too much cooperation from spammers
( ) Requires immediate total cooperation from everybody at once
( ) Many email users cannot afford to lose business or alienate potential employers
( ) Spammers don't care about invalid addresses in their lists
( ) Anyone could anonymously destroy anyone else's career or business

Specifically, your plan fails to account for

( ) Laws expressly prohibiting it
( ) Lack of centrally controlling authority for email
( ) Open relays in foreign countries
( ) Ease of searching tiny alphanumeric address space of all email addresses
( ) Asshats
( ) Jurisdictional problems
( ) Unpopularity of weird new taxes
( ) Public reluctance to accept weird new forms of money
( ) Huge existing software investment in SMTP
( ) Susceptibility of protocols other than SMTP to attack
( ) Willingness of users to install OS patches received by email
( ) Armies of worm riddled broadband-connected Windows boxes
( ) Eternal arms race involved in all filtering approaches
( ) Extreme profitability of spam
( ) Joe jobs and/or identity theft
( ) Technically illiterate politicians
( ) Extreme stupidity on the part of people who do business with spammers
( ) Dishonesty on the part of spammers themselves
( ) Bandwidth costs that are unaffected by client filtering
(x) The large amount of resources needed for implementation of your idea that small companies don't have
( ) Outlook

and the following philosophical objections may also apply:

( ) Ideas similar to yours are easy to come up with, yet none have ever been shown practical
( ) Any scheme based on opt-out is unacceptable
( ) SMTP headers should not be the subject of legislation
( ) Blacklists suck
( ) Whitelists suck
( ) We should be able to talk about Viagra without being censored
(x) Your solution is nothing more than a conceptual remanifestation of a solution that already exists
( ) Countermeasures should not involve wire fraud or credit card fraud
( ) Countermeasures should not involve sabotage of public networks
( ) Countermeasures must work if phased in gradually
( ) Sending email should be free
( ) Why should we have to trust you and your servers?
( ) Incompatiblity with open source or open source licenses
( ) Feel-good measures do nothing to solve the problem
( ) Temporary/one-time email addresses are cumbersome
( ) I don't want the government reading my email
( ) Killing them that way is not slow and painful enough

Furthermore, this is what I think about you:

(x) I think it is a creative concept, but there is no need to reinvent the wheel.
( ) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your house down!

--

Falun Dafa is good!

Re:Modelling Nature by Anonymous Coward · 2006-07-10 10:12 · Score: 0

In what way is the parent post attempting to troll? (i.e. illicit discussion to take the conversation offtopic or engage in flamewars, etc.) Get your facts straight you moderating morons!

BTW, so you don't screw this up, this post should be modded (-1 Offtopic).
Re:Modelling Nature by Anonymous Coward · 2006-07-10 10:39 · Score: 0

The post is technically on topic, so offtopic would be dead wrong.

This post does, however, fit into the standard definition of crapflooding which is considered by most people to be a form of trolling.

I would have also accepted a moderation of Redundant, as you get a couple of these every single time spam is mentioned in a slashdot article.

An attempt to "engage in a flamewar" would be classified more directly as "Flamebait" which is, in itself, a form of trolling.
Re:Modelling Nature by mrheckman · 2006-07-10 10:54 · Score: 1

Furthermore, this is what I think about you:

(x) Brilliant!
( ) I think it is a creative concept, but there is no need to reinvent the wheel.
( ) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your house down!
Re:Modelling Nature by Anonymous Coward · 2006-07-10 11:42 · Score: 0

> (x) Brilliant!

Paula, is that you?
Re:Modelling Nature by Anonymous Coward · 2006-07-10 11:47 · Score: 0

The first AC's reply didn't say that the parent post was offtopic, it said that it, itself, was offtopic, a misunderstanding on your part I think (not that it matters). I also think that all think the original post in question isn't redundant because the scope of a redundancy should be limited to the article it was posted as a reply to. No one yet had posted one of those types of messages for this article.

Still addressing the symptom, not the root by Lead+Butthead · 2006-07-10 09:40 · Score: 1

Inflict heavy fine on people buying spamvertised products and execute spammers. Only then can spam be stopped for good.

--
ELOI, ELOI, LAMA SABACHTHANI!?

SpamAssassin does "decay" them. by khasim · 2006-07-10 09:40 · Score: 2, Informative

Look up "bayes_expiry_max_db_size". If your database gets larger than the limit you set then the lesser used tokens are deleted.

Abysmal results by gvc · 2006-07-10 09:40 · Score: 4, Interesting

More specifically, it correctly classifies 84% of spam and 98% of non-spam.

The authors used the SpamAssassin corpus. Holden shows that, on the Spamassasin corpus, Bogofilter correctly classifies 90.3% of spam and 99.88% of non-spam. See http://sam.holden.id.au/writings/spam2/

This approach is nowhere near state of the art.

Death is too good for them by hellfire · 2006-07-10 09:43 · Score: 1

Any good programmer worth their salt would have programmed this to cut out their tongue, cut off their fingers one by one, slice off their eyelids and force them to watch "Biodome" 5 times in succession.

I want those fuckers to live painfully damnit, just like the rest of us do when we have too much spam.

--

"All great wisdom is contained in .signature files"

Re:Death is too good for them by RsG · 2006-07-10 09:48 · Score: 1

Damnit that goes too far! You're a cruel human being. I wouldn't subject a dog to that level of torture, much less a human.

In the name of human rights, they should not be forced to watch Biodome any more than twice! :-P

--
Erotic is when you use a feather. Exotic is when you use the whole chicken.

Clone by Anonymous Coward · 2006-07-10 09:46 · Score: 0

Sounds like a genetically modified clone of Bayes :-)

Sounds cool, but... by fm6 · 2006-07-10 09:47 · Score: 1

Has anybody stopped to think that the human immune system is a little less than perfect? It doesn't stop all diseases, not by a long shot. And sometimes it creates illness, as anybody with Hay Fever — or Multiple Sclerosis — will testify.

TOANTFOITOWTBS by spun · 2006-07-10 09:49 · Score: 1

Take off and nuke them from orbit. It's the only way to be sure.

--
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton

Re:TOANTFOITOWTBS by Antique+Geekmeister · 2006-07-10 12:55 · Score: 1

That's what finally knocked Cyberpromo off the air: not the lawsuits from other abused companies, not the out-of-court settlements they made with AOL and other victims of their spam, not the incensed public, but a bunch of irritated script kiddies who knocked down the router connection sold to them by Agis and kept it off the air.

Eventually the peasants will revolt.

no more biological metaphors.... by illuminatedwax · 2006-07-10 09:53 · Score: 5, Insightful

I'm seriously sick of people abusing biological methodolgies. People seem very attracted to ideas simply because they are grounded in "how nature works" and ignore the mathematical benefits or weaknesses. Now this idea pretty much just sounds like statistical rules based on a corpus - pretty much how every successful solution out there now works. This solution simply prunes rules that aren't being used, but there are better ways to get a smaller spam detection database. Have you seen the stuff the CRM114 people are doing? This is nothing new.

Read your Russell and Norvig, people. Airplane research didn't get off the ground (ugh) until we stopped trying to mimic birds and study physical principles of flight.

--
Did you ever notice that *nix doesn't even cover Linux?

Re:no more biological metaphors.... by EMiniShark · 2006-07-10 11:13 · Score: 1

Read your Holland and Koza. Evolutionary computing (and others: Neural nets, Cellular Automata, ...) have a wide array of successful applications. Dismissing this just work because it's biologically inspired is inappropriate and counter-productive to science.

And just so you know, the AIS community is absolutely not ignoring fundamental questions of complexity and mathematical weaknesses. I met one of the authors at ICARIS 05, and her presentation of this work was cautious, qualified, and thorough.

It's reasonable to argue that the spam filtering mechanics of this work aren't novel. But to attack the practice of biologically-inspired computing because of this paper is just overreacting.
Re:no more biological metaphors.... by illuminatedwax · 2006-07-10 12:54 · Score: 1

There are a million other biological ideas we could borrow, and other biological ideas we could borrow in radically different ways, but we don't because they don't work. Those ideas that do work may have been inspired by biological phenomena, but other than that they do little better than provide a good analogy. In this case, they aren't doing anything different and it is only considered interesting because they thought of a good analogy for it. Nothing works because it is based on biological phenomena. I realize that much of AI research is well grounded in mathematical theory (heck, my master's advisor does stuff with COLT and the like), but many students, Slashdotters, and a few researchers still have a romantic kind of view of "intelligence" and "living computers" or whatever. So basically my comment was directed more at Slashdot than the research group. Hell, I didn't RTFA, so they could be doing Serious Research.

The flip side to what you are saying of course is that accepting a work just because it is biologically inspired is also inappropriate and counter-productive to science.

--
Did you ever notice that *nix doesn't even cover Linux?
Re:no more biological metaphors.... by Illserve · 2006-07-10 13:14 · Score: 1

Normally I would agree with you. A great deal of crappy research gets hyped up because of an inappropriate analogy to biology... but this isn't one of them.

Stopping "spam" is almost exactly the problem that our immune system has to deal with. It has to go through reams of data (i.e. every cell in your body) and figure out what is junk and what isn't, and it does this by learning through exposure positive and negative examples. It's not perfect either, sometimes it goes berzerk, producing false positives (autoimmune disorders).

There's a great deal to be learned from our immune system for the sake of solving the spam problem. Don't be so quick to dismiss it.... this time.
Re:no more biological metaphors.... by illuminatedwax · 2006-07-10 15:53 · Score: 1

Excellent analogy, but that's all there is. It might be inspiring, but this time the idea wasn't originally inspired by biology. These methods of filtering spam have been around for a long time.

In any case, the basic idea is simple: use a corpus of examples separated into classes to create an algorithm to decide if a new example is in a certain category. There are million AI techniques to do this. What differs in each case are the details of what each part means.

The immune system analogy is flawed in its details anyway. For example, in the human body, antibodies are created more like a genetic algorithm: there are a few families of them and they recombine randomly and float around the body. (Those harmful to the body are never let out.) Those that find a matching host invader protien are then made to reproduce. Should we implement a similar strategy for spam? Probably not, or at least my intuition says that this method does not work as well with spam as some of the very very successful strategies that we have now, especially most of these converge very quickly. GA-style seems like it would take a long time.

--
Did you ever notice that *nix doesn't even cover Linux?
Re:no more biological metaphors.... by agquarx · 2006-07-14 02:10 · Score: 1

By nature of things and how our mind relates to them by symbolic computation it is natural for people to use meta languages (and higher levels of reflexion) based in the affluent nature of things that are. This comment was provoked by your silly (do not take it as personal offense, please) sentence with an inflamatory derivation mentioning the process of development of machines capable of controlled flight wytch are posessing a feature to execute that capability without the need to decrease their density and or mass (too much phenetylamines in my bloodstream...) as a condition for practical validity of their isness in context of the article commented. This has nothing to do with extremely important phenomena of immune systems, keeping alive (defined as being online (running) and able to communicate or kick the bucket and throw the sponge and actually hit someone with aforementioned objects) not only you and me, dear Reader (I love you too, Google), but every metastable system (oh, say the Atlantic Ocean as a living entity, generalizing that to Ea (Earth) itself; AI wouldn't be surprised if our Solar System had an immune system or as a hole in a whole, all of that wytch is (The Universe is a she, BTW ;-} - her name is Luka (Suzanne Vega told me so)). As a mad scientist I am convinced that the problem can be solved by applied memetics and enough smoke and mirrors collaborating in life-within (if you find yourself confused while trying to think that out of your enchanted mind, AI have something simple for that affliction - count (the operator on the left side can be skipped as most of Readers are rather unable to forget the correct and amusing number, go figure...) how many fingers you have on one hand and how many hands you need to create an applause for the most of the pictures of naked beauty of Luka taken with humble telescopes...My iPod is reminding us we are recursing and repeating ourselves by randomly stopping generation of { this } message using a song "International Dateline" by Ladytron from album "Witching Hour". It is on topic, read the lyrics if you do not believe Alien Intelligence). .A., Imperial Space Command.

--
I would like to meet you // In a timeless, placeless place // Somewhere out of cont

nice amazon refferer link by Anonymous Coward · 2006-07-10 09:55 · Score: 0

lol
an Amazon spammer talking about spam
if you want to paste links to help people try them without sticking your stupid Amazon refferer code in there

Re:nice amazon refferer link by Anonymous Coward · 2006-07-10 10:22 · Score: 0

Hey Amazon troll... you realize that the link automatically puts in the referrer code when somebody who happens to be logged in searches for a title and then finds it? STFU, it's not people actually trying to get... whatever the fuck you think it is that Amazon gives them. People put the links in because it's a good website to find decent details on books/movies/music etc.

A new range of spam by Wierdy1024 · 2006-07-10 10:13 · Score: 1

Has anyone come across the newer spam ideas, where the spam message looks so much like a real message, I can sometimes have to spend a good few minutes looking at it to see if it's genuine - they use your nickname - eg. "Dear Bob", and end with the name of someone you know. They are usually about mundane things (eg. "do you want to come to a party on saturday?"), and the emails make good sense and have a suitable subject line. The only giveaway is that they all have a tinyURL link to the actual spam site - but how can I tell if a spammer is using tinyURL of if a friend of mine is using tinyurl? The annoying thing is each email has a unique tinyurl, so by clicking on the link they know it's an active address - and I made the mistake of clicking on the first oine I got.

One thing that concerns me is how certain fields are filled in, for example my nickname and a friends name at the bottom. Also, it seems to sometimes use my geographic location (nearest city - presumably from IP location) - eg. "Meet tomorrow in London, UK." I suspect the fields are filled in by some spyware on the pc reading previous emails and analysing them - All these emails appear on my vmware spyware/virus test machine. It's also possible the fields could be filled in by a hack of someone elses mailbox (mail server or PC), because as soon as they've got a mailbox full of email (including headers), they can auto-analyse it to find out nicknames etc. fairly reliably with a decent amount of mail.

Re:A new range of spam by gvc · 2006-07-10 10:32 · Score: 1

No I haven't. Unless you think I can't tell what's below from correspondence from somebody I know.

--

Hello .

I think we had correspondence a long time ago if it was not you I am sorry.
If it was I could not answer you because my Mozilla mail manager was down for a
long time and I could not fix it only with my friend's help I got the emails
address out for me ..:)
I hope it was whom we were corresponded with you are still interested, as I am,
though I realize much time has passed since then...
I really don't know where to start ....
Maybe you could tell me a little about yourself since I lost our early letters,
your appearance,age , hobbies, and are you still in the search?
If it was you I wrote to and you are interested to get to know me better, I have
a profile at :
http://www.im-waiting-4you.net/

Don't really know what else to say for now I hope this is the right address

Let me know if you are interested, And I hope
you won't run when you see my picture :-)

talk to you soon.....

Galinka
Re:A new range of spam by Antique+Geekmeister · 2006-07-10 12:20 · Score: 1

Many emails like that do not actually contain an ad or commercial message: they're email address probes, being sent by the million to gather email addresses, and often with a webbug (a one-pixel GIF in a URL) to track exactly which email address's HTML-reading client received the message.

Those valid email addresses are themselves highly saleable to spam companies, whether the company is even vaguely legitimate or not.

This article was published in 2004 by Anonymous Coward · 2006-07-10 10:29 · Score: 0

How is this even close to news?!

The first paragraph of TFA, even above the abstract:

"This article was published in Crossroads Magazine, November 2004 edition. It was supposed to be on their website, but since it no longer seems to be available, I have provided this copy for reference."

No wonder it's not even near the "state of the art", maybe it was.. back then.

/ AC

sounds like something he would say by bersl2 · 2006-07-10 10:34 · Score: 1

from the lymp0cty3z-narf-poit!-claire-said-the-laundry-whee l dept.

Pinky, if I could reach you I would hurt you.

"News" from 2004? by 44BSD · 2006-07-10 11:28 · Score: 1

Come on, guys.

^^ Mod Parent Up! by InakaBoyJoe · 2006-07-10 11:39 · Score: 1

Mod parent up. That was an awesome post.

And kind of ironic that the author slipped in some unsolicited politically motivated PR on the Falun Gong as part of his/her message.

Re:^^ Mod Parent Up! by Anonymous Coward · 2006-07-10 12:18 · Score: 0

That's not ironic.

In any event, if the poster practices Falun Dafa/Gong then that is their business, and if they would like to tell everyone that it isn't an evil cult as the Chinese authorities propagandize it to be and that it, instead, is a beneficial spiritual practice, then that is that poster's perogative. Unlike China, the United States is a free country and posting that in a forum isn't going to result in the poster getting killed or tortured.

Besides, that quip wasn't part of the message, that was part of the poster's signature.
Re:^^ Mod Parent Up! by Anonymous Coward · 2006-07-10 12:53 · Score: 0

It's a pretty old canned joke. Google it.

Are we still doing this? by Anonymous Coward · 2006-07-10 12:21 · Score: 1, Insightful

Are we still on the message-filtering bandwagon? I know it was all the rage when we talked about it in 2000, but now it's 2006, and we've all had experience with it. Pattern-matching has been defeated, and it was an embarassing defeat. This is usually a sign to those who proposed it that they should consider a career change. With the exception of those patterns that correspond to firewall rules blocking domains run by companies with names like "Megaultra Webcram Holdings, Inc", it's a dead issue.

The real issue I have is with those researchers and businesses that to continue to push this cyber snakeoil. It's getting to the point that e-mail is worthless, not because of the high volume of spam, but because easy-confused pattern-matching blockers remove just enough messages to cause major problems for the rest of us. Here is why it's stupid, and should be stopped:

* While contaminated pattern-matching filters don't always block wanted messages, they remove just enough messages to cause doubt and frustration with my users, and those on the other end of the loop. This leads to network administrator (me) having to individually resolve each problem by sifting through the logs.

* Because the matched-messages are removed on the far end of the transaction, i.e. on the "client side", there's no indication of trouble, or even an error message (to the user or in the logs). Neither party understands where the message has gone, and this reinforces superstition. For years, I whined, teased and scolded to get the attention of the morons who were going gung-ho with client-end filtering for spam and viruses, but they just wouldn't listen.

* ISPs and other service providers have deployed these infernal filters everywhere, making a huge mess which I cannot resolve. It is next to impossible to politely explain the problem is theirs, without having their attention tossed amid a sea of techie jargon. They usually come away with the message, "it is your fault, not ours". I'm fed up dealing with the hostile confrontations that result.

I have a sneaking suspicion that the same morons who thought spam/virus filtering based on pattern-matching the 'From' line was brilliant are the same idiots responsible for the current crop of "security" dud-ware. Do I sound hostile? I am, and these charlatans can go shove it. At this point, I think only the "homeopathic remedy" market has more frauds than the computer industry.

I'm sorry, no matter how graceful the descriptions or the analogies, I will no longer accept content-based pattern-matching filters on e-mail. They have been proven horribly ineffective. Spam-filtering isn't rocket science, okay? First you block any SMTP traffic without a zone pointer, then block large chunks of addresses from underdeveloped countries based on message header sampling. From there, build up a list of UK, US, and Canadian spam-pushers based on their domain registrations. You'll eliminate most of it, and unless you communicate extensively with people in China, Bolivia, Russia or Brazil, you won't have to do much tuning.

This is all incredibly stupid anyway. The solution to the spam problem is not a technological one, or a political one. It's an economic problem. The powers that be chose - in their infinite wisdom - to allocate huge blocks of addresses to largely underdeveloped nations based on populace, instead of demand. Most of these people don't have a network device, and won't have one in the foreseeable future. The value of these addresses is so ridiculously deflated, that they're worth close to nothing. Spammers have massive chunks of address space, and can cycle through millions of IPs before all of them are at risk of being blocked. Want it to stop? Charge a reasonable rate to pass the traffic through your country's network backbone.

Re:Are we still doing this? by illuminatedwax · 2006-07-10 16:01 · Score: 1

My Gmail account has a success rate of about 2/1000 or 99.8% success rate. My Thunderbird email has a similar success rate. Speak for yourself, buddy, statistical filtering works.

--
Did you ever notice that *nix doesn't even cover Linux?
Re:Are we still doing this? by Anonymous Coward · 2006-07-10 16:39 · Score: 0

Actually, I'm speaking for myself and a couple hundred users... and a couple thousand recipients. I don't know about GMail, but the top of my sh*tlist is populated by Hotmail, Yahoo Mail, Thunderbird and Outlook Express. Either Hotmail or T-Bird is the worst, I can't decide which. It only took a little while for a properly "trained" T-Bird filter to get contaminated, and junk 17 legitimate (and highly important) messages inside of a workweek.

Connection blocking has given me much greater gains. It doesn't need special functionality from an e-mail client, it doesn't require user configuration, and it doesn't require "training" individual client installations every time the nature of the spam changes. By rejecting connections outright, it uses almost no network bandwidth, and no storage space on the server or the client machines.

Apparently, the type of spam you receive and your habits are not the same as those belonging to the typical shlub with a PC on his desk.
Re:Are we still doing this? by illuminatedwax · 2006-07-10 17:13 · Score: 1

OK, well, Hotmail blows. But that just means they are doing it wrong. (Plus, the conspiracy theorist in me says they don't want to filter spam.)

They aren't just doing pattern matching; it's more sophisticated than that. It is also adaptive. As Paul Graham said, you can defeat spammers this way because they rely on their message. Email clients can do whitelisting techniques to reduce or eliminate false negatives as well as other things. This can all be done behind the scenes, with user interaction limited to the initial training of spam and the discovery of false positives. We have the technology! No, filtering hasn't been defeated nearly as far as I can see.

It works for my mother and all of her employees perfectly. A few questions: How are you training your Thunderbird install? What do you mean "contaminated?" And why the hell would you *delete* filtered spam immediately? The idea is to save that spam for a while (30 days is good) in a "Trash" or "Recycle Bin"(patent pending) just in case one gets through. Someone notifies you that you aren't responding, you dig it up, classify it, and your filter gets better. But if you spend long enough with a spam filter, filtering it correctly, you will generally not get false negatives.

There is a price to smart filtering: you have to spend time with it to train it correctly. If you train it wrong, you've got a huge problem on your hands. I've said it elsewhere in the comments: look at CRM114 to see how good this kind of filtering has become, and how quickly you don't have to worry about it. But I personally have never lost an important, critical email with Thunderbird or Gmail. Neither have I heard a single complaint about spam from any Gmail user.

I do however, agree with you that ISPs should not be filtering your spam for you. That just gets annoying. But rejecting spam from IP addresses is an idea that can only go so far: like you said, spammers have huge swaths of IP addresses, sometimes ones that are used by legit emailers. IPv6 is coming, which means even more IP addresses for you to block. Personally, I think client-side filtering is quickly becoming the superior spam solution - look at that "SPAM solution checklist" that someone else posted.

Personally, it's been a long time since I worried about spam.

--
Did you ever notice that *nix doesn't even cover Linux?
Re:Are we still doing this? by Anonymous Coward · 2006-07-11 05:36 · Score: 0

It works for my mother and all of her employees perfectly. A few questions: How are you training your Thunderbird install? What do you mean "contaminated?"
When the filter analyzes the content based on the statistic frequency of certain words and phrases, it can be contaminated by training it on junk mail containing large lists of words meant to throw it off. Sometimes they are random, sometimes they contain collections of terms specific to industry. There's also the issue of random insertion of garbage characters into words. There are probably a hundred-thousand different variants of 'Viagra' that the filter couldn't possibly recognize. I couldn't even think of a regular expression to get them all. We've got v1agra, Via.jGra, Vi,aGra0, etc.

And why the hell would you *delete* filtered spam immediately? The idea is to save that spam for a while (30 days is good) in a "Trash" or "Recycle Bin"(patent pending) just in case one gets through.
Egads, I don't delete them! But they are buried in a huge list of junk.

Someone notifies you that you aren't responding, you dig it up, classify it, and your filter gets better. But if you spend long enough with a spam filter, filtering it correctly, you will generally not get false negatives.
The assumption here is that I can look over everyone's shoulder to make sure they're doing it right. I can't. The most efficient method would be to take a large sample, train the filter on that, and distribute the settings to clients. That's quite an undertaking.

The notification part doesn't really work that way. If there is no response, the sender will likely interpret it as recipient ignorage. Deadlines often mean that my users do not have the luxury of playing e-mail tennis with unreliable delivery mechanisms. I'm glad it works correctly for you, that's at least one less frustrated user, but I need e-mail that works like the Post Office (your local Postal Office quality may vary).
Re:Are we still doing this? by illuminatedwax · 2006-07-11 10:37 · Score: 1

When the filter analyzes the content based on the statistic frequency of certain words and phrases, it can be contaminated by training it on junk mail containing large lists of words meant to throw it off. Sometimes they are random, sometimes they contain collections of terms specific to industry. There's also the issue of random insertion of garbage characters into words. There are probably a hundred-thousand different variants of 'Viagra' that the filter couldn't possibly recognize. I couldn't even think of a regular expression to get them all. We've got v1agra, Via.jGra, Vi,aGra0, etc.

This just shows that you might be confused how this is done. There are words that they have to use, and the way that modern spam filters calculate "scores" would pick these words out, skipping over the injected random garbage because it appears in equal probability with normal conversations. Read Paul Graham's A Plan For Spam as he does a good job of explaining the basics behind filtering, and addresses this very issue. Also realize when you read this essay that anti-spam techniques have gotten even better.

It also sounded like you were deleting them since you were talking about them being sent to a netherworld from whence they could never return.

More than once I have heard "oh maybe my spam filters got it." This is a concept which I am convinced will carry into the common knowledge. Spam training is not difficult either, no more difficult than other complex computer concepts which are vital to common usage.

--
Did you ever notice that *nix doesn't even cover Linux?

Augment this "immune system" with some by ScrewMaster · 2006-07-10 12:36 · Score: 1

.45 caliber penicillin, applied directly to the spammer's kneecaps.

--
The higher the technology, the sharper that two-edged sword.

junk science by m874t232 · 2006-07-10 13:05 · Score: 1

The idea of applying immune system models to spam and computer virus detection is old. Nobody has so far demonstrated that it is any better than a sound statistical approach, and this paper fails to do so as well. It's junk science.

Immune System Attacking Spammers by cyberscan · 2006-07-10 13:46 · Score: 3, Interesting

Here is a better Idea: Blue Security was attacked and shut down because the Internet is septic. The germs (spammers) have taken over. The best way to win this is to take the profit out of spamming. This can be done in a similar manner in which the body's t cells alert the rest of an immune system on how to attack a pathogen. A cryptographically signed spammer complaint (attack) file should be distributed via a peer to peer network protocol. This file is sent amongst complaining programs that complain to a spammer's website each time a spam advertising said website is received.

Like an immune system, this network of spam attack programs will have a t-cell. The "t-cells" will be a small group of people who draw up the complaint instruction file. Whenever the pathogen (spammer) releases enough toxins (spam) into the body (Internet), the T-cells (people who write the complaint instruction file) alert the immune cells (spam complaint program) of the presence of the pathogen and how to attack (complain to website advertised) it. The pathogen is overwhelmed with a quick immuno responce (high bandwidth usage resulting from many, many complaints).

When the cost of running a website surpasses the revenue earned from said website, the website is shut down. When the costs of spamming or advertising via spam exceeds the income, spam stops. Blue Security was beginning to become successful. Too bad they bowed out.

So, I was thinking by ratboy666 · 2006-07-10 15:33 · Score: 1

How about a REAL IMMUNE SYSTEM anti-spam filter? I had a dream...

Here's how it works. I catch me a SPAMMER, and have it tested. IFF it is alergic to a common item (ragweed, peanuts, shellfish, etc.). I keep it in the sub-basement. Otherwsie, I release it back to the wild and catch me another.

Once SPAMMER is aquired, I put it in a chair, and provide food and water. SPAMMER is given computer, internet access, and is also attach to an allergen device that delivers the substance SPAMMER is allergic to, in controllable quantities.

SPAMMER is given control of the COMPUTER INCOMING SPAM FILTER, and allowed to freely hack on the internet.

If SPAM is delivered, and identified by my userbase, the ALLERGEN DEVICE is activated, releasing a quantity of the ALLERGEN. If a period of time (settable) goes by WITHOUT identified SPAM, the ALLERGEN DEVICE is disabled, with a random delay in the system.

If the SPAMMER is able to capture two additional SPAMMERs, it is removed from service.

Ratboy

--
Just another "Cubible(sic) Joe" 2 17 3061

Give Them What They Want..... by IHC+Navistar · 2006-07-10 16:41 · Score: 0

Someone should set up an organization where a panel reviews submitted spam emails, and when an email is identified as spam, a program is activated that sends massive quantities of replies, essentially a DoS, to the spammer's computer. After getting bombarded with thousands of requests (that is what they wanted, right?) the hosting server will eventually crash and shut down. How can they complain when you gave them what they wanted? ----- Sig Sauer

--
Knowing Google's lust for data collection, the Soviet Union is still alive and well inside the psyche of Sergey Brin....

Greylisting? by Anonymous Coward · 2006-07-10 17:29 · Score: 0

This is a general question, how does a well configured Spam filter compared to a simple grey listing?

Haven't been able to find any nice graphs that show a direct comparison.

You CANNOT stop spam by swordgeek · 2006-07-10 22:09 · Score: 1

First of all, you can't stop spam. Filtering will always be an imperfect arms race--we build a better filter, the spammers come up with a better way of circumventing it. It's a never-ending battle.

Secondly, you can't end spam. Too many companies rely on its existence for their business model to work.

The only way to stop spam is to stop the spammers from SENDING the stuff. However if this happened, you would see a huge number of companies suffer and possibly go bankrupt. Sure, the organised crime groups behind it would suffer, but I'm thinking of the moderately legitimate companies: Symmantec, Tumbleweed, Borderware, and the like make their money from spam and viruses. They cannot afford for these threats to go away! (Well, perhaps Symmantec could survive now that they own Veritas.) Now consider the amount of network gear and bandwidth that has been sold and is being consumed by spam, and you realise that even the big gear vendors like Cisco and Nortel have a major stake in spam sticking around.

Zero tolerance of spam might have worked if we had started worldwide in 1996, but that chance is long gone. Furthermore, legislation won't work as long as their are 'safe haven' countries out there who will host spammers' gear.

The only potentially useable answer is true vigilantism--if spammers start consistently showing up dead, we might be able to reduce spam. Failing that, we can give up on email as a useful medium.

In other words; short of serial murders, the spammers have won.

--

"People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban

a real solution by rs232 · 2006-07-10 22:16 · Score: 1

A real solution would be end to end authentication and encryption. I wonder why none of the supreme innovators have thought of this yet. But then again the NoSuchAgency wouldn't be able to monitor our inbox or product vendors spam our inboxs.

--
davecb5620@gmail.com

I get (practically) no spam.... by DaveGuy · 2006-07-11 04:40 · Score: 1

I have two major filtering layers (perimeter & inbox). If the recipient is not known, it's spam, and gets temp-failed. If the sender is not known, it is likely spam, and can only send 1 message per second, or get temp-failed (otherwise, I allow several messages per second). I allow only 2 recipients per envelope (temp-fail overage). Whatever makes it through my permieter filters gets to the second major layer (inbox). At this layer, if the sender is known, it stays in the inbox, otherwise, it goes into a "new-contacts" folder. This inbox layer, of course, is fully at the discretion of the individual owner. The inbox-owner can scan through this folder for legitimates or spam, report the spam to me (for specific blacklisting), or reply to (and/or add to their addressbook -- making them "known") the legitimates.

Spammers tend to use botnets, and botnets tend to go elsewhere when presented with a temp-fail. Legitimate MTAs keep trying automatically until the message is relayed, or times-out. Spammers tend to have lots of bad addresses; legitimates tend to have very few. Spammers tend to send to more than 2 recipients per envelope. For my environment, legitimates tend to send to only 1 or 2 recipients at a time, but even when they send to more, they keep going (yes, this causes me some extra work for the extra data portions that must be virus scanned) until they're done.

To "know the sender", I evaluate my outgoing mail logs for recipients my customers send to. This is NOT challenge-response. If I don't know you, and you're legitimate, your mail will come through on the first try -- it may just take a bit longer than if I know you already.

Of course, the perimeter layer also does various other filtering (heuristics, content, virus) that may result in the message being quarantined as spam.

Slashdot Mirror

Spam Detection Using an Artificial Immune System

114 comments