Armoring Spam Against Anti-Spam Filters

← Back to Stories (view on slashdot.org)

Armoring Spam Against Anti-Spam Filters

Posted by timothy on Wednesday February 4, 2004 @03:18AM from the take-two-viagra-and-call-nigeria-in-the-a.m. dept.

moggyf points to a BBC article about how spam can be successfully tweaked to slip past current filtering methods, excerpting "To finding out how to beat the filters Mr Graham-Cumming sent himself the same message 10,000 times but to each one added a fixed number of random words. When a message got through he trained an 'evil' filter that helped to tune the perfect collection of additional words." iluvspam adds "It's an interview with POPFile author John Graham-Cumming that summarizes his talk at the recent MIT Spam Conference. You can still listen to the technical details here (choose the Afternoon 1 session, he starts about 75 minutes in)."

16 of 511 comments (clear)

Min score:

Reason:

Sort:

Tch tch... by supersam · 2004-02-04 03:22 · Score: 5, Insightful

Didn't they know something as simple as...

"Make it idiot-proof, and someone will make a better idiot"
Re:Hmmm... by somethinghollow · 2004-02-04 03:25 · Score: 5, Insightful

Like many other academic studies, such as skinning humans alive to see how long they can live, I think this one should only be placed into the right hands.

It's a pisser that spammers now have another tool to circumvent filters; on the other hand, the people who write the filters know exactly what a spammer would do to make "better" spam.

The question is: who will implement first?
my spam filter by SkArcher · 2004-02-04 03:26 · Score: 4, Insightful

if Message header = "type = text/html" then send to "Spam"

It works a treat :)

The other trick I have found useful is the CamelCase nature of my name - spammers tend to mail me either as skarcher or SKARCHER, and both trip filters on my mailbox.

--

An infinite number of monkeys will eventually come up with the complete works of /.
Re:That's dedication... :( by andih8u · 2004-02-04 03:32 · Score: 3, Insightful

We need to get people to stop buying products advertised through spam

As you alluded to, it'd be easier to teach fish to fly. The internet essentially carries with it a stupid-user tax. Worms, virii, spam, et al are the by-products of stupidity, but as with most taxes, it just something that you have to deal with.

--

slashdot, news for crazed liberal socialist zealots
how NOT to get SPAM 101 by musikit · 2004-02-04 03:35 · Score: 3, Insightful

1. don't sign up on any page that requires you email address to verify *cough*like this one *cough*

2. don't use free email services hotmail etc.
3. don't use AOL
4. don't let anyone have your address that forwards messages like "cute bunny pic" or "funny anti-geek joke" etc.
5. don't post your email anywhere.
6. don't sign up for majordomo lists.
Only if you're the author. by Eevee · 2004-02-04 03:36 · Score: 3, Insightful

In the article, it points out those words listed are good for getting past his filter. If you don't normally have mail that uses those words, then your filter will still catch it as spam.

Now, if you do deal with the Berkshire Marriott frequently, asking them for comments on your wireless setup, then yes you're up the creek.
Really don't understand it. by The+I+Shing · 2004-02-04 03:48 · Score: 4, Insightful

I've said this before, but I'll say it again. I really don't understand why all this even happens.

When I'm going through the webmail access to my spam-bait accounts (the ones that are listed on my websites that I don't bother retrieving with my POP email client anymore because of hundreds of spams a day to each), if I'm fooled into opening one up, most likely because of it having a subject header that might be someone legitimate, the moment I see that the message body says anything spammy I immediately click the Delete button. I imagine everyone else in the world is doing the same thing.

It's gotten to the point where the preoccupation of spamming is just to get past filters, the result of which is that the message is grumblingly deleted by the irritated recipient. Who out there is saying, "Oh, look, this message got past all my spam filters and contains a lot of jumbled, garbled nonsense text alongside a plug for herbal penis enlarging pills. This must be legitimate. Now, where's my credit card,"? Do the spammers think that we're all clones of Dilbert's pointy-haired manager?

Spamming is not only irritating, it's pointless. Who is paying these people to spam us? Are people actually buying penis enlarging pills and patches, herbal viagra, mortgage refinancing, credit repair kits, or any of that stuff? Enough to put millions of dollars a month into the hands of career spammers?

I'm hopelessly at sea in this matter.

--
You are in error. No-one is screaming. Thank you for your cooperation.
Re:Ok fuck it by swb · 2004-02-04 03:53 · Score: 4, Insightful

Another example of people assuming that EVERYBODY lives in the USA or is under US law...

The solicitation was made on a server located in the US. I don't doubt that Ashcroft would consider that US jurisdiction, regardless of the physical location of the poster.

There's a lot of guys in dog cages at Guantanomo Bay who've NEVER been to the US. I'm not so sure these days that when the US governemnt is pissed off at you, where you are and where you did something matter a whole lot.
I don't see how this is necessarily a problem by PixelCat · 2004-02-04 03:58 · Score: 3, Insightful

What he's doing is a brute-force attempt to find words with--for himself--a high ham probability. I don't see how this is necessarily going to be an effective general-purpose technique. If you need to start bombarding people with thousands of messages to find the good words you're just going to drive more people into using filters--and this will almost certainly coerce ISPs into doing more filtering as well. Plus, you've got to deal with the issue of keeping data on all those users to find out which words are good for them. This would require you to tailor your spam to each individual user, which probably is going to increase the cost to the spammer (at least in terms of disk storage and time, anyway) and, as Graham-cumming implemented it, is going to fail utterly for anyone who isn't viewing mail as HTML, anyway.
Re:infinite monkeys by letxa2000 · 2004-02-04 04:00 · Score: 5, Insightful

I'm not sure I understand why they think this is a problem with Bayesian filtering. Basically, they're saying that if a spammer sends you the same message thousands of times but inserts a few slightly different words each time, and if the thousands of messages get through the Bayesian filter to the user, and if the user doesn't disable HTML bugs on his email client, then we have a problem...?
First, if the spammer sends thousands of copies of the same message and just changes the "extra words" that he is testing, it will take very little time for Bayesian to adapt to the rest of the message. Suddenly, the rest of the message that previously contained non-spammy words will be considered very spammy and will overwhelm the "extra words" that each message contains. Each time the message is caught as spam, the probability that any future tests get through--regardless of the "extra words"--will be reduced even further.
Second, as the article said, it's a lot of work on the part of the spammer. They'd have to send out thousands of messages to each target to "sniff them out" and most of those wouldn't even be effective since most of them would be caught by filters and those few that got through very few would load the HTML bugs to identify themselves.
Finally, it assumes that those that are using Bayesian filters are filtering their email but leaving their security (inasmuch as HTML bugs) wide open. While there may be some people that use Bayesian and leave HTML bugs active, it has to be a small minority.
In short, it seems to me they've "found" a way to get around Bayesian that won't work, so to speak. I just don't see the problem.... ??
Re:infinite monkeys by Sique · 2004-02-04 04:12 · Score: 4, Insightful

Second, as the article said, it's a lot of work on the part of the spammer. They'd have to send out thousands of messages to each target to "sniff them out" and most of those wouldn't even be effective since most of them would be caught by filters and those few that got through very few would load the HTML bugs to identify themselves.

This is exactly the point. Most of the spam examples will die out because they have an ineffective collection of non spam words. But a few will survive and you now can train an own Bayesian filter which collects the versions of spam that generated webbug hits. After a while some words will shine prominently in your Bayesian filter database for being very effective at slipping through Bayesian spam filters.

Basicly you a fighting the dote with itself. And yes. You can automate the process. Just take your everyday spam (penis enlargement, unsecured credit, Nigerian business opportunities...), take a dictionary and then randomly mix dictionary words into your spam messages and send them out to your email database. Create a website to get the webbug hits and associate every spam message with a hash of the random dictionary words to identify successful sets of anti spam words.

--
.sig: Sique *sigh*
Re:combat the flaw? how? by Winkhorst · 2004-02-04 04:14 · Score: 3, Insightful

The best solution I have found so far is to have your own domain and generate specific email addresses for specific types of communications. You keep your actual ISP email address totally secret and don't give it to anybody except your domain registrar. You then generate an address for your best friends and aquaintances you can trust and keep it separate from everything else so you don't have to change it but once every few years if that. You have a specific Shopping and Registration address you kill and replace after it becomes spammy. And you have an address for things like newletters and email groups you can also change and reregister if they leak out to the spam boobs. There are all kinds of variations on this theme, but that's the basic gist of the matter: Secrecy and flexibility.

--
"Is this Winkhorst a nova criminal?" "No just a technical sergeant wanted for interrogation."
How NOT to get SPAM 201 - a more practical guide by djrogers · 2004-02-04 04:33 · Score: 4, Insightful
- 1) Register a domain (come on, they're cheap now)
- 2) Get an email address from your ISP or other provider (yahoo, fastmail.fm etc) that is complex and convoluted - no names or words
- 3) set up mail redirection with Zoneedit, redirection.net etc. with a catchall to your new mailbox.
- 4) Use a different email address every time you must sign up for anything (ie amazon.com@newdomain.com)
- 5) Filter on sent to headers at first sign of compromised id, or if the volume for a particular id gets too heavy and you're tired of client side filtering, set a specific redirection for it to sample@sample.com (do a whois on sample.com if you're curious).
- 6) Enjoy the same spam free mailbox I've had for 2 years...
Also helpful is to change your reply-to address every few months and give your friends different addresses based on how clueful they are
--
Think outside the... Hey, where'd the friggin' box go?
Re:Here's a sneaky one... by pclminion · 2004-02-04 05:44 · Score: 3, Insightful

The stuff you're talking about is all fine, but it will fail because the spammers will evolve to defeat it.
I think you overestimate the intelligence of these creeps. The fact that spammers are using more and more of these garbage terms, randomizers, and other hacks to get around the filters actually encourages me -- it demonstrates that they really don't have the slightest clue how statistical content based filtering actually works. Currently, they are taking advantage of the extremely bad decision to assign a 0.4 score to unknown words. The spammers are exploiting a crack in the armor, which means the armor needs to be fixed.
A human can filter spam. A spammer can't weasel his way around human intelligence, so this sets an upper bound on how advanced the spammer techniques can get. All we have to do is get document classification up to the point of competitiveness with human performance, and the problem is solved. And research into these directions isn't wasted, because the motivation for the research is for actual important document organization tasks. The effect of stomping out spam will be a cool side effect.
If a spammer was ever actually intelligent enough to get around serious, well-constructed classifiers, I highly doubt he would be in the business of spamming. To suggest that spammers could intellectually compete with people whose have spent years specializing in statistical language processing is a tad bit ridiculous.
At some point, to sell something, the spammer has to say something intelligible which is an advertisement. They can't hide this. Techniques which are foiled by bogus terms at the bottom of the email are broken. It's not a valid reason to believe that spammers are actually getting smart.
Y'all are going to hate this, but... by duck_prime · 2004-02-04 05:57 · Score: 3, Insightful

... The internet essentially carries with it a stupid-user tax. Worms, virii [sic, heh], spam, et al are the by-products of stupidity, but as with most taxes, it is just something that you have to deal with.
With respect to spam, let's take a step back. Obviously somebody out there is gleefully munching handfuls of Viagra and (ahem) "enhancement" pills to psych himself up to (ahem) r0x0r his wife until her weight-loss pills kick in.

It is silly to assume that all these people are just morons. After all, Viagra is proven to work, it is a legitimate product of sorts. The internet is there for hefty short limp (ahem ahem) non-digerati as well as for propeller heads, God bless 'em.

It seems to me that spam is the runaway bastard-child of something which actually is good and useful -- that is, targeted marketing to the willing. Don't throw out the baby with the bathwater. There is a huge legitimate market out there, just begging to be flee^wmarketed.

The anti-spam people are fighting against the Invisible Hand. Good luck.
Re:infinite monkeys by FireBreathingDog · 2004-02-04 06:13 · Score: 3, Insightful

It's much easier than that to defeat Bayesian filtering. Ever \/\/0|\|D3R why you're getting so much spam with obfuscated words? Or why you're getting so much spam where the text content is contained primarily in images rather than plaintext? Those things bypass Bayesian filters, that's why!
Bayesian filters rely on words. That means it is dependent upon word breaks and certain spellings. Well, spammers have been avoiding word breaks (either by removing spaces or introducing unnecessary ones) and obvious "spam words" by mangling the word or introducing "1337"-type spelling.
And Bayesian filters can't parse graphics, so a lot of spammers are careful to put words likely to trigger spam filters into graphics.
BTW, this article explains why there will never be a filtering-based solution to solving spam until SMTP itself is made more secure.

--
Shame on Google.