Filter-foiling Gibberish Becoming A Spam Staple

← Back to Stories (view on slashdot.org)

Filter-foiling Gibberish Becoming A Spam Staple

Posted by timothy on Tuesday January 13, 2004 @02:16PM from the re:-claire-yum-donut-manhattan-regrets-cute dept.

hcg50a writes "Wired has a story about the random words which have recently been appearing in spam. Antispam experts agreed that this isn't a brand-new technique, but said the addition of potentially filter-foiling gibberish is rapidly becoming a common component of spam."

14 of 606 comments (clear)

Min score:

Reason:

Sort:

I don't get it, really by theRhinoceros · 2004-01-13 14:20 · Score: 4, Insightful

"Most of the illegal-exploit spammers use hash busters and any other trick they can to get past filters, refusing to accept that people use spam filters because they really don't want spam," Linford added.

I really understand this part: going after people who are taking active measures against your enterprise due to their disinterest. Why bother to market to them at all? Is the rate of return worth all the ill will, DOS attacks and legislation?
1. Re:I don't get it, really by radicalskeptic · 2004-01-13 14:31 · Score: 5, Insightful
  
  One reason is that ISPs, corporate servers, or some other body might have implemented the filtering, and not the one reading the mail.
  
  --
  WARNING: If accidentally read, induce vomiting.
2. Re:I don't get it, really by Anonymous Coward · 2004-01-13 14:41 · Score: 5, Insightful
  
  The technique also makes obvious the lie of their "we're just innocent entrepeneurs trying to make a buck" defense. Innocent entrepeneurs don't go out of their way to try to hack their data into other people's computers, past programs that are every bit as clear a sign of intent as a "No Soliciting" sign on your door.
  
  On every spam thread on Slashdot, there's someone complaining that technical measures won't solve the problem, and another saying legal measures won't solve the problem. The answer is that you need both: technical measures to assure the identity of the sender -- both spammer and sponsor -- as well as legal measures to provide for punishment.
3. Re:I don't get it, really by Eosha · 2004-01-13 14:44 · Score: 5, Insightful
  
  Unfortunately, spammers are not in the business of selling things to consumers. They are in the business of selling advertising space to other companies. As long as they can convince unscrupulous business owners that advertising via spam is worthwhile, the spam will continue.
  
  --
  I have a girlfriend whose name doesn't end in .JPG
4. Re:I don't get it, really by rgmoore · 2004-01-13 16:01 · Score: 3, Insightful
  
  It's possible, if not likely, that some of the spamware authors are doing it for the challenge. Some of those guys are allegedly pretty good programmers, and I suspect that many of them are essentially hackers with no sense of morals. I could easily imagine somebody like that trying to figure out how to bypass spam filters just because it was a challenge, not because he actually expected any particular rewards for it. It's like trying to break into the computers in the Pentagon; it's stupid and illegal but a big enough challenge that some people with more brains than common sense will try it anyway.
  
  --
  There's no point in questioning authority if you aren't going to listen to the answers.
Why? by aePrime · 2004-01-13 14:20 · Score: 3, Insightful

I can see them doing this to overcome Bayesian filters, but why? AFAIK, Bayesian filters are not used much (if at all) on mail servers. These filters are run at home by geeks.

Granted, this may get them past the filters, but if somebody's gone through the effort of setting up a Bayesian filter, they're not going to buy your product even if you get into their inbox. It seems like a waste of everybody's effort, and I mean including the spammers.
Grammar Check and Spell Check... by LostCluster · 2004-01-13 14:29 · Score: 4, Insightful

The solution to randomness is to spell check and grammar check incoming e-mail, and consider violations as cause to ad points to the score indicating that it's spam-like.

Sure, a few strange words might be a name that's not in the filter yet, but pure gibberish should be a red flag that either somebody's cat walked on the keyboard, or there's spam going on here. Heavy use of "non-spam" words can override to indicate it's good mail... but a poorly composed mail that doesn't use language seen in friendly mail is highly likely to be spam....
Parent post is not offtopic (steganography) by phr1 · 2004-01-13 14:29 · Score: 4, Insightful

Whoever modded it that way is a moron.
Spam is a perfect carrier for steganographic data since it's broadcast to millions of people and nobody can fall under suspicion merely by receiving it. When the government wants to monitor people's communications to search for steganography, when they don't do anything about spam, the purpose of the monitoring is probably not the stated one.
Re:Spamkiller doesn't care by fo0bar · 2004-01-13 14:30 · Score: 5, Insightful

My Mcafee Spamkiller ignores the white noise, and simply nukes all the mail containing viagra, etc.
What good is that when somebody spams you for Gen3r@c v|agar@?
Re:gibberish... by Alyeska · 2004-01-13 14:37 · Score: 4, Insightful

Worse yet, they keep spamming, Someone keeps buying from spam.
Re:Should be easy to block by kalidasa · 2004-01-13 14:50 · Score: 4, Insightful

Most of them are using random word sequences; the random strings like xdwexe are not usually an important percentage of the overall text, no more than names might be. Besides, how large a corpus of "valid" words do you want to use? The OED weighs in at almost 0.5M; and then with another 0.5M uncatalogued scientific terms and neologisms, plus common mis-spellings and typos and jargon and dialect orthography (like our color, meter, checker, jail etc. for the Brits colour, metre, chequer, gaol) ...
If you don't want to keep the entire corpus of "valid" words in your code, you're going to have to make some compromises. Maybe you'll want to exclude words like "thou," "hauberk," and "coney." Not so good if you're subscribing to an Early Modern Literature listserv.
So you're going to need some logic to determine whether or not a "valid" word that occurs in a message is meaningful. Here's how one rather well known discussion of Bayesian filtering deals with this issue (of unknown words); this is precisely the logic that spammers with random meaningful words are exploiting:
One question that arises in practice is what probability to assign to a word you've never seen, i.e. one that doesn't occur in the hash table of word probabilities. I've found, again by trial and error, that .4 is a good number to use. If you've never seen a word before, it is probably fairly innocent; spam words tend to be all too familiar.

So, what if all the words are valid, but the sentences aren't? Grammar checkers involve a lot more logic than spellcheckers do, and are consequently a lot less accurate. Fact is, you can also fool a grammar checker filter: just pad with random quotations from novels, etc. instead of padding with random words or random misspelled strings.
So the Bayesian approach of identifying spam and ham words is a pretty effective one, given the limitations.
Re:Simple Solution... by drooling-dog · 2004-01-13 14:54 · Score: 3, Insightful

I've been filtering subject lines with too much punctuation for some time now; it catches quite a bit.
Re:Spamkiller doesn't care by rgmoore · 2004-01-13 15:50 · Score: 4, Insightful

I'm pretty sure that the big worry is about third party filtering. If I install a spam filter, that means that I don't want to see spam and am unlikely to buy something advertized therein. If my ISP installs a spam filter, it removes spam to everyone, including the idiots who might actually buy something from a spammer. Since my ISP theoretically might be using the same technology in their filter that I'm using in mine, it would still make sense for the spammer to work on defeating my filter.

--
There's no point in questioning authority if you aren't going to listen to the answers.
Re:The real problem will be deliberate poisoning by sidney · 2004-01-13 16:20 · Score: 3, Insightful

Nigerian scam spam is very different from most spam. It is a story that can be carefully written to use only words that are commonly used, assuming that the people who author them are able to go beyond their broken English all the way to use of statistically hammy correctly spelled text.

But how would you sell more inches on your male member enhanced with V*@gra to make money fast watching celeb teenie nymphos doing it on the farm while only using ordinary non-spammy words?

There are only so many ways to get someone to click here to get all the hot action and a long boring story full of erudite euphemisms is not one of them.

It would be interesting to see if your method of disguising spam can work on a wider range of topics.