Filter-foiling Gibberish Becoming A Spam Staple

← Back to Stories (view on slashdot.org)

Filter-foiling Gibberish Becoming A Spam Staple

Posted by timothy on Tuesday January 13, 2004 @02:16PM from the re:-claire-yum-donut-manhattan-regrets-cute dept.

hcg50a writes "Wired has a story about the random words which have recently been appearing in spam. Antispam experts agreed that this isn't a brand-new technique, but said the addition of potentially filter-foiling gibberish is rapidly becoming a common component of spam."

10 of 606 comments (clear)

Min score:

Reason:

Sort:

I don't get it, really by theRhinoceros · 2004-01-13 14:20 · Score: 4, Insightful

"Most of the illegal-exploit spammers use hash busters and any other trick they can to get past filters, refusing to accept that people use spam filters because they really don't want spam," Linford added.

I really understand this part: going after people who are taking active measures against your enterprise due to their disinterest. Why bother to market to them at all? Is the rate of return worth all the ill will, DOS attacks and legislation?
1. Re:I don't get it, really by radicalskeptic · 2004-01-13 14:31 · Score: 5, Insightful
  
  One reason is that ISPs, corporate servers, or some other body might have implemented the filtering, and not the one reading the mail.
  
  --
  WARNING: If accidentally read, induce vomiting.
2. Re:I don't get it, really by Anonymous Coward · 2004-01-13 14:41 · Score: 5, Insightful
  
  The technique also makes obvious the lie of their "we're just innocent entrepeneurs trying to make a buck" defense. Innocent entrepeneurs don't go out of their way to try to hack their data into other people's computers, past programs that are every bit as clear a sign of intent as a "No Soliciting" sign on your door.
  
  On every spam thread on Slashdot, there's someone complaining that technical measures won't solve the problem, and another saying legal measures won't solve the problem. The answer is that you need both: technical measures to assure the identity of the sender -- both spammer and sponsor -- as well as legal measures to provide for punishment.
3. Re:I don't get it, really by Eosha · 2004-01-13 14:44 · Score: 5, Insightful
  
  Unfortunately, spammers are not in the business of selling things to consumers. They are in the business of selling advertising space to other companies. As long as they can convince unscrupulous business owners that advertising via spam is worthwhile, the spam will continue.
  
  --
  I have a girlfriend whose name doesn't end in .JPG
Grammar Check and Spell Check... by LostCluster · 2004-01-13 14:29 · Score: 4, Insightful

The solution to randomness is to spell check and grammar check incoming e-mail, and consider violations as cause to ad points to the score indicating that it's spam-like.

Sure, a few strange words might be a name that's not in the filter yet, but pure gibberish should be a red flag that either somebody's cat walked on the keyboard, or there's spam going on here. Heavy use of "non-spam" words can override to indicate it's good mail... but a poorly composed mail that doesn't use language seen in friendly mail is highly likely to be spam....
Parent post is not offtopic (steganography) by phr1 · 2004-01-13 14:29 · Score: 4, Insightful

Whoever modded it that way is a moron.
Spam is a perfect carrier for steganographic data since it's broadcast to millions of people and nobody can fall under suspicion merely by receiving it. When the government wants to monitor people's communications to search for steganography, when they don't do anything about spam, the purpose of the monitoring is probably not the stated one.
Re:Spamkiller doesn't care by fo0bar · 2004-01-13 14:30 · Score: 5, Insightful

My Mcafee Spamkiller ignores the white noise, and simply nukes all the mail containing viagra, etc.
What good is that when somebody spams you for Gen3r@c v|agar@?
Re:gibberish... by Alyeska · 2004-01-13 14:37 · Score: 4, Insightful

Worse yet, they keep spamming, Someone keeps buying from spam.
Re:Should be easy to block by kalidasa · 2004-01-13 14:50 · Score: 4, Insightful

Most of them are using random word sequences; the random strings like xdwexe are not usually an important percentage of the overall text, no more than names might be. Besides, how large a corpus of "valid" words do you want to use? The OED weighs in at almost 0.5M; and then with another 0.5M uncatalogued scientific terms and neologisms, plus common mis-spellings and typos and jargon and dialect orthography (like our color, meter, checker, jail etc. for the Brits colour, metre, chequer, gaol) ...
If you don't want to keep the entire corpus of "valid" words in your code, you're going to have to make some compromises. Maybe you'll want to exclude words like "thou," "hauberk," and "coney." Not so good if you're subscribing to an Early Modern Literature listserv.
So you're going to need some logic to determine whether or not a "valid" word that occurs in a message is meaningful. Here's how one rather well known discussion of Bayesian filtering deals with this issue (of unknown words); this is precisely the logic that spammers with random meaningful words are exploiting:
One question that arises in practice is what probability to assign to a word you've never seen, i.e. one that doesn't occur in the hash table of word probabilities. I've found, again by trial and error, that .4 is a good number to use. If you've never seen a word before, it is probably fairly innocent; spam words tend to be all too familiar.

So, what if all the words are valid, but the sentences aren't? Grammar checkers involve a lot more logic than spellcheckers do, and are consequently a lot less accurate. Fact is, you can also fool a grammar checker filter: just pad with random quotations from novels, etc. instead of padding with random words or random misspelled strings.
So the Bayesian approach of identifying spam and ham words is a pretty effective one, given the limitations.
Re:Spamkiller doesn't care by rgmoore · 2004-01-13 15:50 · Score: 4, Insightful

I'm pretty sure that the big worry is about third party filtering. If I install a spam filter, that means that I don't want to see spam and am unlikely to buy something advertized therein. If my ISP installs a spam filter, it removes spam to everyone, including the idiots who might actually buy something from a spammer. Since my ISP theoretically might be using the same technology in their filter that I'm using in mine, it would still make sense for the spammer to work on defeating my filter.

--
There's no point in questioning authority if you aren't going to listen to the answers.