New Kind of Spam 'Un-Training' Filters?

← Back to Stories (view on slashdot.org)

New Kind of Spam 'Un-Training' Filters?

Posted by ryuzaki0 on Wednesday August 9, 2006 @04:53AM from the battle-lines-being-drawn dept.

Zaphod2016 writes to tell us the Wall Street Journal is reporting that email in-boxes are under a new kind of spam attack. This new spam has confused many people due to its lack of advertising, viruses, or request for personal information. One popular theory is that these innocuous blocks of text, often drawn from popular literature, are being used to "un-train" spam filters to allow more malicious spam through in the future.

18 of 454 comments (clear)

Min score:

Reason:

Sort:

Vectorspaces by bigattichouse · 2006-08-09 04:59 · Score: 4, Interesting

As a hobby, I play around with ways to classify spam. Not much of a hobby, but I find the problem interesting.

Lately, I've also been trying to use my vectorspace engine to classify spam.. so these sorts of things might get in, but only because they fall into the general category of readable text...

I've also been thinking about building a GPL tool to provide "sound-based" classification sort of like a "one second orchestra" playing in harmony/disharmony based on the content.

Regardless of the engine I use, I still have to dig through my trash bin every few days to make sure nothing good slipped through.

--
meh
The text comes from the Gutenberg Project by sotweed · 2006-08-09 05:00 · Score: 5, Interesting

I've been getting 3 or 4 of these a day for at least a month now. The text can
always be found in some file of an old book provided by the Gutenberg
Project, which is making non-copyright texts available through volunteer
effort.

I think the theory about using this stuff to untrain spam filters is very plausible.
But it's difficult to see how it will work. There's no common text among these
e-mails; in order to send effective spam, there'll have to be at least some text which
is the same across multiple mails, and that will tend to expose it.
1. Re:The text comes from the Gutenberg Project by misleb · 2006-08-09 05:23 · Score: 3, Interesting
  
  . There's no common text among these
  e-mails;
  
  I think that is the point. They want to either poison those words so you get more false positives or they want to push other REAL spam related words out of the "this is spam" dictionaries. Maybe both. If these messages had some common theme, they would all get blocked and would have no net effect. They need you to click "this is spam" to poison your filters.
  
  Question is, does it work? I don't know. Seems to be highly dependent on the nature of your spam filter. Maybe they are only targeting a specific, popular filtering system.
  
  To me it seems like an act of deparation. I think filters are finally catching up with spammers. It is getting more and more difficult to get spam through a half way decent filter and there are a lot of decent filters out there.
  
  -matthew
  
  --
  "THERE IS NO JUSTICE, THERE IS ONLY ME." -Death
Re:Other way around? by pe1chl · 2006-08-09 05:01 · Score: 5, Interesting

At work our spamassassin bayes filter has heavily trained on English text always being spam.
This is because English is not our local language, so almost no business communication is in English and most of the spam is.
This indeed sometimes causes false positives when English language mail has other spam-like properties as well, and the added 3.5 points from the Bayes filter pushes it above the limit.

This again shows that you should not use solely a Bayes filter as spam blocker.
My uninformed hunch: screwup... by nweaver · 2006-08-09 05:01 · Score: 5, Interesting

The text block spam is very common WITH images . I suspect that what happened is some lame spammer got a BIG botnet contract, sent out his spam, and forgot to include the image.

--
Test your net with Netalyzr
1. Re:My uninformed hunch: screwup... by xpurple · 2006-08-09 05:44 · Score: 3, Interesting
  
  I suspect that some of it may be more than that. You can encrypt messages into plain text. If you then send out your encrypted messages to a million people then who would ever know who the message was really for?
  
  --
  http://www.xpurple.com
Re:Other possibilities by Coventry · 2006-08-09 05:12 · Score: 4, Interesting

Just like the cryptic number sequence radio/voip 'stations', this could be a method of communication.

We see so much Spam everyday, everyone takes it for granted, and everyone runs 'filters'. If I wanted to secretly inform agents to begin operations, a select quote from a book sent as spam to hundreds of thousands of people would be perfect. Everyone ends up on spam-lists, and recieving spam is a passive process, so its even more anonymous than public web forums.

--
man is machine
Re:I buy the "broken spamware" angle by Richard_at_work · 2006-08-09 05:26 · Score: 4, Interesting

I dont think this is the case, as Ive been getting these sorts of emails for at least 3 years (looking back at the spam archive I keep to train from) - random blocks of legible text, blocks of psuedo english (words are correct but theres no effort at sentence structure), even jokes on their own. I got intrigued by this about 6 months ago and wrote a few scripts to see if it was just a broken spam client forgetting to add the payload, but your average 'with payload' spam doesnt seem to match these emails, theres practically no similiar 'with payload' spams in my archive with these blocks of text.

I always wrote it off as baysian filter poisoning.
Not New by Tweekster · 2006-08-09 05:27 · Score: 3, Interesting

As long as I can remember I always received spam that didnt have an advertisement, didnt have contact information at all etc.

Some spammers spoof their emails so well you couldnt contact them if you were interested in their crap. Many times it is a bit of text with a click here (but nowhere to actually click ) etc.

I think the spammers are just idiots. It is amazing most of them actually managed to get the software working and send an email because of how craptastic their messages are (not disguised, just junk)

--
The phrase "more better" is acceptable English. suck it grammar Nazis
Re:Other way around? by ericlondaits · 2006-08-09 05:30 · Score: 5, Interesting

I Recommend that you subscribe to a couple of english language Mailing Lists (or Yahoo Groups), which you can then filter and move to a mail subfolder of their own easily through the Subject line or From Address. That way you can have good english non-spam mails going through your Bayes daily.

--
As a Slashdot discussion grows longer, the probability of an analogy involving cars approaches one.
Re:Other way around? by Skynyrd · 2006-08-09 05:40 · Score: 4, Interesting

My limited experience is that whatever filtering Hotmail uses has been allowing lots of Spam to slip through in the last few weeks.

Anyone else?
How's Yahoo & G-Mail been doing?

I use gmail, and although it's let one or two pieces of spam through in the last week, it's always been near 100%.

I get 50-100 email a day on gmail.
Spam is dying by Animats · 2006-08-09 06:15 · Score: 5, Interesting

Spam as advertising is dead, killed by a combination of CAN-SPAM and spam filters. What remains is ordinary criminality.
CAN-SPAM killed spam as advertising, in a way that neither the Direct Marketing Association or the anti-spam groups expected. CAN-SPAM has criminal penalties for forged headers, but doesn't restrict "legitimate e-mail marketing", which is what the DMA wanted. But with valid headers, spam filters can immediately discard spam. The result is that "legitimate e-mail marketing" attempts go directly to the bit bucket today. Notice how rarely you see a spam from any legitimate company any more. (This assumes you have reasonable filtering.)
With the legitimate businesses gone, spam became a branch of crime. To be a spammer today, you have to commit felonies. Which means a risk of doing jail time. The famous "Buffalo Spammer" went to jail in 2004, and gets out in 2011. Jeremy Jaynes was sentenced to nine years in prison; he's out on bail pending an appeal, but sooner or later he's going to do those nine years. There's a Registry of Known Spam Operators, and law enforcement reads that list. Most of the people on that list have had visits from law enforcement.
Spammers have tried moving offshore, but that's not working as well as it used to. Few countries want to be known as spam havens. Even in China, it's getting harder; spammers have had to move from the developed coast to more remote provinces, where Beijing has less presence. ("The mountains are high and the emperor is far away") Operating offshore draws the attention of the investigators who follow money-laundering, terrorism, and drug-dealing. There are people doing this, but the risks are high.
What's left is what you'd expect - wannabe crooks, as in any bad neighborhood. They're not very good at crime. They're not making much money. They're what cops call "regular customers". They're a problem, but not a major threat. Those are the ones sending out useless spam.
Re:A lot of my spam seems pointless by madopal · 2006-08-09 06:53 · Score: 3, Interesting

I'm not exactly sure, but I think the problem with these spam getting further and further away from being legible is caused by market forces. I think the spammers get paid for delivering spam, NOT how many responses/click thrus/sales they get. So, if they blast out an e-mail to you and don't get a bounce, that counts as a successful delivery. Thus, they don't really care what's in the body of the e-mail. They did their job, and they get paid for the delivery.

That's all I can figure, because if your average person is so stupid that they respond to spam, then I think they aren't probably smart enough to figure out what "Viggra" is.
Re:Other way around? by porcupine8 · 2006-08-09 07:07 · Score: 3, Interesting

Actually, you haven't noticed any legitimate emails from Yahoo getting tossed as spam, have you? (Just curious, I've emailed my dad three times in a row with no response, even though he's forwarded me stuff in between, and he's usually quick to respond, so I'm worried Hotmail is tagging emails from Yahoo addresses or something.)
I think I've confused Yahoo by applying for a mortgage. So I've been getting lots of legitimate mortgage and real estate-related emails, and it's been starting to let through a few related spams as well.
Other than that, I haven't been getting any more stray spam than usual. Maybe once a week I'll get one (that's not mortgage-related) that the filter misses.
Then there are the ones that go to email lists that I have filtered to other boxes besides Inbox... Since you can't pick when the spam filter works, it always works AFTER all your others, and so I get all of these. *sigh*

--
Warning: Apple/Nintendo fangirl. Likes her electronics cute & cuddly. May be rabid.
My new pet theory by dfinster · 2006-08-09 08:34 · Score: 4, Interesting

I've about become convinced that the Viagra and other drug spam must be funded by the drug companies themselves. Not because they want us to buy the drugs from the spammers, but just because the constant barrage of email adds up to advertising impressions.

Obviously the emails I get for this crap are so badly done, nobody would actually expect me to buy from them. If I was actually trying to make money selling bogus drugs through spam, wouldn't I work harder to make it look legit? The phishing guys don't seem to have too much trouble making good looking e-mail - so why are the bogus drug emails so childish?

Because they don't exist. It's just advertising impressions. They've managed to get the word Viagra and Cialis in front of me a few more times a day, really cheaply.
More Workable Solution by lord_sarpedon · 2006-08-09 08:43 · Score: 3, Interesting

Rather than send random garbage that, as others have said, bears no resemblance to the users' typical email, why not extract text from the domain's website? A large portion of spam goes to work addresses. Emails sent and received with these addresses often times contain the name of the company, major individuals, current products, industry jargon, etc. So google the second half of the address and insert blocks of text from the company website/related pages. It seems to me that such a method would be much more obvious and effective than using Project Gutenberg. Especially in the short term, the one which matters most in this case.

--
"Strangers have the best candy" -Me
Re:I just thought they were weird. by CohibaVancouver · 2006-08-09 09:27 · Score: 3, Interesting

One could say the same about stealing.
"A fool and his money are soon parted."
What's the difference between some guy selling a tonic via SPAM and a tonic at the state fair? At the end of the day, not much, just that the spammer reaches more people.
Who cares about the email body? by Spacejock · 2006-08-09 12:50 · Score: 4, Interesting

My client-side email app does filtering on the header only. It also applies a few tests to the sender name and email. (Reads each header off the server, checks it out, rates it spam, not spam, or unsure.)
I get phenomenal accuracy without looking at the body, and it's quicker too.

--
Hal Spacejock: Science Fiction with Nuts