New Kind of Spam 'Un-Training' Filters?

← Back to Stories (view on slashdot.org)

New Kind of Spam 'Un-Training' Filters?

Posted by ryuzaki0 on Wednesday August 9, 2006 @04:53AM from the battle-lines-being-drawn dept.

Zaphod2016 writes to tell us the Wall Street Journal is reporting that email in-boxes are under a new kind of spam attack. This new spam has confused many people due to its lack of advertising, viruses, or request for personal information. One popular theory is that these innocuous blocks of text, often drawn from popular literature, are being used to "un-train" spam filters to allow more malicious spam through in the future.

11 of 454 comments (clear)

Min score:

Reason:

Sort:

Vectorspaces by bigattichouse · 2006-08-09 04:59 · Score: 4, Interesting

As a hobby, I play around with ways to classify spam. Not much of a hobby, but I find the problem interesting.

Lately, I've also been trying to use my vectorspace engine to classify spam.. so these sorts of things might get in, but only because they fall into the general category of readable text...

I've also been thinking about building a GPL tool to provide "sound-based" classification sort of like a "one second orchestra" playing in harmony/disharmony based on the content.

Regardless of the engine I use, I still have to dig through my trash bin every few days to make sure nothing good slipped through.

--
meh
The text comes from the Gutenberg Project by sotweed · 2006-08-09 05:00 · Score: 5, Interesting

I've been getting 3 or 4 of these a day for at least a month now. The text can
always be found in some file of an old book provided by the Gutenberg
Project, which is making non-copyright texts available through volunteer
effort.

I think the theory about using this stuff to untrain spam filters is very plausible.
But it's difficult to see how it will work. There's no common text among these
e-mails; in order to send effective spam, there'll have to be at least some text which
is the same across multiple mails, and that will tend to expose it.
Re:Other way around? by pe1chl · 2006-08-09 05:01 · Score: 5, Interesting

At work our spamassassin bayes filter has heavily trained on English text always being spam.
This is because English is not our local language, so almost no business communication is in English and most of the spam is.
This indeed sometimes causes false positives when English language mail has other spam-like properties as well, and the added 3.5 points from the Bayes filter pushes it above the limit.

This again shows that you should not use solely a Bayes filter as spam blocker.
My uninformed hunch: screwup... by nweaver · 2006-08-09 05:01 · Score: 5, Interesting

The text block spam is very common WITH images . I suspect that what happened is some lame spammer got a BIG botnet contract, sent out his spam, and forgot to include the image.

--
Test your net with Netalyzr
Re:Other possibilities by Coventry · 2006-08-09 05:12 · Score: 4, Interesting

Just like the cryptic number sequence radio/voip 'stations', this could be a method of communication.

We see so much Spam everyday, everyone takes it for granted, and everyone runs 'filters'. If I wanted to secretly inform agents to begin operations, a select quote from a book sent as spam to hundreds of thousands of people would be perfect. Everyone ends up on spam-lists, and recieving spam is a passive process, so its even more anonymous than public web forums.

--
man is machine
Re:I buy the "broken spamware" angle by Richard_at_work · 2006-08-09 05:26 · Score: 4, Interesting

I dont think this is the case, as Ive been getting these sorts of emails for at least 3 years (looking back at the spam archive I keep to train from) - random blocks of legible text, blocks of psuedo english (words are correct but theres no effort at sentence structure), even jokes on their own. I got intrigued by this about 6 months ago and wrote a few scripts to see if it was just a broken spam client forgetting to add the payload, but your average 'with payload' spam doesnt seem to match these emails, theres practically no similiar 'with payload' spams in my archive with these blocks of text.

I always wrote it off as baysian filter poisoning.
Re:Other way around? by ericlondaits · 2006-08-09 05:30 · Score: 5, Interesting

I Recommend that you subscribe to a couple of english language Mailing Lists (or Yahoo Groups), which you can then filter and move to a mail subfolder of their own easily through the Subject line or From Address. That way you can have good english non-spam mails going through your Bayes daily.

--
As a Slashdot discussion grows longer, the probability of an analogy involving cars approaches one.
Re:Other way around? by Skynyrd · 2006-08-09 05:40 · Score: 4, Interesting

My limited experience is that whatever filtering Hotmail uses has been allowing lots of Spam to slip through in the last few weeks.

Anyone else?
How's Yahoo & G-Mail been doing?

I use gmail, and although it's let one or two pieces of spam through in the last week, it's always been near 100%.

I get 50-100 email a day on gmail.
Spam is dying by Animats · 2006-08-09 06:15 · Score: 5, Interesting

Spam as advertising is dead, killed by a combination of CAN-SPAM and spam filters. What remains is ordinary criminality.
CAN-SPAM killed spam as advertising, in a way that neither the Direct Marketing Association or the anti-spam groups expected. CAN-SPAM has criminal penalties for forged headers, but doesn't restrict "legitimate e-mail marketing", which is what the DMA wanted. But with valid headers, spam filters can immediately discard spam. The result is that "legitimate e-mail marketing" attempts go directly to the bit bucket today. Notice how rarely you see a spam from any legitimate company any more. (This assumes you have reasonable filtering.)
With the legitimate businesses gone, spam became a branch of crime. To be a spammer today, you have to commit felonies. Which means a risk of doing jail time. The famous "Buffalo Spammer" went to jail in 2004, and gets out in 2011. Jeremy Jaynes was sentenced to nine years in prison; he's out on bail pending an appeal, but sooner or later he's going to do those nine years. There's a Registry of Known Spam Operators, and law enforcement reads that list. Most of the people on that list have had visits from law enforcement.
Spammers have tried moving offshore, but that's not working as well as it used to. Few countries want to be known as spam havens. Even in China, it's getting harder; spammers have had to move from the developed coast to more remote provinces, where Beijing has less presence. ("The mountains are high and the emperor is far away") Operating offshore draws the attention of the investigators who follow money-laundering, terrorism, and drug-dealing. There are people doing this, but the risks are high.
What's left is what you'd expect - wannabe crooks, as in any bad neighborhood. They're not very good at crime. They're not making much money. They're what cops call "regular customers". They're a problem, but not a major threat. Those are the ones sending out useless spam.
My new pet theory by dfinster · 2006-08-09 08:34 · Score: 4, Interesting

I've about become convinced that the Viagra and other drug spam must be funded by the drug companies themselves. Not because they want us to buy the drugs from the spammers, but just because the constant barrage of email adds up to advertising impressions.

Obviously the emails I get for this crap are so badly done, nobody would actually expect me to buy from them. If I was actually trying to make money selling bogus drugs through spam, wouldn't I work harder to make it look legit? The phishing guys don't seem to have too much trouble making good looking e-mail - so why are the bogus drug emails so childish?

Because they don't exist. It's just advertising impressions. They've managed to get the word Viagra and Cialis in front of me a few more times a day, really cheaply.
Who cares about the email body? by Spacejock · 2006-08-09 12:50 · Score: 4, Interesting

My client-side email app does filtering on the header only. It also applies a few tests to the sender name and email. (Reads each header off the server, checks it out, rates it spam, not spam, or unsure.)
I get phenomenal accuracy without looking at the body, and it's quicker too.

--
Hal Spacejock: Science Fiction with Nuts