New Kind of Spam 'Un-Training' Filters?

← Back to Stories (view on slashdot.org)

New Kind of Spam 'Un-Training' Filters?

Posted by ryuzaki0 on Wednesday August 9, 2006 @04:53AM from the battle-lines-being-drawn dept.

Zaphod2016 writes to tell us the Wall Street Journal is reporting that email in-boxes are under a new kind of spam attack. This new spam has confused many people due to its lack of advertising, viruses, or request for personal information. One popular theory is that these innocuous blocks of text, often drawn from popular literature, are being used to "un-train" spam filters to allow more malicious spam through in the future.

4 of 454 comments (clear)

Min score:

Reason:

Sort:

Un-training? Hardly. by pclminion · 2006-08-09 05:03 · Score: 5, Informative

Bayesian and other filters do not rely on "spammy" words alone -- they also rely on "unspammy" words, and spammers have no idea what those words are because each person receives different email.

A scenario, with made up (but plausible) numbers: Suppose you're a developer of a Linux driver for the Bozodrive 1000. The majority of your legitimate email comes from Linux driver development mailing lists. A full 50% of those emails contain the word "IRQ." 99% of the emails contain the word "driver," and 15% contain the word "Johannsen" which is in the signature of one of your friends. And precisely 0% of the emails containing any of these terms have ever been found to be spam.

Any decent spam filter will give a huge weight to the presence of these "unspammy" words, because of the extremely high probability of emails containing them to be non-spam. The presence of randomly selected confusion words in empty spams is not going to affect these frequency counts.

In order to defeat a filter by confusing it, the spammer must guess what the SPECIFIC non-spam words for that PARTICULAR email user are, and then produce bogus, spam messages containing those words in the appropriate frequencies. This will cause the classification counts for those words to become more equalized, and the value of those words in determining spammyness to be greatly reduced. However, this is an impossible task unless the spammer has access to the actual emails of the target.

Perhaps the intent of the empty spams is to confuse the filters, but whoever devised the method has no understanding of how these things actually work, whatsoever.
Re:Other way around? by TubeSteak · 2006-08-09 05:24 · Score: 5, Informative

My limited experience is that whatever filtering Hotmail uses has been allowing lots of Spam to slip through in the last few weeks.

Anyone else?
How's Yahoo & G-Mail been doing?

--
[Fuck Beta]
o0t!
Re:The text comes from the Gutenberg Project by Ed+Avis · 2006-08-09 06:02 · Score: 5, Informative

If the spammers are now sending round Gutenberg texts, this is entirely appropriate. Project Gutenberg caused probably the first ever spam, when Michael Hart launched the project by trying to mail everyone on ARPANET with the U.S. Declaration of Independence. (source)

--
-- Ed Avis ed@membled.com
Re:Other way around? by badasscat · 2006-08-09 06:13 · Score: 5, Informative

How's Yahoo & G-Mail been doing?

Here are actual samples of emails that Gmail and Yahoo have let through to my inbox over the past couple days. First, Gmail:

Wells, who has had a rather similar historyand who obviously owes something to Dickens as novelist. In some ways his outlook is verysimilar to Dickenss. No one who is really involved in the landscape ever sees thelandscape. To Chesterton the poor means small shopkeepers andservants. There is nothing psychologically false in this, either. No one who is really involved in the landscape ever sees thelandscape. It is easy to imagine what the young woman would have said to this inreal life. And given the FACT ofservitude, the feudal relationship is the only tolerable one. Theother point is that Dickenss early experiences have given him a horrorof proletarian roughness. They, and the men, always spoke of me as the younggentleman. It is one of the stockjokes of English literature, from Malvolio onwards. Buthe is remarkably free from the idiocy of regarding nations asindividuals. So were all the characteristic English novelists of thenineteenth century. The last thing anyone ever remembers about the books is theircentral story. Nevertheless hislist of most hated types is like enough to Wellss for the similarity tobe striking. A change of heart is in fact THE alibi of peoplewho do not wish to endanger the STATUS QUO. There is nothing psychologically false in this, either. Pickwick and the servant should be Sam Weller. It is noticeable thatDickens hardly writes of war, even to denounce it. Therewere no labour-saving devices, and there was huge inequality of wealth. In Dickenss novels anything in the nature of work happens off-stage. And, on the whole, his attacks on good society are ratherperfunctory. But byorigins and upbringing Thackeray happens to be somewhat nearer to theclass he is satirizing. Here perhaps Gissing is influenced by his own love of classical learning. In a rather different sense his attitude to life is extremely unphysical. It is usual to claim him as a popularwriter, a champion of the oppressed masses. Dickens would be quite incapable of this. Compare any lawsuit in Dickens with the lawsuit inORLEY FARM, for instance. I do consider the young ooman, sir, said Sam. Here the contrast between Dickens and, say, Trollopeis startling. It is true that not all his novelsare alike in this. He getshimself arrested in order to follow Mr. Progressis not an illusion, it happens, but it is slow and invariablydisappointing. If his palms are hard from work, they let him in; if his palms aresoft, out he goes. It is perhaps more significant that he shows noprejudice against Jews. At first sight this statement looks flatly untrueand it needs some qualification. A modern manservant would neverthink of doing either. There arepractically no friendly pictures of the landowning class, for instance. If one wants a modern equivalent,the nearest would be H.

Attached to the above was an image file that contained an obvious ad. So to Gmail, this apparently looks like a regular text email that happens to have an attached image.

(You can argue about how effective this is, since Gmail thumbnails all images, meaning you'd need to click a separate link to open it and read it.)

Now Yahoo, where I get approximately 1,000 messages to my bulk folder per day - this is the only one that's gotten through to my inbox in the last day:

FROM THE DESK OF Mrs Queen Adams
BANK OF AFRICA [BOA]
OUAGADOUGOU, BURKINA FASO.

DEAR FRIEND,

I AM HOPEFUL THAT THIS MAIL WILL REACH YOU IN GOOD CONDITION OF
HEALTH.I AM MRS QUEEN ADAMS A STAFF OF BANK OF AFRICA AND A BURKINABE RESIDENT
IN BURKINA FASO ALSO.IN THE BANK WHERE I WORK AS AN AUDITOR,I
DISCOVERED AN ABANDONED SUM OF MONEY AMOUNTING TO 15.2MILLION DOLLARS BELONGING
TO DR GEORGE BRUMLEY WHO UNFORTUNATELY DIED IN THE PLANE CRASH OF UNION
TRANSPORT AFRICAN FLIGHT BOEING 727 IN KENYA, EAST AFRICA ON SUNDAY