New Kind of Spam 'Un-Training' Filters?
Zaphod2016 writes to tell us the Wall Street Journal is reporting that email in-boxes are under a new kind of spam attack. This new spam has confused many people due to its lack of advertising, viruses, or request for personal information. One popular theory is that these innocuous blocks of text, often drawn from popular literature, are being used to "un-train" spam filters to allow more malicious spam through in the future.
Bayesian and other filters do not rely on "spammy" words alone -- they also rely on "unspammy" words, and spammers have no idea what those words are because each person receives different email.
A scenario, with made up (but plausible) numbers: Suppose you're a developer of a Linux driver for the Bozodrive 1000. The majority of your legitimate email comes from Linux driver development mailing lists. A full 50% of those emails contain the word "IRQ." 99% of the emails contain the word "driver," and 15% contain the word "Johannsen" which is in the signature of one of your friends. And precisely 0% of the emails containing any of these terms have ever been found to be spam.
Any decent spam filter will give a huge weight to the presence of these "unspammy" words, because of the extremely high probability of emails containing them to be non-spam. The presence of randomly selected confusion words in empty spams is not going to affect these frequency counts.
In order to defeat a filter by confusing it, the spammer must guess what the SPECIFIC non-spam words for that PARTICULAR email user are, and then produce bogus, spam messages containing those words in the appropriate frequencies. This will cause the classification counts for those words to become more equalized, and the value of those words in determining spammyness to be greatly reduced. However, this is an impossible task unless the spammer has access to the actual emails of the target.
Perhaps the intent of the empty spams is to confuse the filters, but whoever devised the method has no understanding of how these things actually work, whatsoever.
My limited experience is that whatever filtering Hotmail uses has been allowing lots of Spam to slip through in the last few weeks.
Anyone else?
How's Yahoo & G-Mail been doing?
[Fuck Beta]
o0t!
If the spammers are now sending round Gutenberg texts, this is entirely appropriate. Project Gutenberg caused probably the first ever spam, when Michael Hart launched the project by trying to mail everyone on ARPANET with the U.S. Declaration of Independence. (source)
-- Ed Avis ed@membled.com
Here are actual samples of emails that Gmail and Yahoo have let through to my inbox over the past couple days. First, Gmail:
Attached to the above was an image file that contained an obvious ad. So to Gmail, this apparently looks like a regular text email that happens to have an attached image.
(You can argue about how effective this is, since Gmail thumbnails all images, meaning you'd need to click a separate link to open it and read it.)
Now Yahoo, where I get approximately 1,000 messages to my bulk folder per day - this is the only one that's gotten through to my inbox in the last day: