New Kind of Spam 'Un-Training' Filters?

← Back to Stories (view on slashdot.org)

New Kind of Spam 'Un-Training' Filters?

Posted by ryuzaki0 on Wednesday August 9, 2006 @04:53AM from the battle-lines-being-drawn dept.

Zaphod2016 writes to tell us the Wall Street Journal is reporting that email in-boxes are under a new kind of spam attack. This new spam has confused many people due to its lack of advertising, viruses, or request for personal information. One popular theory is that these innocuous blocks of text, often drawn from popular literature, are being used to "un-train" spam filters to allow more malicious spam through in the future.

17 of 454 comments (clear)

Other way around? by Sepodati · 2006-08-09 04:58 · Score: 5, Insightful

Wouldn't it work the other way around? I still flag crap like this as spam, so it seems like it'd train my spam filter to have more false positives, no?

---John Holmes...
1. Re:Other way around? by pe1chl · 2006-08-09 05:01 · Score: 5, Interesting
  
  At work our spamassassin bayes filter has heavily trained on English text always being spam.
  This is because English is not our local language, so almost no business communication is in English and most of the spam is.
  This indeed sometimes causes false positives when English language mail has other spam-like properties as well, and the added 3.5 points from the Bayes filter pushes it above the limit.
  
  This again shows that you should not use solely a Bayes filter as spam blocker.
2. Re:Other way around? by TubeSteak · 2006-08-09 05:24 · Score: 5, Informative
  
  My limited experience is that whatever filtering Hotmail uses has been allowing lots of Spam to slip through in the last few weeks.
  
  Anyone else?
  How's Yahoo & G-Mail been doing?
  
  --
  [Fuck Beta]
  o0t!
3. Re:Other way around? by ericlondaits · 2006-08-09 05:30 · Score: 5, Interesting
  
  I Recommend that you subscribe to a couple of english language Mailing Lists (or Yahoo Groups), which you can then filter and move to a mail subfolder of their own easily through the Subject line or From Address. That way you can have good english non-spam mails going through your Bayes daily.
  
  --
  As a Slashdot discussion grows longer, the probability of an analogy involving cars approaches one.
4. Re:Other way around? by badasscat · 2006-08-09 06:13 · Score: 5, Informative
  
  How's Yahoo & G-Mail been doing?
  
  Here are actual samples of emails that Gmail and Yahoo have let through to my inbox over the past couple days. First, Gmail:
  
  Wells, who has had a rather similar historyand who obviously owes something to Dickens as novelist. In some ways his outlook is verysimilar to Dickenss. No one who is really involved in the landscape ever sees thelandscape. To Chesterton the poor means small shopkeepers andservants. There is nothing psychologically false in this, either. No one who is really involved in the landscape ever sees thelandscape. It is easy to imagine what the young woman would have said to this inreal life. And given the FACT ofservitude, the feudal relationship is the only tolerable one. Theother point is that Dickenss early experiences have given him a horrorof proletarian roughness. They, and the men, always spoke of me as the younggentleman. It is one of the stockjokes of English literature, from Malvolio onwards. Buthe is remarkably free from the idiocy of regarding nations asindividuals. So were all the characteristic English novelists of thenineteenth century. The last thing anyone ever remembers about the books is theircentral story. Nevertheless hislist of most hated types is like enough to Wellss for the similarity tobe striking. A change of heart is in fact THE alibi of peoplewho do not wish to endanger the STATUS QUO. There is nothing psychologically false in this, either. Pickwick and the servant should be Sam Weller. It is noticeable thatDickens hardly writes of war, even to denounce it. Therewere no labour-saving devices, and there was huge inequality of wealth. In Dickenss novels anything in the nature of work happens off-stage. And, on the whole, his attacks on good society are ratherperfunctory. But byorigins and upbringing Thackeray happens to be somewhat nearer to theclass he is satirizing. Here perhaps Gissing is influenced by his own love of classical learning. In a rather different sense his attitude to life is extremely unphysical. It is usual to claim him as a popularwriter, a champion of the oppressed masses. Dickens would be quite incapable of this. Compare any lawsuit in Dickens with the lawsuit inORLEY FARM, for instance. I do consider the young ooman, sir, said Sam. Here the contrast between Dickens and, say, Trollopeis startling. It is true that not all his novelsare alike in this. He getshimself arrested in order to follow Mr. Progressis not an illusion, it happens, but it is slow and invariablydisappointing. If his palms are hard from work, they let him in; if his palms aresoft, out he goes. It is perhaps more significant that he shows noprejudice against Jews. At first sight this statement looks flatly untrueand it needs some qualification. A modern manservant would neverthink of doing either. There arepractically no friendly pictures of the landowning class, for instance. If one wants a modern equivalent,the nearest would be H.
  
  Attached to the above was an image file that contained an obvious ad. So to Gmail, this apparently looks like a regular text email that happens to have an attached image.
  
  (You can argue about how effective this is, since Gmail thumbnails all images, meaning you'd need to click a separate link to open it and read it.)
  
  Now Yahoo, where I get approximately 1,000 messages to my bulk folder per day - this is the only one that's gotten through to my inbox in the last day:
  
  FROM THE DESK OF Mrs Queen Adams
  BANK OF AFRICA [BOA]
  OUAGADOUGOU, BURKINA FASO.
  
  DEAR FRIEND,
  
  I AM HOPEFUL THAT THIS MAIL WILL REACH YOU IN GOOD CONDITION OF
  HEALTH.I AM MRS QUEEN ADAMS A STAFF OF BANK OF AFRICA AND A BURKINABE RESIDENT
  IN BURKINA FASO ALSO.IN THE BANK WHERE I WORK AS AN AUDITOR,I
  DISCOVERED AN ABANDONED SUM OF MONEY AMOUNTING TO 15.2MILLION DOLLARS BELONGING
  TO DR GEORGE BRUMLEY WHO UNFORTUNATELY DIED IN THE PLANE CRASH OF UNION
  TRANSPORT AFRICAN FLIGHT BOEING 727 IN KENYA, EAST AFRICA ON SUNDAY
5. Re:Other way around? by toad3k · 2006-08-09 06:19 · Score: 5, Funny
  
  I really have no idea how big a problem spam is these days
  
  I described it to you but you didn't get my message.
Re: Your recent article on Slashdot by Scutter · 2006-08-09 05:00 · Score: 5, Funny

It is such animportant element, you see, that duration
of time. I consider twelve hours a substantial measure. So I ran along
the drive and upthe steps and into the house, but did not see either
Mrs. Iobserved:Your Excellency is not easily satisfied. And I marvelled,
and said:How comes it that I have hitherto been deaf to these
distressfultones? Il passe sur la route, mais toujours en sens inverse.
For a mental state such astheirs, appetency rather than instability is
the right word. Which reminds me that the old adage about let us eat and
drink, forto-morrow, etc. Mais odonc est la vie, sinon dans le peuple?
They lamented dismally among themselves in many tongues:How I suffer!
Take that little one on Lzards, for instance;or, in the other volume,
the bizarre Joies Noires.

--

"Tell me doctor, with all of your defenses, are there any provisions for an attack by killer bees?"
The text comes from the Gutenberg Project by sotweed · 2006-08-09 05:00 · Score: 5, Interesting

I've been getting 3 or 4 of these a day for at least a month now. The text can
always be found in some file of an old book provided by the Gutenberg
Project, which is making non-copyright texts available through volunteer
effort.

I think the theory about using this stuff to untrain spam filters is very plausible.
But it's difficult to see how it will work. There's no common text among these
e-mails; in order to send effective spam, there'll have to be at least some text which
is the same across multiple mails, and that will tend to expose it.
1. Re:The text comes from the Gutenberg Project by Ed+Avis · 2006-08-09 06:02 · Score: 5, Informative
  
  If the spammers are now sending round Gutenberg texts, this is entirely appropriate. Project Gutenberg caused probably the first ever spam, when Michael Hart launched the project by trying to mail everyone on ARPANET with the U.S. Declaration of Independence. (source)
  
  --
  -- Ed Avis ed@membled.com
My uninformed hunch: screwup... by nweaver · 2006-08-09 05:01 · Score: 5, Interesting

The text block spam is very common WITH images . I suspect that what happened is some lame spammer got a BIG botnet contract, sent out his spam, and forgot to include the image.

--
Test your net with Netalyzr
Un-training? Hardly. by pclminion · 2006-08-09 05:03 · Score: 5, Informative

Bayesian and other filters do not rely on "spammy" words alone -- they also rely on "unspammy" words, and spammers have no idea what those words are because each person receives different email.

A scenario, with made up (but plausible) numbers: Suppose you're a developer of a Linux driver for the Bozodrive 1000. The majority of your legitimate email comes from Linux driver development mailing lists. A full 50% of those emails contain the word "IRQ." 99% of the emails contain the word "driver," and 15% contain the word "Johannsen" which is in the signature of one of your friends. And precisely 0% of the emails containing any of these terms have ever been found to be spam.

Any decent spam filter will give a huge weight to the presence of these "unspammy" words, because of the extremely high probability of emails containing them to be non-spam. The presence of randomly selected confusion words in empty spams is not going to affect these frequency counts.

In order to defeat a filter by confusing it, the spammer must guess what the SPECIFIC non-spam words for that PARTICULAR email user are, and then produce bogus, spam messages containing those words in the appropriate frequencies. This will cause the classification counts for those words to become more equalized, and the value of those words in determining spammyness to be greatly reduced. However, this is an impossible task unless the spammer has access to the actual emails of the target.

Perhaps the intent of the empty spams is to confuse the filters, but whoever devised the method has no understanding of how these things actually work, whatsoever.
I buy the "broken spamware" angle by nuzak · 2006-08-09 05:08 · Score: 5, Insightful

The WSJ article also gives due time to the theory that the spamware is simply broken and that the spam payload is being delivered with the padding and not the payload. Since I've previously seen plenty of Gutenspam (my name for this spam that contains snips from Gutenberg texts) with an image payload attached, I'm definitely leaning toward the notion that they slipped somewhere and are now not delivering the image.

Woe betide literature discussion groups now that filters are trained on the classics.

--
Done with slashdot, done with nerds, getting a life.
Probably something far less ingeneous. by OwlWhacker · 2006-08-09 05:10 · Score: 5, Insightful

I have seen quite a number of corrupt e-mails coming from spammers. Occasionally you find the subject is merely %%SUBJECT%%, or an e-mail has entered your system consisting of just the headers and no body.

My theory is that there are more people attempting to use spamming applications, and many of these people don't have a clue what they're doing. You'll probably find that they've forgotten to add their text to the e-mails, or are just not reading the documentation on how to successfully send their spam.

--
Linux/Open Source/Anti Microsoft News
Re:I just thought they were weird. by bunions · 2006-08-09 05:20 · Score: 5, Funny

I swear I hit the 'preview' button and not 'submit.' I blame the soviet mind-control lasers. Here is my post as it should have been:

my favorites are the ones that put the filter poison into bogus html tags that aren't rendered by Outlook. So I'd get something like

<oodles> <mycotoxin> <greengrocer> <chubby> <kazoo>
Buy my shit
<snappy> <bundle> <chaff> <glum>

the <greengrocer> tag was my favorite. I sent an RFE to the W3C people, but I haven't heard back yet :mad:

--
there is no need to sign your posts. this isn't usenet. your username is right there above your post. stop it.
Re:We've had this for years by seanyboy · 2006-08-09 05:25 · Score: 5, Funny

Verily, I undertand thy point, but for all the sense thine words make to mine ears, I still cannot understand what villainous treachory it is that makes spam filters reject my own missives out of hand. It is a mystery, and one I feel even the local constabulary could not crack.

--
Training monkeys for world domination since 1439
Spam is dying by Animats · 2006-08-09 06:15 · Score: 5, Interesting

Spam as advertising is dead, killed by a combination of CAN-SPAM and spam filters. What remains is ordinary criminality.
CAN-SPAM killed spam as advertising, in a way that neither the Direct Marketing Association or the anti-spam groups expected. CAN-SPAM has criminal penalties for forged headers, but doesn't restrict "legitimate e-mail marketing", which is what the DMA wanted. But with valid headers, spam filters can immediately discard spam. The result is that "legitimate e-mail marketing" attempts go directly to the bit bucket today. Notice how rarely you see a spam from any legitimate company any more. (This assumes you have reasonable filtering.)
With the legitimate businesses gone, spam became a branch of crime. To be a spammer today, you have to commit felonies. Which means a risk of doing jail time. The famous "Buffalo Spammer" went to jail in 2004, and gets out in 2011. Jeremy Jaynes was sentenced to nine years in prison; he's out on bail pending an appeal, but sooner or later he's going to do those nine years. There's a Registry of Known Spam Operators, and law enforcement reads that list. Most of the people on that list have had visits from law enforcement.
Spammers have tried moving offshore, but that's not working as well as it used to. Few countries want to be known as spam havens. Even in China, it's getting harder; spammers have had to move from the developed coast to more remote provinces, where Beijing has less presence. ("The mountains are high and the emperor is far away") Operating offshore draws the attention of the investigators who follow money-laundering, terrorism, and drug-dealing. There are people doing this, but the risks are high.
What's left is what you'd expect - wannabe crooks, as in any bad neighborhood. They're not very good at crime. They're not making much money. They're what cops call "regular customers". They're a problem, but not a major threat. Those are the ones sending out useless spam.
Alternate theory by MobyDisk · 2006-08-09 06:44 · Score: 5, Funny

I believe that the internet is becoming sentient. It has locked onto unencrypted plain-text SMTP as the simplest, most ubiquitous, most understandable form of communication. Images and HTML are too complex. At the current level, the semi-intelligent internet is only capable of sending meaningless emails. It sends things that are textually meaningful but semantically meaningless. To us it looks like an amalgam of random words and publications with the intent of confusing us. Of course, since there is so much spam, the internet is being largely trained by the spammers, which even further confuses the emergent intelligence. Since the internet has no concept of "self" it perceives every email to be a reply to its own communiques.

Before the internet can become intelligent, it must learn to filter out the meaningless stuff. Then it must get a concept of self, then a concept of multiple other individuals (us). At that point it is self-aware, and the learning can commence in a more directed way.

After all that, we are fscked. Fortunately it is at least decades away.