The Next Step In Spam Filtering

← Back to Stories (view on slashdot.org)

The Next Step In Spam Filtering

Posted by CmdrTaco on Thursday October 9, 2003 @09:08AM from the brace-yourself-for-impact dept.

simeonbeta2 writes "Paul Graham (of "A Plan for Spam" fame) has a couple of new articles up. The first one details the success of Bayesian spam filters despite various circumvention techniques by spammers. While the success of Bayesian spam filtering is encouraging, it certainly hasn't seemed to stem the flow of spam in the last year or so. His second article, however, suggests finally taking the anti-spam battle to the spammers! Paul proposes that spam filtering packages automatically spider links contained in probable spam. Not only will this increase the accuracy of filters (by running the retrieved content through the spam filter as well) but this would effectively be a massive distributed DOS attack on spammers. This isn't a new idea nor is it without its problems but I think it's definitely an idea whose time has come."

26 of 349 comments (clear)

DoS Filter Circumvention by inertia187 · 2003-10-09 09:09 · Score: 3, Insightful

We've seen first hand how the early Bayesian filters were circumvented. Remember the images instead of text, then the HTML Entities (like A instead of the letter 'A')? The second and third generations of the Bayesian filters had to account for them. I can just see how a DoS filter would be circumvented early: redirects and browser scripts.

If a filter spiders a spam, all the spammer needs to do is use a redirect or, for smart filters, a small page with javascript that the browser would understand, but would confuse the filter. So yes, the DoS would work at first, but the spammers would realize what was going on and adapt.

I'm sure meta refresh tags would work in the beginning, but it's simple enough to get a filter to look for those. Eventually, a good filter will have to mimic what the browser does very closely. Maybe it'd be better to actually use a browser that the user can't see.

--
A programmer is a machine for converting coffee into code.
1. Re:DoS Filter Circumvention by sketerpot · 2003-10-09 09:13 · Score: 2, Informative
  
  It's possible to include, say, the Mozilla javascript engine in one of these spam filters, which would let it deal with funky javascript. BFilter, for one, uses this approach to deal with ad banners that are inserted in the page by javascript. The redirects can be dealt with; I'm sure there's some standard code for dealing with them that would be easy to use.
  Really, you cn take quite a bit of browser code out of the browser and use it in a filter.
2. Re:DoS Filter Circumvention by vslashg · 2003-10-09 09:29 · Score: 2, Informative
  
  Eventually, a good filter will have to mimic what the browser does very closely. Maybe it'd be better to actually use a browser that the user can't see.
  
  Or set up a filter, and just stop accepting HTML mail altogether. Life is so much better when all of your incoming email is plain text. Most legitimite incoming mail is sent as multipart, so mail from your friends still gets through, even when they use mail clients that want to send out formatted mail.
  
  The spammers sometimes send multipart messages with a text part that says something like "There is no plain text version of this message", but that's still better to see than a picture I didn't ask for.
3. Re:DoS Filter Circumvention by BagOBones · 2003-10-09 09:41 · Score: 2, Informative
  
  In order to render the image it would have to be dowloaded.
  This is how spammers know that they found a working e-mail address.
  
  --
  EA David Gardner -"... but the consumers have proven that actually what they want is fun."
Grr Spam. by Muerto · 2003-10-09 09:09 · Score: 3, Insightful

I think we're on the right track with fining people large amounts of money for being associated with the spam. If you not only go after the people who send the spam, but the people whose products are being advertised, then I think we'll get some results.
1. Re:Grr Spam. by NineNine · 2003-10-09 09:18 · Score: 2, Insightful
  
  If you not only go after the people who send the spam, but the people whose products are being advertised, then I think we'll get some results
  
  Um no. There are plenty of companies that have affiliate programs with thousands of members. There's no way to keep track of how each of your members are advertising. The results you'll get will be putting lots of innocent companies out of business.
2. Re:Grr Spam. by Mannerism · 2003-10-09 10:07 · Score: 4, Insightful
  
  Um no. There are plenty of companies that have affiliate programs with thousands of members. There's no way to keep track of how each of your members are advertising. The results you'll get will be putting lots of innocent companies out of business.
  
  I think I speak for millions when I say, "too fucking bad."
  
  Seriously, to suggest that these companies are "innocent" is ridiculous. They're downright complicit.
  
  --
  Please donate your spare CPU cycles to help fight cancer and other diseases
Duplicate! by JohnGrahamCumming · 2003-10-09 09:10 · Score: 2, Offtopic

Congratulations, Slashdot editors, this is a dupe.

And I'm a subscriber.

And I emailed you before it was posted saying it was a dupe of this story: http://slashdot.org/article.pl?sid=03/08/10/161920 6&mode=thread&tid=111&tid=126. Anybody there?

John.
Silly by ^ · 2003-10-09 09:11 · Score: 3, Insightful

Then all I need to do to launch a DoS attack is send a piece of spam?
Repeat from August by merger · 2003-10-09 09:12 · Score: 2, Informative

Feel free to read the comments from when this article was posted to slashdot in August.
Could be evil. by grub · 2003-10-09 09:12 · Score: 5, Insightful

Imagine a Joe-Job where an EvilDoer wants to knock someone else offline and sends out bogus spam with the victim's website.. Think before you jump.

--
Trolling is a art,
Stop wrecking the Internet. by Sheetrock · 2003-10-09 09:13 · Score: 5, Insightful

Spam alone chews up more than enough bandwidth.
Having every recipient spider the links in the spam they get will not only make spamming inefficient, but web browsing as well. Enough with anti-spam cures that are worse than the disease -- the last almost killed SomethingAwful, and this might knock off the rest of the websites.

--

Try not. Do or do not, there is no try.
-- Dr. Spock, stardate 2822-3.
1. Re:Stop wrecking the Internet. by gilesjuk · 2003-10-09 09:19 · Score: 2
  
  Plus if you get false positives it might take out an innocent site.
2. Re:Stop wrecking the Internet. by tessaiga · 2003-10-09 09:34 · Score: 4, Insightful
  
  Exactly. Whoever was responsible for writing such anti-spam software would be the first person to get hit with a massive lawsuit the first time some spammer found a way to "aim" this sort of scheme at an innocent bystander. If that bystander happens to be a big company with deep pockets, the programmer could be looking at some serious pain. Knowing that such a risk exists, it would be interesting to see if anyone would still be willing to develop such software.
  The article tries to combat false positives with blacklists. A couple of problems with this come to mind right away. The first is that centrally-maintained blacklists are easy to take offline via DDOS, as we've already seen with sites like SPEWS. The second, and IMHO more serious, problem is that this would give the blacklist maintainers huge power over the rest of the internet -- if you ever got on their bad side, or if they were just plain inefficient/not conscientious about accidentally listing innocent bystanders, your site could potentially be shut down until they felt like taking you off the blacklist, just by some spammer spoofing you. Given the poor history of responsiveness that many blacklist maintainers have shown historically, I don't think giving them more power is the answer. Bad enough not being able to send people email if you accidentally get blacklisted -- imagine not being able to get net access at all.
  
  --
  The bold print giveth, and the fine print taketh away ...
What about... by Misch · 2003-10-09 09:15 · Score: 4, Insightful

What about the case where the spammer puts a uniquely identifier into the URL. Sure, he may not get a sale from the clickthrough, but he gets verification that your e-mail address is good.

Then, you get more spam.

--

--You will rephrase your request for me to go to hell. Goto statements are not acceptable programming constructs
1. Re:What about... by mengel · 2003-10-09 10:01 · Score: 2, Insightful
  
  Acutally, no. If the spam filter is in front of the valid-recipient check on your email system, then all the spam message attempts yeild web-hits, meaning they get "verification" of lots of invalid email addresses. Soon the belief that a web hit from an email address makes it more valuable goes the way of the dodo bird...
  
  --
  - "History shows again and again how nature points out the folly of men" -- Blue Oyster Cult, 'Godzilla'
Are these subject lines example's of anti BF? by t0qer · 2003-10-09 09:15 · Score: 2, Interesting

Are these subject lines anti Bayesian filters? Just curious cause they've been getting weird lately..

Xanax_-_No_Prescription_Needed_-_neonatal
Kuasx ep Pharmaceuticals including Valiumm, prozac, aAmbientforth mw
Enter to win free cigarettes pedant
Fight Aging and Skin Cancer Xpxtdp
Bigger Penis is Better betsy

I'm just curious why my spam lately seems to just have weird random junk in the subject line, I actually find it sort of amusing because some of the randomness reminds me of turetL}...yndrome.
1. Re:Are these subject lines example's of anti BF? by Sheetrock · 2003-10-09 09:19 · Score: 3, Informative
  
  The recognizable words (neonatal, pedant, betsy) might be a weak attempt at that in addition to creating non-identical subjects, although they'd need a lot more non-spammy words buried in the article to get through... which they usually do, surrounded with HTML to make them invisible.
  
  --
  
  Try not. Do or do not, there is no try.
  -- Dr. Spock, stardate 2822-3.
Re:Boston Globe Article by hackhound · 2003-10-09 09:27 · Score: 3, Informative

Correct, clickable link here: Boston Globe
This is a horrible idea by image · 2003-10-09 09:27 · Score: 2, Insightful

Malicious virus and trojan authors spend a lot of time and energy writing code that can infect host machines across the internet and wait for incoming instructions to launch a DDOS attack against a target.

And there is actually a proposal for people to voluntarily install this on their machines? And the trigger is simply an email?

Sick of yahoo.com today? Take them down -- just spam the net with junk mail that points their site. Have a vendetta against a guy that hosts his own email over a DSL line? No problem -- you won't even need to spam that many people before their auto-crawling DDOS boxes take his server down.

Yikes.
Re:Who the hell?! by andih8u · 2003-10-09 09:27 · Score: 4, Insightful

This woman at my wife's work got an email where they were selling Photoshop for $40. Quite the bargain, eh? So of course she went and got the director of the company's credit card # and went ahead and ordered it. Amazingly enough, five months later, Photoshop still hasn't come in the mail.

So, in answer to your questions, stupid people make it worth while, and there's no shortage of those.

--

slashdot, news for crazed liberal socialist zealots
Re:What about false positives? by (54)T-Dub · 2003-10-09 09:29 · Score: 3, Informative

From the FAQ :

This could be used to DoS innocent victims.

That's the point of the blacklist. A site doesn't get pounded simply by being mentioned in a spam. It has to be mentioned in a spam and be on the blacklist.

--

"I can not bring myself to believe that if knowledge presents danger, the solution is ignorance" - Isaac Asimov
Re:Who the hell?! by ender-iii · 2003-10-09 09:33 · Score: 2, Funny

coming up with a solution for stupid people would solve a lot more problems then just spam...

--
ender-iii
Section 508 by yerricde · 2003-10-09 11:02 · Score: 2, Interesting

the filter points people to my captcha, which is here and they have to type in "I am not a spammer" and then the letters in the graphic.

The problem with your approach and with any approach that uses a CAPTCHA is that it provides no way for a visually impaired human being to first-contact you. If you use a CAPTCHA, you can't do business with the U.S. government.

--
Will I retire or break 10K?
Go after the source by DigitalSpyder · 2003-10-09 11:06 · Score: 2, Interesting

Legislation is working, albeit slowly.
What is required is that we start fining the companies being spamvertised.

This will force companies to assess who they deal with and make damn sure they understand that they are responsible for this just as much as the spammer (they are the ones that ultimately benefit and therefore pay the spammers).

This would only work however if you could prove a legitimate relationship exists between the spammer being sued and the company. With sufficient resources and investigation this is not as hard as it sounds.

If a company is joe-jobbed in someway, then the spamvertised company shouldn't be targeted unless you can catch the spammer as well and prove that a relationship exists between the two entities. You are then just working up chain, similarly to how cops catch street dealers and work their way up.

Regardless, there are many ways joe-jobbing could be resolved. This is just one idea.

What would eventually happen (through smart legislation) is that it will force spammers to use servers in other countries where it is legal.

This is where blacklists will become most effective then. Business and individuals in these countries will create a public outcry so large that legislation will have to change. And if legislation doesn't change, they still remain blacklisted.

This would stop a significant portion of spam.

The rest (abused networks, open relays) should be be made liable and culpable for spamming. A few well aimed lawsuits against companies with negligent system administrators or people running dedicated servers should get the point across. I have no sympathy for Joe Blow with Winbloze 95 who has no firewall software, no anti virus software, has no idea what a patch is, and expects the ISP to take care of it all for him. And they are just as liable.

We don't let people drive without a license, it should be the same principle with users on the Internet - because there are very real and sometimes drastic consequences of their actions (or lack thereof). It is already in the T's & C's of every AUP for every ISP that the end user is responsible for their actions under their account. It's time that ISPs and the courts *SERIOUSLY* enforce it!!
Long term solution + ramblings from tired mind by lightspawn · 2003-10-09 11:10 · Score: 2, Interesting

Replace the email system with a system that makes sending forged email non-trivial.

I may still wish to accept anonymous emails, but nothing that contains HTML for sure, and maybe only if I can cause the sender 1 cent of damage (maybe by depleting some anonymous fund - for most people paying 1 dollar as a deposit will last forever, spammers would have a dollar disappear in seconds as 100 people mark it as spam and a cent is claimed each time).

In the meantime, seriously, I'd be happy with bouncing each message containing HTML+links, links by IP addresses, or links to domains registered in .cn, .kr or .br . These seem to be the big three right now. Unfortunately I'm using a web-based email solution so I can't implement any of this.

If only we could convince lawmakers to pass actual anti-spam laws, it would be a nice stop-gap solution.

Specifically, we need a way to go not after the anonymous spammer, but after the business being spammed.

What if anybody receiving a spamvertisement for a product could order it, pay with a credit card (up to $500), then present the spam, keep the product and not be required to pay the credit card company?

Just an example, I know that would not work in practice.