The Next Step In Spam Filtering

← Back to Stories (view on slashdot.org)

The Next Step In Spam Filtering

Posted by CmdrTaco on Thursday October 9, 2003 @09:08AM from the brace-yourself-for-impact dept.

simeonbeta2 writes "Paul Graham (of "A Plan for Spam" fame) has a couple of new articles up. The first one details the success of Bayesian spam filters despite various circumvention techniques by spammers. While the success of Bayesian spam filtering is encouraging, it certainly hasn't seemed to stem the flow of spam in the last year or so. His second article, however, suggests finally taking the anti-spam battle to the spammers! Paul proposes that spam filtering packages automatically spider links contained in probable spam. Not only will this increase the accuracy of filters (by running the retrieved content through the spam filter as well) but this would effectively be a massive distributed DOS attack on spammers. This isn't a new idea nor is it without its problems but I think it's definitely an idea whose time has come."

12 of 349 comments (clear)

DoS Filter Circumvention by inertia187 · 2003-10-09 09:09 · Score: 3, Insightful

We've seen first hand how the early Bayesian filters were circumvented. Remember the images instead of text, then the HTML Entities (like A instead of the letter 'A')? The second and third generations of the Bayesian filters had to account for them. I can just see how a DoS filter would be circumvented early: redirects and browser scripts.

If a filter spiders a spam, all the spammer needs to do is use a redirect or, for smart filters, a small page with javascript that the browser would understand, but would confuse the filter. So yes, the DoS would work at first, but the spammers would realize what was going on and adapt.

I'm sure meta refresh tags would work in the beginning, but it's simple enough to get a filter to look for those. Eventually, a good filter will have to mimic what the browser does very closely. Maybe it'd be better to actually use a browser that the user can't see.

--
A programmer is a machine for converting coffee into code.
Grr Spam. by Muerto · 2003-10-09 09:09 · Score: 3, Insightful

I think we're on the right track with fining people large amounts of money for being associated with the spam. If you not only go after the people who send the spam, but the people whose products are being advertised, then I think we'll get some results.
1. Re:Grr Spam. by Mannerism · 2003-10-09 10:07 · Score: 4, Insightful
  
  Um no. There are plenty of companies that have affiliate programs with thousands of members. There's no way to keep track of how each of your members are advertising. The results you'll get will be putting lots of innocent companies out of business.
  
  I think I speak for millions when I say, "too fucking bad."
  
  Seriously, to suggest that these companies are "innocent" is ridiculous. They're downright complicit.
  
  --
  Please donate your spare CPU cycles to help fight cancer and other diseases
Silly by ^ · 2003-10-09 09:11 · Score: 3, Insightful

Then all I need to do to launch a DoS attack is send a piece of spam?
Could be evil. by grub · 2003-10-09 09:12 · Score: 5, Insightful

Imagine a Joe-Job where an EvilDoer wants to knock someone else offline and sends out bogus spam with the victim's website.. Think before you jump.

--
Trolling is a art,
Stop wrecking the Internet. by Sheetrock · 2003-10-09 09:13 · Score: 5, Insightful

Spam alone chews up more than enough bandwidth.
Having every recipient spider the links in the spam they get will not only make spamming inefficient, but web browsing as well. Enough with anti-spam cures that are worse than the disease -- the last almost killed SomethingAwful, and this might knock off the rest of the websites.

--

Try not. Do or do not, there is no try.
-- Dr. Spock, stardate 2822-3.
1. Re:Stop wrecking the Internet. by tessaiga · 2003-10-09 09:34 · Score: 4, Insightful
  
  Exactly. Whoever was responsible for writing such anti-spam software would be the first person to get hit with a massive lawsuit the first time some spammer found a way to "aim" this sort of scheme at an innocent bystander. If that bystander happens to be a big company with deep pockets, the programmer could be looking at some serious pain. Knowing that such a risk exists, it would be interesting to see if anyone would still be willing to develop such software.
  The article tries to combat false positives with blacklists. A couple of problems with this come to mind right away. The first is that centrally-maintained blacklists are easy to take offline via DDOS, as we've already seen with sites like SPEWS. The second, and IMHO more serious, problem is that this would give the blacklist maintainers huge power over the rest of the internet -- if you ever got on their bad side, or if they were just plain inefficient/not conscientious about accidentally listing innocent bystanders, your site could potentially be shut down until they felt like taking you off the blacklist, just by some spammer spoofing you. Given the poor history of responsiveness that many blacklist maintainers have shown historically, I don't think giving them more power is the answer. Bad enough not being able to send people email if you accidentally get blacklisted -- imagine not being able to get net access at all.
  
  --
  The bold print giveth, and the fine print taketh away ...
What about... by Misch · 2003-10-09 09:15 · Score: 4, Insightful

What about the case where the spammer puts a uniquely identifier into the URL. Sure, he may not get a sale from the clickthrough, but he gets verification that your e-mail address is good.

Then, you get more spam.

--

--You will rephrase your request for me to go to hell. Goto statements are not acceptable programming constructs
Re:Are these subject lines example's of anti BF? by Sheetrock · 2003-10-09 09:19 · Score: 3, Informative

The recognizable words (neonatal, pedant, betsy) might be a weak attempt at that in addition to creating non-identical subjects, although they'd need a lot more non-spammy words buried in the article to get through... which they usually do, surrounded with HTML to make them invisible.

--

Try not. Do or do not, there is no try.
-- Dr. Spock, stardate 2822-3.
Re:Boston Globe Article by hackhound · 2003-10-09 09:27 · Score: 3, Informative

Correct, clickable link here: Boston Globe
Re:Who the hell?! by andih8u · 2003-10-09 09:27 · Score: 4, Insightful

This woman at my wife's work got an email where they were selling Photoshop for $40. Quite the bargain, eh? So of course she went and got the director of the company's credit card # and went ahead and ordered it. Amazingly enough, five months later, Photoshop still hasn't come in the mail.

So, in answer to your questions, stupid people make it worth while, and there's no shortage of those.

--

slashdot, news for crazed liberal socialist zealots
Re:What about false positives? by (54)T-Dub · 2003-10-09 09:29 · Score: 3, Informative

From the FAQ :

This could be used to DoS innocent victims.

That's the point of the blacklist. A site doesn't get pounded simply by being mentioned in a spam. It has to be mentioned in a spam and be on the blacklist.

--

"I can not bring myself to believe that if knowledge presents danger, the solution is ignorance" - Isaac Asimov