Paul Graham: Filters that Fight Back

response to the lister's comment by ih8apple · 2003-08-10 05:11 · Score: 4, Informative

In response to the comment: "One danger is someone doing a DDoS by sending fake spam"

From the article notes: "[5] The best way to protect against abuse might be to have the central authority whitelist every site by default, and then, by whatever protocol, take certain sites off. Because you can look at the sites before taking them off the whitelist, there is little danger of people abusing this system to attack an innocent site."

--

Why do I h8 apple?

Following links validates your address by PeekabooCaribou · 2003-08-10 05:12 · Score: 5, Interesting

If I load an image or a link from spam, it's possible that a spammer could be validating my e-mail address for future sale, or perhaps increased spamming since he knows someone is actually reading the message. For example, http://server.foo/image.gif?id=ab0a98df12j3 could be unique to the spam that was sent to me. If any user-agent accesses that URL, the spammer knows that my e-mail is active and I'm reading his junk. I don't know if they actually do this in practice, but I'm wont to load HTML messages because of it.

--
"I'll say it again for the logic-impaired." -- Larry Wall.

Re:Following links validates your address by hankaholic · 2003-08-10 05:16 · Score: 5, Interesting

I've been thinking for a while about maybe having a Slashbox that displays images included in spam in a 1x1 pixel box.

Every load of Slashdot would hit spammers' servers.

--
Somebody get that guy an ambulance!
Re:Following links validates your address by koehn · 2003-08-10 05:27 · Score: 4, Interesting

Actually, the opposite would happen: since all links in all spams get hit, this technique would make putting UIDs into URLs worthless for the purpose of authenticating users.

Spammers would need another mechanism to attempt to authenticate who reads their messages. I like it.

What do you think about downloading IMG tags? It would hurt the server's bandwidth, but it would hurt my mail server's bandwidth, too. Maybe use one of the many open proxies out there instead, kill their bandwidth, maybe close the open proxy... ooh, that's evil! I really like it!

If there were a sig here, would you read it?
Re:Following links validates your address by LordKronos · 2003-08-10 07:29 · Score: 4, Insightful

That's not going to work. All you are going to do would be to needlessly DOS www.geocities.com without any particular spammers site being identified. Geocities would have no way to identify which site is the spammer's, and their hourly bandwidth would never get used up, and thus would still be available for those who click on the links.

Also, consider that spammers could move the identifier to the other end of the url. Just have *.spammer.com or www.*.spammer.com resolve to the same site, and start putting the identifiers in the domain. They could even use random dictionary words as the identifiers to make it more difficult to pick out. The only way to combat that would be to have a system that compares the URLs from several spams and figures out which parts of the URLs changed per user.

Dangerous from a legal perspective by hardaker · 2003-08-10 05:15 · Score: 4, Insightful

What about phrases like "by clicking on this link you agree to let us call your house" kind of things (where the link containers a token for identification purposes). Having a filter auto-follow links could be really dangerous then.

The interesting thing is how the courts would end up viewing auto-clicks vs manual clicks. I'd bet that if a user set up a filter then it would be effectively view as the user doing the clicking...

--
The next site to slashdot will be ready soon, but subscribers can beat the rush and start slashdotting it early!

We're going mobile! by Superfreaker · 2003-08-10 05:15 · Score: 4, Funny

/.ing moves from the web, right into your own mailbox! All the fun of crushing someone elses website without all of the work of clicking those tiresome links.

Note to self: Move web site off of modded GameBoy running apache.

horrid legal thought by BobTheLawyer · 2003-08-10 05:18 · Score: 4, Interesting

a deliberate denial of service attack is illegal whether the victim is an innocent website or an evil spammer. There is no internet equivalent of lawful self defence.

If a spammed website is brought down by a method such as this, it wouldn't altogether surprise me if they sued the maker of the software responsible. Matters would be complicated if, as they might, they deny responsibility for the original spam e-mail.

(This is the case in the UK, I'd guess the position will be similar in the US but IANAAL (I Am Not An American Lawyer))

On the other hand, the "scan the spamvertised website for its content" sounds a great technical approach.

Re:horrid legal thought by Todd+Knarr · 2003-08-10 05:24 · Score: 4, Insightful

Why would it be illegal? The spammer put the links in the e-mail, obviously intending people to follow them (especially if they make reference to something being available at the linked site in the rest of the text). If far too many people follow the links and the site is brought down, how is that any more unlawful than Slashdot linking to a site in a story and the sudden burst of traffic bringing that site down?

I think the idea's dangerous for another reason, though. As noted, a spammer could easily include links to sites he doesn't like and let the traffic spike take them down.

This is stupid! by MoogMan · 2003-08-10 05:18 · Score: 4, Interesting

Seems a bit retarded to at least double the bandwidth drain from spam. Its bad enough as it is. This is *not* a viable solution, unless the spammers happened to be one hop away...

Comparison of Bayesian spam filters by kreide33 · 2003-08-10 05:27 · Score: 5, Informative

I recently switched from a keyword-based spam filter to a bayesian filter. However, there exists several bayesian filter projects and the choice of which to use is not obvious. Therefore, I decided to do an actual test and write up my findings in a review so others can benefit as well. Read it and find out how to win the War on spam.

Re:Comparison of Bayesian spam filters by __past__ · 2003-08-10 05:38 · Score: 4, Insightful

I always wondered how Graham felt about the hundreds of Bayesian filters written after he published his article. After all it was supposed to be a killer feature of a webmail system he (together with others, of course) writes to demo his Arc language.
Then again, he's probably still insanely rich from the ViaWeb (a.k.a Yahoo! Store) deal, and doesn't really have to care about lost business advantage much. Becoming a millionaire to be able to concentrate on hacking seems to be a good career plan :-)

--
Programming can be fun again. Film at 11.

Filter web-pages through bayesian filterss by flux · 2003-08-10 05:28 · Score: 5, Interesting

How about using the bayesian algorithms we have today and apply them to the referred web pages? I'm sure they would have plenty of good material for the filters to detect.. Plus this would propably be more effective with spam that effectively is only an url.

Secondly, I don't call this any kind of DDoS, even though it might seem such to spammers (is slashdotting a DDoS?). If anyone sends me a mail with an url, chances are they _want_ me to check it out. If my system fetches the pages and stores them to a cache, I'm doing exactly what the sender wants. (Mailing lists may be a problem though.)

Thirdly, does it really hurt you to let spammers know that your address is valid? Chances are the address will receive spam nevertheless..

I'm 1337 by MoeMoe · 2003-08-10 05:31 · Score: 4, Funny

One danger is someone doing a DDoS by sending fake spam

I'm sorry but spoof's dont usually work to well on me... I'm 2 1337 to be fooled.

Seriously though, if you just take a little more time to look into the header contents of that "penis enlargement" ad, you might find a pretty new IP addy to "play with" *cough* BO2K *cough* or atleast the real route that this spam took to get to you, just follow the yellow brick road back up to Mr. 12 extra inches and... well, you decide your own punishment for 'em ;)

Besides, it's not like you need that ad... do you?

--
Business \Busi"ness\, n.;
A scam in which all people involved perceive as beneficial...

Re:Do they really care? by Anonymous Coward · 2003-08-10 05:40 · Score: 5, Informative

You can have a domain/subdomain with no A records or MX records and they will keep trying. You can also have nothing but blackhole MXs - hosts that don't exist, but are on routable networks. I've had a domain since 1994, and it was in one of the above states for about 2-3 years.

Last month I put a real MX record in there and pointed it at box that's running a mail server. Sure enough, the spam flows continuously. It's not just the "make up random shit and put @aol.com" idiots either - the big outfits with permanent networks and domains are mailing it too.

I've taught my mail server to quarantine any host that attempts to mail my long-dead domain, so having it go to a routable address is actually useful again. Every attempt they make ruins another open proxy or relay for every other spammer that may find it later.

You might consider using those "never valid/previous owner" accounts as spam traps. Anything coming to them now is obviously worthless, so why not make them suffer for trying?

Thoughts on active countermeasures and relays... by atcroft · 2003-08-10 05:41 · Score: 5, Insightful

Just finished reading the section of the article that was headed as "Filters that fight back." I think that the biggest issues that keep such an approach from working are fundamental features of the e-mail infrastructure itself: 1) the lack of verification, and 2) the store-and-forward and replicative nature of email itself.

In other systems I am aware of in which active countermeasures may appear (such as firewalls, and tcpwrappers), the adversary can be established with reasonable certainty in most cases; however, because the From and Reply-To addresses can be (and often are) forged and most owners of relaying machines are unaware they are misconfigured, it seems doubtful countermeasures would work at that step. If one uses the URLs, as suggested in the article, it is not guaranteed that the "million" emails sent out will hit the next server along their path at a particular time, so it seems doubtful you can guarantee a massive traffic burst at once. Indeed, what may be seen instead is incremental bursts of traffic at the delivery retry intervals of various mailserver software.

Other questions also arise, such as: 1) how much additional load will a mailserver experience from hitting the links; 2) what additional security issues are introduced in doing so (what if, for instance, the code to do this results in a security vulnerability); 3) how can it be done in such a way that DDOS attacks against innocent victims can be avoided; and 4) how can you get enough people to both upgrade their systems and cooperate in a useful way to do this. Issues 1 and 2 are probably obvious questions to ask-issues 3 and 4, however, I believe suffer from the same weaknesses as some of the current BL schemes. Also, some localities have legal codes which prohibit the interruption of legitimate access to a system, and the server in this case definitely has a way to track back to you at that point, which potentially make participants vulnerable to legal or civil actions.

While I admire Mr. Graham and his efforts in the spam-wars, and find it an intriguing idea, I do not think this approach will truly be successful until changes are made to the underpinning email system that may reduce some of the issues mentioned, but hopefully will themselves make an impact on the issue without being too onerous to prevent wide-spread adoption.

The people who PAY spammers would not by The+Monster · 2003-08-10 05:45 · Score: 5, Interesting

In the situation where the spammer gets paid by hit, the spammer would be rich overnight. But, then the customer might see somthing a little fishy, then start asking questions.

So you're saying that the long-term effect would be to destroy the spammers' business model?

Looking for a downside to this plan . . . still looking . . . Nope. I can't see one.

--

[100% ISO 646 Compliant]
SVM, ERGO MONSTRO.

DDoS with IFRAMEs by The+Famous+Brett+Wat · 2003-08-10 05:50 · Score: 4, Informative

The problems with spam-based DDoS are bad enough already. Many HTML mail readers honour IFRAME tags, so if you want to DDoS someone, then just combine a Joe Job (fake their identity, advertise their site) with an HTML mail that contains N IFRAMEs, each set to be one pixel high and refer to a large page on the victim's site. Anyone who reads the spam in an uncautious HTML-capable mail client (of which there are still way too many) will subsequently attempt to fetch the specified page N times, unless you're lucky with intermediate caching proxies or the user hitting the stop button.

Such an attack on Nutters.org forced me to stop doing my own hosting on a DSL line, since it got utterly swamped and cost way too much in bandwidth. Amusingly, it has forced me into using a much cheaper and higher bandwidth service -- one where such attacks are no longer my problem. The rules of the game have changed for me, though: I no longer consider it viable to host a website on a low-bandwidth leaf node like a single DSL, even where normal usage would make it seem acceptable, since it makes you a sitting duck for this kind of attack. I still can't imagine why anyone would want to target Nutters.org; being small and unworthy of attack doesn't seem to be a good defense anymore.

--
proof, n. A demonstration that a conclusion is implied by certain premises and axioms.

Paul's good at this stuff, but this is no good... by wavecoder · 2003-08-10 05:52 · Score: 5, Insightful

The way I see it, these are the beefs people have:

Multiplies bandwidth exponentially, automatically. Big corporations, especially, would be hacked off by this, and it has the added downside of slowing whole sections of the net (imagine what happens when a college dorm gets hit and 800 little bots go check out the site 57 times...).
Accidental DDoS on good sites - yes, Victoria, spam can be spoofed VERY convincingly.
Accidental DDoS on good sites (2) - if you've ever maintained a mailing list of more than 20 people, you know that, eventually, some idiot complains he/she got spammed, even if they double-opted in. I've been accused of spamming when I was quoted 2/3 of the way into someone else's (double opt-in) message! I know great sites that are blacklisted, out of human stupidity, alone.
Accidental DDoS on good hosts - imagine the impact on any shared host, or even some virtual hosts, when one bad client mails 5 million spams - before they could react, they could be taken offline!
Bad programmers (gasp!) - yes, those exist, and some of these filters could really go haywire and start thrashing all sorts of sites.
Lawyers - IANAL, but I shudder to think what happens the first time Microsoft or Big Blue sues some programmer, because an abused copy of their software took them down for an hour! (What is the M$ site worth, per hour? Too much, for sure.) Granted, the suit should go the other way, but that's another topic.
Abuse of ISPs - you'd be amazed how many ISPs will pull the plug on paying accounts for even innocent behavior (like sending 1,000 messages on a DSL account in under an hour, even if it's a business and all the messages are unique). This could get a lot of folks kicked offline.

There are probably others... My thought is this - build a really good, Bayesian, SBPH filter like CRM114, and incorporate a "grab questionable sites" option for the "spams of the future," then filter that page as though it were spam. That'll get us all up into the 99.9% range (the noise), and spammers will eventually either (a) go out of business, or (b) only be able to get their messages to the few people that think they're worthwhile, anyway.

My $.02.

-Ed

--

Web Design & Software Development

Confirmed opt-in mailing lists. by SSpade · 2003-08-10 05:56 · Score: 4, Insightful

Has anyone considered what this will really do? It'll have next to no impact on spammers.

However, lots and lots of legitimate opt-in mailing lists are following best practices by requiring a closed-loop opt-in with a magic cookie to prevent forged signups.

How do they work? Well, usually you follow a URL containing a magic cookie in a challenge email to confirm you want to sign up for the mailing list. Oops.

(For added brokenness, combine this with the other flawed anti-spam fad-du-jour, challenge/response).

Sorry, bad idea by mikeswi · 2003-08-10 06:20 · Score: 5, Insightful

When my newsletter (confirmed Opt-in for the NANAE people who may be reading) goes out every Tuesday and 8,000 people open it, how am I supposed to deal with these filters DDoSing my site? For that matter, how do I deal with these filters attacking my site when some other newsletter links to it? What do I do when I piss off Ronnie Scelson and he links to every individual page on my site and spams 100,000,000 people with them?

Links are more likely to be found in legitimate email than in spam. We're going to whitelist every single existing domain on Earth, and then remove the bad ones? Do you have any idea how large that list would be and how long it would take to download it to compare with the domains found linked in an email?

Let's say this idea becomes used widely. It will be used as a weapon by the spammers themselves.

1.) Pay-per-click links sent in mass mailings. Spammer gets paid for every link clicked. I'm sure some of the advertisers will get wise, but there will be plenty who just sign the checks without looking deeper.

2.) Ronnie Scelson or Alan Ralsky get pissed at someone who owns a web site (SPEWS perhaps), and send the address to several hundred million people.

For the ISP sysadmins reading, you think it's bad when 20,000 spams land on your mail server? How are you going to like it when each of those 20,000 spams produce 3 or 4 (or 30 or 40) HTTP requests?

Sorry, bad idea. I can't see how the idea of "attack filters" does anything but discredit the whole idea, especially after thousands of perfectly innocent web sites are knocked offline by the sort of malicious software being advocating, or when spammers inevitably abuse it.

--
Only on /.

This is spectacularly stupid. by edunbar93 · 2003-08-10 06:22 · Score: 4, Insightful

Any program that does something this dangerous automatically, even to people that deserve it, is a BAD idea.

This is the sort of thing that needs human supervision because bugs, user input, and solar flares may cause the program to act differently than you think it should. Any sysadmin who's made programs that would affect thousands of users automatically knows this. There will be a percentage - no matter how small - that the program will affect negatively, and that tiny percentage will be very, very pissed off.

You should be exceptionally careful about where you point your Massive Hose of Death because after all, to err is human, but to really fuck things up requires a recursive algorithm working at 2 billion cycles per second.

It's also ocurred to me that you'd be hurting yourself just as bad bandwidth wise anyway. We all complain about how much of our mail is spam, and how much bandwidth it wastes, but to DDOS them would waste hundreds of times more, not only for you but every provider that carries the traffic.

--
"No problem. I have the capacity to do infinite work so long as you don't mind that my quality approaches zero."-Dilbert

New Spamming Technique : Trickle Spam. by androse · 2003-08-10 06:34 · Score: 4, Informative

I'm all for the idea, and as a matter of fact, I suggested it a couple of months ago.

If individual spam victims start repetitively downloading the spammers website, this could bring the spammer to change the way he sends spam from the current big bang technique to a small continuous trickle technique. The spammer would send a single spam over several weeks, in stead of a few hours. He would parallelize the process.

I see two possible counter-attacks to this :

content-based blacklisting (like Vilpul Razor, etc), i.e a central database of links that are currently being used in spam.
high aggressivity from the victims : if everyone loads the URI 50, 100, or 300 times, then the "trickle method" would probably fail. You should of course change the HTTP User Agent string for each request, and randomize the timing to stop any filtering on the web server.

Feel the rage !

SETI@HOME ? by axxackall · 2003-08-10 07:32 · Score: 5, Interesting

I think that some sort of SETI approach can be used:

your filter recognizes the spam and gets URLs from it;
all such URLs are gathered in the central authority and statistically verified (how many filters have claimed the same site);
only the most often claimed sites are left in the list, while more rarely claimed sites are considered as claimed by mistake or by the anti-filter attack;
people willing to help to fight spam download the screensaver aka SETI@HOME, working at your CPU and net idle time;
the screensaver downloads the fresh list of sites to be fought back along with a centrally generated schedule;
the filter actually attacks back at the scheduled time points (if it's still the idlle time for client PC), not massively from the individual PC (so it doesn't look suspicious for the individual client *AND* it doesn't create any peak bandwidth problem for the attacker);
the spammer's web site is /.ed;

All problems I see resolvable:

a schedule must be smart to avoid a local bandwidth problem, but still flood the spammer, but with many such screensavers even a smooth atack will be not very smooth when it's multiplied to millions;
a central authority can be a subject for a counter-attack as well (will it start cyber-wars?), but if the central authority will really decentralized (p2p, SETI, other techs) that it should not be a problem;
spammers may use some sort of logging, but what can they do with it?
to avoid if someone will organize the fake claim in order to /. the innocent site, statistics should help - only really massively claimed sites will be counted;

The main idea of the spam is to send email massively on a very low cost. So if the attack will be also very massive, it will increase their cost of operation and at least some of them will go out of business.

Any attmpts of spammers to go through filters will not work, as you can manually submit the spam claim to (what is its name? NOSPAM@HOME?) the central authority. If the amount of such claims will be big enough, then the claimed sites will be included.

--

Less is more !

Re:And now by Zeinfeld · 2003-08-10 07:59 · Score: 4, Insightful

And now thanks to links posted to Slashdot, Paul Graham is being DDoS'd =)

Which illustrates the problems that you get when people who have little or no security experience try to do security.

The problem with hackback schemes of all types is that they always end up having unexpected effects. The basic problem is that when people design a hackback scheme they never consider what happens when someone sets out to abuse it. They assume that the only change to the environment is their hackback scheme.

A few months ago Paul though Bayesean filtering was the one true solution. The only problem was that people who have spent years working on the techniques he described never achieved results anywhere close to the ones he claims.

Paul Graham's scheme is not as damaging as some others because the amplifier effect is limited. The message sender only gets five or ten messages created for each spam sent. But even that could make a profitable scheme for someone trying to get their site promoted in a 'most visited list'. If they have pay per view adverts they can rake in quite a few bucks - as much as a cent for every spam sent. Far from discouraging spam this scheme would create a new incentive.

BTW the guy who said 'there is no fake spam' is right depending on the definition you use. If you use the definition 'unwanted email sent indiscriminately' then he is pretty much right. If on the other hand you define spam as 'that which our filters decide is spam'... (I kid you not, folk do try to get that type of definition accepted). The exception would be satires like 'make penis fast'.

There are similar problems with the folks running blacklists, they think that they understand everything there is about spam but don't realize that the systems they set up can be and will be gamed. Every partisan political mailing list of every stripe that has a significant number of readers gets blacklisted from time to time as people sign up for the list in order to be able to report it as spamming.

Try to explain to either group that there is a problem and they get majorly defensive. You get accused of wanting to help the spammers, etc. etc. When people start getting defensive like that in response to fair questions you are in big trouble.

The way to deal with spam is to treat it as a security problem. We deal with security problems using access control - authentication and authorization. We need to start with robust authentication mechanisms that hold ISPs responsible for the messages sent from their domain. These need to be accompanied by robust authorization mechanisms that allow recipients to judge whether the sender is honest.

--
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/

Re:And now by Zeinfeld · 2003-08-10 10:45 · Score: 4, Insightful

>>The message sender only gets five or ten messages created for each spam sent.
Go back and read the article. It's about http requests, not sending mail.

Oh, I totally get the fact you are sending out http requests. The fact the message is HTTP rather than SMTP is not relevant as far as I am concerned. The original HTTP spec used the term messages for requests and responses. I really can't remember what we did in the RFC.

The amplifier effect is just the same, for each message in there could be five messages out. The main advantage to the spammer though is laundering the IP address so that their web site hits appear to come from 10,000 distinct views rather than the same view.

I don't know where you get this idea. I know plenty of filter hackers who get results so much better than me that I'm kind of embarrassed.

Getting that sort of result on their own mail is one thing, getting that result on a representative corpus of user emails is a very different matter.

Geek mail is much easier to spam filter than naive user's mail. They tend to be far more aggressive in the features they use. They are also the targets of the spammers, geeks being a minority. So the vocabulary chosen by spammers tends to be much closer.

My real concern is not whether a filter is 99.8 or 95% efficient at detecting spam, its the false positive rate that is the problem. 1% false positives is a big problem, even 0.5% is a serious problem. The other big problem is the sheer cost of CPU cycles. Imagine a room the size of a football field filled with 100 equipment racks. Processing the legitimate mail only requires one of those racks, the rest are for dealling with spam. Each processing step adds cost. Bayesian filtering is only one part of the solution.

I agree about going after the spammers, but litigation and law enforcement are far more likely to be effective than hackback.

What we need to do in addition is to change the mail protocols so that we can know that a message that purports to come from a particular source is authentic. At least 50% of the spam sent claims a false sender address. The tricks that spam senders use to hide from litigation are a very robust spamdicator that almost never gives a false positive.

--
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/

Slashdot Mirror

Paul Graham: Filters that Fight Back

26 of 328 comments (clear)