Paul Graham: Filters that Fight Back
Mortimer.CA writes "Paul Graham is back with another article about combating spam. It's entitled Filters that Fight Back: 'One intriguing idea is to literally fight back: to make filters disable spammers' servers by automatically following all the links in each incoming email. We may be driven to this in order to achieve accurate filtering anyway. Why wait?' One danger is someone doing a DDoS by sending fake spam."
The interesting thing is how the courts would end up viewing auto-clicks vs manual clicks. I'd bet that if a user set up a filter then it would be effectively view as the user doing the clicking...
The next site to slashdot will be ready soon, but subscribers can beat the rush and start slashdotting it early!
Why would it be illegal? The spammer put the links in the e-mail, obviously intending people to follow them (especially if they make reference to something being available at the linked site in the rest of the text). If far too many people follow the links and the site is brought down, how is that any more unlawful than Slashdot linking to a site in a story and the sudden burst of traffic bringing that site down?
I think the idea's dangerous for another reason, though. As noted, a spammer could easily include links to sites he doesn't like and let the traffic spike take them down.
Then again, he's probably still insanely rich from the ViaWeb (a.k.a Yahoo! Store) deal, and doesn't really have to care about lost business advantage much. Becoming a millionaire to be able to concentrate on hacking seems to be a good career plan :-)
Programming can be fun again. Film at 11.
Just finished reading the section of the article that was headed as "Filters that fight back." I think that the biggest issues that keep such an approach from working are fundamental features of the e-mail infrastructure itself: 1) the lack of verification, and 2) the store-and-forward and replicative nature of email itself.
In other systems I am aware of in which active countermeasures may appear (such as firewalls, and tcpwrappers), the adversary can be established with reasonable certainty in most cases; however, because the From and Reply-To addresses can be (and often are) forged and most owners of relaying machines are unaware they are misconfigured, it seems doubtful countermeasures would work at that step. If one uses the URLs, as suggested in the article, it is not guaranteed that the "million" emails sent out will hit the next server along their path at a particular time, so it seems doubtful you can guarantee a massive traffic burst at once. Indeed, what may be seen instead is incremental bursts of traffic at the delivery retry intervals of various mailserver software.
Other questions also arise, such as: 1) how much additional load will a mailserver experience from hitting the links; 2) what additional security issues are introduced in doing so (what if, for instance, the code to do this results in a security vulnerability); 3) how can it be done in such a way that DDOS attacks against innocent victims can be avoided; and 4) how can you get enough people to both upgrade their systems and cooperate in a useful way to do this. Issues 1 and 2 are probably obvious questions to ask-issues 3 and 4, however, I believe suffer from the same weaknesses as some of the current BL schemes. Also, some localities have legal codes which prohibit the interruption of legitimate access to a system, and the server in this case definitely has a way to track back to you at that point, which potentially make participants vulnerable to legal or civil actions.
While I admire Mr. Graham and his efforts in the spam-wars, and find it an intriguing idea, I do not think this approach will truly be successful until changes are made to the underpinning email system that may reduce some of the issues mentioned, but hopefully will themselves make an impact on the issue without being too onerous to prevent wide-spread adoption.
- Multiplies bandwidth exponentially, automatically. Big corporations, especially, would be hacked off by this, and it has the added downside of slowing whole sections of the net (imagine what happens when a college dorm gets hit and 800 little bots go check out the site 57 times...).
- Accidental DDoS on good sites - yes, Victoria, spam can be spoofed VERY convincingly.
- Accidental DDoS on good sites (2) - if you've ever maintained a mailing list of more than 20 people, you know that, eventually, some idiot complains he/she got spammed, even if they double-opted in. I've been accused of spamming when I was quoted 2/3 of the way into someone else's (double opt-in) message! I know great sites that are blacklisted, out of human stupidity, alone.
- Accidental DDoS on good hosts - imagine the impact on any shared host, or even some virtual hosts, when one bad client mails 5 million spams - before they could react, they could be taken offline!
- Bad programmers (gasp!) - yes, those exist, and some of these filters could really go haywire and start thrashing all sorts of sites.
- Lawyers - IANAL, but I shudder to think what happens the first time Microsoft or Big Blue sues some programmer, because an abused copy of their software took them down for an hour! (What is the M$ site worth, per hour? Too much, for sure.) Granted, the suit should go the other way, but that's another topic.
- Abuse of ISPs - you'd be amazed how many ISPs will pull the plug on paying accounts for even innocent behavior (like sending 1,000 messages on a DSL account in under an hour, even if it's a business and all the messages are unique). This could get a lot of folks kicked offline.
There are probably others... My thought is this - build a really good, Bayesian, SBPH filter like CRM114, and incorporate a "grab questionable sites" option for the "spams of the future," then filter that page as though it were spam. That'll get us all up into the 99.9% range (the noise), and spammers will eventually either (a) go out of business, or (b) only be able to get their messages to the few people that think they're worthwhile, anyway.My $.02.
-Ed
Web Design & Software Development
Has anyone considered what this will really do? It'll have next to no impact on spammers.
However, lots and lots of legitimate opt-in mailing lists are following best practices by requiring a closed-loop opt-in with a magic cookie to prevent forged signups.
How do they work? Well, usually you follow a URL containing a magic cookie in a challenge email to confirm you want to sign up for the mailing list. Oops.
(For added brokenness, combine this with the other flawed anti-spam fad-du-jour, challenge/response).
When my newsletter (confirmed Opt-in for the NANAE people who may be reading) goes out every Tuesday and 8,000 people open it, how am I supposed to deal with these filters DDoSing my site? For that matter, how do I deal with these filters attacking my site when some other newsletter links to it? What do I do when I piss off Ronnie Scelson and he links to every individual page on my site and spams 100,000,000 people with them?
Links are more likely to be found in legitimate email than in spam. We're going to whitelist every single existing domain on Earth, and then remove the bad ones? Do you have any idea how large that list would be and how long it would take to download it to compare with the domains found linked in an email?
Let's say this idea becomes used widely. It will be used as a weapon by the spammers themselves.
1.) Pay-per-click links sent in mass mailings. Spammer gets paid for every link clicked. I'm sure some of the advertisers will get wise, but there will be plenty who just sign the checks without looking deeper.
2.) Ronnie Scelson or Alan Ralsky get pissed at someone who owns a web site (SPEWS perhaps), and send the address to several hundred million people.
For the ISP sysadmins reading, you think it's bad when 20,000 spams land on your mail server? How are you going to like it when each of those 20,000 spams produce 3 or 4 (or 30 or 40) HTTP requests?
Sorry, bad idea. I can't see how the idea of "attack filters" does anything but discredit the whole idea, especially after thousands of perfectly innocent web sites are knocked offline by the sort of malicious software being advocating, or when spammers inevitably abuse it.
Only on
Any program that does something this dangerous automatically, even to people that deserve it, is a BAD idea.
This is the sort of thing that needs human supervision because bugs, user input, and solar flares may cause the program to act differently than you think it should. Any sysadmin who's made programs that would affect thousands of users automatically knows this. There will be a percentage - no matter how small - that the program will affect negatively, and that tiny percentage will be very, very pissed off.
You should be exceptionally careful about where you point your Massive Hose of Death because after all, to err is human, but to really fuck things up requires a recursive algorithm working at 2 billion cycles per second.
It's also ocurred to me that you'd be hurting yourself just as bad bandwidth wise anyway. We all complain about how much of our mail is spam, and how much bandwidth it wastes, but to DDOS them would waste hundreds of times more, not only for you but every provider that carries the traffic.
"No problem. I have the capacity to do infinite work so long as you don't mind that my quality approaches zero."-Dilbert
That's not going to work. All you are going to do would be to needlessly DOS www.geocities.com without any particular spammers site being identified. Geocities would have no way to identify which site is the spammer's, and their hourly bandwidth would never get used up, and thus would still be available for those who click on the links.
Also, consider that spammers could move the identifier to the other end of the url. Just have *.spammer.com or www.*.spammer.com resolve to the same site, and start putting the identifiers in the domain. They could even use random dictionary words as the identifiers to make it more difficult to pick out. The only way to combat that would be to have a system that compares the URLs from several spams and figures out which parts of the URLs changed per user.
Which illustrates the problems that you get when people who have little or no security experience try to do security.
The problem with hackback schemes of all types is that they always end up having unexpected effects. The basic problem is that when people design a hackback scheme they never consider what happens when someone sets out to abuse it. They assume that the only change to the environment is their hackback scheme.
A few months ago Paul though Bayesean filtering was the one true solution. The only problem was that people who have spent years working on the techniques he described never achieved results anywhere close to the ones he claims.
Paul Graham's scheme is not as damaging as some others because the amplifier effect is limited. The message sender only gets five or ten messages created for each spam sent. But even that could make a profitable scheme for someone trying to get their site promoted in a 'most visited list'. If they have pay per view adverts they can rake in quite a few bucks - as much as a cent for every spam sent. Far from discouraging spam this scheme would create a new incentive.
BTW the guy who said 'there is no fake spam' is right depending on the definition you use. If you use the definition 'unwanted email sent indiscriminately' then he is pretty much right. If on the other hand you define spam as 'that which our filters decide is spam'... (I kid you not, folk do try to get that type of definition accepted). The exception would be satires like 'make penis fast'.
There are similar problems with the folks running blacklists, they think that they understand everything there is about spam but don't realize that the systems they set up can be and will be gamed. Every partisan political mailing list of every stripe that has a significant number of readers gets blacklisted from time to time as people sign up for the list in order to be able to report it as spamming.
Try to explain to either group that there is a problem and they get majorly defensive. You get accused of wanting to help the spammers, etc. etc. When people start getting defensive like that in response to fair questions you are in big trouble.
The way to deal with spam is to treat it as a security problem. We deal with security problems using access control - authentication and authorization. We need to start with robust authentication mechanisms that hold ISPs responsible for the messages sent from their domain. These need to be accompanied by robust authorization mechanisms that allow recipients to judge whether the sender is honest.
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
Go back and read the article. It's about http requests, not sending mail.
Oh, I totally get the fact you are sending out http requests. The fact the message is HTTP rather than SMTP is not relevant as far as I am concerned. The original HTTP spec used the term messages for requests and responses. I really can't remember what we did in the RFC.
The amplifier effect is just the same, for each message in there could be five messages out. The main advantage to the spammer though is laundering the IP address so that their web site hits appear to come from 10,000 distinct views rather than the same view.
I don't know where you get this idea. I know plenty of filter hackers who get results so much better than me that I'm kind of embarrassed.
Getting that sort of result on their own mail is one thing, getting that result on a representative corpus of user emails is a very different matter.
Geek mail is much easier to spam filter than naive user's mail. They tend to be far more aggressive in the features they use. They are also the targets of the spammers, geeks being a minority. So the vocabulary chosen by spammers tends to be much closer.
My real concern is not whether a filter is 99.8 or 95% efficient at detecting spam, its the false positive rate that is the problem. 1% false positives is a big problem, even 0.5% is a serious problem. The other big problem is the sheer cost of CPU cycles. Imagine a room the size of a football field filled with 100 equipment racks. Processing the legitimate mail only requires one of those racks, the rest are for dealling with spam. Each processing step adds cost. Bayesian filtering is only one part of the solution.
I agree about going after the spammers, but litigation and law enforcement are far more likely to be effective than hackback.
What we need to do in addition is to change the mail protocols so that we can know that a message that purports to come from a particular source is authentic. At least 50% of the spam sent claims a false sender address. The tricks that spam senders use to hide from litigation are a very robust spamdicator that almost never gives a false positive.
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/