Paul Graham: Filters that Fight Back

Following links validates your address by PeekabooCaribou · 2003-08-10 05:12 · Score: 5, Interesting

If I load an image or a link from spam, it's possible that a spammer could be validating my e-mail address for future sale, or perhaps increased spamming since he knows someone is actually reading the message. For example, http://server.foo/image.gif?id=ab0a98df12j3 could be unique to the spam that was sent to me. If any user-agent accesses that URL, the spammer knows that my e-mail is active and I'm reading his junk. I don't know if they actually do this in practice, but I'm wont to load HTML messages because of it.

--
"I'll say it again for the logic-impaired." -- Larry Wall.

Re:Following links validates your address by hankaholic · 2003-08-10 05:16 · Score: 5, Interesting

I've been thinking for a while about maybe having a Slashbox that displays images included in spam in a 1x1 pixel box.

Every load of Slashdot would hit spammers' servers.

--
Somebody get that guy an ambulance!
Re:Following links validates your address by koehn · 2003-08-10 05:27 · Score: 4, Interesting

Actually, the opposite would happen: since all links in all spams get hit, this technique would make putting UIDs into URLs worthless for the purpose of authenticating users.

Spammers would need another mechanism to attempt to authenticate who reads their messages. I like it.

What do you think about downloading IMG tags? It would hurt the server's bandwidth, but it would hurt my mail server's bandwidth, too. Maybe use one of the many open proxies out there instead, kill their bandwidth, maybe close the open proxy... ooh, that's evil! I really like it!

If there were a sig here, would you read it?

horrid legal thought by BobTheLawyer · 2003-08-10 05:18 · Score: 4, Interesting

a deliberate denial of service attack is illegal whether the victim is an innocent website or an evil spammer. There is no internet equivalent of lawful self defence.

If a spammed website is brought down by a method such as this, it wouldn't altogether surprise me if they sued the maker of the software responsible. Matters would be complicated if, as they might, they deny responsibility for the original spam e-mail.

(This is the case in the UK, I'd guess the position will be similar in the US but IANAAL (I Am Not An American Lawyer))

On the other hand, the "scan the spamvertised website for its content" sounds a great technical approach.

This is stupid! by MoogMan · 2003-08-10 05:18 · Score: 4, Interesting

Seems a bit retarded to at least double the bandwidth drain from spam. Its bad enough as it is. This is *not* a viable solution, unless the spammers happened to be one hop away...

Needs Critical Mass, but how do you tame it? by globalar · 2003-08-10 05:26 · Score: 3, Interesting

"We should try to ensure that this is only done to suspected spams"

I am not sure that is 100% possible. In light of that reality, this might just punish any server, not necessarily attached directly to the spammer. For example, if I wanted to shutdown a site, couldn't I spam a million inboxes with that site's address?

I could see this solution, when mismanaged, merely creating lots of extra, meaningless traffic as well.

I am all for doing something to inconvenience spam, but it seems that the most effective solutions always come at a direct cost to everyone. For example, I have read about adding a small CPU penalty calculation for every email sent. This new solution isnt quite as distributed - it adds traffic to networks and places loads on servers, but its still a penalty.

I guess the real challenge is finding a way to penalize the spammers and no one else. Good thoughts, and honestly if my client supported a "punish mode," I think I would be tempted to use it with the same careless sense I apply delete.

Filter web-pages through bayesian filterss by flux · 2003-08-10 05:28 · Score: 5, Interesting

How about using the bayesian algorithms we have today and apply them to the referred web pages? I'm sure they would have plenty of good material for the filters to detect.. Plus this would propably be more effective with spam that effectively is only an url.

Secondly, I don't call this any kind of DDoS, even though it might seem such to spammers (is slashdotting a DDoS?). If anyone sends me a mail with an url, chances are they _want_ me to check it out. If my system fetches the pages and stores them to a cache, I'm doing exactly what the sender wants. (Mailing lists may be a problem though.)

Thirdly, does it really hurt you to let spammers know that your address is valid? Chances are the address will receive spam nevertheless..

another approach by mwilliamson · 2003-08-10 05:29 · Score: 3, Interesting

I think this approach would be rather simple to implement

Copyright my gnupg/pgp public key and write a EULA outlining its use. Here is where I'd explicitly disallow unsolicited advertisement.
Have procmail or some other filter direct all non-pgp mail to /dev/null
If someone sucessfully sends me encrypted email having violating the EULA of my gnupg/pgp key, pursue legal action against them.
Enjoy my spammless mailspool

There are other fringe benefits...the overhead encrypting to a large number of keys would certainly slow a spammer's throughput down. Also, this would encourage the use of widespread secure email.

The people who PAY spammers would not by The+Monster · 2003-08-10 05:45 · Score: 5, Interesting

In the situation where the spammer gets paid by hit, the spammer would be rich overnight. But, then the customer might see somthing a little fishy, then start asking questions.

So you're saying that the long-term effect would be to destroy the spammers' business model?

Looking for a downside to this plan . . . still looking . . . Nope. I can't see one.

--

[100% ISO 646 Compliant]
SVM, ERGO MONSTRO.

Interesting side-effect by leetrum · 2003-08-10 05:46 · Score: 3, Interesting

An interesting side effect of this strategy would be that it would be harder to track comissions based on per-click (instead of per-sale) for the sites employing spammers, thus limiting their income to people who buy (which can gernerally be a better comission anyway, but not offered by all these seedy companies).

Another idea by skinfitz · 2003-08-10 05:59 · Score: 2, Interesting

Why not just have the filter reply to the sending address with it's own randomly generated addy and auto drop those messages that use fake addresses that bounce? This could be done within seconds in most cases. The only issues here would be storage of the spam and how long you wait. It could be done by "keeping the spammer on the line" during the SMTP transfer also causing the transmission of spam to be delayed.
Could it work?

Re:Dangerous from a legal perspective by hardaker · 2003-08-10 06:00 · Score: 2, Interesting

yeah, but its how slow the law changes that should scare you.

Plus you know the law would be written like "A computer user must manually actively active a link for a legal binding to have an effect; All computers must enforce digital rights management"

which not only allows for click-through-licensing but ties on a second hidden agenda (pick your topic). Everyone will think the first sentence would do what they wanted and not care about the rest. Hmm... sounds like I'm kind of bitter about the current state of the legal system.

--
The next site to slashdot will be ready soon, but subscribers can beat the rush and start slashdotting it early!

collateral damage? Not really by swordgeek · 2003-08-10 06:00 · Score: 2, Interesting

I've seen a few posts about the possibility of collateral damage--deliberately targetting someone else's server as the target of an auto-DDOS. Someone also mentioned hijacking a server, and then bringing it down.

The thing is, it's no easier to do it with this proposed system than anything that's currently available. In this case you have to download (buy?!) a copy of spamming software, get a list, and then run a DDOS that's actually traceable back to you. Good plan? Not by my thinking.

Now the nice thing about this is that it will end up costing an inordinate amount of money for the spammer, take down their servers, and really piss off their ISP. (Watch the pink contracts dissappear!) This is a fairly drastic measure that might actually get rid of many spammers for good.

Basically, it's either this or a crowbar to the head.

--

"People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban

Re:response to the lister's comment by Anonymous Coward · 2003-08-10 06:14 · Score: 1, Interesting

Exactly. The beauty of this idea, is that it's not really a cooridnated attack. It's just a reasonable responce to an e-mail. If they send the mail, you can certainly follow each link once. If you have a "central authority", then you have a consiracy to attack these spammers. You're giving them someone to sue, or worse.

Avoid URL validation - lie to them by Tool+Man · 2003-08-10 07:16 · Score: 2, Interesting

I like the idea of whacking the spammers' bandwidth, but I'm not really keen on validating the email address the bastards have reached.

So, why not follow the links, but change the parameter values? It's all something which we'd do programmatically anyway, so subtle variations in the value portion would still incur the expense of processing the input, even if it fails. Keep the path component of the URL, and the parameter names used, so it gets as far as possible before blowing chunks.

Re:response to the lister's comment by Anonymous Coward · 2003-08-10 07:20 · Score: 1, Interesting

This idea of "attacking" spammers has always intregued me, but I've run across many innocent people who's email address was in the "replyto" field of the spam getting hammered by bounced emails. This is commonly referred to as a "joe job".

I'm only at liberty to say that a certain famous hacker is soon to release an awsome spammer tool that can certainly jam ths spam back in the face of the spammer.

It's common knowledge that the first "received" line in a message is the REAL IP address the mail traveled through before it hit your mailbox.

In most cases, this is the SMTP server's IP address, so therefore it is possible to establish a connection to this server (using socket level connection protocol).

Although I'm told this feature won't be in the release version, but pre-alpha testing has revealed some really cool things it's capable of.

First, it tries to connect through port 25 (SMTP), and sends a pre-composed message to an assumed account of "postmaster", then "root", then "hostmaster", then "abuse". each of these suspected users are then checked to see if mailboxes exist for them. if so, a pretty nasty "cease and desist" letter would be sent to this box.

On some ocassions, the spammer was actually supid enough to reply to this message, and was rather freaked out we tracked them down. It gave me a wonderful opportunity to feed his stupid head with all sorts of bullshit about how we can track them down and to spread the word to all his other spammer friends about the existance of this tool (strike fear the mind of these bozo spammers). This particular individual was just some bozo that reponded to a spam about an amazing "work at home" offer, paid his $39 and got his spam kit.

In about 10 - 15% of the cases, this is usually the spammer's spam proxy server. Most of which are in China or Brazil. They are easy to identiy as most don't have a reverse DNS, or are determined to be bogus.

Here is where it gets really interesting... This IP is then scanned for vulnerabilities, and in most cases, one is found. It then installs a nasty bit of code that halts the machine's ability to send spam. Of course this is highly illegal, but the way this program is written, anyone could write a simple script to do this.

Other interesting things I've seen it do, is to issue a "honeypot" address. With a little bit of "sendmail" scripting, it's possible to allow a user like "fred@mydomain.com" issue an email address like "fred8765@mydomain.com", which would strip of the numbers.

In earlier experiments, we started to opt out of every spam we got. Not caring if our Email address might wind up in every "tom dick and harry" spammer's mailing list. First we would opt out using our normal address, then opt out again using the honeypot address.

The results were amazing.... In just 9 hours, we started getting spam into our honeypot addresses. A simple database lookup revealed that when we opted out of the "mortgage" spam, instead of opting us out, it just added us to their mailing list. BUSTED!

And amazingly enough, they had the audacity to include a "privacy policy" that promised they wouldn't sell or release our Email address..... LIERS!!!

Legal action is pending, and with "deposition before supena" we are able to get the ISPs to release all their logs, and we nailed them.

As soon as this amazing tool is released, spamming, the way we know it today, is going to cost the spammers a huge amount of money.

In just 2 weeks of use of this program, an amazing reduction of spam was realized, and in fact it even made a dent in the whole internet as a whole. Imagine of everyone could use this tool. Hopefully that day will soon arrive. Because of it's detailed reporting ability, it is very selective in how it arranges reports, and ISP's are very pleased to be getting such detailed reports, and are much more likely to act on them because they are so rich in useful infor

SETI@HOME ? by axxackall · 2003-08-10 07:32 · Score: 5, Interesting

I think that some sort of SETI approach can be used:

your filter recognizes the spam and gets URLs from it;
all such URLs are gathered in the central authority and statistically verified (how many filters have claimed the same site);
only the most often claimed sites are left in the list, while more rarely claimed sites are considered as claimed by mistake or by the anti-filter attack;
people willing to help to fight spam download the screensaver aka SETI@HOME, working at your CPU and net idle time;
the screensaver downloads the fresh list of sites to be fought back along with a centrally generated schedule;
the filter actually attacks back at the scheduled time points (if it's still the idlle time for client PC), not massively from the individual PC (so it doesn't look suspicious for the individual client *AND* it doesn't create any peak bandwidth problem for the attacker);
the spammer's web site is /.ed;

All problems I see resolvable:

a schedule must be smart to avoid a local bandwidth problem, but still flood the spammer, but with many such screensavers even a smooth atack will be not very smooth when it's multiplied to millions;
a central authority can be a subject for a counter-attack as well (will it start cyber-wars?), but if the central authority will really decentralized (p2p, SETI, other techs) that it should not be a problem;
spammers may use some sort of logging, but what can they do with it?
to avoid if someone will organize the fake claim in order to /. the innocent site, statistics should help - only really massively claimed sites will be counted;

The main idea of the spam is to send email massively on a very low cost. So if the attack will be also very massive, it will increase their cost of operation and at least some of them will go out of business.

Any attmpts of spammers to go through filters will not work, as you can manually submit the spam claim to (what is its name? NOSPAM@HOME?) the central authority. If the amount of such claims will be big enough, then the claimed sites will be included.

--

Less is more !

Bad idea, but might be improved by Animats · 2003-08-10 07:37 · Score: 2, Interesting

The good idea there is to filter spam based on what it links to. SpamCop already does some of this, and reports the spamvertised site to its ISP or upstream provider. This is reasonably effective. It also identifies black-hat ISPs that host sites referenced in much spam.

Re:Hear! hear! by hankaholic · 2003-08-10 08:09 · Score: 2, Interesting

A 404 would cause load on their servers, but pulling actual images would rob their bandwidth as well.

--
Somebody get that guy an ambulance!

Fight fire... by adding fire? by quacking+duck · 2003-08-10 08:27 · Score: 3, Interesting

Given that so many people, even corporate execs, are stupid enough to order stuff from spammers, why not use this fact to our advantage?

Send out "white hat" spam, which for all intents and purposes looks like real (ie "black hat") spam. Except clicking on the link takes you to any number of webpages that basically say "are you so f***ing stupid you actually believe pills can make your penis/breasts/whatever larger?"

Adjust content to suit type of spam. Include disgusting images if the type of spam you're emulating is adult-oriented (pr0n, enlargements, etc), something else entirely if you're "selling" mortgages or similarly benign wares (ie no goatse.cx-type images if you're "selling".

And to cap it off, if viewers are so enraged at what they see, the page will have a feedback link. The link will either be a known spammer's email so they receive their venting instead of their money, or link to yet another anti-spam site.

Geeks and filters will automatically block this stuff out, so there's no harm done to us, aside from having to filter out even more spam.

But with any luck, if enough of these anti-spam spams get sent out that people start associating spam messages with informative, insulting or disgusting websites, they'll learn to stop clicking on those damn links, stop buying their bullshit products, the spam model becomes unprofitable, and spam is reduced to a saner level or eliminated entirely.

Legal implications? No better and no worse than black hat spammers.

Comments?

Re:Comparison of Bayesian spam filters by asteinberg · 2003-08-10 09:06 · Score: 2, Interesting

I've always wondered how Paul Graham has managed to get so much hype built up about his work. The idea of using Bayesian filters to classify spam had been around about 5 years prior to his "A Plan For Spam" - check out, for example, this paper by Mehran Sahami (a very cool guy who works here at Stanford as well as at Google) from 1998: http://citeseer.nj.nec.com/sahami98bayesian.html (and if you search around on Citeseer you'll undoubtedly find many other papers on spam classifying from even earlier, though not all use Naive Bayes).

Mathematically, Graham's version of Naive Bayes is pretty weak - look at the original A Plan for Spam, he chooses all kinds of random numbers based purely on trial and error, rather than backing them up with mathematical reasoning:

I want to bias the probabilities slightly to avoid false positives, and by trial and error I've found that a good way to do it is to double all the numbers in good. This helps to distinguish between words that occasionally do occur in legitimate email and words that almost never do. I only consider words that occur more than five times in total (actually, because of the doubling, occurring three times in nonspam mail would be enough). And then there is the question of what probability to assign to words that occur in one corpus but not the other. Again by trial and error I chose .01 and .99. There may be room for tuning here, but as the corpus grows such tuning will happen automatically anyway.

That's just one paragraph, stuff like that is all over the paper. There are many more logical ways to bias the classifier away from false-positives, which I'm not sure if it's worth getting into. Having spent the summer implementing many different variations on spam filtering, I can say confidently that Graham's variation is definitely far from the best.

--
The first ever Ultimate Frisbee video game: here (now

Re:And now by Anonymous Coward · 2003-08-10 09:33 · Score: 1, Interesting

A few months ago Paul though Bayesean filtering was the one true solution. The only problem was that people who have spent years working on the techniques he described never achieved results anywhere close to the ones he claims.

I don't know where you get this idea. I know plenty of filter hackers who get results so much better than me that I'm kind of embarrassed.

I still think Bayesian filtering works. (My current filtering rate is around 99.8%.) But that only stops me from seeing the spams. This is something to attach to it, to cause the spams to stop being sent. But the brain of the whole system is still a Bayesian filter.

The message sender only gets five or ten messages created for each spam sent.

Go back and read the article. It's about http requests, not sending mail.

P2P Analogy by prozac79 · 2003-08-10 10:30 · Score: 2, Interesting

Isn't this what some congressman is trying to get passed for P2P networks? He thinks that it is perfectly acceptable for copyright holders to hack P2P networks and bring down machines that are suspected of having illegally obtained copyrighted material. Now we propose this for spam and suddenly this is a good thing? I know, nobody likes spammers, but that can't be the foundation to allowing people to hack other's systems. If filters were allowed to strike back at spammers, that would give the RIAA and MPAA all the ammo they need to lobby for new laws that allow disabling people's service. As many people have said in other posts, it sets a very slippery slope that will probably have consequences beyond what we initially invision, not just for email, but for anything that someone does over the internet that is "unwanted".

--
"Oh dear, she's stuck in an infinite loop and he's an idiot" -Prof. Farnsworth (Futurama)

Sounds a lot like an old idea... by jemfinch · 2003-08-10 10:52 · Score: 2, Interesting

Making spammers pay for each spam they send? Sounds a lot like Daniel Bernstein's Internet Mail 2000 recommendation, except that this idea has far more potential for abuse. As much as I like Paul Graham's innovative ideas, this one is definitely both late on the scene and inferior to IM2000.

Jeremy

--
Looking for a Python IRC bot?

RE: Filters that Fight Back by Tacoguy · 2003-08-10 11:27 · Score: 3, Interesting

Spam fighting, it seems to me has 2 fronts. What to do when you get on the lists and how did you get there to begin with. Having made numeous web sites thru the years it has become clear to me that these spammers are largely harvesting addys thru mail-to links on web pages. A number of techniques can be utilized to prevent such activity. 2 of my favs are the use of ASCII characters in the actual addy and the use of Javascript to mask the addy. Once you are "in their hooks" there seems little you can do so it seems best to me to not get there in the first place. Best Jeff

Re:And now by mdinowitz · 2003-08-10 13:04 · Score: 2, Interesting

There's an additional issue here. What of mailing lists which go out to huge amounts of people and include such things as unsubscribe urls in the header or footer. My server is already overloaded with the lists I run as is. Having even 1% of those who get mail from it pining it for content can run into tens of thousands of extra hits a day for no constructive reason.
In addition, if the advertising view scheme you mention goes into effect, it will drive advertising off the web even further than it is now.
The article is interesting, but....

--
Michael Dinowitz House of Fusion http://www.houseoffusion.com

Re:noooooooo by mikiN · 2003-08-10 13:25 · Score: 2, Interesting

... it would screw up stuff like mailing lists that have URLs to click to confirm you want to be on the list.

Simple problem, simple solution: mailing lists should use something like

Please <a href="mailto:listowner@some.domain?subject=confirm -#confirmationkey">confirm</a>your subscription.

Please don't let the 'clickability factor' of an http URL (1 click) versus a plain old mailto (2 or more clicks to send) get in the way of privacy protection. I suppose that when you have just subscribed to a mailing list you are interested in more than just the confirmation message, so you have some clicks to spare

-
Never send a machine to do a human's job.

--
The Hacker's Guide To The Kernel: Don't panic()!

Re:And now by mdinowitz · 2003-08-10 13:30 · Score: 2, Interesting

And here's the evil that can come from this. A spam message with a link that says "by pressing this link, you signify that you wish to opt into our mailings". The spam filter automatically visits the link and boom, you've opted into God knows what.
I can think of a TON of things that would be good for. or bad for as the case may be.

--
Michael Dinowitz House of Fusion http://www.houseoffusion.com

Slashdot Mirror

Paul Graham: Filters that Fight Back

28 of 328 comments (clear)