TarProxy Creates Tar Pit... For Spammers
agravaine writes "I ran across TarProxy, which, IMHO, is one of the cleverest spammer-handling ideas I've seen yet. The gist: Early detection of incoming spam [using the statistical techniques pioneered on the client side] could be used to create an artificial scarcity of bandwidth experienced only by spammers." This project hasn't gone very far yet, but essentially is slows SMTP requests to suspected spammers. If this really works, and is installed on enough of the net, it could work. 144 spam so far today. Anything would be an improvement. CT Yup, it's a dupe. There wasn't anything better to post at 9am on a sunday, so you can just bitch about me instead ;)
In the spirit of repetition...
Easy to defeat, just use spamming software that dynamically increases it's connection pool whenever it encounters a 'slow' SMTP recipient. Even if a large part of the net population were running this, the spammer could just spawn thousands of simultanious (slowed down, yes) connections, and still maximize his bandwidth utilization. If it takes 2 minutes to send each message, it dosen't matter if he's sending 5000 messages at once!
I believe linux, for example, allows up to 8192 open sockets, and I think this can be changes with a sysctl command, and most definitely could be with a few changes to kernel headers.
Sure, it would take a machine with decent memory, but that's not too hard to find.
---
the pen is mightier than the sword, the sword is mightier than the court, the court is mightier than the pen.
And you would not need to roll this out on most of the net. If the large ISP and webmail providers started doing this it would have a significant impact. Much of the spammer's distribution list consists of a few domains; yahoo, hotmail, aol, etc. If the large providers implemented tarpits it could quickly damage the ready supply of open relays for spammers.
Not exactly the same thing as the article is about, but still related: My mailserver is properly secured and refuses to relay anything except legitimate mail (i.e. it will accept incoming mail for users on the domains it serves and it only relays mail to the outside world when it's from a predefined set of internal machines). There are plenty of spammers trying to convince my mailserver to send their spam to other people, but all get a nice "relaying denied" message and a couple of lines in my maillog.
/dev/null and telling the spammer what he or she wants to hear: I have delivered your junk. The logs would prove useful, the spam is prevented. Happy happy, joy joy.
I think it's a safe bet all relaying attempts originating from the outside of my network are spammers. The information in the maillog about denied relaying attempts should give an accurate list of IP-numbers used by spammers.
Doesn't this give some interesting opportunities?
Creating spamtrap daemons that listen on servers that aren't mailservers (so the fact the behave similar to a real mailserver and listen to the same TCP port is just a coincidence). Those server should be unlisted, not have any DNS records pointing at them being MX for any domains, etc.
The only way to find them should be be randomly scanning an IP range.
In that case the only people using them would be spammers trying to abuse random mailservers and it would be pretty safe to have the fake mailserver pretend to accept the mail, wait a while, try to gobble up some resources of the spammer, and finally dumping the spam-attempt to
The biggest disadvantage would be that such a fake relaying server would probably trigger some of the open-relay scanners (although the clueful scanners would wait until a message is actually received). Hmmm, spammers could do the same, really probing a mailrelay before trying to use it...
Anyway, it would cost spammers more and more effort and probably annoy the hell out of them, which is a Good Thing.
While this is a very cool idea and works if you run your *own* mail server, it doesn't do any good for those of us that grab our mail from ISP's and use POP (or some other protocol). It means we have to convince our ISP to use this product/concept, which in my case (cable company) is impossible since they are a bunch of twits anyway.
-- DuckWing
Hmm... This calls for some TCP geekhood and some strong math. I am way too hung over for math. Let's just talk about it in broad strokes first.
If I'm tuning this package, I can make these delays REAL big. I mean, email is one of those systems where a false positive resulting in even a... let's say an **8 hour** delay to a legitimate message would still be considered perfectly fine for most purposes. There's fuzzy logic in play here; I'm thinking not all delays will be equal. But what if you were just really harsh on suspected spam? Not such a loss IMO. Of course... I haven't considered that you will have increased reliability problems trying to hold a stream open for 8 hours, but remember, a legitimate mailserver will keep resending, and as we go bayesian on servers, perhaps we will learn to resend for a little longer as well? Or perhaps there is another protocol solution (i.e. letting the sender know they're being delayed for spam... so perhaps giving them the option to reformulate their message and resend?) Let's just press on. The precise amount of the delay may not necessarily be important.
If I'm sending 50 million messages (a modest spammer's run, if I'm well enough informed) and each one holds me on the line for 8 hours that means 400 million hours if run serially. At the 8192 concurrent thread barrier that's still almost 50,000 hours (~5 years)... with mathematical convenience, to do this entire run in 8 hours you will require **50 million** concurrent threads? Or should I have just stayed in bed longer?
Now it's looking like the exact length of the delays, and the exact number of concurrent threads is not actually something worth too much niggling debate. We just have to get familiar with the orders of magnitude we're dealing with.
Consider the protocol-to-data ratio of an SMTP transaction over TCP alone. How much is data and how much is just protocol overhead in a given mail transaction? We can figure this out down to the last bit, but I'm going to just throw out the hypothetical notion that when you have to initiate a new SMTP transaction for every message you send, the bandwidth overhead for doing this millions of times is not inconsiderable.
And we have to think of the other end. Spammers may write themselves custom TCP/IP stacks, but receivers certainly will not. Consider AOL. AOL encompasses some significant percentage of your list of victims. What is AOL going to do with anywhere _near_ that many simultaneous connections... ***from just one spammer?*** Why, call the FBI, of course! It's a DOS attack!
I'll stop now. I wouldn't be surprised if there were other angles on this I haven't considered. But at first blush it doesn't seem nearly as easy to beat as you suggest.
Perversely I think the biggest danger in this technique is that it may become widespread and then force spammers to really confront Bayesian filtering head-on. Of course, just thinking aloud (and this probably is undoable for privacy reasons, but just to open a line of speculation) you can do some interesting things with these kinds of filters... retain lists of email addresses that you've received mail from (and/or replied to) more than once... they get a lower score (and a lower delay) than first-time senders... etc. etc. So it's not clear even with very well-designed spam (another cost increase for spammers!) that you could win against the filter.
Want to Know How to Cheat the GPL? Read On!
I suggested a similar mechanism to constipate TCP connections on the IETF e-mail list last summer. The basic idea is to add some new calls to the TCP API so that an application could peek at the incoming traffic without it being acknolwledged at the TCP level. If the incoming stream were something bad, then the application could tell the TCP stack to go into a slow acknowledgement mode, thus capturing the spammer in slow-mode transfer.
g 17009.html
For more, see http://www1.ietf.org/mail-archive/ietf/Current/ms
The difficulty is getting enough of these deployed so that spammers, and open relays, have a good chance of getting stuck.