TarProxy Creates Tar Pit... For Spammers
agravaine writes "I ran across TarProxy, which, IMHO, is one of the cleverest spammer-handling ideas I've seen yet. The gist: Early detection of incoming spam [using the statistical techniques pioneered on the client side] could be used to create an artificial scarcity of bandwidth experienced only by spammers." This project hasn't gone very far yet, but essentially is slows SMTP requests to suspected spammers. If this really works, and is installed on enough of the net, it could work. 144 spam so far today. Anything would be an improvement. CT Yup, it's a dupe. There wasn't anything better to post at 9am on a sunday, so you can just bitch about me instead ;)
In the spirit of repetition...
Easy to defeat, just use spamming software that dynamically increases it's connection pool whenever it encounters a 'slow' SMTP recipient. Even if a large part of the net population were running this, the spammer could just spawn thousands of simultanious (slowed down, yes) connections, and still maximize his bandwidth utilization. If it takes 2 minutes to send each message, it dosen't matter if he's sending 5000 messages at once!
I believe linux, for example, allows up to 8192 open sockets, and I think this can be changes with a sysctl command, and most definitely could be with a few changes to kernel headers.
Sure, it would take a machine with decent memory, but that's not too hard to find.
---
the pen is mightier than the sword, the sword is mightier than the court, the court is mightier than the pen.
And you would not need to roll this out on most of the net. If the large ISP and webmail providers started doing this it would have a significant impact. Much of the spammer's distribution list consists of a few domains; yahoo, hotmail, aol, etc. If the large providers implemented tarpits it could quickly damage the ready supply of open relays for spammers.
Hmm... This calls for some TCP geekhood and some strong math. I am way too hung over for math. Let's just talk about it in broad strokes first.
If I'm tuning this package, I can make these delays REAL big. I mean, email is one of those systems where a false positive resulting in even a... let's say an **8 hour** delay to a legitimate message would still be considered perfectly fine for most purposes. There's fuzzy logic in play here; I'm thinking not all delays will be equal. But what if you were just really harsh on suspected spam? Not such a loss IMO. Of course... I haven't considered that you will have increased reliability problems trying to hold a stream open for 8 hours, but remember, a legitimate mailserver will keep resending, and as we go bayesian on servers, perhaps we will learn to resend for a little longer as well? Or perhaps there is another protocol solution (i.e. letting the sender know they're being delayed for spam... so perhaps giving them the option to reformulate their message and resend?) Let's just press on. The precise amount of the delay may not necessarily be important.
If I'm sending 50 million messages (a modest spammer's run, if I'm well enough informed) and each one holds me on the line for 8 hours that means 400 million hours if run serially. At the 8192 concurrent thread barrier that's still almost 50,000 hours (~5 years)... with mathematical convenience, to do this entire run in 8 hours you will require **50 million** concurrent threads? Or should I have just stayed in bed longer?
Now it's looking like the exact length of the delays, and the exact number of concurrent threads is not actually something worth too much niggling debate. We just have to get familiar with the orders of magnitude we're dealing with.
Consider the protocol-to-data ratio of an SMTP transaction over TCP alone. How much is data and how much is just protocol overhead in a given mail transaction? We can figure this out down to the last bit, but I'm going to just throw out the hypothetical notion that when you have to initiate a new SMTP transaction for every message you send, the bandwidth overhead for doing this millions of times is not inconsiderable.
And we have to think of the other end. Spammers may write themselves custom TCP/IP stacks, but receivers certainly will not. Consider AOL. AOL encompasses some significant percentage of your list of victims. What is AOL going to do with anywhere _near_ that many simultaneous connections... ***from just one spammer?*** Why, call the FBI, of course! It's a DOS attack!
I'll stop now. I wouldn't be surprised if there were other angles on this I haven't considered. But at first blush it doesn't seem nearly as easy to beat as you suggest.
Perversely I think the biggest danger in this technique is that it may become widespread and then force spammers to really confront Bayesian filtering head-on. Of course, just thinking aloud (and this probably is undoable for privacy reasons, but just to open a line of speculation) you can do some interesting things with these kinds of filters... retain lists of email addresses that you've received mail from (and/or replied to) more than once... they get a lower score (and a lower delay) than first-time senders... etc. etc. So it's not clear even with very well-designed spam (another cost increase for spammers!) that you could win against the filter.
Want to Know How to Cheat the GPL? Read On!
I suggested a similar mechanism to constipate TCP connections on the IETF e-mail list last summer. The basic idea is to add some new calls to the TCP API so that an application could peek at the incoming traffic without it being acknolwledged at the TCP level. If the incoming stream were something bad, then the application could tell the TCP stack to go into a slow acknowledgement mode, thus capturing the spammer in slow-mode transfer.
g 17009.html
For more, see http://www1.ietf.org/mail-archive/ietf/Current/ms
The difficulty is getting enough of these deployed so that spammers, and open relays, have a good chance of getting stuck.