Using Statistics to Cause Spammers Pain
mlamb writes "Statistical mail classifiers like PopFile save time on the part of their users, but don't do anything to actively combat spam. I just published an article that suggests a way to use classifier output against a spammer while they're connected to your SMTP server, and I'm launching a project called TarProxy to implement it."
This may be just a little off topic, but the thing is that I always have to go through all my mail by hand to make sure I didn't miss anything important anyways. No anti-spam software out there seems to save me this hassle... So to this day I haven't stuck with any. It doesn't look like this will be better.
I am a viral sig. Please help me spread.
Most mail servers will only forward mail from users of their own domain. If the mailserver is sending spam for one of their legitimate users, I feel no pity for them if their server slows down.
If they forward mail from anyone who sends them mail, then they are an open relay, and again, they deserve what they get for leaving an open relay up.
Two things here. First, this article wasn't about preventing spammers from using your SMTP server as a relay, but in slowing down the reception of mail at the end-point SMTP server. This will ripple up the chain to hurt the spammers by slowing down the relays they use. Second, it doesn't matter whether I get 10 spam emails or 10,000. One of the goals of TarProxy is to be ubiquitous. I may only receive 10 spammy emails, but my running instance of TarProxy will determine that those are of sufficient spamminess to throttle bandwidth to each of those connections. At the same time, you're doing the same on your SMTP server, and Joe over there is, and so is Susie, and so on. If everybody (defined as "a large number of smtp servers", and not necessarily "everybody") is running such a service, the spammers will be hurt. You're right that a single individual using this won't make much difference, but that didn't seem to be the goal of the article.
Nonsense. The spammer will just run the connections in parallel. The slower they get the more he'll run. He already does this to some extent. All this will accomplish is to tie up resources on YOUR mail server.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
In his article he actually does address this very question. He even gives, what I feel at least, is an interesting answer.
So, you don't run an open relay. You're not going to slow down the spammer directly, but you will slow down all the connections that come from that open relay to your mail server. For a particularly abused open relay, that could lead to such problems that the admin of that open relay will finally get a clue and look in to configuring their server properly.
Hence, a cascading effect that will eventually harm the spammers. Admins of open relays that get a clue will tighten their servers, thus depriving the spammers of one more relay they can abuse.
so exactly WHO are you hurting?
sure, the open relay deserves some pain. but you're naieve if you think that most spammers send from their OWN systems!
I have qmail running on my mail hub and I reject mail at the time of connect simply based on the receiver they're trying to send to. when they handshake (part of the HELO exchange) I detect the user they're trying to send to, and since I only have a handful of valid users, its easy to know if they're dictionarying me or not. once I know that, I immediately cut them off, AND add an ipfw (I run freebsd) rule to block all traffic from that IP to my port 25. not only do they NOT get to send any DATA to me, but they're for now on (until it ages out, automatically) forbidden from even connecting to my box. I know that's harsh but I can be that selective since its mostly just me on my mailhub.
but I don't think for a second that even tarpitting that source IP is punishing the spammer. they've most likely broken into (or found) an open relay and they're routing thru them. they don't even see the 'address not reachable' error due to my firewalling them.
--
"It is now safe to switch off your computer."
what if the spammer sends a message to a (good) SMTP server which haven't got the system, and the SMTP server in turn tries to deliver the "spammail" to the right SMTP server, won't that hurt the good SMTP server, who just tries to do it's job?
The situation you're describing is called relaying.
If you start with the assumption that spammers are evil, then the logical conclusion is that there is no such thing as a "good" SMTP server that would relay mail on a spammer's behalf. Servers that do are either in collusion with the spammer, or are mis-configured to allow anonymous relaying. A server that willingly acts in collusion with evil is, by definition, evil. The level of stupidity necessary to allow your sever to act as an open relay also, by definition, precludes being considered a "good" server.
So the short answer to your query is that it's a non-issue. A truly good server will, by definition, never relay spam!
Here's what I propose: setup a large number of bogus email accounts. Broadcast them everywhere, and let them be honey-pots for spam. The point is, since you NEVER use this account for anything but dropping in spammable places, anything you receive on it *must* be spam. As soon as you get a connection from a mail server to one of these addresses, you *know* it's an open relay, and you put it in your database -- automatically, with no interaction required.
Step 2: You also do a "fingerprint" on the spam you get in your honeypot (you know the routine - what's the length, average use of the word "dildo", etc) so that you can identify this particular spam "copy" by the message -- NOT the header. This allows you to automatically filter out spam messages. If the spammers want to adapt, they have to rewrite their copy. As long as your signature algorithm is fairly lose -- that is, not a true hash algorithm -- they should have to do a total rewrite if they don't want to be detected. You can then filter these at the relays. Thus, once again, you raise the cost for them to do their spam. Since you are filtering by actual known-spam content -- that is, you're doing this like they do virus signatures -- you should get virtually no false positives.
And, anybody whose friends who are emailing them about penis enlargement doesn't really deserve email anyway.
Anyway, there's step 1 and 2. To summarize:
"He who would learn astronomy, and other recondite arts, let him go elsewhere. " -- John Calvin, commenting on Genesis 1
If these tarpits were ubiquitous, they could completely change the economics of spam, creating a scarcity of bandwidth experienced only by spammers.
/dev/kqueue under FreeBSD, for example; and you can do the same, but with a bit more CPU wasted, using plain old select() on almost any Unix.
:(
Err, I don't think so. This just requires spammers to use more simultaneous connections to overcome the slowdown; it doesn't really increase their network requirements much, only their host CPU requirements. 20,000 simultaneous TCP connections from one process is quite possible with
I also don't understand the rationale behind processing the message incrementally. Why not just do your processing before sending back the final 2xx response to the DATA command? Most spam software does not hang up right after sending the final "\r\n.\r\n" from what I've heard from people who run tarpits.
How about this instead: when you are confident you are receiving spam, you stop reading from the socket entirely, and send perhaps 10MB of data back on the other side of the connection. (If the other endpoint isn't reading, and consequently you can only send one window worth of data, then do something to get your TCP stack to generate a lot of useless ACKs, or send your trash back one octet at a time and push between them, or something.) The intent being that sending spam to a large number of MTAs configured in this manner rapidly just becomes a way to DDOS *yourself*. Probably this is too disruptive for most sites to want to bother implementing, though
I don't know exactly what the profit margin for spammers is like, but I'm not convinced a small multiplier in network costs is going to matter. Anyway, a lot of these "countermeasures" are mostly going to hurt maintainers of open relays, but if that means they actually fix them, I suppose that is almost as good.
Java: the COBOL of the new millenium.
For example, if your machine only receives a small amount of email per day, why not throttle them to take 10-20 minutes of connect time overall? If you only get two emails per day (one real and one spam), getting them 10 minutes later probably won't bother you too much, but could cost the spammer or his relay-helpers a 5 minute duration on a connection.
I receive about a hundred emails per day from a number of sources, and adding six to sixty seconds of delay per email wouldn't cause me any grief. But if everyone throttled their email, it might cause someone using their '250 million Valid! Tested! Opt-In!' email lists to have to upgrade their machine to half a million connections to process it in an hour.
I don't see that differential throttling has any benefit over a contant throttling rate. For a big site, the differentiation between spam and not-spam would probably cost you any load advantage you earned in slowing the spam, and for a small system, the delay would not be noticable.
Of course, big senders like AOL, prodigy, and yahoo, might have to upgrade...