Using Statistics to Cause Spammers Pain
mlamb writes "Statistical mail classifiers like PopFile save time on the part of their users, but don't do anything to actively combat spam. I just published an article that suggests a way to use classifier output against a spammer while they're connected to your SMTP server, and I'm launching a project called TarProxy to implement it."
Exactly how it should be.
Perhaps public floggings and other corperal punishment as well.
However I have to wonder if all spammers are really sane ... I just got an email about chicks who crave small penis's and those who crave big penis's and then emails about penis enlargement and viagra online purchases, it just seems weird that there is so much concern for my penis. Perhaps we should just imprison them on an island as they might find tar and feathering a bit kinky and enjoy it.
Ignore the "p2p is theft" trolls, they're just uninformed
Most mail servers will only forward mail from users of their own domain. If the mailserver is sending spam for one of their legitimate users, I feel no pity for them if their server slows down.
If they forward mail from anyone who sends them mail, then they are an open relay, and again, they deserve what they get for leaving an open relay up.
TarProxy is written in Java,
Well, that's one way to do it.
The hurt-back part of the project is not new. Theo de Raadt is working on just that, in connection with an IP number list (much faster, so suitable for busy servers):
Very simply, this hangs the full list of ~12,000 spam-sending IP/mask entries listed at www.spews.org off a pf(4) rdr-anchor (which is only entered for port 25). When connections from these spammers arrive they are redirected to a daemon which minimally fakes the SMTP protocol with very low overhead -- for multiple connections at the same time -- and then the message is left on the sender's queue by providing a 550 return code.
The theory here is that most spam still comes in via open relays, and the only way we are going to convince them to clean up their act is to waste _their_ disk space, their time, and their network bandwidth more than they waste ours. For those spammers who drop messages when they received a 550, well, we have not wasted any further time or network bandwidth, and even in that situation I think some of the might remove an address if they receive a 550.
I've been using bogofilter for a while now as a pass-through tagging mechanism. I filter on the client side based on the tag information. This sounds a lot like what you are doing.
The only thing close to a false positive I've gotten was having to dumpster dive into my spam folder to retrieve an amazon order confirmation.
Bayesian filtering really works, but you have to train the filter correctly and with as large a corpus as possible.
Two things here. First, this article wasn't about preventing spammers from using your SMTP server as a relay, but in slowing down the reception of mail at the end-point SMTP server. This will ripple up the chain to hurt the spammers by slowing down the relays they use. Second, it doesn't matter whether I get 10 spam emails or 10,000. One of the goals of TarProxy is to be ubiquitous. I may only receive 10 spammy emails, but my running instance of TarProxy will determine that those are of sufficient spamminess to throttle bandwidth to each of those connections. At the same time, you're doing the same on your SMTP server, and Joe over there is, and so is Susie, and so on. If everybody (defined as "a large number of smtp servers", and not necessarily "everybody") is running such a service, the spammers will be hurt. You're right that a single individual using this won't make much difference, but that didn't seem to be the goal of the article.
This is the same thing as OpenBSD's spamd, which Theo de Raadt wrote specifically to cause spam relays pain. spamd uses some new features of pf and blacklists from Spews to create a tarpit for incoming messages from known spam relays. It was even discussed on Slashdot in this article. Also, Daniel Hartmeier, pf developer extraordinaire and all around good guy, wrote a little piece about annoying spammers using pf, spamd, and bmf.
In his article he actually does address this very question. He even gives, what I feel at least, is an interesting answer.
So, you don't run an open relay. You're not going to slow down the spammer directly, but you will slow down all the connections that come from that open relay to your mail server. For a particularly abused open relay, that could lead to such problems that the admin of that open relay will finally get a clue and look in to configuring their server properly.
Hence, a cascading effect that will eventually harm the spammers. Admins of open relays that get a clue will tighten their servers, thus depriving the spammers of one more relay they can abuse.
what if the spammer sends a message to a (good) SMTP server which haven't got the system, and the SMTP server in turn tries to deliver the "spammail" to the right SMTP server, won't that hurt the good SMTP server, who just tries to do it's job?
The situation you're describing is called relaying.
If you start with the assumption that spammers are evil, then the logical conclusion is that there is no such thing as a "good" SMTP server that would relay mail on a spammer's behalf. Servers that do are either in collusion with the spammer, or are mis-configured to allow anonymous relaying. A server that willingly acts in collusion with evil is, by definition, evil. The level of stupidity necessary to allow your sever to act as an open relay also, by definition, precludes being considered a "good" server.
So the short answer to your query is that it's a non-issue. A truly good server will, by definition, never relay spam!
If these tarpits were ubiquitous, they could completely change the economics of spam, creating a scarcity of bandwidth experienced only by spammers.
/dev/kqueue under FreeBSD, for example; and you can do the same, but with a bit more CPU wasted, using plain old select() on almost any Unix.
:(
Err, I don't think so. This just requires spammers to use more simultaneous connections to overcome the slowdown; it doesn't really increase their network requirements much, only their host CPU requirements. 20,000 simultaneous TCP connections from one process is quite possible with
I also don't understand the rationale behind processing the message incrementally. Why not just do your processing before sending back the final 2xx response to the DATA command? Most spam software does not hang up right after sending the final "\r\n.\r\n" from what I've heard from people who run tarpits.
How about this instead: when you are confident you are receiving spam, you stop reading from the socket entirely, and send perhaps 10MB of data back on the other side of the connection. (If the other endpoint isn't reading, and consequently you can only send one window worth of data, then do something to get your TCP stack to generate a lot of useless ACKs, or send your trash back one octet at a time and push between them, or something.) The intent being that sending spam to a large number of MTAs configured in this manner rapidly just becomes a way to DDOS *yourself*. Probably this is too disruptive for most sites to want to bother implementing, though
I don't know exactly what the profit margin for spammers is like, but I'm not convinced a small multiplier in network costs is going to matter. Anyway, a lot of these "countermeasures" are mostly going to hurt maintainers of open relays, but if that means they actually fix them, I suppose that is almost as good.
Java: the COBOL of the new millenium.
Great idea! Parse out the URLs, plug 'em into some boilerplate, and automatically submit it as a story to Slashdot! They'll never try THAT again!
Cantankerous old coot since 1957.
I have several domain names that appear on many of the "million address" CDs and other popular spam lists, but which longer any legitimate recipients/users.
We are also working on obtaining access to true "realtime" RBL lists of currently abused open relay servers. Assistance would be appreciated.
The core of "stations of the cross" is a custom DNS server. This server is authoritative for these oft-spammed domains, and each time a request is made for an MX record, it returns (with a short TTL) a unique randomly generated list of MXes, each address on the list being a known open relay.
So when a spammer or relay first goes to deliver a message, the system will select an open relay off the list of MXes, and hands off the message to that host. Being an open relay, the host accepts the message for my domain, then goes to do a DNS lookup for the MX record. The relay receives a (different) list of other open relays...
Usually, you can get a message to traverse a dozen or more open relays (most sendmail systems default to a maximum "hop count" of 25), after which the message will bounce.
Since the only traffic my server has to deal with is DNS queries and responses, this is very low-overhead for me, but depending on the size of the spammail, very high overhead for the open relay servers.
I do not deploy Linux. Ever.
There's a few spammers who send direct from their own IPs. If you want to tarpit them just tarpit the traffic from their Ips - you don't need to analyze anything.
9 0]"A ni[!--HVtu--]ce la[!--HVtu--]dy
- ]im[!--WPVizB--]ited
For other spam, through open proxies or open relays, you are not hurting the spammer to tarpit. If the spammer is working through open proxies and if you got enough tarpits going then you could hurt them, but until there's enough tarpits there is still zero (0.000) percent pain to the spammer. Some open proxes are slow with one or two tarpits, the others are fast enough to keep the spammer's server fully busy. He only cares if he's running his server flat out. Delays at one or more open proxies mean little.
Right now I'm trapping spam on a relay spam honeypot. It comes to the honeypot from open proxies - theer's nothig I can learn about the spammer by learning about the proxies. It comes (usually) as 99-recipient spam messages. This particular spammer uses imbedded comments in his spam to evade Bayesian filters. Makes no difference to me - I see it is spam. I have no valid email to filter out - everything is spam. That's one of the beauties o a honeypot - the spammer does yor filtering for you.
Somewhere over 20,000 recipients so far, since Wednesday. Here's a tiny sample, showing the URL's he advertises and the random comments he uses to defeat filters:
[a href="http://www.directmailorderbrides.com/?oc=23
[a href="http://www.flati.com/silagra/"]L[!--WPVizB-
(I replaced agle brackets with square brackets - tou'll have to imagine them restored.)
I have no filter, no smarts of any kind. The honeypot is a mail server with the output queue stopped. I got the spammer to start sendng spam by delivering to him three of his relay test messages - he'd sent so many I decided to see who he was, what spam I'd get if I did deliver.
I'm trying various ways to hurt the spammer but I've not yet delivered enough hurt - he's still operating. Other spammers have succumed more readily - this guy is better at hiding himself.
Note, by the way, that he puts no comments in the URL - if you filter on those (or remove comments before filtering - that would be easy) the spam instantly is revealed. One guy simply rejects any email message with three repeated comments in a line (this spam is laced with the comments throughout, not just in the http lines.) The spammer's clever way of obscuring the spam is useful in identifying the spam - no points for Spammy.
Windows users with a permanent connection can step into running a relay spam honeypot very easily: they can run Jackpot: http://jackpot.uk.net/
There is at least one open proxy honeypot out there: Google in news.admin.net-abuse.email for it. These can be very wicked - create your own for even more fun. Or create your own open relay honeypot - see if you can make it even more wicked.
(Oversize reply packets from an open proxy honeypot might have a very interesting efffect.)