Using Statistics to Cause Spammers Pain

← Back to Stories (view on slashdot.org)

Using Statistics to Cause Spammers Pain

Posted by michael on Friday February 28, 2003 @11:59AM from the slow-poison dept.

mlamb writes "Statistical mail classifiers like PopFile save time on the part of their users, but don't do anything to actively combat spam. I just published an article that suggests a way to use classifier output against a spammer while they're connected to your SMTP server, and I'm launching a project called TarProxy to implement it."

4 of 334 comments (clear)

Min score:

Reason:

Sort:

OpenBSD Spam Blocking Engine by Incadenza · 2003-02-28 12:12 · Score: 5, Interesting

The hurt-back part of the project is not new. Theo de Raadt is working on just that, in connection with an IP number list (much faster, so suitable for busy servers):

Very simply, this hangs the full list of ~12,000 spam-sending IP/mask entries listed at www.spews.org off a pf(4) rdr-anchor (which is only entered for port 25). When connections from these spammers arrive they are redirected to a daemon which minimally fakes the SMTP protocol with very low overhead -- for multiple connections at the same time -- and then the message is left on the sender's queue by providing a 550 return code.

The theory here is that most spam still comes in via open relays, and the only way we are going to convince them to clean up their act is to waste _their_ disk space, their time, and their network bandwidth more than they waste ours. For those spammers who drop messages when they received a 550, well, we have not wasted any further time or network bandwidth, and even in that situation I think some of the might remove an address if they receive a 550.
Re:Anti-Spam software by stinky+wizzleteats · 2003-02-28 12:12 · Score: 5, Interesting

I've been using bogofilter for a while now as a pass-through tagging mechanism. I filter on the client side based on the tag information. This sounds a lot like what you are doing.

The only thing close to a false positive I've gotten was having to dumpster dive into my spam folder to retrieve an amazon order confirmation.

Bayesian filtering really works, but you have to train the filter correctly and with as large a corpus as possible.
"Stations of the Cross" Relays attacking relays. by Nonesuch · 2003-02-28 13:39 · Score: 5, Interesting

We are working on a project called "Stations of the cross".
I have several domain names that appear on many of the "million address" CDs and other popular spam lists, but which longer any legitimate recipients/users.
We are also working on obtaining access to true "realtime" RBL lists of currently abused open relay servers. Assistance would be appreciated.
The core of "stations of the cross" is a custom DNS server. This server is authoritative for these oft-spammed domains, and each time a request is made for an MX record, it returns (with a short TTL) a unique randomly generated list of MXes, each address on the list being a known open relay.
So when a spammer or relay first goes to deliver a message, the system will select an open relay off the list of MXes, and hands off the message to that host. Being an open relay, the host accepts the message for my domain, then goes to do a DNS lookup for the MX record. The relay receives a (different) list of other open relays...
Usually, you can get a message to traverse a dozen or more open relays (most sendmail systems default to a maximum "hop count" of 25), after which the message will bounce.
Since the only traffic my server has to deal with is DNS queries and responses, this is very low-overhead for me, but depending on the size of the spammail, very high overhead for the open relay servers.

--

I do not deploy Linux. Ever.
Re:Nice idea by minas-beede · 2003-02-28 14:19 · Score: 5, Interesting

There's a few spammers who send direct from their own IPs. If you want to tarpit them just tarpit the traffic from their Ips - you don't need to analyze anything.

For other spam, through open proxies or open relays, you are not hurting the spammer to tarpit. If the spammer is working through open proxies and if you got enough tarpits going then you could hurt them, but until there's enough tarpits there is still zero (0.000) percent pain to the spammer. Some open proxes are slow with one or two tarpits, the others are fast enough to keep the spammer's server fully busy. He only cares if he's running his server flat out. Delays at one or more open proxies mean little.

Right now I'm trapping spam on a relay spam honeypot. It comes to the honeypot from open proxies - theer's nothig I can learn about the spammer by learning about the proxies. It comes (usually) as 99-recipient spam messages. This particular spammer uses imbedded comments in his spam to evade Bayesian filters. Makes no difference to me - I see it is spam. I have no valid email to filter out - everything is spam. That's one of the beauties o a honeypot - the spammer does yor filtering for you.

Somewhere over 20,000 recipients so far, since Wednesday. Here's a tiny sample, showing the URL's he advertises and the random comments he uses to defeat filters:

[a href="http://www.directmailorderbrides.com/?oc=239 0]"A ni[!--HVtu--]ce la[!--HVtu--]dy

[a href="http://www.flati.com/silagra/"]L[!--WPVizB-- ]im[!--WPVizB--]ited

(I replaced agle brackets with square brackets - tou'll have to imagine them restored.)

I have no filter, no smarts of any kind. The honeypot is a mail server with the output queue stopped. I got the spammer to start sendng spam by delivering to him three of his relay test messages - he'd sent so many I decided to see who he was, what spam I'd get if I did deliver.

I'm trying various ways to hurt the spammer but I've not yet delivered enough hurt - he's still operating. Other spammers have succumed more readily - this guy is better at hiding himself.

Note, by the way, that he puts no comments in the URL - if you filter on those (or remove comments before filtering - that would be easy) the spam instantly is revealed. One guy simply rejects any email message with three repeated comments in a line (this spam is laced with the comments throughout, not just in the http lines.) The spammer's clever way of obscuring the spam is useful in identifying the spam - no points for Spammy.

Windows users with a permanent connection can step into running a relay spam honeypot very easily: they can run Jackpot: http://jackpot.uk.net/

There is at least one open proxy honeypot out there: Google in news.admin.net-abuse.email for it. These can be very wicked - create your own for even more fun. Or create your own open relay honeypot - see if you can make it even more wicked.

(Oversize reply packets from an open proxy honeypot might have a very interesting efffect.)