Using Statistics to Cause Spammers Pain
mlamb writes "Statistical mail classifiers like PopFile save time on the part of their users, but don't do anything to actively combat spam. I just published an article that suggests a way to use classifier output against a spammer while they're connected to your SMTP server, and I'm launching a project called TarProxy to implement it."
But what if the spammer sends a message to a (good) SMTP server which haven't got the system, and the SMTP server in turn tries to deliver the "spammail" to the right SMTP server, won't that hurt the good SMTP server, who just tries to do it's job?
My <1000 UID is with a hot chick
This may be just a little off topic, but the thing is that I always have to go through all my mail by hand to make sure I didn't miss anything important anyways. No anti-spam software out there seems to save me this hassle... So to this day I haven't stuck with any. It doesn't look like this will be better.
I am a viral sig. Please help me spread.
Just one question... what if the spammer doesn't connect to your SMTP server to send billions of messages from it? What if the spammer (with half a brain, and some scripting ability), only sends a few emails through your SMTP server? Most SMTP servers are wide open still, and simply sending 10 emails on one server and moving on to another open server would be so low that statistical usage wouldn't show anything on the radar screen... or did I not understand what you are trying to do?
---
Programming is like sex... Make one mistake and support it the rest of your life.
That we need all these technicalities to try and fight spam... But this is just like people trying to fight piracy, there will always be a new way to get around security. Actually, what we needed was authenticated SMTP from the beginning...
Exactly how it should be.
Perhaps public floggings and other corperal punishment as well.
However I have to wonder if all spammers are really sane ... I just got an email about chicks who crave small penis's and those who crave big penis's and then emails about penis enlargement and viagra online purchases, it just seems weird that there is so much concern for my penis. Perhaps we should just imprison them on an island as they might find tar and feathering a bit kinky and enjoy it.
Ignore the "p2p is theft" trolls, they're just uninformed
But, but, but, why would they be connected and sending spam through your server? Unless you run an open relay. And you don't run an open relay, do you? Do you?!
There is no sig, there is only Zuul.
The simpler method is still SMTPAUTH. Now we just have to convince the world that this is a Good Thing.
This sig no verb.
TarProxy is written in Java,
Well, that's one way to do it.
The hurt-back part of the project is not new. Theo de Raadt is working on just that, in connection with an IP number list (much faster, so suitable for busy servers):
Very simply, this hangs the full list of ~12,000 spam-sending IP/mask entries listed at www.spews.org off a pf(4) rdr-anchor (which is only entered for port 25). When connections from these spammers arrive they are redirected to a daemon which minimally fakes the SMTP protocol with very low overhead -- for multiple connections at the same time -- and then the message is left on the sender's queue by providing a 550 return code.
The theory here is that most spam still comes in via open relays, and the only way we are going to convince them to clean up their act is to waste _their_ disk space, their time, and their network bandwidth more than they waste ours. For those spammers who drop messages when they received a 550, well, we have not wasted any further time or network bandwidth, and even in that situation I think some of the might remove an address if they receive a 550.
This is the same thing as OpenBSD's spamd, which Theo de Raadt wrote specifically to cause spam relays pain. spamd uses some new features of pf and blacklists from Spews to create a tarpit for incoming messages from known spam relays. It was even discussed on Slashdot in this article. Also, Daniel Hartmeier, pf developer extraordinaire and all around good guy, wrote a little piece about annoying spammers using pf, spamd, and bmf.
I was hoping to get first post, but my connection got throttled back to nothing....
Nonsense. The spammer will just run the connections in parallel. The slower they get the more he'll run. He already does this to some extent. All this will accomplish is to tie up resources on YOUR mail server.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
Instead, this is meant to be run on the incoming SMTP server, the one that receives the mail. It will only hurt the spammer if he's trying to send a bunch of spam to your domain, but every server running this can help.
"Evil company X is threatening to restrict our rights! Let's all get together to stop--OOOH! SHINEY!!!" -- AC
Daniel Hartmeier uses OpenBSD and packet filter to waste spammers time.
so exactly WHO are you hurting?
sure, the open relay deserves some pain. but you're naieve if you think that most spammers send from their OWN systems!
I have qmail running on my mail hub and I reject mail at the time of connect simply based on the receiver they're trying to send to. when they handshake (part of the HELO exchange) I detect the user they're trying to send to, and since I only have a handful of valid users, its easy to know if they're dictionarying me or not. once I know that, I immediately cut them off, AND add an ipfw (I run freebsd) rule to block all traffic from that IP to my port 25. not only do they NOT get to send any DATA to me, but they're for now on (until it ages out, automatically) forbidden from even connecting to my box. I know that's harsh but I can be that selective since its mostly just me on my mailhub.
but I don't think for a second that even tarpitting that source IP is punishing the spammer. they've most likely broken into (or found) an open relay and they're routing thru them. they don't even see the 'address not reachable' error due to my firewalling them.
--
"It is now safe to switch off your computer."
Here are some more spam tarpits:
TarProxy
ChuckMail
OpenBSD's spamd (tarball)
Google Search Results
Using a list of the spam-sending IP's and Bayesian methods, one could assign a high prior probability of a message being Spam. The affect would be to slow down the connection on less evidence if its from a suspect IP address, and to require more evidence if its from an IP address that you trust. Thus you preferentially slow-down suspect computers, and allow your friends to get away with more spam-like messages before tarring them.
Quack!Quack!.....QUACK!!
The easiest solution is to have no open relays. I know I know, it ain't gonna happen, but perhaps this could convince more of those relays to close their doors:
What we do is have a small app that plugs into eudora, outlook, evolution, kmail etc. Whenever you get a spam, you click a button, it scans the header, finds the smtp server that sent the spam and then sends them 1 email informing them of the fact that they are sending spam (of course you need a way of getting the sysadmin's email address).
If enough people did this then the bad relays would be swamped with emails informing them of the spam they've been relaying, and they might close their relay. And non-open relays that just allow spammers to spam might think about being less friendly to spammers.
What do people think, is it lame?
--- I used to moderate, then I read the -1 articles and decided having to filter through them was not worth it.
Here's what I propose: setup a large number of bogus email accounts. Broadcast them everywhere, and let them be honey-pots for spam. The point is, since you NEVER use this account for anything but dropping in spammable places, anything you receive on it *must* be spam. As soon as you get a connection from a mail server to one of these addresses, you *know* it's an open relay, and you put it in your database -- automatically, with no interaction required.
Step 2: You also do a "fingerprint" on the spam you get in your honeypot (you know the routine - what's the length, average use of the word "dildo", etc) so that you can identify this particular spam "copy" by the message -- NOT the header. This allows you to automatically filter out spam messages. If the spammers want to adapt, they have to rewrite their copy. As long as your signature algorithm is fairly lose -- that is, not a true hash algorithm -- they should have to do a total rewrite if they don't want to be detected. You can then filter these at the relays. Thus, once again, you raise the cost for them to do their spam. Since you are filtering by actual known-spam content -- that is, you're doing this like they do virus signatures -- you should get virtually no false positives.
And, anybody whose friends who are emailing them about penis enlargement doesn't really deserve email anyway.
Anyway, there's step 1 and 2. To summarize:
"He who would learn astronomy, and other recondite arts, let him go elsewhere. " -- John Calvin, commenting on Genesis 1
Check out http://www.martiansoftware.com/nailgun/
Never let a lack of data get in the way of a good rant.
Step 1: sysadmins band together in a DDOSOR alliance. Step 2a: Spammer uses open relay for spam campaign. Step 2b: Alliance member starts to receive spam. Step 2c: DDOSOR alliance is notified immediately and starts one-hour DDOS attack on open relay. Step 2d: open relay can't finish sending spam. Step 3: Profit!
If these tarpits were ubiquitous, they could completely change the economics of spam, creating a scarcity of bandwidth experienced only by spammers.
/dev/kqueue under FreeBSD, for example; and you can do the same, but with a bit more CPU wasted, using plain old select() on almost any Unix.
:(
Err, I don't think so. This just requires spammers to use more simultaneous connections to overcome the slowdown; it doesn't really increase their network requirements much, only their host CPU requirements. 20,000 simultaneous TCP connections from one process is quite possible with
I also don't understand the rationale behind processing the message incrementally. Why not just do your processing before sending back the final 2xx response to the DATA command? Most spam software does not hang up right after sending the final "\r\n.\r\n" from what I've heard from people who run tarpits.
How about this instead: when you are confident you are receiving spam, you stop reading from the socket entirely, and send perhaps 10MB of data back on the other side of the connection. (If the other endpoint isn't reading, and consequently you can only send one window worth of data, then do something to get your TCP stack to generate a lot of useless ACKs, or send your trash back one octet at a time and push between them, or something.) The intent being that sending spam to a large number of MTAs configured in this manner rapidly just becomes a way to DDOS *yourself*. Probably this is too disruptive for most sites to want to bother implementing, though
I don't know exactly what the profit margin for spammers is like, but I'm not convinced a small multiplier in network costs is going to matter. Anyway, a lot of these "countermeasures" are mostly going to hurt maintainers of open relays, but if that means they actually fix them, I suppose that is almost as good.
Java: the COBOL of the new millenium.
Great idea! Parse out the URLs, plug 'em into some boilerplate, and automatically submit it as a story to Slashdot! They'll never try THAT again!
Cantankerous old coot since 1957.
Easy to defeat, just use spamming software that dynamically increases it's connection pool whenever it encounters a 'slow' SMTP recipient. Even if a large part of the net population were running this, the spammer could just spawn thousands of simultanious (slowed down, yes) connections, and still maximize his bandwidth utilization. If it takes 2 minutes to send each message, it dosen't matter if he's sending 5000 messages at once!
I believe linux, for example, allows up to 8192 open sockets, and I think this can be changes with a sysctl command, and most definitely could be with a few changes to kernel headers.
Sure, it would take a machine with decent memory, but that's not too hard to find.
---
the pen is mightier than the sword, the sword is mightier than the court, the court is mightier than the pen.
Theo changed his mind about 550..
1 04 027378218501&w=4
It's now 450.. Hurts more..
http://marc.theaimsgroup.com/?l=openbsd-misc&m=
I'm thinking that using spamassassin along with qmail-qfilter and a small perl script to tie it together that envokes a sleep() loop for every spam-like message, that it could easily be used to do the same thing because spamassassin kicks back a score for the message's likehood of being spam...
cheers..
If you do an static word frequency list, spammers will pass around it (check in POPfile site for the latest spammers tricks), if is dinamic, then the users of your system must train it for a while (someone must tell that some message is spam or not, reading it). You must have another way to access your server for the training thing, and then another possible point of vulnerability.
And more than this, as it depend on the user, you should not use a common word frequency list, you should have one for each user, and check if the message is spam against destination word base.
At best, it will work for the users that care to train this server, for the other users that don't want to waste their time spam will be coming at the same speed as before. At worst, you'll be using a common list for all, and maybe slow down receptions of mailing lists or things like that, and people in your server could be unsubscribed from some of them.
Is a good idea, but there are some things that should be implemented with care, and should work only for the users that care about it, the others should not be slowed down because you can put obstacles in the reception of normal mail.
I have several domain names that appear on many of the "million address" CDs and other popular spam lists, but which longer any legitimate recipients/users.
We are also working on obtaining access to true "realtime" RBL lists of currently abused open relay servers. Assistance would be appreciated.
The core of "stations of the cross" is a custom DNS server. This server is authoritative for these oft-spammed domains, and each time a request is made for an MX record, it returns (with a short TTL) a unique randomly generated list of MXes, each address on the list being a known open relay.
So when a spammer or relay first goes to deliver a message, the system will select an open relay off the list of MXes, and hands off the message to that host. Being an open relay, the host accepts the message for my domain, then goes to do a DNS lookup for the MX record. The relay receives a (different) list of other open relays...
Usually, you can get a message to traverse a dozen or more open relays (most sendmail systems default to a maximum "hop count" of 25), after which the message will bounce.
Since the only traffic my server has to deal with is DNS queries and responses, this is very low-overhead for me, but depending on the size of the spammail, very high overhead for the open relay servers.
I do not deploy Linux. Ever.
- Free: It's no good unless it's everywhere... or at least in lots of places. TarProxy is Open Source Software released under a BSD-style license and available on SourceForge (see project page for details).
- Platform Independent: TarProxy is written in Java, so it runs on Linux, Windows, Solaris, OS X, and any other operating system with a Java Virtual Machine available.
contradict one another, and therefore directly suggest incipient failure. Any program you want widely deployed had better not depend on having some buggy JVM installed.(Arguably that is the reason that Freenet has been a practical failure. Every time I have tried to use it, it has got stuck in an infinite loop, or consumed all my swap space, or crashed. I blame buggy JVMs.)
If you want software to be widely and successfully deployed, it should (must!) resemble the software that already has been. Almost all such code (99%+) has been in C or in C++. Are there any Free Software programs written in Java successfully deployed outside of Java development shops? (Rhetorical question; the answer is "not enough to matter".)
If you want portability to Unixes, to w32, and to Macosix, you already get that with Gcc and autoconf.
If it's in Java, I certainly won't run it as a daemon.
A 550 error is a permanent reject. The spam source knows that the mail cannot be delivered so it quits. A 450 error tells the connecting smtp server that your server is temporarily unable to deliver the mail, but that it's not a fatal error and delivery should be retried. This is much more likely to keep the message in the spammer's mail queue.
Personally, the spam solution I like the best is to have procmail+formail or some other tool sitting on your mail server and making unknown senders go through a confirmation step. It doesn't work for everyone (for instance, people expecting email replies to résumés! NAGI...), but if it works for you it tends to work very well. It inconveniences everyone else, but hey, everyone else is not me. I can whitelist all the people I truly care about.
Either that or we should throw out SMTP, email RFCs, sendmail, etc. and build a spam-free system from the ground up. Yeah, right.
Washington, DC: It's like Hollywood for ugly people.
How about a manual method where one creates a ficticious bounce message from spam that has made it to the mailbox?
The idea is the following: spam gets through whatever filter you might have, but you still want to reject it, and given that some spammers MIGHT be trimming their lists based on bounces, you forge a bounce message from the spam.
Does anyone know if this is possible with, eg, RMail or VM (or something else) running under Emacs?
Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
For example, if your machine only receives a small amount of email per day, why not throttle them to take 10-20 minutes of connect time overall? If you only get two emails per day (one real and one spam), getting them 10 minutes later probably won't bother you too much, but could cost the spammer or his relay-helpers a 5 minute duration on a connection.
I receive about a hundred emails per day from a number of sources, and adding six to sixty seconds of delay per email wouldn't cause me any grief. But if everyone throttled their email, it might cause someone using their '250 million Valid! Tested! Opt-In!' email lists to have to upgrade their machine to half a million connections to process it in an hour.
I don't see that differential throttling has any benefit over a contant throttling rate. For a big site, the differentiation between spam and not-spam would probably cost you any load advantage you earned in slowing the spam, and for a small system, the delay would not be noticable.
Of course, big senders like AOL, prodigy, and yahoo, might have to upgrade...
I'm not saying that the recipient server tar-pitting is a bad idea, but I think that there are more effective ways of raising the cost for spammers. Blacklisting the entire /24 of anything supporting spam would pressure providers to nuke spammers (or at least pass on costs to spammers).
Prime numbers are exactly what Alan Greenspan says they are -S. Minsky
Want to find open relays? Here's a nice simple way I implemented a couple of years ago, and ran for awhile. It's quite simple, and detects single stage relays rather quickly.
Write something that listens on port 25. When it receives a connection, connect back to the calling host on port 25. If the connection attempt succeeds, copy characters back and forth. Anything they send to you, you send to their port 25, and vice-versa.
If it's a true open relay, it will gladly accept the mail over and over again. I had a few mail servers looping THOUSANDS of times through me since they didn't check Received: headers. I also realize that it would be trivial to *ahem* "break" the Received: line such that it wouldn't increment the counter.
Granted, that sucks down bandwidth, so back to the point - proving that this is an open relay. What you do is stick a magic header in the message as it heads back to them. If you receive that header back from a host, it's something you've already looped, and they're an open relay.
Now you know they're an open relay, so you can add them to your MX lists. You can also then avoid letting them run through your looper, since it won't provide any more data.
The beauty of this plan is that you're only giving them what they pushed upon you first. If they leave you alone, you leave them alone. It's a nice implementation of a concept I wish more people would honor.
Would that work? Or would trying to keep track of 20,000 outgoing email's transmission rates simultaneousy cause more problems than its worth?
Read about a method to get SpamAssassin to execute at SMTP time in exim (I'm about to impliment this on my own mailserver) and read about teergrubing which is basically the same idea as a tarpit.
Unlike the original post, Marc seems to have a stable working version of this right now.
That said, this is probably the most realistic method of causing spammers pain that we have right now, short of changing the way mail works in a fundamental manner.
I'll definately be implimenting teergrubing/tarpitting. I might even impliment it on the multi-user hosting system that I helped to build. It probably wouldn't scale too well on a busy site though
I'm going back to splinter cell.
It's most fun to do the dirty work against the spammer. What he thinks is an open relay doesn't have to be one.
This one whacked Ralsky hard for several months - Ralsky never caught on: http://www.corpit.ru/cgi-bin/h0n5yp0t
You can do it, too:
http://jackpot.uk.net/
And please do.
In my experience the strongest indication of spam is near the very end of a message, where is says something like, "click here to unsubscribe." If you've already accepted that much of the message, isn't it possible that the spammer will only have to wait for the message acknowledgement before it disconnects? What the spammer may see is either a normal or a slow acknowledgement. Is that enough to make a difference?
Ever heared about a spam-relay?
With this method you'll only get (most of the times) the relaying host and STILL the spammer doesn't get is.
i'd say read the e-mail and use whois and CALL THE FUCKERS. mail the registrants, complain about yahoo addresses for administrative contacts. waste their personal time. I wouldn't give a shit if it would take ond day to spam a lot of people or just one hour, but if i had to answer the phone all day without making any money...
Privacy is terrorism.
This won't work the way the author wants. Once the receiving SMTP server sends the 354 after the client issues a DATA command, there is no opportunity for the server to slow things down until it produces the 250 response at the end of the message. That is, at the application level, all the server can do is slow down the WHOLE message. During the transfer, the only way to slow things down would be to mess around with the TCP layer. The transport layer lives in the kernel. That means kernel module. That means not very portable. That means no Java. That means an SMTP server (by its nature a security risk) futzing with the security of the operating system itself.
You can slow things down by waiting before you produce the 250, but that is not at all a new concept. Several people have referenced Sendmail milters for that purpose already.
I've been using qmail, qmail-scanner and SpamAssassin with a few very minor tweaks to deter spammers. Basically, qmail-scanner runs SpamAssassin, and if SA returns with a score above 15.0, instead of sending a "250 ok" to the spammer telling them the mail was accepted, I send back a "5.3.0 spam detected" -- this seems to have gotten me off a couple of spam lists where the spammers actually care enough to clean their lists.
... which I've done.
I made these tweaks because once the mail is sent and the spammer has disconnected, there really is no way of getting information back to them that you're rejecting their mail. So, you have to reject it at the time they've got the SMTP session established
TarPit seems like an exercise in overengineering with little proof that it'll do anything to hurt spammers -- they'll figure a way around the tarpit, somehow.
-- Dossy
Dossy's Blog
Shutting down for a week won't do it. I had a secondary EMail account I set up for a job search a couple of years ago. Once I got a job, I deactivated the account. That was back in late 2001. Two weeks ago I reactivated it because I needed to let some site I was registered on EMail me my password. I left the account active overnight, and the next morning it had half-a-dozen Spams. This was after being inactive and bouncing messages for more then a year.
I am NOT a man!
I am a free number!
I understand this guys theory of operation, but I am not convinced of it's value for the following reasons:
With all that aside, there may be some points in this that are valid. But I'm not certain that the usage of mail servers by spammers is going to be entirely effected by this technique.
Wouldn't it be easier to simply challenge each incoming IP address to test it for being an open relay and if so, REJECT?
I think that the postfix group has a similar concept for testing any incoming email address in the MAIL FROM tag to see if that address can in turn accept mail.
Note that while the Paul Graham rating of 0.999999999999999 is high, in practice I use Gary Robinson's calculations (more refined and use even infrequently occurring tokens - I get better, less extreme results). Gary Robinson's spam rating on this is: 0.61705129961986 That may seem relatively low, but is on a different scale and is firmly indicative of spam.
Unlike Paul Graham, I don't parse out (and ignore) HTML comments. I find all information is useful, and I find it just as effective (and simple) to treat the text as a straight byte stream.
Recycle PCs and build a wireless community network www.hillsborough.org.nz
The short of it is that there is no legal way to cause spammers pain.
I've been running a tarpit for the past six months. (Exim + SpamAssassin + SA-Exim) During that time, I've seen that roughly 5% of spammers will sit around for however long I feel like tarpitting them (my timeout is currently four days), while the rest of them are smart enough to disconnect from my tarpit when they see that I'm holding them open.
But the spammers are using open relays, and there's an infinite supply of open relays. If one of them gets bogged down, they'll just move on to another.
The especially interesting thing is that I've seen the amount of spam attempts on my server *triple* since I started tarpitting them, from 100/day last year to 300/day now! It's as if the spammers love to be tarpitted!
And I've found out there's absolutely no way to convince a spammer to remove me from his mailing list. Tarpit him, he doesn't care! Give him a 5xx error code, he doesn't care! Firewall his connection attempts, he doesn't care! It's easier for spammers to sell lists of five million addresses (4.99 million of which don't accept email) than it is to try to pay attention to error messages and failure states and weed out bad addresses. I've even seen spam addressed to the messageID's on Usenet news postings.