Slashdot Mirror


Using Statistics to Cause Spammers Pain

mlamb writes "Statistical mail classifiers like PopFile save time on the part of their users, but don't do anything to actively combat spam. I just published an article that suggests a way to use classifier output against a spammer while they're connected to your SMTP server, and I'm launching a project called TarProxy to implement it."

27 of 334 comments (clear)

  1. Anti-Spam software by Visaris · · Score: 4, Insightful

    This may be just a little off topic, but the thing is that I always have to go through all my mail by hand to make sure I didn't miss anything important anyways. No anti-spam software out there seems to save me this hassle... So to this day I haven't stuck with any. It doesn't look like this will be better.

    --

    I am a viral sig. Please help me spread.
    1. Re:Anti-Spam software by Dopeskills · · Score: 1, Insightful

      He mentions in the article that the software is set by default to recieve all messages and simply just slow down the connection of spammers. The default settings would eliminate problems with false positives.

  2. Quite sad.. by Aliencow · · Score: 3, Insightful

    That we need all these technicalities to try and fight spam... But this is just like people trying to fight piracy, there will always be a new way to get around security. Actually, what we needed was authenticated SMTP from the beginning...

    1. Re:Quite sad.. by magickalhack · · Score: 2, Insightful

      *wry grin*
      Authenticated? Authenticated by whom? Who gets to determine who has the authority to send messages and who doesn't. I run my own mail server, therefore I, and anyone else I permit, can send mail through it. Are you suggesting that I shouldn't be allowed to run something as simple and utilitarian as a mail server?

      Now granted, adding authentication to SMTP in the beginning would have been nice, and useful, but it wouldn't have prevented, and it won't now solve, the spam problem.

      --
      This Sig Kills Fascists
  3. Re:Interesting idea by TheViciousOverWind · · Score: 3, Insightful

    That would still hurt the spammer alot, since it would take waaay more time for him to send all the spam, instead of just doing it through one big bulb.

    --
    My <1000 UID is with a hot chick
  4. Uh... by jdreed1024 · · Score: 3, Insightful
    I just published an article that suggests a way to use classifier output against a spammer while they're connected to your SMTP server,

    But, but, but, why would they be connected and sending spam through your server? Unless you run an open relay. And you don't run an open relay, do you? Do you?!

    --
    There is no sig, there is only Zuul.
    1. Re:Uh... by highcaffeine · · Score: 5, Insightful

      In his article he actually does address this very question. He even gives, what I feel at least, is an interesting answer.

      So, you don't run an open relay. You're not going to slow down the spammer directly, but you will slow down all the connections that come from that open relay to your mail server. For a particularly abused open relay, that could lead to such problems that the admin of that open relay will finally get a clue and look in to configuring their server properly.

      Hence, a cascading effect that will eventually harm the spammers. Admins of open relays that get a clue will tighten their servers, thus depriving the spammers of one more relay they can abuse.

  5. Re:Nice idea by LowneWulf · · Score: 5, Insightful

    Most mail servers will only forward mail from users of their own domain. If the mailserver is sending spam for one of their legitimate users, I feel no pity for them if their server slows down.

    If they forward mail from anyone who sends them mail, then they are an open relay, and again, they deserve what they get for leaving an open relay up.

  6. Re:Interesting idea by Osty · · Score: 5, Insightful

    Two things here. First, this article wasn't about preventing spammers from using your SMTP server as a relay, but in slowing down the reception of mail at the end-point SMTP server. This will ripple up the chain to hurt the spammers by slowing down the relays they use. Second, it doesn't matter whether I get 10 spam emails or 10,000. One of the goals of TarProxy is to be ubiquitous. I may only receive 10 spammy emails, but my running instance of TarProxy will determine that those are of sufficient spamminess to throttle bandwidth to each of those connections. At the same time, you're doing the same on your SMTP server, and Joe over there is, and so is Susie, and so on. If everybody (defined as "a large number of smtp servers", and not necessarily "everybody") is running such a service, the spammers will be hurt. You're right that a single individual using this won't make much difference, but that didn't seem to be the goal of the article.

  7. Parallel by Spazmania · · Score: 4, Insightful

    Nonsense. The spammer will just run the connections in parallel. The slower they get the more he'll run. He already does this to some extent. All this will accomplish is to tie up resources on YOUR mail server.

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    1. Re:Parallel by letxa2000 · · Score: 4, Insightful
      I more or less agree. I actually tried this approach about a year and a half ago. I modified my Sendmail server to analyze incoming mail during the DATA phase of the SMTP connection. While it was just a simple text filter rather than a cool Bayesian approach, the idea was the same: Cause pain to the spammer because even if I filter the spam before I see it, the spammer has already done his damage. The problem is you can't really do anything to slow him down once they're on the DATA phase and the data is coming in because there is no handshaking at that point. So all I did was have Sendmail close the connection as soon as it recognized something that was sure to be spam.

      I gave up on this approach. While there was a satisfaction in looking at my message log and seeing all the spam I had hung up on, spammers would often just keep trying to deliver. Some of the worst software would try a second or two after I hung up on them so they literally pounded my system. It didn't cause any problems except for a little bit of bandwidth, but it certainly didn't seem to phase the spammer.

      The fact is, there's not much you can technically do to hurt the spammer. Even if everyone implements this there's no reason why spam software can't open up hundreds of tasks running in parallel and simply be patient when necessary. It could even make spam worse because spam software might evolve to where it DOES send spam out in parallel hundreds at a time by default [forgive me if this is already the case, I have no idea what capabilities spam software has].

      The fact is, the only way to make spam go away is to make the response rate go down. This approach gives you, as the admin, a certain satisfaction but it really won't reduce spam--it'll just make spam software more advanced. The only way to make the response rate go down is make sure the spam doesn't get to the user, and that's filtering. Feel free to implement this system, but once the thrill of sticking it to some spammers gets old you'll be back to where you were--with the filters doing the real work.

    2. Re:Parallel by WolfWithoutAClause · · Score: 3, Insightful
      All this will accomplish is to tie up resources on YOUR mail server.

      The spammer already IS tying up 30-50% of the resources on the mail server; if you throttle the bastards back they'll end up using less. What would you prefer a few hundred megs of spam on your hard disk or a few kilobytes of spam that trickled in over a few days till they eventually kill the run. This way you save both bandwidth AND disk space.

      Either they use their own server, in which case they're easy to spot. Or they use someone else's- in which case chances are, it isn't engineered for lots of parallel connections.

      This scheme may actually work.

      --

      -WolfWithoutAClause

      "Gravity is only a theory, not a fact!"
  8. but its usually from an open relay... by TheGratefulNet · · Score: 4, Insightful

    so exactly WHO are you hurting?

    sure, the open relay deserves some pain. but you're naieve if you think that most spammers send from their OWN systems!

    I have qmail running on my mail hub and I reject mail at the time of connect simply based on the receiver they're trying to send to. when they handshake (part of the HELO exchange) I detect the user they're trying to send to, and since I only have a handful of valid users, its easy to know if they're dictionarying me or not. once I know that, I immediately cut them off, AND add an ipfw (I run freebsd) rule to block all traffic from that IP to my port 25. not only do they NOT get to send any DATA to me, but they're for now on (until it ages out, automatically) forbidden from even connecting to my box. I know that's harsh but I can be that selective since its mostly just me on my mailhub.

    but I don't think for a second that even tarpitting that source IP is punishing the spammer. they've most likely broken into (or found) an open relay and they're routing thru them. they don't even see the 'address not reachable' error due to my firewalling them.

    --

    --
    "It is now safe to switch off your computer."
  9. Increase prior probabilities of spams if suspectIP by EnlightenedDuck · · Score: 3, Insightful
    I mentioned this earlier in the discussion - I'm repeating myself because it also applies here...

    Using a list of the spam-sending IP's and Bayesian methods, one could assign a high prior probability of a message being Spam. The affect would be to slow down the connection on less evidence if its from a suspect IP address, and to require more evidence if its from an IP address that you trust. Thus you preferentially slow-down suspect computers, and allow your friends to get away with more spam-like messages before tarring them.

    --
    Quack!Quack!.....QUACK!!
  10. Re:Nice idea by Snowgen · · Score: 5, Insightful

    what if the spammer sends a message to a (good) SMTP server which haven't got the system, and the SMTP server in turn tries to deliver the "spammail" to the right SMTP server, won't that hurt the good SMTP server, who just tries to do it's job?

    The situation you're describing is called relaying.

    If you start with the assumption that spammers are evil, then the logical conclusion is that there is no such thing as a "good" SMTP server that would relay mail on a spammer's behalf. Servers that do are either in collusion with the spammer, or are mis-configured to allow anonymous relaying. A server that willingly acts in collusion with evil is, by definition, evil. The level of stupidity necessary to allow your sever to act as an open relay also, by definition, precludes being considered a "good" server.

    So the short answer to your query is that it's a non-issue. A truly good server will, by definition, never relay spam!

  11. What the hey by Fished · · Score: 4, Insightful
    Okay, I think you've got what to do down - this is a great idea. The problem is, when to use it?

    Here's what I propose: setup a large number of bogus email accounts. Broadcast them everywhere, and let them be honey-pots for spam. The point is, since you NEVER use this account for anything but dropping in spammable places, anything you receive on it *must* be spam. As soon as you get a connection from a mail server to one of these addresses, you *know* it's an open relay, and you put it in your database -- automatically, with no interaction required.

    Step 2: You also do a "fingerprint" on the spam you get in your honeypot (you know the routine - what's the length, average use of the word "dildo", etc) so that you can identify this particular spam "copy" by the message -- NOT the header. This allows you to automatically filter out spam messages. If the spammers want to adapt, they have to rewrite their copy. As long as your signature algorithm is fairly lose -- that is, not a true hash algorithm -- they should have to do a total rewrite if they don't want to be detected. You can then filter these at the relays. Thus, once again, you raise the cost for them to do their spam. Since you are filtering by actual known-spam content -- that is, you're doing this like they do virus signatures -- you should get virtually no false positives.

    And, anybody whose friends who are emailing them about penis enlargement doesn't really deserve email anyway.

    Anyway, there's step 1 and 2. To summarize:

    1. Lag spammers.
    2. Filter spammers.
    3. ????
    4. Profit - and make sure to send me some.
    --
    "He who would learn astronomy, and other recondite arts, let him go elsewhere. " -- John Calvin, commenting on Genesis 1
  12. Re:Nice idea by Anonymous Coward · · Score: 2, Insightful
    I suppose a fake open relay that forwards nothing is definitely an idea that has merit. However, it still doesnt really hit back. Now, a fake open relay that is a tarpit, as per the article, would be pretty good.

    However, I guess the 'blank relay' is good as a time waster, because they THINK its succeeding. Whereas a tarpit open relay will eventually be ignored.

    As anything, they will evolve to counter these kind of subterfuges, mapping out known good relays could just be done programmatical by either make sure it actually forwards or doesnt hog bandwidth.

    So the only really good solution would be to close all open relays, and tarpit all valid SMTP recievers. Hardly likely to happed, tho =( I guess we will just have to hope there is someday a common denominator alternative to SMTP, and that it actaully gets used!

  13. Re:Interesting idea by ATMAvatar · · Score: 3, Insightful
    As I understand the system, it is meant for those receiving spam, not those unwittingly relaying it. The basic idea is that the laggier the network, the longer it takes to send a message. So if your mailserver pretends to be laggy, it will take more time for a computer to send Spam. Thus, less spam is sent. It has the added advantage of since it accepts every message (though it takes longer if it thinks the message is spam), there is no cost to the user for false positives.

    Nope - you missed what the article was saying. The mailserver being used by the spammer would be slowed down.

    I propose that the running probability from the classifier be used to throttle the connection with the offending server. If an incoming message looks like spam [1], the connection could be slowed dramatically, consuming the spammer's resources and wasting their time [2].


    "Throttling" is when you send ICMP choke packets to a sender, which in turn tells the connection to stop sending so many packets. It's generally used to tell a sender that you cannot handle the number of messages it is sending.

    Now, what this article proposes is that mailservers use software that statistically analyzes messages, and based upon the likelihood of a message being spam, may send choke packets to the sender. You essentially spam the smammer with choke packets until the spammer's SMTP connection slows to a crawl.

    At this point, the spammer can either deal with sending *maybe* a small handful of emails at a time, or give up on spamming. For those businesses that make money off spamming, this would destroy their ability to make any decent money.
    --
    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety."
  14. Re:Nice idea by freeweed · · Score: 2, Insightful

    An open relay is different than the formmail.cgi vulnerability. Ok, so they can result in the same thing, but when people talk about open relays they usually mean production SMTP servers which accept mail from anywhere, instead of verifying the source domain first.

    Matt's formmail script isn't really intended for use as a mail server, but on a webserver (ok, so I'm arguing semantics here :) to just fire off the odd email easily for the admin.

    As for your questions, the idea is *not* to set up false open relays per se, but to set up servers that tie up the 'upstream' mail server. Tarpitting is a pretty cool idea if you ask me - it hurts no one but the spammer, if implemented properly. As for blacklisting/whitelisting servers, sure, let the spammers. Note that if enough people tarpitted, eventually spam wouldn't get *anywhere* - spammers could spam each other all they want, but none of it would ever get delivered.

    Unfortunately the critical mass for this to really work is very, very large.

    --
    Endless arguments over trivial contradictions in books written by ignorant savages to explain thunder in the dark.
  15. Just another stage in the arms race by zatz · · Score: 5, Insightful

    If these tarpits were ubiquitous, they could completely change the economics of spam, creating a scarcity of bandwidth experienced only by spammers.

    Err, I don't think so. This just requires spammers to use more simultaneous connections to overcome the slowdown; it doesn't really increase their network requirements much, only their host CPU requirements. 20,000 simultaneous TCP connections from one process is quite possible with /dev/kqueue under FreeBSD, for example; and you can do the same, but with a bit more CPU wasted, using plain old select() on almost any Unix.

    I also don't understand the rationale behind processing the message incrementally. Why not just do your processing before sending back the final 2xx response to the DATA command? Most spam software does not hang up right after sending the final "\r\n.\r\n" from what I've heard from people who run tarpits.

    How about this instead: when you are confident you are receiving spam, you stop reading from the socket entirely, and send perhaps 10MB of data back on the other side of the connection. (If the other endpoint isn't reading, and consequently you can only send one window worth of data, then do something to get your TCP stack to generate a lot of useless ACKs, or send your trash back one octet at a time and push between them, or something.) The intent being that sending spam to a large number of MTAs configured in this manner rapidly just becomes a way to DDOS *yourself*. Probably this is too disruptive for most sites to want to bother implementing, though :(

    I don't know exactly what the profit margin for spammers is like, but I'm not convinced a small multiplier in network costs is going to matter. Anyway, a lot of these "countermeasures" are mostly going to hurt maintainers of open relays, but if that means they actually fix them, I suppose that is almost as good.

    --

    Java: the COBOL of the new millenium.
  16. Re:This is too complicated by stetsds · · Score: 2, Insightful

    SMTPAUTH helps you not being an open relay.
    But if you want to receive any mail at all, you'll have to accept anonymous SMTP connections from any odd server out there. You just don't relay those mails.

  17. Re:OpenBSD Spam Blocking Engine by mindriot · · Score: 3, Insightful
    in connection with an IP number list (much faster, so suitable for busy servers)

    Another big advantage of going by IP numbers is simply this: I have an IMAP mail account at my university that I use, but I have some external Email addresses as well, which are configured to forward their mail to the university server. Now, if the university's server will add tar based on the message content, I suppose the external mail provider will not be too happy about being slowed down. I would suppose there are quite a number of users simply forwarding mails from one account to another. Maybe (depending on how many people actually use automatic forwarding capabilities) "innocent" servers could be slowed down due to forwarding mail to a "dynamic tarpit", and maybe there are some providers that would not be too happy about such stuff... on the other hand, tarpitting by IP lists seems a little more practical then. But I suppose only practice will show which works best.

  18. Re:Interesting idea by zackbar · · Score: 2, Insightful

    But the spammer could simply be running multiple threads sending spam.

    Sure, one thread is slowed down while it connects to that one server sending throttling packets, but the others won't.

    So while one thread is slowed down waiting to sending packets slower to that server, add'l threads will be creating with the excess cpu.

    Even with 90% of the smtp servers using Tarpit, it would just means that the spammer's machine would have 10 times as many spam threads as he would otherwise.

    Perhaps I'm missing something. I hope so, because anything that hurts a spammer is good.

  19. Why use the statistics? Throttle it all! by drf5n · · Score: 4, Insightful
    Do the statistics on 'spamminness' really improve the system? Wouldn't it be easier to throttle all the email to a site-adjustable rate, and have the same effect on the spammers? The ease of implementation would increase the ubiquity, and it would increase the hardware/software requirements of those who mail massively.

    For example, if your machine only receives a small amount of email per day, why not throttle them to take 10-20 minutes of connect time overall? If you only get two emails per day (one real and one spam), getting them 10 minutes later probably won't bother you too much, but could cost the spammer or his relay-helpers a 5 minute duration on a connection.

    I receive about a hundred emails per day from a number of sources, and adding six to sixty seconds of delay per email wouldn't cause me any grief. But if everyone throttled their email, it might cause someone using their '250 million Valid! Tested! Opt-In!' email lists to have to upgrade their machine to half a million connections to process it in an hour.

    I don't see that differential throttling has any benefit over a contant throttling rate. For a big site, the differentiation between spam and not-spam would probably cost you any load advantage you earned in slowing the spam, and for a small system, the delay would not be noticable.

    Of course, big senders like AOL, prodigy, and yahoo, might have to upgrade...

  20. Re:Nice idea by Zeinfeld · · Score: 2, Insightful
    Tarpitting is a pretty cool idea if you ask me - it hurts no one but the spammer, if implemented properly.

    As with all vigilante actions, it works pretty well if only the bad guys get a lynching.

    The problem with these teergrubbing type schemes is that they typically only hurt the innocent victims caught by accident. It is very unlikely that a bulk email sender program does not have code in it to detect slow connections and abort. Otherwise the bulk sender is going to fail at the least network problem.

    Bulk senders are in any case coded with multiple threads, either by using a threads package like pthreads or in some cases the threading is simulated by maintaining a state machine for each connection. The teergrubbing scheme described only causes pain if the bulk sender is single threaded and blocks when connecting to a single slow server.

    Vigilante hacking frequently goes wrong. Coupling a vigilantge hacking scheme up to a heuristic detection scheme is pure stupidity.

    --
    Looking for an Information Security student project suggestion?
    Try http://dotcrimeManifesto.com/
  21. Re:Nice idea by Zeinfeld · · Score: 2, Insightful
    What do you mean by inocent victim? The spammers arn't and neither ....

    I mean that I simply don't believe that crappy heuristics are accurate enough to use to target attacks. Paul Graham's claim of zero false positives is simply not credible when you compare his claims against the prior experience of using naive (and not so naive) Bayesian filtering.

    So don't imagine for a second that this plan to hurt spamers is not going to backfire on people who are neither spam senders or run misconfigured email relays. This plan is not going to hurt a single spam sender, competent bulk email software has always ontained measures to abandon attempts to connect to slow hosts.

    If the fantasyland claims about the effectiveness of filtering technology were true spam would be an easily solved problem. Unfortunately the MIT conference had only two decent research papers on applying Bayesian filtering to spam, this despite the fact that these were the only solutions papers selected - Judge, Shein and Berkowitz were providing informal descriptions of the problem and not solutions. The first was the talk by the Microsoft research group guy who set out the way to measure the effectiveness of spam detection algorithms. The second was the talk by the MIT undergrad on his class project which was the only one that presented actual comparative data. Unsurprisingly to those of us who have worked on Bayesian approaches to events data in the past the results were considerably mixed. In the end it turned out that the most effective scheme was to use least squares fit rather than the Baysian stuff and the most effective technique turned out to be to look at the message headers rather than the content.

    And about bulk mailers aborting on slow connections, isn't that the point? Hasn't the throttling software just succeeded by stopping 1 or more spams?

    No the point of teergrubbing is to try to hurt the spam sender. As I showed it does not hurt the spam sender at all. You can stop the spam by simply aborting the connection on the server side so no teergrubbing does nothing to stop spam, the premise the scheme starts from is that you already have a mechanism that does that.

    Ultimately this proposal is simply another well intentioned scheme by someone who simply can't see or does not care that their half-baked idea might backfire baddly and create more problems than it solves. It is the same sort of thinking that is behind the idots who run SPEWS. I was at a recent meeting of the top ISPs to discuss the spam problem, turned out that everyone of them had been listed on SPEWS. So 70%++ of the US Internet population has been b,ocked by SPEWS how can people claim that there is NO collateral damage with a straight face? Oh yes that's right they don't answer anything they are completely unaccountable. And yes contrary to the lies put out on the SPEWS site they do list for frivolous reasons, one of the things that can get you listed on SPEWS is simply complaining about them.

    --
    Looking for an Information Security student project suggestion?
    Try http://dotcrimeManifesto.com/
  22. No pain by Brian+Kendig · · Score: 2, Insightful

    The short of it is that there is no legal way to cause spammers pain.

    I've been running a tarpit for the past six months. (Exim + SpamAssassin + SA-Exim) During that time, I've seen that roughly 5% of spammers will sit around for however long I feel like tarpitting them (my timeout is currently four days), while the rest of them are smart enough to disconnect from my tarpit when they see that I'm holding them open.

    But the spammers are using open relays, and there's an infinite supply of open relays. If one of them gets bogged down, they'll just move on to another.

    The especially interesting thing is that I've seen the amount of spam attempts on my server *triple* since I started tarpitting them, from 100/day last year to 300/day now! It's as if the spammers love to be tarpitted!

    And I've found out there's absolutely no way to convince a spammer to remove me from his mailing list. Tarpit him, he doesn't care! Give him a 5xx error code, he doesn't care! Firewall his connection attempts, he doesn't care! It's easier for spammers to sell lists of five million addresses (4.99 million of which don't accept email) than it is to try to pay attention to error messages and failure states and weed out bad addresses. I've even seen spam addressed to the messageID's on Usenet news postings.