Slashdot Mirror


Domain Based Spam Prevention?

aralin asks: "Recently I got this idea and wrote a little perl script to extract all the second (third in case of co.uk) level domains from my last month's collection of spam (some 4000 messages). I ran that against a nameserver to find the ones with NS record (valid domains) and made a list for my procmail filter. I get about 10 mails a day that escape to SpamAssassin for various reasons and since I began to check them against my list of domains I caught half of these. The idea is that if they want to sell something, or put a working web bug in my email, they need to provide a valid url with valid domain. If we filter domains from a URL in confirmed spam, then its almost certain any other email referencing such domain is spam as well. What I wanted to ask Slashdot is whether you know about some software project that already uses this form of spam detection as an addition to rule matching and Bayes filters?"

9 of 42 comments (clear)

  1. IIRC SendMail allows this already by squiggleslash · · Score: 3, Informative
    There's a flag somewhere (which I can't find now I'm looking but I can see the rewrite rules in my sendmail.cf) for many of the Sendmail sendmail.cf configuration macros that will block on unresolvable domain names in the MAIL FROM: line (part of the email "envelope").

    I use it, but I'm not happy with it. There are several problems: domain names often are temporarily unresolvable for a variety of reasons (hey, you've been there, you've typed in a perfectly valid website address into your browser, got a message about the name not being resolvable, done it again immediately and it's worked? Right?); and it does encourage Joe Jobs - not necessarily against specific addresses, but against domains.

    Joe Jobs in turn are making ISPs adopt anti-spam practices that require emails with certain addresses only come from certain IPs, which in the absense of a standardized remote-access protocol for SMTP smarthosts makes it much more difficult for people to roam and increases the number of ways in which perfectly valid email may fail to be delivered.

    If I was an ISP, I definitely wouldn't go down this road. I'm wary of doing it anyway, and by-and-large I'm finding the emails that are blocked using this method are ones that would be blocked anyway, or are what appears to be valid emails with temporary DNS problems. It's something I don't intend to use for much longer, I just hate to have to reconfigure sendmail and see what breaks as a result.

    --
    You are not alone. This is not normal. None of this is normal.
    1. Re:IIRC SendMail allows this already by schon · · Score: 2, Informative

      macros that will block on unresolvable domain names in the MAIL FROM:

      There are two checks for this - one rejects (501) mail that comes from bogus domains (domains which do not exist) and one that sends a temporary failure message (451) for domains which are unresolvable.

      Such rules are necessary for proper operation of a mail server - the MAIL FROM: should always be a resolvable address (with the exception of empty sender) because that's where the bounces should go.

      domain names often are temporarily unresolvable for a variety of reasons

      Which is why such domains would get a temporary failure, so that the sending mail server can try again.

      There is a difference between "that domain does not exist", and "that domain does not resolve".

      it does encourage Joe Jobs - not necessarily against specific addresses, but against domains.

      If it's against a domain, then it's not a joe job. You can't joe a domain, only an address (or a group of addresses.) And it doesn't encourage it - spammers have forged their MAIL FROM: address since time immemorial.

      valid emails with temporary DNS problems

      Again, such mails would get a temporary failure, and would go through once the domain is able to be resolved - so there is really no reason NOT to use such rules.

  2. genetic classification by Glog · · Score: 3, Informative

    Yep, a company called Cloudmark (http://www.cloudmark.com/products/authority/techn ology/) uses the DNS method you describe as one of its many rules to distinguish spam from regular mail. They call the approach Genetic Classification with the separate rules being called spamGenes. I don't know how much of a classifier (in the true AI sense) they have built but the idea sounds pretty nifty.

  3. Already been done by Zocalo · · Score: 2, Informative
    You can grab a config file for SpamAssassin here which has hundreds of spam domains listed, all in nicely optimised regular expressions. I did try this sometime back, but it rapidly became clear that this is very much an arms race. Using a new domain to act as a redirector for each spam run is a minimal overhead for a spammer - maybe they need a 0.0002% response rate instead of 0.0001% which is no big deal for the spammer.

    I suppose you could write some scripts to automatically add new domains and expire those beyond a certain age, but I don't see much point. I've been writing custom SpamAssassin rules for a several months now, and for me at least the ones that give the best results by far are the general purpose ones. Sure, if you have a big spam run or something like MyDoom to deal with, then a specific rule can really help, but that seems very much an exception to the rule.

    The rules I have most success with are targeting the obfuscation attempts, which is great because if the spammer omits obfuscation then Bayes has a field day instead. Even if you don't use SpamAssassin, the Wiki is great for examples of this kind of rule that you can adapt to your own engine if need be. Best of all, this is the kind of stuff that will *always* work, rather than a rule that will at best have a shelf life of a couple of months before it starts to bog down your mail gateway for no benefit.

    --
    UNIX? They're not even circumcised! Savages!
  4. Bayes by Gadzinka · · Score: 2, Informative

    If I understand properly how bogofilter tokenizes email, it already collects those domains as spam words.

    --
    Bastard Operator From 193.219.28.162
  5. Lots of ways around this by cyways · · Score: 3, Informative

    Many of the spams I see these days use throwaway domains or IP addresses in their URLs, so blocking by domain name seems pretty ineffective. Moreover many of the "websites" to which these spams point are actually compromised machines with proxies that refer traffic to the real site. Given that such compromised machines now surely number in the tens or hundreds of thousands, it wouldn't take much effort to construct messages that use the IP address of a randomly selected proxy in each message's embedded URLs.

  6. Re:Easily Defeated by gowen · · Score: 2, Informative

    There is, as far as I can tell, only one sure way to detect and block spam, and that is the one thing that cannot be forged easily in email headers...

    The "Received: " header added by your server. Filtering on anything the spammer can control means an arms race; filtering on the IP address is the only consistent thing, whether the hosts are complicit with spammers (netvision.il, wideopenwest.net, chello.nl) or just too incompetent/lazy to act on reports of trojanned machines on their network (attbi / comcast.net -- 4,200 spams to me and rising -- this means you!)

    I block on IP addrees, and block 99% of the spams; the remainder get a polite note to abuse@ ; if they persist in spamming, they go on the blacklist.

    --
    Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
  7. One big problem with this by macdaddy · · Score: 2, Informative
    is that you'll have to use whitelists for all that legit domains. For example receiving spam from spammer.dyndns.com doesn't neccessarily mean that dyndns.com is a spamming domain. You may very well list not-a-spammer.dyndns.com if you chose to block dyndns.com. Likewise for homeunix.org. spammer.homeunix.org isn't the same as not-a-spammer.homeunix.org. You'll have a large and ever-growing whitelist if you use this tactic.

    IMHO a better method would be to use the WHOIS information for a given domain name to match it to other spamming domains. I used to maintain the largest list of Alan Ralsky's spamming domains. My list was enormous. Alan had a bad habit (good for us anti-spammers though) of using identical or very similar WHOIS information in each of his spamming domains. This was the case with probably 90% of his spamming domains. He frequently used the same nameservers as well. I think a crafty programmer could come up with a way to use a Bayesian filter to identify spam by the WHOIS records of the domains in a given message that's been marked as spam. This would be a worthwhile project to me. Best of luck.

  8. Re:Easily Defeated by forevermore · · Score: 2, Informative
    http://shopping.yahoo.com%01@%31%39%32%2e%31%36%38 %2e%31%30%35%28%32%33:3333/porn4all.asp

    Spamassassin already has rules to catch this kind of obfuscation. However, it wouldn't be hard to merely translate these things back into real IP's. After all, the author of this article has already said that he filters on the 2nd (3rd) level domain name, and in an instance like this, there IS no domain name - any good filter would skip over the stuff before the @ and after the :

    --
    Do you really need reason for beer? Wingman Brewers