Domain Based Spam Prevention?

← Back to Stories (view on slashdot.org)

Domain Based Spam Prevention?

Posted by Cliff on Wednesday January 28, 2004 @02:01AM from the more-tools-for-the-spamfighting-toolbox dept.

aralin asks: "Recently I got this idea and wrote a little perl script to extract all the second (third in case of co.uk) level domains from my last month's collection of spam (some 4000 messages). I ran that against a nameserver to find the ones with NS record (valid domains) and made a list for my procmail filter. I get about 10 mails a day that escape to SpamAssassin for various reasons and since I began to check them against my list of domains I caught half of these. The idea is that if they want to sell something, or put a working web bug in my email, they need to provide a valid url with valid domain. If we filter domains from a URL in confirmed spam, then its almost certain any other email referencing such domain is spam as well. What I wanted to ask Slashdot is whether you know about some software project that already uses this form of spam detection as an addition to rule matching and Bayes filters?"

4 of 42 comments (clear)

Min score:

Reason:

Sort:

Easily Defeated by Tom7 · 2004-01-28 02:09 · Score: 4, Insightful

Again the arms race problem: This might work for a while, but once the spammers see a certain level of blocking, they can adjust their spam to circumvent it.

In this case they could start including (hidden, web-bug style) links to popular webmail sites, like hotmail. If you start blocking all messages with links to hotmail, you are probably going to miss some e-mail that you want!
IIRC SendMail allows this already by squiggleslash · 2004-01-28 02:23 · Score: 3, Informative

There's a flag somewhere (which I can't find now I'm looking but I can see the rewrite rules in my sendmail.cf) for many of the Sendmail sendmail.cf configuration macros that will block on unresolvable domain names in the MAIL FROM: line (part of the email "envelope").
I use it, but I'm not happy with it. There are several problems: domain names often are temporarily unresolvable for a variety of reasons (hey, you've been there, you've typed in a perfectly valid website address into your browser, got a message about the name not being resolvable, done it again immediately and it's worked? Right?); and it does encourage Joe Jobs - not necessarily against specific addresses, but against domains.
Joe Jobs in turn are making ISPs adopt anti-spam practices that require emails with certain addresses only come from certain IPs, which in the absense of a standardized remote-access protocol for SMTP smarthosts makes it much more difficult for people to roam and increases the number of ways in which perfectly valid email may fail to be delivered.
If I was an ISP, I definitely wouldn't go down this road. I'm wary of doing it anyway, and by-and-large I'm finding the emails that are blocked using this method are ones that would be blocked anyway, or are what appears to be valid emails with temporary DNS problems. It's something I don't intend to use for much longer, I just hate to have to reconfigure sendmail and see what breaks as a result.

--
You are not alone. This is not normal. None of this is normal.
genetic classification by Glog · 2004-01-28 02:24 · Score: 3, Informative

Yep, a company called Cloudmark (http://www.cloudmark.com/products/authority/techn ology/) uses the DNS method you describe as one of its many rules to distinguish spam from regular mail. They call the approach Genetic Classification with the separate rules being called spamGenes. I don't know how much of a classifier (in the true AI sense) they have built but the idea sounds pretty nifty.
Lots of ways around this by cyways · 2004-01-28 04:00 · Score: 3, Informative

Many of the spams I see these days use throwaway domains or IP addresses in their URLs, so blocking by domain name seems pretty ineffective. Moreover many of the "websites" to which these spams point are actually compromised machines with proxies that refer traffic to the real site. Given that such compromised machines now surely number in the tens or hundreds of thousands, it wouldn't take much effort to construct messages that use the IP address of a randomly selected proxy in each message's embedded URLs.