Domain Based Spam Prevention?

← Back to Stories (view on slashdot.org)

Domain Based Spam Prevention?

Posted by Cliff on Wednesday January 28, 2004 @02:01AM from the more-tools-for-the-spamfighting-toolbox dept.

aralin asks: "Recently I got this idea and wrote a little perl script to extract all the second (third in case of co.uk) level domains from my last month's collection of spam (some 4000 messages). I ran that against a nameserver to find the ones with NS record (valid domains) and made a list for my procmail filter. I get about 10 mails a day that escape to SpamAssassin for various reasons and since I began to check them against my list of domains I caught half of these. The idea is that if they want to sell something, or put a working web bug in my email, they need to provide a valid url with valid domain. If we filter domains from a URL in confirmed spam, then its almost certain any other email referencing such domain is spam as well. What I wanted to ask Slashdot is whether you know about some software project that already uses this form of spam detection as an addition to rule matching and Bayes filters?"

42 comments

Min score:

Reason:

Sort:

Easily Defeated by Tom7 · 2004-01-28 02:09 · Score: 4, Insightful

Again the arms race problem: This might work for a while, but once the spammers see a certain level of blocking, they can adjust their spam to circumvent it.

In this case they could start including (hidden, web-bug style) links to popular webmail sites, like hotmail. If you start blocking all messages with links to hotmail, you are probably going to miss some e-mail that you want!
1. Re:Easily Defeated by Tor · 2004-01-28 05:23 · Score: 1
  
  Again the arms race problem: This might work for a while, but once the spammers see a certain level of blocking, they can adjust their spam to circumvent it.
  
  In this case they could start including (hidden, web-bug style) links to popular webmail sites, like hotmail. If you start blocking all messages with links to hotmail, you are probably going to miss some e-mail that you want!
  
  Also, many times the URLs contained in an e-mail points to a cracked Windoze box, which has been turned into a WWW server by the spammer. (Often, but not always, listening on strange port numbers).
  
  For instance:
  
  http://shopping.yahoo.com%01@%31%39%32%2e%31%36% 38 %2e%31%30%35%28%32%33:3333/porn4all.asp
2. Re:Easily Defeated by gowen · 2004-01-28 05:25 · Score: 2, Informative
  
  There is, as far as I can tell, only one sure way to detect and block spam, and that is the one thing that cannot be forged easily in email headers...
  
  The "Received: " header added by your server. Filtering on anything the spammer can control means an arms race; filtering on the IP address is the only consistent thing, whether the hosts are complicit with spammers (netvision.il, wideopenwest.net, chello.nl) or just too incompetent/lazy to act on reports of trojanned machines on their network (attbi / comcast.net -- 4,200 spams to me and rising -- this means you!)
  
  I block on IP addrees, and block 99% of the spams; the remainder get a polite note to abuse@ ; if they persist in spamming, they go on the blacklist.
  
  --
  Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
3. Re:Easily Defeated by forevermore · 2004-01-28 09:40 · Score: 2, Informative
  
  http://shopping.yahoo.com%01@%31%39%32%2e%31%36%38 %2e%31%30%35%28%32%33:3333/porn4all.asp
  Spamassassin already has rules to catch this kind of obfuscation. However, it wouldn't be hard to merely translate these things back into real IP's. After all, the author of this article has already said that he filters on the 2nd (3rd) level domain name, and in an instance like this, there IS no domain name - any good filter would skip over the stuff before the @ and after the :
  
  --
  Do you really need reason for beer? Wingman Brewers
4. Re:Easily Defeated by unitron · 2004-01-28 18:31 · Score: 1
  
  "If you start blocking all messages with links to hotmail, you are probably going to miss some e-mail that you want!"
  Oh yeah, the email I want to receive is most likely to come from a hotmail account (or yahoo or AOL), right, sure...
  The only solution is to replace email with something else based on "sender pays".
  
  --
  I see even classic Slashdot is now pretty much unusable on dial up anymore.
5. Re:Easily Defeated by Chess_the_cat · 2004-01-31 05:12 · Score: 1
  
  Oh yeah, the email I want to receive is most likely to come from a hotmail account (or yahoo or AOL), right, sure...
  Take it easy. He only said you'd probably miss some e-mail you'd want. He didn't say anything about 'most likely.' And what kind of elitist doesn't accept e-mail from Hotmail? Wow.
  
  --
  Support the First Amendment. Read at -1
How 'bout a mail rule by Asprin · 2004-01-28 02:13 · Score: 2, Insightful

Isn't this just like adding a mail client filtering rule to trash all emails with "mydomain.com" in the body?

Now, having said that, I don't think any mail filter does this explicitly because of problems with legit web page links. All the spammer would need to do is redirect through a page on a hosting service like fortunecity.com or geocities.com.

...although now that I think about it - throwing fortuncity and geocities in your filter list may not be a bad idea either since so little actually goes on there ;) and the interesting stuff is always over their bandwidth limit by the time I get the link. :(

--
"Lawyers are for sucks."
- Doug McKenzie
IIRC SendMail allows this already by squiggleslash · 2004-01-28 02:23 · Score: 3, Informative

There's a flag somewhere (which I can't find now I'm looking but I can see the rewrite rules in my sendmail.cf) for many of the Sendmail sendmail.cf configuration macros that will block on unresolvable domain names in the MAIL FROM: line (part of the email "envelope").
I use it, but I'm not happy with it. There are several problems: domain names often are temporarily unresolvable for a variety of reasons (hey, you've been there, you've typed in a perfectly valid website address into your browser, got a message about the name not being resolvable, done it again immediately and it's worked? Right?); and it does encourage Joe Jobs - not necessarily against specific addresses, but against domains.
Joe Jobs in turn are making ISPs adopt anti-spam practices that require emails with certain addresses only come from certain IPs, which in the absense of a standardized remote-access protocol for SMTP smarthosts makes it much more difficult for people to roam and increases the number of ways in which perfectly valid email may fail to be delivered.
If I was an ISP, I definitely wouldn't go down this road. I'm wary of doing it anyway, and by-and-large I'm finding the emails that are blocked using this method are ones that would be blocked anyway, or are what appears to be valid emails with temporary DNS problems. It's something I don't intend to use for much longer, I just hate to have to reconfigure sendmail and see what breaks as a result.

--
You are not alone. This is not normal. None of this is normal.
1. Re:IIRC SendMail allows this already by schon · 2004-01-28 03:37 · Score: 2, Informative
  
  macros that will block on unresolvable domain names in the MAIL FROM:
  
  There are two checks for this - one rejects (501) mail that comes from bogus domains (domains which do not exist) and one that sends a temporary failure message (451) for domains which are unresolvable.
  
  Such rules are necessary for proper operation of a mail server - the MAIL FROM: should always be a resolvable address (with the exception of empty sender) because that's where the bounces should go.
  
  domain names often are temporarily unresolvable for a variety of reasons
  
  Which is why such domains would get a temporary failure, so that the sending mail server can try again.
  
  There is a difference between "that domain does not exist", and "that domain does not resolve".
  
  it does encourage Joe Jobs - not necessarily against specific addresses, but against domains.
  
  If it's against a domain, then it's not a joe job. You can't joe a domain, only an address (or a group of addresses.) And it doesn't encourage it - spammers have forged their MAIL FROM: address since time immemorial.
  
  valid emails with temporary DNS problems
  
  Again, such mails would get a temporary failure, and would go through once the domain is able to be resolved - so there is really no reason NOT to use such rules.
2. Re:IIRC SendMail allows this already by Anonymous Coward · 2004-01-28 17:50 · Score: 0
  
  in the absense of a standardized remote-access protocol for SMTP
  
  Where did you get this idea? SMTP AUTH has been around for a long time. Go read rfc 2554.
  
  Even outlook and outlook express have supported smtp auth for quite some time.
3. Re:IIRC SendMail allows this already by tck1000 · 2004-01-30 05:04 · Score: 1
  
  You could try using milter-sender, in addition to your SpamAssassin/Milter-Spamc.
  
  Milter-Sender attempts a connect to the MX host of record for the purported From address, and if that MX host does not accept mail for that account, your sendmail will not accept mail _from_ that account.
  
  It's tunable, so you can tell it to wait and try again later, or just pass-thru emails unreachable MX hosts, or just reject them outright.
  
  It's not a perfect solution for what you're looking for, because as spammer just needs to spam from a valid account, and it will pass the milter-sender checks, but when you combine it with milter-spamc/SpamAssasin, your access db, and various other procmail type tools, you end up with a fairly effective solution.
  
  -Tim
genetic classification by Glog · 2004-01-28 02:24 · Score: 3, Informative

Yep, a company called Cloudmark (http://www.cloudmark.com/products/authority/techn ology/) uses the DNS method you describe as one of its many rules to distinguish spam from regular mail. They call the approach Genetic Classification with the separate rules being called spamGenes. I don't know how much of a classifier (in the true AI sense) they have built but the idea sounds pretty nifty.
An Idea by FePe · 2004-01-28 02:30 · Score: 1

I don't know, but the following may be a bad idea: http://www.csc.liv.ac.uk/~ullrich/teaching/MScProj ects/#spam-filter

--
"Until you do what you believe in, how do you know whether you believe in it or not?" -- Leo Tolstoy
Already been done by Zocalo · 2004-01-28 02:44 · Score: 2, Informative

You can grab a config file for SpamAssassin here which has hundreds of spam domains listed, all in nicely optimised regular expressions. I did try this sometime back, but it rapidly became clear that this is very much an arms race. Using a new domain to act as a redirector for each spam run is a minimal overhead for a spammer - maybe they need a 0.0002% response rate instead of 0.0001% which is no big deal for the spammer.
I suppose you could write some scripts to automatically add new domains and expire those beyond a certain age, but I don't see much point. I've been writing custom SpamAssassin rules for a several months now, and for me at least the ones that give the best results by far are the general purpose ones. Sure, if you have a big spam run or something like MyDoom to deal with, then a specific rule can really help, but that seems very much an exception to the rule.
The rules I have most success with are targeting the obfuscation attempts, which is great because if the spammer omits obfuscation then Bayes has a field day instead. Even if you don't use SpamAssassin, the Wiki is great for examples of this kind of rule that you can adapt to your own engine if need be. Best of all, this is the kind of stuff that will *always* work, rather than a rule that will at best have a shelf life of a couple of months before it starts to bog down your mail gateway for no benefit.

--
UNIX? They're not even circumcised! Savages!
Would already get too many false positives for me by Pembers · 2004-01-28 02:51 · Score: 1

Most of us have probably seen spams pushing various pump-and-dump scams. Many of these are just plain text, bragging that such-and-such a stock is undervalued and will skyrocket in the next few {days|weeks|decades} when the company announces that the {RIAA|FBI|SCOX} have placed a $1 {m|b|tr}illion order for their new whizz-bang {frobnicator|KaZaA-killer|pengiun trap}.

Usually, there's no URL, because if you were stupid enough to buy the shares, you'd buy them from someone else. Some of these spams, though, link to things like the company's stock chart on Yahoo! Finance. I get a lot of mail from people with Yahoo! mail accounts, and I'm also on several Yahoo! Groups mailing lists. Messages from either of those sources usually have a little advert for Yahoo! at the bottom. So, for me, at least, blocking messages that have "yahoo.com" in a URL somewhere would cause me to lose a lot of legitimate mail.

Perhaps I'm being thick, but if you're running some sort of Bayesian filter, would it not automatically flag mail containing the offending domain names as probably spam anyway?

--
Just another wannabe fantasy novelist...
Bayes by Gadzinka · 2004-01-28 02:56 · Score: 2, Informative

If I understand properly how bogofilter tokenizes email, it already collects those domains as spam words.

--
Bastard Operator From 193.219.28.162
1. Re:Bayes by aralin · 2004-01-28 08:55 · Score: 1
  
  The point is that domain can look many different ways, letters can be substituted with %D5 or other
  ways and after you demangle the domain in the email and compare it with the list you get better match.
  
  --
  If programs would be read like poetry, most programmers would be Vogons.
Joe-Job by joostje · 2004-01-28 03:22 · Score: 2, Insightful

If we filter domains from a URL in confirmed spam, then its almost certain any other email referencing such domain is spam as well.
OK, the first spammer that wants to irritate you can thus easily block anyone from ever hearing about your website (by running a "joe-job" with your website's URL in it).
Two thoughts by image · 2004-01-28 03:56 · Score: 1

One, wouldn't a normal Bayesian filter do this automatically? I.e., pick up that url in mail classified as spam and then weight it positively in the future?

Two, this doesn't help with the strangest category of spam -- email that doesn't refer to a particular product, include a valid reply-to or from address, or contain any valid urls. Those spam emails are the ones that just blow my mind. They suck up bandwidth, cost everyone money and resources, yet they contain only a few random words, none of which could ever lead to a sale. Around 15% of my spam falls into this mindless category.

I doubt they are just testing email addresses, because relying on bounces isn't effective. And if they don't even include an image for email clients to automatically load for tracking purposes, they seem to be just a total and complete waste (unlike most spam, which is just a waste).
1. Re:Two thoughts by kalidasa · 2004-01-28 04:19 · Score: 1
  
  The spammer is probably using them to soften up the bayesian filtering (maybe get you to classify so much as spam that you start getting too many false positives and have to wipe your bayesian db, as I have?) so the real spams get through.
2. Re:Two thoughts by jmason · 2004-01-28 06:19 · Score: 1
  
  'One, wouldn't a normal Bayesian filter do this automatically? I.e., pick up that url in mail classified as spam and then weight it positively in the future?'
  
  Yep, that's the case, in SpamAssassin 2.6x at least.
3. Re:Two thoughts by perlchild · 2004-01-28 16:51 · Score: 1
  
  Wouldn't just polluting bayes(not the per-user bayes slashdot users are likely to have), but the per-enterprise-appliance-bayes(a small device shared by a bunch of people) or per-webmail-bayes shared by say, yahoo or brightmail. Wouldn't sending two emails, one which increase the chances the other gets through not be a waste, in a spammer's eyes?
  Think of it as a one-two punch against your email box.
Lots of ways around this by cyways · 2004-01-28 04:00 · Score: 3, Informative

Many of the spams I see these days use throwaway domains or IP addresses in their URLs, so blocking by domain name seems pretty ineffective. Moreover many of the "websites" to which these spams point are actually compromised machines with proxies that refer traffic to the real site. Given that such compromised machines now surely number in the tens or hundreds of thousands, it wouldn't take much effort to construct messages that use the IP address of a randomly selected proxy in each message's embedded URLs.
Wont work by skinfitz · 2004-01-28 05:00 · Score: 1

What happens when the spam simply contains a link to a legit site like Microsoft / RedHat / Apple / Network Associates / Norton etc? You are then going to block all messages that mention these sites? You are going to succeed in cutting yourself off from security mailing lists if nothing else.
One big problem with this by macdaddy · 2004-01-28 05:50 · Score: 2, Informative

is that you'll have to use whitelists for all that legit domains. For example receiving spam from spammer.dyndns.com doesn't neccessarily mean that dyndns.com is a spamming domain. You may very well list not-a-spammer.dyndns.com if you chose to block dyndns.com. Likewise for homeunix.org. spammer.homeunix.org isn't the same as not-a-spammer.homeunix.org. You'll have a large and ever-growing whitelist if you use this tactic.
IMHO a better method would be to use the WHOIS information for a given domain name to match it to other spamming domains. I used to maintain the largest list of Alan Ralsky's spamming domains. My list was enormous. Alan had a bad habit (good for us anti-spammers though) of using identical or very similar WHOIS information in each of his spamming domains. This was the case with probably 90% of his spamming domains. He frequently used the same nameservers as well. I think a crafty programmer could come up with a way to use a Bayesian filter to identify spam by the WHOIS records of the domains in a given message that's been marked as spam. This would be a worthwhile project to me. Best of luck.
Filter out HTML by Zork+the+Almighty · 2004-01-28 05:51 · Score: 0

Do what I do and filter out any email which has HTML. I get enough email already. These people are the first to go.

--

In Soviet America the banks rob you!
1. Re:Filter out HTML by dk.r*nger · 2004-01-28 07:18 · Score: 1
  
  And while that might work for you, it won't for those of us who actually need to distingiush spam from ham.
  
  I, for one, am not going to miss a business opportunity (as in a job, not transfereing money out of Nigeria), because the poor guy with the money and the standard Outlook setup sends me a HTML mail.
  
  I might also just stop reading email, y'know .. ?
  
  By the way - I send MIME-multipart mails with both the text and HTML version. And I reply above the quote. So shoot me.
2. Re:Filter out HTML by Anonymous Coward · 2004-01-29 06:39 · Score: 0
  
  bang.
Doing something simular by hords · 2004-01-28 06:30 · Score: 1

I modified qmail and capture a list of all domains into a database. I easily blacklist the spammer's domains through a web interface I made. It has been pretty effective for me. I'm blocking about 100 emails a minute *after* four RBL Blacklists. Plus a few other techniques, and I am blocking about 83% of all email *before* spam assassin.
Another basic idea by Gudlyf · 2004-01-28 07:54 · Score: 1

As I sit with several domain names available at my disposal, I got to thinking that this may be the way one could combat spam - registering your own domainname. Let me explain.

So I have the domain "blah.com" and I want to register for an Ebay account. Instead of simply giving "me@blah.com", I'd instead register "ebay@blah.com" which would just point to my inbox. Now I can easily filter mail appropriately as it comes through. Not only that, but I can tell which places gave my email address out to spaming companies and act accordingly. I can also give out addresses like "mike.hunt@blah.com" and "george.bush@blah.com" for individuals. If I don't want to hear from them anymore, *poof*, I delete the address.

An even better way to make sure your address isn't guessed is to give out nonsense addresses, like "tfg57@blah.com", which you would just make a note of who that address is assigned to. That way george.bush can't email me at mike.hunt because he knows I still get email from him.

I can't imaging this is a new idea, but I figured I'd post it anyway.

--
Trolls lurk everywhere. Mod them down.
1. Re:Another basic idea by cyways · 2004-01-28 12:51 · Score: 1
  
  You don't even need additional 2nd-level domains to do this, just add a 3rd-level domain for this purpose.
  For instance, suppose my normal address is me@mydomain.com, but when I give out my address on websites, I use something like amazon@replies.mydomain.com. In your DNS, just set up an MX record for the subdomain. If you use sendmail, it's easy to add a mailertable entry on the final delivery server like this:
  replies.mydomain.com local:myreplymailbox
  Make sure replies.mydomain.com also appears in /etc/sendmail.cw, or whatever you call your file of local host names.
2. Re:Another basic idea by Sandman1971 · 2004-01-28 14:57 · Score: 1
  
  I've been doing this for years. Your own domain + sendmail aliases is a wonderful, wonderful thing. You'd be surprised, however, how very little spam comes from registering on websites (out of the hundreds of emails I created over the years, I can count on one hand how many received spam. Note: these were from non-messageboard type sites).
  
  Most spam comes from having your address posted on some websites. Even newsgroups don't seem to be heavilly crawled by spammers. I did a test last year, posted to a few newsgroups with a honeypot email address, and received no spam.
  
  --
  It's better to burn out than to fade away
3. Re:Another basic idea by Anonymous Coward · 2004-01-29 03:50 · Score: 0
  
  Some data points:
  
  CDW had a massive leak of their customer leak last year. Some joint called mnjtech (just up the road from them in IL, as it turns out) did it.
  
  When CDNOW sold out to Spamazon, some twit took their customer list and started running address verifiers against it from cable modems in the northeast.
  
  Tagged addresses used with x10.com around 1998 have been getting mail attempts again recently.
  
  I used tagged addresses in all three cases, so it was trivial to turn them into spam traps. Try to mail them and your host gets blocked. Then I check the logs later, do some investigation and block the whole network/domain name/whatever later as appropriate.
  
  The moral of the story is: if you haven't gotten spammed to a tagged address yet, just wait. Sooner or later, some whore at one of those companies will sell you out, and then it will start.
4. Re:Another basic idea by cyt0plas · 2004-01-29 07:14 · Score: 1
  
  http://tmda.net
  
  It lets you do dated addresses that expire after a period of time. It also lets you generate cryptographically signed addresses through the web interface (you-keyword-kht9840w@youraddress.com), so they can't just make them up.
  
  It also allows you to do challenge response where people have to prove they aren't lying about their email address.
  
  --
  Contact Me (got tired of viruses emailing me).
http://marc.merlins.org/linux/exim/sa.html by lkcl · 2004-01-28 07:57 · Score: 1

the configuration for exim4 written for sa-exim by default goes a little further: it looks up not only a reverse-dns but also checks for an MX record. the only problem is that if the ISP's configuration is piss-poor broken, e.g. they pretend to be a host for which they themselves do not have a DNS record (yes i have seen it happen), you will get a response sent to postmaster@sendersdomain. ... and if, as most people do not, you don't _have_ an alias postmaster@sendersdomain, then the sender will get - to them - an unintelligable message about a postmaster not existing. i prefer that to receiving trash: 600 messages a day get rejected by my site, most of them random systems on compromised windows hosts.
Check URLs' IP addresses against some RBLs... by DocSnyder · 2004-01-28 10:06 · Score: 1

...to get the spamvertised ISP's hat color and adjust spam scores.
A while ago, I made a SpamAssassin patch which resolves any URL found within an email and tests the resulting IP addresses against blacklists which are otherwise used to block unwanted email. A lot of Chinese bulletproof servers' IP addresses are listed on the Spamhaus Block List (SBL) and/or SPEWS as well as on certain *.blackholes.us lists.
1. Re:Check URLs' IP addresses against some RBLs... by chriskenrick · 2004-01-28 12:11 · Score: 1
  
  And while we're on the subject of IP address based block lists, I'll add a mandatory plug for the Weighted Private Block List project.
  
  Check it out, it uses a different approach to any other block list I've seen thus far.
Obscured addresses by menders · 2004-01-28 21:30 · Score: 1

I don't know if that will work. It's too easy to obscure URLs.
follow email transport virus by Anonymous Coward · 2004-01-28 21:45 · Score: 0

Those new email worms are done by group of insiders who have top level access to top level servers. These email worms will just follow emails to everyone back to their home. There is no filter or rules that can block them.
TO SOLVE THIS PROBLEM OF WORMS WE MUST SECURE THE INSIDE SERVERS. NO SYSTEM ADMIN AT THE TOP LEVEL CAN BE TRUSTED. SORRY FOLKS. My suggestion is that EMAIL TRANSPORT must be redesigned with multi level of accountability features.

Honolulu
Handling first contact with legitimate clients? by tepples · 2004-01-29 16:30 · Score: 1

Oh yeah, the email I want to receive is most likely to come from a hotmail account (or yahoo or AOL), right, sure...

I'll assume that was sarcasm. What if one of your clients uses an account on Hotmail, Yahoo! mail, or AOL mail as his or her primary e-mail account? Or do you whitelist only clients who have approached you through a web form?
question by H0B0 · 2004-02-01 21:26 · Score: 1

all users of cable modems in north america are required to use a 24.xxx IP. So, why arent all email servers required to use a predetermined range of IPs, eg. 25, 64, or one of the yet-to-be-assigned ranges. If all spammers were kept within one range, it would be far easier to stop spammers (they couldnt rent a fly-by-night IP in a different range, for example), and to catch them when they do spam. There are other benefits (reducing virii, worms, etc.) that would be produced if an email server range were to be established. H0B0