Slashdot Mirror


How to Get Rid of Referrer Spam?

wikinerd asks: "I have recently opened my own community website. Everything was fine until spammers found it, which happened quite quickly. As usual they filled up my mailboxes, but SpamAssassin can take care of that when it is needed. Then, they discovered my blog and my wikis and employed their bots to fill them up with spam comments. I solved this problem by moderating all comments. Now, however, they employed another evil trick: Referrer spam. They caused my webserver statistics to grow up by orders of magnitude by making their stupid websites to show up on my referrer lists. Unfortunately now my webserver usage statistics are full of viagra, poker, casino, porn, spyware, and pharmacy sites. I am afraid that this is a problem I cannot solve with the knowledge and the tools I have at the moment. So, I came here to ask Slashdot readers: How can I fight referrer spam and what tools are available in a GNU/Linux environment to ensure clean and spam-free usage statistics?"

14 of 56 comments (clear)

  1. Here's how for Apache by Anonymous Coward · · Score: 5, Informative

    I'll assume you're using Apache and have access to the .conf, or someone that does.

    First, you need to setup the log you'll use for statistics to exclude requests marked with a "nolog" environment variable.

    CustomLog logs/access_log-www.example.com combined env=!badreferer

    The following requires Apache's SetEnvIf module. You can put these lines in .conf, or even in .htaccess so you can change them without a restart. If you don't have/want SetEnvIf, you can also use mod_rewrite (E=badreferer:1 at the end of your RewriteRule) to do the same thing.

    #Blacklist (adjust as you need)
    SetEnvIfNoCase Referer ".*(credit|hold-em|holdem|mortgage|money|cash|gb.c om|4free|teen|pussy|discount|inkjet|fuck|hasfun|ca sino|gambling|poker|porn|sex|paris|nude|xxx|hilton |adminshop|devaddict|iaea|peng|just-deals|pisx|tec rep-inc|learnhow|phentermine|terashells|psxtreme|f reakycheats).*" badreferer
    #Whitelist (optional)
    SetEnvIfNoCase Referer ".*(google|yahoo|alltheweb|search|excite|aol.com|l ycos|msn|altavista|XXXX).*" !badreferer

    Additionally, you can use the same blocks to deny them access to your site:

    <Limit GET HEAD POST>
    Order Allow,Deny
    Allow from All
    Deny from badreferer
    </Limit>

    <LimitExcept GET HEAD POST>
    Order Deny,Allow
    Deny from All
    </LimitExcept>

    1. Re:Here's how for Apache by wowbagger · · Score: 4, Informative

      I'd take it one step further - log the IP addresses of the machines denied by the bad referrer, and report them to their ISP, and to some of the open relay/trojan blacklists.

      You could even try configuring your software to use such blacklists to deny trojaned machines access completely.

      Additionally, if you wanted, you could then add those IP addresses to your firewall rules to drop the requests at the firewall.

      Lastly, you could teergrub them - set things up to...

      Respond...

      Very...

      Slowly...

      To...

      Their...

      Request...

  2. Did you google before posting this? by j-turkey · · Score: 3, Informative

    I hope I'm not being too rude, but seriously, I googled for referrer spam and bam...first result had some decent advice. This was just the first thing that came up. Add the word "apache" to your query and you will get some very helpful results. Besides, this is Slashdot...not a trove of reliable information/advice. Just start using Apache to start blocking the Mallorys. Also, if you're still posting any kind of statistics or referrers publicly, stop. Spammers wouldn't do this if Bloggers didn't publish that kind of abusable data.

    --

    -Turkey

    1. Re:Did you google before posting this? by bill_mcgonigle · · Score: 4, Informative

      Should you decide to move two centimeters towards rude, slashdot plays nicely with these links.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    2. Re:Did you google before posting this? by Jerf · · Score: 3, Informative
      if you're still posting any kind of statistics or referrers publicly, stop. Spammers wouldn't do this if Bloggers didn't publish that kind of abusable data.

      They don't bother checking to see if your site publishes their referrers publically. I don't and I have it anyhow, of course. Also note my site uses a fairly obscure weblogging platform (PyDS), and that I've also customized the templates until there's no recoginizable signiture of any platform on my site, and I was still getting hammered.

      I've gone with an .htaccess solution. Here's what I'm currently using, updated just today, based on this:
      RewriteEngine On
      RewriteBase /
      RewriteCond %{HTTP_HOST} !^(www.)?jerf.org$ [NC]
      RewriteCond %{HTTP_REFERER} ^(.*)$ [NC]
      RewriteRule ^(.*)$ %1 [R=301,L]
      SetEnvIfNoCase Referer ".*(crescentarian|xanax|datashaping|psxtr|phente|t erash|1stchoic|learnhowtoplay|1stchoice|pharmacy|p rofitbook|auction|cialis|stories-on|levitra|roulet te|prozac|debt|discount|\.biz|alumni|cheat|loan|di et|tax\.|exams|krantas|atlanta|paramountseed|web4u |mcdortablar|reservedi|credit|canadianlabels|8gold |texas-hold|hold-em|holdem|fidelityfunding|condo|s portsparent|mortgage|spoodles|money|cash|hotel|hou seofseven|stmaryonline|newtruths|popwow|oiline|fla feber|thatwhichis|tmsathai|pisoc|crepesuzette|medi avisor|commerce|easymoney|911|.vi|\.gb\.|gb\.com|4 free|macsurfer|teen|pussy|discount|blogincome|lill ystar|aizzo|webdevsquare|laser-eye|escal8|xopy|vix en1|linkerdome|youradulthosting|fick|inkjet-toner| fuck|ime.nu|perfume-cologne|italiancharmsbracelets |shoesdiscount|psnarones|hasfun|casino|gambling|po ker|porn|sex|paris|gabriola|nude|xxx|hilton|pics|v ideo|adminshop|devaddict|iaea|empathica|insurancei nfo|atelebanon|handy-sms|peng|just-deals|pisx|rimp im).*" BadReferrer
      order deny,allow
      deny from env=BadReferrer
      You'll get spaces in that of course thanks to Slashdot, so either filter them out, or grab it here. (That's a symlink to the real thing, so it includes a couple of things you don't need; if you understand Apache enough to use this, it should be obvious which that is.)

      Don't forget to update the first RewriteCond line to match your server name.

      Unfortunately, this has known false positives, but nothing too bad for me yet. But this approach won't scale; we'll either need something more sophisticated, or to make it less useful for referrer spammers until they stop doing it. (The recent "nofollow" tag is a good start, since it's Yet Another way to try to steal Google Juice.)
  3. PHP bayesian filter. by HansF · · Score: 3, Informative

    You could write a module that would check entries from your referrer log.
    The best way to check if it's spam would be with a bayesian filter.
    Sure , it will take some coding / training the filter but this seems to me like the best option.

    --
    --> Insert Funny Sig Here
  4. Deny them access in the first place by IO+ERROR · · Score: 4, Informative
    Here's some handy Apache rules I've collected in my .htaccess file while fighting comment spammers:
    <IfModule mod_rewrite.c>
    RewriteEngine On
    # Many robots do not handle SGML or HTML correctly. These rules catch them and
    # punish them:
    RewriteRule &amp; - [NC,F,L]
    # Active exploits out in the wild
    RewriteCond %{HTTP_USER_AGENT} ^(LWP) [NC,OR]
    # Comment spammer software
    RewriteCond %{HTTP_USER_AGENT} ^(.*MSIE.*Win.9x.4.90|8484.Boston.Project|grub.cra wler|Indy.Library|Java.1|MSIE.*Windows.XP) [NC,OR]
    # Miscellaneous suspicious software
    RewriteCond %{HTTP_USER_AGENT} ^(.*DTS.Agent|libwww-perl|POE-Component-Client|WIS Ebot|.*WISEnutbot) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(Mozilla...0)$ [NC,OR]
    RewriteRule .* - [F,L]

    # Blank user agents, not a trackback
    # Needed because WP before 1.5-beta doesn't include a user-agent
    RewriteCond %{HTTP_USER_AGENT} ^(-?)$
    RewriteCond %{REQUEST_URI} !^(.*trackback) [OR]
    RewriteCond %{REQUEST_METHOD} !^{POST}
    RewriteRule .* - [F,L]
    </IfModule>
    Also consider the SpamAssassin plugin for WordPress which has also been ported to MovableType.
    --
    How am I supposed to fit a pithy, relevant quote into 120 characters?
  5. If you use AWSTATS by hairtrigger · · Score: 2, Informative

    There is a patch you can apply, available here that will prevent referer spam from showing up in reports.

  6. make it not worth their time by Anonymous Coward · · Score: 2, Informative

    http://www.google.com/googleblog/2005/01/preventin g-comment-spam.html

    per googleblog:

    Q: How does a link change?
    A: Any link that a user can create on your site automatically gets a new "nofollow" attribute. So if a blog spammer previously added a comment like

    Visit my <a href="http://www.example.com/">discount pharmaceuticals</a> site.

    That comment would be transformed to

    Visit my <a href="http://www.example.com/" rel="nofollow">discount pharmaceuticals</a> site.

    --

    just add this for all annon or unapproved links...and make a not on your page so spammers know not to bother.

  7. don't let google see your referrer pages? by aberson · · Score: 2, Informative

    Comment spam can be easily stopped by requiring a password - you can even publish the password right on the website so humans see it and bots don't. I did it for moveable type and it was pretty easy as for referrer spam... it seems to me that the only way referrer spam is fruitful is if your log files are publicly visable and if they are parsed by google (etc), unless I don't understand referrer spam. So why not just remove all links to your logfiles, add a .robots file, and maybe even password protect where your logfiles are stored. I would assume that referrer spambot wouldn't even try to target your page unless it knew your referrer logs were linked off your page...

  8. Protect your stats by Anonymous Coward · · Score: 1, Informative

    If you protect your stats with apache/whatever authentication then robots cant find your stats via google/whatever search engines, and they will probably stop spamming you. I find that every time i unprotect the stats for openphoto.net i get referer spam'd to death.

    $0.02,

    _Michael.

  9. webalizer referrer work-a-round patch by chongo · · Score: 3, Informative
    We started seeing this type of spam back in June of 2004. In our case the referrer spam was attempting to get webalizer to create links in the "top N referrer" table back to their pron sites.

    Our initial attempt to solve this was to complain to the ISP of the referrer spammers. That did no good. The ISP was willing to listen, but not to act.

    We did manage to actually track down the jerks who were doing the referrer spam. They told us that they were attempting to create links back to their sites for better search engine placement.

    Our work-a-round was two fold. For various reasons we wanted to keep these our webalizer stats externally accessible. So we requested bots (the ones that follow the rules at least) to not index our external stats and we modified webalizer to not form links back to the referrers.

    We edited our robots.txt file to exclude legit bots from our stats:

    User-agent: *
    Disallow: /stats

    We also patched webalizer v2.01-10 to no longer form URLs to referrers. Now only a plain text line without the leading http:// shows up in the table. The original referrer spammers gave up when they lost off the the links back to their sites.

    The bottom of the 0.basic.patch prevents webalizer from forming links back to referrers. See README-FIRST for details on this patch set.

    --
    chongo (was here) /\oo/\
  10. Put "rel=nofollow" in the referrer links by JoeD · · Score: 2, Informative

    My first suggestion would be to stop publishing the referrer links.

    But if you have to, then put "rel=nofollow" in the link itself. This makes Google (and other search engines) discard the link when calculating search rankings.

    Go here for more info.

  11. mod_security by Imabug · · Score: 2, Informative

    I installed mod_security on my server a few weeks ago with a few simple regexes to cover the more prolific referrer spammers recorded by awstats. Set the mod_security default action to deny,status:412. Then in httpd.conf I set the ErrorDocument for the 412 code to an empty file.

    Now when the referer spammer hits my site, they get denied and get nothing back. Bandwidth wasted serving up pages to referer spammers is cut to virtually nil. The spammers are still there banging away and a few still get by though. The list of referrers needs to be monitored so that new mod_security rules can be added as required. That's no different than using mod_rewrite to deny the referrer spammers though.

    --
    "For I am a Bear of Very Little Brain, and Long Words Bother Me"