Slashdot Mirror


How to Get Rid of Referrer Spam?

wikinerd asks: "I have recently opened my own community website. Everything was fine until spammers found it, which happened quite quickly. As usual they filled up my mailboxes, but SpamAssassin can take care of that when it is needed. Then, they discovered my blog and my wikis and employed their bots to fill them up with spam comments. I solved this problem by moderating all comments. Now, however, they employed another evil trick: Referrer spam. They caused my webserver statistics to grow up by orders of magnitude by making their stupid websites to show up on my referrer lists. Unfortunately now my webserver usage statistics are full of viagra, poker, casino, porn, spyware, and pharmacy sites. I am afraid that this is a problem I cannot solve with the knowledge and the tools I have at the moment. So, I came here to ask Slashdot readers: How can I fight referrer spam and what tools are available in a GNU/Linux environment to ensure clean and spam-free usage statistics?"

5 of 56 comments (clear)

  1. Here's how for Apache by Anonymous Coward · · Score: 5, Informative

    I'll assume you're using Apache and have access to the .conf, or someone that does.

    First, you need to setup the log you'll use for statistics to exclude requests marked with a "nolog" environment variable.

    CustomLog logs/access_log-www.example.com combined env=!badreferer

    The following requires Apache's SetEnvIf module. You can put these lines in .conf, or even in .htaccess so you can change them without a restart. If you don't have/want SetEnvIf, you can also use mod_rewrite (E=badreferer:1 at the end of your RewriteRule) to do the same thing.

    #Blacklist (adjust as you need)
    SetEnvIfNoCase Referer ".*(credit|hold-em|holdem|mortgage|money|cash|gb.c om|4free|teen|pussy|discount|inkjet|fuck|hasfun|ca sino|gambling|poker|porn|sex|paris|nude|xxx|hilton |adminshop|devaddict|iaea|peng|just-deals|pisx|tec rep-inc|learnhow|phentermine|terashells|psxtreme|f reakycheats).*" badreferer
    #Whitelist (optional)
    SetEnvIfNoCase Referer ".*(google|yahoo|alltheweb|search|excite|aol.com|l ycos|msn|altavista|XXXX).*" !badreferer

    Additionally, you can use the same blocks to deny them access to your site:

    <Limit GET HEAD POST>
    Order Allow,Deny
    Allow from All
    Deny from badreferer
    </Limit>

    <LimitExcept GET HEAD POST>
    Order Deny,Allow
    Deny from All
    </LimitExcept>

    1. Re:Here's how for Apache by wowbagger · · Score: 4, Informative

      I'd take it one step further - log the IP addresses of the machines denied by the bad referrer, and report them to their ISP, and to some of the open relay/trojan blacklists.

      You could even try configuring your software to use such blacklists to deny trojaned machines access completely.

      Additionally, if you wanted, you could then add those IP addresses to your firewall rules to drop the requests at the firewall.

      Lastly, you could teergrub them - set things up to...

      Respond...

      Very...

      Slowly...

      To...

      Their...

      Request...

  2. Re:Did you google before posting this? by bill_mcgonigle · · Score: 4, Informative

    Should you decide to move two centimeters towards rude, slashdot plays nicely with these links.

    --
    My God, it's Full of Source!
    OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  3. Deny them access in the first place by IO+ERROR · · Score: 4, Informative
    Here's some handy Apache rules I've collected in my .htaccess file while fighting comment spammers:
    <IfModule mod_rewrite.c>
    RewriteEngine On
    # Many robots do not handle SGML or HTML correctly. These rules catch them and
    # punish them:
    RewriteRule &amp; - [NC,F,L]
    # Active exploits out in the wild
    RewriteCond %{HTTP_USER_AGENT} ^(LWP) [NC,OR]
    # Comment spammer software
    RewriteCond %{HTTP_USER_AGENT} ^(.*MSIE.*Win.9x.4.90|8484.Boston.Project|grub.cra wler|Indy.Library|Java.1|MSIE.*Windows.XP) [NC,OR]
    # Miscellaneous suspicious software
    RewriteCond %{HTTP_USER_AGENT} ^(.*DTS.Agent|libwww-perl|POE-Component-Client|WIS Ebot|.*WISEnutbot) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(Mozilla...0)$ [NC,OR]
    RewriteRule .* - [F,L]

    # Blank user agents, not a trackback
    # Needed because WP before 1.5-beta doesn't include a user-agent
    RewriteCond %{HTTP_USER_AGENT} ^(-?)$
    RewriteCond %{REQUEST_URI} !^(.*trackback) [OR]
    RewriteCond %{REQUEST_METHOD} !^{POST}
    RewriteRule .* - [F,L]
    </IfModule>
    Also consider the SpamAssassin plugin for WordPress which has also been ported to MovableType.
    --
    How am I supposed to fit a pithy, relevant quote into 120 characters?
  4. Re:Did you google before posting this? by BoomerSooner · · Score: 4, Insightful

    For those of you out there that still cannot figure it out. Ask slashdot is for the poster but also can provide relevant information to other people that didn't think of the problem in the same way. For example, I do not host any blogs at my company but if I decided to I would have this question and answer set as a good reference (in addition to googling).

    Googling info isn't always the best, frequently people contribute things to this blog that you cannot duplicate by a simple query on google.

    And last but not least you can always turn ask slashdot off in your preferences....

    So for the last fucking time: YES HE CAN GOOGLE IT BUT SHE DECIDED TO ASK SLASHDOT INSTEAD. Move on.