How to Get Rid of Referrer Spam?
wikinerd asks: "I have recently opened my own community website. Everything was fine until spammers found it, which happened quite quickly. As usual they filled up my mailboxes, but SpamAssassin can take care of that when it is needed. Then, they discovered my blog and my wikis and employed their bots to fill them up with spam comments. I solved this problem by moderating all comments. Now, however, they employed another evil trick: Referrer spam. They caused my webserver statistics to grow up by orders of magnitude by making their stupid websites to show up on my referrer lists. Unfortunately now my webserver usage statistics are full of viagra, poker, casino, porn, spyware, and pharmacy sites. I am afraid that this is a problem I cannot solve with the knowledge and the tools I have at the moment. So, I came here to ask Slashdot readers: How can I fight referrer spam and what tools are available in a GNU/Linux environment to ensure clean and spam-free usage statistics?"
I'll assume you're using Apache and have access to the .conf, or someone that does.
.conf, or even in .htaccess so you can change them without a restart. If you don't have/want SetEnvIf, you can also use mod_rewrite (E=badreferer:1 at the end of your RewriteRule) to do the same thing.
c om|4free|teen|pussy|discount|inkjet|fuck|hasfun|ca sino|gambling|poker|porn|sex|paris|nude|xxx|hilton |adminshop|devaddict|iaea|peng|just-deals|pisx|tec rep-inc|learnhow|phentermine|terashells|psxtreme|f reakycheats).*" badrefererl ycos|msn|altavista|XXXX).*" !badreferer
First, you need to setup the log you'll use for statistics to exclude requests marked with a "nolog" environment variable.
CustomLog logs/access_log-www.example.com combined env=!badreferer
The following requires Apache's SetEnvIf module. You can put these lines in
#Blacklist (adjust as you need)
SetEnvIfNoCase Referer ".*(credit|hold-em|holdem|mortgage|money|cash|gb.
#Whitelist (optional)
SetEnvIfNoCase Referer ".*(google|yahoo|alltheweb|search|excite|aol.com|
Additionally, you can use the same blocks to deny them access to your site:
<Limit GET HEAD POST>
Order Allow,Deny
Allow from All
Deny from badreferer
</Limit>
<LimitExcept GET HEAD POST>
Order Deny,Allow
Deny from All
</LimitExcept>
I hope I'm not being too rude, but seriously, I googled for referrer spam and bam...first result had some decent advice. This was just the first thing that came up. Add the word "apache" to your query and you will get some very helpful results. Besides, this is Slashdot...not a trove of reliable information/advice. Just start using Apache to start blocking the Mallorys. Also, if you're still posting any kind of statistics or referrers publicly, stop. Spammers wouldn't do this if Bloggers didn't publish that kind of abusable data.
-Turkey
You could write a module that would check entries from your referrer log.
The best way to check if it's spam would be with a bayesian filter.
Sure , it will take some coding / training the filter but this seems to me like the best option.
--> Insert Funny Sig Here
How am I supposed to fit a pithy, relevant quote into 120 characters?
Our initial attempt to solve this was to complain to the ISP of the referrer spammers. That did no good. The ISP was willing to listen, but not to act.
We did manage to actually track down the jerks who were doing the referrer spam. They told us that they were attempting to create links back to their sites for better search engine placement.
Our work-a-round was two fold. For various reasons we wanted to keep these our webalizer stats externally accessible. So we requested bots (the ones that follow the rules at least) to not index our external stats and we modified webalizer to not form links back to the referrers.
We edited our robots.txt file to exclude legit bots from our stats:
We also patched webalizer v2.01-10 to no longer form URLs to referrers. Now only a plain text line without the leading http:// shows up in the table. The original referrer spammers gave up when they lost off the the links back to their sites.
The bottom of the 0.basic.patch prevents webalizer from forming links back to referrers. See README-FIRST for details on this patch set.
chongo (was here)