How to Get Rid of Referrer Spam?

← Back to Stories (view on slashdot.org)

How to Get Rid of Referrer Spam?

Posted by Cliff on Friday February 4, 2005 @03:55AM from the get-a-net-bouncer dept.

wikinerd asks: "I have recently opened my own community website. Everything was fine until spammers found it, which happened quite quickly. As usual they filled up my mailboxes, but SpamAssassin can take care of that when it is needed. Then, they discovered my blog and my wikis and employed their bots to fill them up with spam comments. I solved this problem by moderating all comments. Now, however, they employed another evil trick: Referrer spam. They caused my webserver statistics to grow up by orders of magnitude by making their stupid websites to show up on my referrer lists. Unfortunately now my webserver usage statistics are full of viagra, poker, casino, porn, spyware, and pharmacy sites. I am afraid that this is a problem I cannot solve with the knowledge and the tools I have at the moment. So, I came here to ask Slashdot readers: How can I fight referrer spam and what tools are available in a GNU/Linux environment to ensure clean and spam-free usage statistics?"

56 comments

Min score:

Reason:

Sort:

Here's how for Apache by Anonymous Coward · 2005-02-04 03:56 · Score: 5, Informative

I'll assume you're using Apache and have access to the .conf, or someone that does. First, you need to setup the log you'll use for statistics to exclude requests marked with a "nolog" environment variable. CustomLog logs/access_log-www.example.com combined env=!badreferer The following requires Apache's SetEnvIf module. You can put these lines in .conf, or even in .htaccess so you can change them without a restart. If you don't have/want SetEnvIf, you can also use mod_rewrite (E=badreferer:1 at the end of your RewriteRule) to do the same thing. #Blacklist (adjust as you need) SetEnvIfNoCase Referer ".*(credit|hold-em|holdem|mortgage|money|cash|gb.c om|4free|teen|pussy|discount|inkjet|fuck|hasfun|ca sino|gambling|poker|porn|sex|paris|nude|xxx|hilton |adminshop|devaddict|iaea|peng|just-deals|pisx|tec rep-inc|learnhow|phentermine|terashells|psxtreme|f reakycheats).*" badreferer #Whitelist (optional) SetEnvIfNoCase Referer ".*(google|yahoo|alltheweb|search|excite|aol.com|l ycos|msn|altavista|XXXX).*" !badreferer Additionally, you can use the same blocks to deny them access to your site: <Limit GET HEAD POST> Order Allow,Deny Allow from All Deny from badreferer </Limit> <LimitExcept GET HEAD POST> Order Deny,Allow Deny from All </LimitExcept>
1. Re:Here's how for Apache by wowbagger · 2005-02-04 04:25 · Score: 4, Informative
  
  I'd take it one step further - log the IP addresses of the machines denied by the bad referrer, and report them to their ISP, and to some of the open relay/trojan blacklists.
  
  You could even try configuring your software to use such blacklists to deny trojaned machines access completely.
  
  Additionally, if you wanted, you could then add those IP addresses to your firewall rules to drop the requests at the firewall.
  
  Lastly, you could teergrub them - set things up to...
  
  Respond...
  
  Very...
  
  Slowly...
  
  To...
  
  Their...
  
  Request...
  
  --
  www.eFax.com are spammers
2. Re:Here's how for Apache by Anonymous Coward · 2005-02-04 04:29 · Score: 0
  
  I've noticed a lot of it is done through ISP proxies. Block those, and you end up blocking a LARGE number of visitors.
3. Re:Here's how for Apache by wowbagger · 2005-02-04 04:36 · Score: 1
  
  I've noticed a lot of it is done through ISP proxies. Block those, and you end up blocking a LARGE number of visitors.
  
  Which is why you contact the ISP of the originating connection - to get them to clean up their act.
  
  And if they are unwilling or unable to do so - are you really losing (for the /. crowd - loosing) that much?
  
  And you can also look for the proxied-for headers, and use them to further refine your lists.
  
  --
  www.eFax.com are spammers
4. Re:Here's how for Apache by Anonymous Coward · 2005-02-04 04:42 · Score: 0
  
  Well, in PHP or Perl or whatever language, finding the proxied headers would be relatively easy, but I don't think even mod_rewrite has this capability.
Sample, please? by AtariAmarok · 2005-02-04 04:07 · Score: 1

Could Wikinerd or Cliff post an example of how these appear in Wikinerd's blog? I have a guestbook myself that gets filled with things that say "great site" from some dumb address like cara@aol.com, and then it is filled with a bunch of keyword HTML links to randomly-generated .info sites (5544f45.info, etc) that all go to one of those useless spammy search engines.

--
Don't blame Durga. I voted for Centauri.
1. Re:Sample, please? by Otter · 2005-02-04 04:20 · Score: 1
  
  That's comment spam. Referrer spam targets (I think) sites that have a viewable list of top referrers.
  Here's a series of posts dealing with the issue on LGF. (Note: I'm posting this link in the context of referrer spamming -- no political statement is intended, and no political arguing over it is desired.)
  
  --
  What I'm listening to now on Pandora...
Did you google before posting this? by j-turkey · 2005-02-04 04:08 · Score: 3, Informative

I hope I'm not being too rude, but seriously, I googled for referrer spam and bam...first result had some decent advice. This was just the first thing that came up. Add the word "apache" to your query and you will get some very helpful results. Besides, this is Slashdot...not a trove of reliable information/advice. Just start using Apache to start blocking the Mallorys. Also, if you're still posting any kind of statistics or referrers publicly, stop. Spammers wouldn't do this if Bloggers didn't publish that kind of abusable data.

--

-Turkey
1. Re:Did you google before posting this? by bill_mcgonigle · 2005-02-04 04:17 · Score: 4, Informative
  
  Should you decide to move two centimeters towards rude, slashdot plays nicely with these links.
  
  --
  My God, it's Full of Source!
  OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
2. Re:Did you google before posting this? by BoomerSooner · 2005-02-04 04:49 · Score: 4, Insightful
  
  For those of you out there that still cannot figure it out. Ask slashdot is for the poster but also can provide relevant information to other people that didn't think of the problem in the same way. For example, I do not host any blogs at my company but if I decided to I would have this question and answer set as a good reference (in addition to googling).
  
  Googling info isn't always the best, frequently people contribute things to this blog that you cannot duplicate by a simple query on google.
  
  And last but not least you can always turn ask slashdot off in your preferences....
  
  So for the last fucking time: YES HE CAN GOOGLE IT BUT SHE DECIDED TO ASK SLASHDOT INSTEAD. Move on.
3. Re:Did you google before posting this? by j-turkey · 2005-02-04 05:20 · Score: 1
  
  So for the last fucking time: YES HE CAN GOOGLE IT BUT SHE DECIDED TO ASK SLASHDOT INSTEAD. Move on.
  
  Hey, be nice. Was I really impolite (kinda like you're being right now)? Did I, or did I not provide helpful information to the poster?
  
  Lighten up, Francis.
  
  --
  
  -Turkey
4. Re:Did you google before posting this? by Jerf · 2005-02-04 05:43 · Score: 3, Informative
  
  if you're still posting any kind of statistics or referrers publicly, stop. Spammers wouldn't do this if Bloggers didn't publish that kind of abusable data.
  
  They don't bother checking to see if your site publishes their referrers publically. I don't and I have it anyhow, of course. Also note my site uses a fairly obscure weblogging platform (PyDS), and that I've also customized the templates until there's no recoginizable signiture of any platform on my site, and I was still getting hammered.
  
  I've gone with an .htaccess solution. Here's what I'm currently using, updated just today, based on this:
  RewriteEngine On RewriteBase / RewriteCond %{HTTP_HOST} !^(www.)?jerf.org$ [NC] RewriteCond %{HTTP_REFERER} ^(.*)$ [NC] RewriteRule ^(.*)$ %1 [R=301,L] SetEnvIfNoCase Referer ".*(crescentarian|xanax|datashaping|psxtr|phente|t erash|1stchoic|learnhowtoplay|1stchoice|pharmacy|p rofitbook|auction|cialis|stories-on|levitra|roulet te|prozac|debt|discount|\.biz|alumni|cheat|loan|di et|tax\.|exams|krantas|atlanta|paramountseed|web4u |mcdortablar|reservedi|credit|canadianlabels|8gold |texas-hold|hold-em|holdem|fidelityfunding|condo|s portsparent|mortgage|spoodles|money|cash|hotel|hou seofseven|stmaryonline|newtruths|popwow|oiline|fla feber|thatwhichis|tmsathai|pisoc|crepesuzette|medi avisor|commerce|easymoney|911|.vi|\.gb\.|gb\.com|4 free|macsurfer|teen|pussy|discount|blogincome|lill ystar|aizzo|webdevsquare|laser-eye|escal8|xopy|vix en1|linkerdome|youradulthosting|fick|inkjet-toner| fuck|ime.nu|perfume-cologne|italiancharmsbracelets |shoesdiscount|psnarones|hasfun|casino|gambling|po ker|porn|sex|paris|gabriola|nude|xxx|hilton|pics|v ideo|adminshop|devaddict|iaea|empathica|insurancei nfo|atelebanon|handy-sms|peng|just-deals|pisx|rimp im).*" BadReferrer order deny,allow deny from env=BadReferrer
  You'll get spaces in that of course thanks to Slashdot, so either filter them out, or grab it here. (That's a symlink to the real thing, so it includes a couple of things you don't need; if you understand Apache enough to use this, it should be obvious which that is.)
  
  Don't forget to update the first RewriteCond line to match your server name.
  
  Unfortunately, this has known false positives, but nothing too bad for me yet. But this approach won't scale; we'll either need something more sophisticated, or to make it less useful for referrer spammers until they stop doing it. (The recent "nofollow" tag is a good start, since it's Yet Another way to try to steal Google Juice.)
5. Re:Did you google before posting this? by yog · 2005-02-04 05:47 · Score: 3, Insightful
  
  Yeah, I find ask slashdot useful too. When you filter out the "Why didn't you just google it, moron?" type comments and the "why would you want to do that anyway" trolls, you sometimes get some useful information and discussion regarding the various ways to solve the O.P.'s problem.
  
  I see Ask Slashdot not as a substitute for a simple keyword search but rather a supplemental verification process. I have found that keyword searches don't necessarily reveal best practices; you get unedited, unrefuted claims that you have to sift through. In a reasonably informed techie discussion forum like Slashdot (sometimes), you can get some interesting debate and comparisons on various approaches and methodologies.
  
  And, as you noted, it's a way to be exposed to problems which I don't currently have but might someday. Then when I encounter the problem, I hope a little fragment of memory in my aging brain will bubble to the surface to remind me that it's been discussed on Slashdot.
  
  For researching technical problems, the best thing is to combine Google, Slashdot, Usenet newsgroups, and specialty forums such as (in the O.P.'s case) webhostingtalk.com, spend a little time in each place and take notes. From amongst voluminous chaff generally there's a bit of wheat to be harvested. ;-)
  
  At the risk of belaboring the obvious, it should also be noted that the way to put useful information out there in the first place so that googlers can find it is precisely this sort of forum. Google is only your friend if there's something out there worth searching for.
  
  --
  it's = "it is"; its = possessive. E.g., it's flapping its wings.
6. Re:Did you google before posting this? by fm6 · 2005-02-04 05:50 · Score: 1
  
  I don't blame people for naively posting lame "I'm stuck" questions. I do blame editors for being too lazy to filter them out. And (not for the last time, alas): IT DOESN'T MAKE SENSE TO POST A QUESTION ON SLASHDOT UNLESS IT WILL LEAD TO AN INTERESTING DISCUSSION. A question that can be answered by a simple google is not very interesting.
7. Re:Did you google before posting this? by Anonymous Coward · 2005-02-04 05:55 · Score: 0
  
  See the first post in this discussion for how to whitelist certain referer words.
8. Re:Did you google before posting this? by Jerf · 2005-02-04 06:08 · Score: 1
  
  AC says: See the first post in this discussion for how to whitelist certain referer words.
  
  Thanks. I figured there was something easy to do but I didn't care to dig too deeply :-)
9. Re:Did you google before posting this? by dattaway · 2005-02-04 07:45 · Score: 1
  
  Often when I google for something, often its difficult to get any useful search results besides, "why don't you google for it." While I can appreciate finding information without involvement of a forum directly, searching for information sometimes turns into a recursive black hole.
  
  What I have seen here is a better compilation of information than I have seen yet. So I thank the person for asking.
PHP bayesian filter. by HansF · 2005-02-04 04:09 · Score: 3, Informative

You could write a module that would check entries from your referrer log.
The best way to check if it's spam would be with a bayesian filter.
Sure , it will take some coding / training the filter but this seems to me like the best option.

--
--> Insert Funny Sig Here
1. Re:PHP bayesian filter. by Jerf · 2005-02-04 05:55 · Score: 1
  
  It seems extremely unlikely to me that a Bayesian filter could work. There isn't enough for it to get a hold of. Plus, too much of the referrer spam is entirely new sites, which can be made up arbitrarily.
  
  Bayesian filters are cool and all, but they aren't magic. If you don't understand them, then when you're wondering "why hasn't somebody tried using a Bayesian filter for this problem?", the answer is probably "because it isn't an appropriate solution". After they got popular for spam, there was a mini-renaissance where people applied them to everything... mostly with mixed or poor results.
  
  Bayes requires a lot of tokens in a relatively constrained domain to work; all English and English-like words used to spam things is fairly constrained vs. the number of times you'll recieve them as spam. Each referrer spamming only gives you very limited info, and the range of domains is much wider... until you see "psxtreme.com" actually go by, you can't have a clue whether it is spam or not, and the fact that "ilovemycheese.com" is spam can't help you decide. This explanation is in non-technical English and is a deliberate simplification.
2. Re:PHP bayesian filter. by Anonymous Coward · 2005-02-04 06:00 · Score: 0
  
  What we probably need is a bunch of honeypots to pick up on automated spammers and a fast blacklist to stop them. It'd be best to use similar techniques to those used by Project Honeypot, where you hide the links from humans and only automated robots find them. Then, when enough of these pages get referer spam from the same ip address, you can blacklist that ip address.
3. Re:PHP bayesian filter. by HansF · 2005-02-04 06:34 · Score: 1
  
  Well , I must admit I haven't had any personal experience with this specific referrer-spam problem. But I 've tried the php module and think it learns pretty fast. Maybe i should experiment with the referred script and some URL little later.
  Personally I think it's as good as any solution because it will be smarter and more adaptive than most word-filter ideas mentioned in this thread.
  Furthermore, you raised a valid point. A url is quite limited to filter. But maybe the script could get the referring page and filter that content too?
  
  --
  --> Insert Funny Sig Here
4. Re:PHP bayesian filter. by Jerf · 2005-02-04 08:36 · Score: 1
  
  That might work. You're getting into arms race mode, though; it'd be easy to lie to the server you just spammed with any nice page, even including a nice link to the site you just spammed, while being a spam page for everybody else. The text your Bayes filter recieves no longer necessarily matches what is being sent out. That wouldn't be perfect, but that won't bother the spammers.
  
  Remember, in general, against an intelligent human attacker, only intelligent human vigilence can win. You can "what if, what if, what if" to your heart's content and it can even be fun, but you can't write a program to defeat a determined intelligent human all by itself.
  
  As for your first paragraph which boils down to "But I think it would work", I would encourage you to try it. Maybe I'm wrong. But I've done my time with Bayesian filters, and in my considered and dare I say expert opinion, you're not going to be able to Bayes filter incoming HTTP requests meaningfully. It's only an opinion, but it's an informed one, and it'll take a bit more than "I've tried the PHP module and it learns pretty fast" to change my mind.
You should.... by rudy_wayne · 2005-02-04 04:17 · Score: 2, Funny

Take off and nuke 'em from orbit.

Just to be sure.
1. Re:You should.... by xmas2003 · 2005-02-04 04:34 · Score: 1
  
  Just want to echo parent's comments - it's a losing battle ... and if you publish 'em, they will come.
  I have the web analysis program (Analog) generate privately with the referrers, but anything I put out does NOT show that. For those interested, I have a a page about referrer log spamming.
  
  --
  Hulk SMASH Celiac Disease
2. Re:You should.... by thegrassyknowl · 2005-02-04 15:20 · Score: 1
  
  Reminds me of the part of HGTG when they Ford and Arthur stumble upon the crashed ship full of telephone booth sanitisers and advertising agency workers, etc.
  
  Perhaps we should launch all these questionable people into orbit and crash them into the nearesy star?
  
  --
  I drink to make other people interesting!
Deny them access in the first place by IO+ERROR · 2005-02-04 04:17 · Score: 4, Informative

Here's some handy Apache rules I've collected in my .htaccess file while fighting comment spammers:
<IfModule mod_rewrite.c> RewriteEngine On # Many robots do not handle SGML or HTML correctly. These rules catch them and # punish them: RewriteRule & - [NC,F,L] # Active exploits out in the wild RewriteCond %{HTTP_USER_AGENT} ^(LWP) [NC,OR] # Comment spammer software RewriteCond %{HTTP_USER_AGENT} ^(.*MSIE.*Win.9x.4.90|8484.Boston.Project|grub.cra wler|Indy.Library|Java.1|MSIE.*Windows.XP) [NC,OR] # Miscellaneous suspicious software RewriteCond %{HTTP_USER_AGENT} ^(.*DTS.Agent|libwww-perl|POE-Component-Client|WIS Ebot|.*WISEnutbot) [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^(Mozilla...0)$ [NC,OR] RewriteRule .* - [F,L] # Blank user agents, not a trackback # Needed because WP before 1.5-beta doesn't include a user-agent RewriteCond %{HTTP_USER_AGENT} ^(-?)$ RewriteCond %{REQUEST_URI} !^(.*trackback) [OR] RewriteCond %{REQUEST_METHOD} !^{POST} RewriteRule .* - [F,L] </IfModule>
Also consider the SpamAssassin plugin for WordPress which has also been ported to MovableType.

--
How am I supposed to fit a pithy, relevant quote into 120 characters?
1. Re:Deny them access in the first place by IO+ERROR · 2005-02-04 04:21 · Score: 1
  
  And /. eats my post alive. Two corrections: "grub.crawler" and "WISEbot" should not have spaces in them.
  
  --
  How am I supposed to fit a pithy, relevant quote into 120 characters?
2. Re:Deny them access in the first place by Anonymous Coward · 2005-02-04 04:23 · Score: 0
  
  They'll still show up in his logs unless he uses an environment variable to not log them.
3. Re:Deny them access in the first place by IO+ERROR · 2005-02-04 05:53 · Score: 1
  
  True enough, but the analysis tool should be smart enough to ignore 403 errors when generating statistics anyway. Not logging referer spam is a good idea, but stopping them before they can start is the best way of dealing with comment spam.
  
  --
  How am I supposed to fit a pithy, relevant quote into 120 characters?
Solution for blog spam by Knights+who+say+'INT · 2005-02-04 04:24 · Score: 1

At least for WordPress. It's called Spam Karma. I'm lazy, Google for it.

If Spam Karma finds questionable words in comments -- it's configurable, and it comes with a good default list -- it sends users to a captcha. If they fail at the captcha -- and they're not on a strongbad keyword list like "viagra" and "vegas poker" -- the comments are sent for moderation.

Works great for me. Nope, the URL in my profile is not my blog anymore, it's on my own server, it's in portuguese and I ain't gonna expose my server to a slashdotting.
Password by rehannan · 2005-02-04 04:35 · Score: 1, Interesting

I just password protect the directory with the server stats.
1. Re:Password by Profane+MuthaFucka · 2005-02-04 05:46 · Score: 1
  
  That really works. I was getting a pile of spam hits, but I put a password on the log stats directory and it's dropped off a bit.
  
  --
  Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
2. Re:Password by Anonymous Coward · 2005-02-05 00:25 · Score: 0
  
  That's an extremely important thing to do. Public server logs are the REASON for referrer spam. The spammers don't want to reach server admins. They want to place links to their sites to improve their search engine positions. If there were fewer public logs, referrer spam would become useless and die off after a couple of decades...
3. Re:Password by sglane81 · 2005-02-05 12:20 · Score: 1
  
  That's a great idea, but if you don't link to it in the first place, the search engines won't know it's there (I accept the fact that users might do this for you or you might have done it long ago and can't reverse it). I would also suggest a trick on the spam spiders like this:
  
  1. Set up your robots.txt to disallow random directory.
  2. Put an index file in there that will add any ip that visits the page to your firewall blocklist.
  
  What happens is that ALL good spiders obey the robots.txt and the bad ones use that file for harvesting more stuff. So, the bad spider will follow it and your script will block it at the firewall.
  
  --
  This is the Internet. You can say "fuck" here. - AC
4. Re:Password by Profane+MuthaFucka · 2005-02-05 16:45 · Score: 1
  
  Absolutely a beautiful idea. I will do that. Thanks.
  
  --
  Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
If you use AWSTATS by hairtrigger · 2005-02-04 05:01 · Score: 2, Informative

There is a patch you can apply, available here that will prevent referer spam from showing up in reports.
Fight back, if you have the time by Anonymous Coward · 2005-02-04 05:07 · Score: 0

Obviously you should not be publishing referrers unless you have a way to filter them (see other comments), but since you *are* getting spammed, you could take a moment out to fight back a bit; e.g., you can run up the spammer's bandwidth charges.
make it not worth their time by Anonymous Coward · 2005-02-04 05:13 · Score: 2, Informative

http://www.google.com/googleblog/2005/01/preventin g-comment-spam.html per googleblog: Q: How does a link change? A: Any link that a user can create on your site automatically gets a new "nofollow" attribute. So if a blog spammer previously added a comment like Visit my <a href="http://www.example.com/">discount pharmaceuticals</a> site. That comment would be transformed to Visit my <a href="http://www.example.com/" rel="nofollow">discount pharmaceuticals</a> site. -- just add this for all annon or unapproved links...and make a not on your page so spammers know not to bother.
It's nice to be little by tekiegreg · 2005-02-04 05:19 · Score: 1

At my own homepage (codesweep.com):

A) The code for it is homemade, would be a pain in the butt to re-tool a bot for little old me vs. all the livejournal, blogger, etc. sites out there...
B) I'm so insignificant out there with such low traffic the spammers probably wouldn't care anyways
C) If the spammers do start caring, I can code my blog around them to defeat them. So far it hasn't created a problem, but the stronger the problem the stronger my response will be...

--
...in bed
don't let google see your referrer pages? by aberson · 2005-02-04 05:45 · Score: 2, Informative

Comment spam can be easily stopped by requiring a password - you can even publish the password right on the website so humans see it and bots don't. I did it for moveable type and it was pretty easy as for referrer spam... it seems to me that the only way referrer spam is fruitful is if your log files are publicly visable and if they are parsed by google (etc), unless I don't understand referrer spam. So why not just remove all links to your logfiles, add a .robots file, and maybe even password protect where your logfiles are stored. I would assume that referrer spambot wouldn't even try to target your page unless it knew your referrer logs were linked off your page...
1. Re:don't let google see your referrer pages? by Anonymous Coward · 2005-02-04 06:02 · Score: 0
  
  If you don't want Google to record the page, just add a noindex robots meta tag. If you don't want links on the page to be followed, add nofollow, or all the new rel="nofollow" to the links you don't want followed.
Whois? by jmitchel!jmitchel.co · 2005-02-04 06:55 · Score: 1

I've taken to filtering my e-mail with whois and by protocol deviations. I can see how I could be wrong, but I'm guessing that the same aproach can be thrown at the refer spammers, that:
1> The headers their clients send are different than those of ordinary clients.
2> That the properties revealed by whois are different for refer spammer clients than for ordinary clients.
3> That the whois properties for the spam refer sites are different than those of legitimate sites.

I'll bet that ignoring input from/referring to China and Korea is a good start, and that the bogus sites will tend to cluster in identifiable networks.
Protect your stats by Anonymous Coward · 2005-02-04 07:06 · Score: 1, Informative

If you protect your stats with apache/whatever authentication then robots cant find your stats via google/whatever search engines, and they will probably stop spamming you. I find that every time i unprotect the stats for openphoto.net i get referer spam'd to death.

$0.02,

_Michael.
1. Re:Protect your stats by Kelson · 2005-02-04 08:26 · Score: 1
  
  Nah. My stats page has never been visible without a password, and I get referrer spam all the time. But frankly, I don't care, because it isn't doing them any good.
  
  It's the comment/trackback spam that bugs me, and like another poster said, Spam Karma (on Wordpress, anyway) seems to be working wonders. (This is after trying built-in moderation, three strikes, stopgap, and several other methods)
Captcha, captcha, captcha. by Stavr0 · 2005-02-04 07:13 · Score: 1

Captcha any referral that's not white-listed.
Captcha access to the referral log.
Nobody's said this yet?? by zcat_NZ · 2005-02-04 07:34 · Score: 1

I added /stats/ to my robots.txt.

The stats pages no longer show up on any search engines, so a) The spammers get no 'pagerank' from those links (which is what they do it for) and b) they can't find the stats pages.

I was getting shitloads of referer spam; within a week (as soon as google updated) it dropped to nothing. I've had no referer spam AT ALL since then.

Perhaps they'll start just crawling the entire web, but it appears that at the moment they do a google search to find pages that post their referer stats.

--
455fe10422ca29c4933f95052b792ab2
1. Re:Nobody's said this yet?? by Anonymous Coward · 2005-02-04 07:42 · Score: 0
  
  A friend of mine's blog (which I admin) has never had referers published anywhere and it still gets referer spam. I think some of them may just spam blogs regardless of what's there.
webalizer referrer work-a-round patch by chongo · 2005-02-04 08:02 · Score: 3, Informative

We started seeing this type of spam back in June of 2004. In our case the referrer spam was attempting to get webalizer to create links in the "top N referrer" table back to their pron sites.
Our initial attempt to solve this was to complain to the ISP of the referrer spammers. That did no good. The ISP was willing to listen, but not to act.
We did manage to actually track down the jerks who were doing the referrer spam. They told us that they were attempting to create links back to their sites for better search engine placement.
Our work-a-round was two fold. For various reasons we wanted to keep these our webalizer stats externally accessible. So we requested bots (the ones that follow the rules at least) to not index our external stats and we modified webalizer to not form links back to the referrers.
We edited our robots.txt file to exclude legit bots from our stats:
User-agent: * Disallow: /stats

We also patched webalizer v2.01-10 to no longer form URLs to referrers. Now only a plain text line without the leading http:// shows up in the table. The original referrer spammers gave up when they lost off the the links back to their sites.
The bottom of the 0.basic.patch prevents webalizer from forming links back to referrers. See README-FIRST for details on this patch set.

--
chongo (was here) /\oo/\
Put "rel=nofollow" in the referrer links by JoeD · 2005-02-04 08:32 · Score: 2, Informative

My first suggestion would be to stop publishing the referrer links.

But if you have to, then put "rel=nofollow" in the link itself. This makes Google (and other search engines) discard the link when calculating search rankings.

Go here for more info.
Use Google's own trick by MarkRose · 2005-02-04 10:25 · Score: 1

It was originally intended for comment spam, but just add the same rel="nofollow" to your referrer lists. Read about it. Granted, this won't prevent it, but if everyone starts doing this, this technique will become useless for spammers.

--
Be relentless!
mod_security by Imabug · 2005-02-04 12:51 · Score: 2, Informative

I installed mod_security on my server a few weeks ago with a few simple regexes to cover the more prolific referrer spammers recorded by awstats. Set the mod_security default action to deny,status:412. Then in httpd.conf I set the ErrorDocument for the 412 code to an empty file.

Now when the referer spammer hits my site, they get denied and get nothing back. Bandwidth wasted serving up pages to referer spammers is cut to virtually nil. The spammers are still there banging away and a few still get by though. The list of referrers needs to be monitored so that new mod_security rules can be added as required. That's no different than using mod_rewrite to deny the referrer spammers though.

--
"For I am a Bear of Very Little Brain, and Long Words Bother Me"
another solution by app13b0y · 2005-02-04 19:11 · Score: 1

I believe the problem with spam relies in the stupid lusers that actually click on the links and purchase stuff from them. Lets take a look at some of the latest spam...

Porn: anybody that wants good porn knows to look at p2p solutions (just look in the right spots, it's all there for free)
viagra, etc: if you don't know that it doesn't work, you're an idiot
free stuff: nothing in life is free
special service: there are always string's attached
correct your account information: if you get your identity "stolen" in a scam, you don't even belong using a computer in the first place. perhaps also get rid of your credit cards because they might be "stolen" when you write down your card number and pin and leave it at an internet cafe for a bunch of geeks, basically the same outcome.

Now that we've classified 75% of all spam, lets move on.

There are several ways to solve the problem in weblogs, the main ones include using a combination of the AHBL from sosdg (list of proxies iirc) and the logging of ips from comments. this way anybody who uses a known proxy won't be able to post, and then you can ban ips that post annoying comments anyway. This can help a lot
- The next step is to reformat all links to include the noref thing like mentioned above.
- Use apache2 and linux for hosting your site, (a tad offtopic) this will just keep you more secure in general (NO TROLLING WARS PLEASE!)
- go after the source: help the sosdg(http://www.sosdg.org) by giving them some computer resources or whatever else they could use to track down open proxies, known spammers, etc. and help take them down!
the sosdg took some of the biggest spammers in Spain down by blocking them until their isp's folded and got rid of the spammers. Suprisingly enough, the sosdg and their black lists have spoiled the riches of many spammers, both by emails, comment stoppers, etc.
- use one of those python scripts so each time a comment is to be left the person has to put the numbers and letters in the image in.

Probably the best method is to use a combination of all of these. I hope this helps
1. Re:another solution by Anonymous Coward · 2005-02-05 02:06 · Score: 0
  
  viagra, etc: if you don't know that it doesn't work, you're an idiot
  
  No my friend, you're the idiot. Viagra is medically licensed and does indeed work.
Get it right by Anonymous Coward · 2005-02-05 05:52 · Score: 0

If you're going to quote a source, quote it correctly. "I say we dust off and nuke the site from orbit. It's the only way to be sure." It's also nice to reference the original source, in this case Ripley, from the movie "Aliens" And the web source you used to verify it, in this case http://en.wikiquote.org/wiki/Aliens
Ask a turing question by cyberphotographer · 2005-02-05 22:01 · Score: 0

I wrote a php script that only offers an email address or allows form submission if the client can answer a simple question correctly. Seems to work well except once about a year ago when someone was stil trying to enter 'Clinton' as the name of the president. Here's an example