Stopping SpamBots With Apache
primetyme writes: "Sick of email harvesting spam robots cruising your Apache based site? Here's an in depth article that shows one way you can configure a base Apache installation to keep those nasty bots of your site - and the spam out of your Inbox." Anything that helps annoy spammers is a good thing.
is to give your site a terrible color scheme, like purple & brown. ;-)
If you celebrate Xmas, befriend me (538
is to not install Apache at all. Instead, throw a year or two-old copy of Microsoft IIS and watch the virii propagate. You won't have enough bandwidth or enough minutes of up-time to be able to serve pages with email addresses on them ;-)
If you celebrate Xmas, befriend me (538
First it was the hack to reboot systems asking for your default.ida file. Now it is code to trap and kill spiders...
What is an apache admin to do, it is so configurable there doesn't appear to be anything that it can't do. What is next using apache to brew my morning coffee (well there is the coffee pot cam - anyone know what webserver it ran on) write my website for me, solve world hunger ???
WHY WHY WHY do people run IIS anyway, I would love to see what it would take to do this with IIS, any takers ?
Why bash the president? Let's fvck bin Laden with:
worthlessPOS@taliban.gov, ROOTofallevil@taliban.gov, some MAP address
If you celebrate Xmas, befriend me (538
Checking the user agent won't work for long - how hard will it be for the spammers to change the user agent to "Mozilla..."
Using some client side Javascript would be harder for them to deal with (although if your browser can view it they will be able to also).
I guess graphics would be next...
Surfing slowly, in the Bandwidth Ghetto
I have been using this method for a long time, I don't know how new that article is, but I used it a long time ago to not only block all the spambots I could find, but all of the software for mirroring my webpage as well.
Here is a longer list of common spam bots and mirror bots that I have been able to find:
SetEnvIfNoCase User-Agent "EmailSiphon" bad_bot
SetEnvIfNoCase User-Agent "EmailWolf" bad_bot
SetEnvIfNoCase User-Agent "CherryPickerSE" bad_bot
SetEnvIfNoCase User-Agent "CherryPickerElite" bad_bot
SetEnvIfNoCase User-Agent "Crescent" bad_bot
SetEnvIfNoCase User-Agent "EmailCollector" bad_bot
SetEnvIfNoCase User-Agent "EmailSiphon" bad_bot
SetEnvIfNoCase User-Agent "MCspider" bad_bot
SetEnvIfNoCase User-Agent "bew" bad_bot
SetEnvIfNoCase User-Agent "Deweb" bad_bot
SetEnvIfNoCase User-Agent "FEZhead" bad_bot
SetEnvIfNoCase User-Agent "Fetcher" bad_bot
SetEnvIfNoCase User-Agent "Getleft" bad_bot
SetEnvIfNoCase User-Agent "GetURL" bad_bot
SetEnvIfNoCase User-Agent "HTTrack" bad_bot
SetEnvIfNoCase User-Agent "IBM_Planetwide" bad_bot
SetEnvIfNoCase User-Agent "KWebGet" bad_bot
SetEnvIfNoCase User-Agent "Monster" bad_bot
SetEnvIfNoCase User-Agent "Mirror" bad_bot
SetEnvIfNoCase User-Agent "NetCarta" bad_bot
SetEnvIfNoCase User-Agent "OpaL" bad_bot
SetEnvIfNoCase User-Agent "PackRat" bad_bot
SetEnvIfNoCase User-Agent "pavuk" bad_bot
SetEnvIfNoCase User-Agent "PushSite" bad_bot
SetEnvIfNoCase User-Agent "Rsync" bad_bot
SetEnvIfNoCase User-Agent "Shai" bad_bot
SetEnvIfNoCase User-Agent "Spegla" bad_bot
SetEnvIfNoCase User-Agent "SpiderBot" bad_bot
SetEnvIfNoCase User-Agent "SuperBot" bad_bot
SetEnvIfNoCase User-Agent "tarspider" bad_bot
SetEnvIfNoCase User-Agent "Templeton" bad_bot
SetEnvIfNoCase User-Agent "WebCopy" bad_bot
SetEnvIfNoCase User-Agent "WebFetcher" bad_bot
SetEnvIfNoCase User-Agent "WebMiner" bad_bot
SetEnvIfNoCase User-Agent "webvac" bad_bot
SetEnvIfNoCase User-Agent "webwalk" bad_bot
SetEnvIfNoCase User-Agent "w3mir" bad_bot
SetEnvIfNoCase User-Agent "XGET" bad_bot
SetEnvIfNoCase User-Agent "Wget" bad_bot
SetEnvIfNoCase User-Agent "WebReaper" bad_bot
SetEnvIfNoCase User-Agent "WUMPUS" bad_bot
SetEnvIfNoCase User-Agent "FAST-WebCrawler" bad_bot
http://www.archive.org/details/ThePowerOfNightmares
On the other hand, there are ways to fight spambots; they just don't rely on trusting the user. Here's one way:
There are good ways to deal with spammers but this isn't one of them. It *might* work on a small scale and it definitely won't work on a medium or large scale. It's about as useful as the Sendmail "MX/domain validation" trick that Eric Raymond and the rest of the Sendmail team thought would stop spammers dead in its tracks. (It didn't.) Instead he was "surprised by spam."
-CT
"Here are a couple of the User-Agents that fell for our trap that I pulled out of last months access_log for lists.evolt.org:
Wget/1.6"
Email spider, my ass! Wget is a damn useful HTTP downloader utility which is great for obtaining large files as it can resume interrupted transfers. It can also mirror web sites, which I assume is why it fell into the honeypot. Oh, and you can also change what it says it is on the command line.
And to add my 2 cents to the email problems, one other solution I've seen is to translate email addresses into an image and drop that onto the page. It's not a fantastic solution for those still using Lynx, and you can no longer just click to send mail to somebody, but at least it doesn't go the Javascript route and should be a sufficient technical hurdle to stop automated harvesters for a couple of years at least.
- Anonymous and happy.
I used the tip from the article and put /email-addresses/
.htaccess:
.htaccess to web user's group. This will provide me a list of unique ip's in my .htaccess.
Disallow:
in my robots.txt then in my
ForceType application/x-httpd-php
and in email-addresses:
and chgrp'd
I do selective agent blocking using mod_rewrite directives in .htaccess files. The article claims that mod_rewrite is difficult to learn, but I disagree, and its major advantage is especially visible in shared/virtual hosting environments. If Apache was compiled with mod_rewrite support, anyone on the system can create their own set of agent filters and place them in an .htaccess file. You don't need access to httpd.conf!
/nofilesucking.php [L]
The syntax is simple,
#Send filesucking programs to hell
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^FlashGet.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline Explorer.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^wget.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar.* [NC]
RewriteRule ^.*$
Seems effective enough for me, and it ain't tough to learn when you can find an example. Of course this does rely on the idea that filesucking programs (or email harvesting bots) identify themselves, but I think naysayers would be surprised at how many of them do just that.
Shaun
Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!
Long ago I heard of a CGI script by the name of WebPoison. It would generate a page of random text; the first set of text would be random words that all linked to differently parsed URLs right back to the same page. The second and much longer set of text was a long list of randomly generated bogus e-mail addresses. Because the recursive links were all different (and random) it would theoretically cause a spambot to contunally follow a circular path and constantly retrieve hundreds of fake e-mail addresses (thus the name Webpoison -- it poisons their list).
.cgi show up in the URL (to fool the spambots) and you'd have to have some mechanism to check that the random addresses did not use real domains. Might also use up your bandwidth as bots got stuck, but you could then use their IP to file a complaint against their ISP (and ban them from hitting your server in the future).
There were some flaws. You'd need a webserver that let you run CGI scripts without necessarily having
Sadly, I've not found any information on it recently. Perhaps someone could hack out a more efficient version of such to address potential problems and bugs.
STOP MISUSING APOSTROPHES, YOU MORONS!!!
This is incorrect. You want to use abuse@[127.0.0.1] as the address.
One big difference - MSN discriminated against valid browsers that were just people trying to view their website. The user agent IDs here (with a coupla exceptions - *cough* wget *cough*) are all things that are only ever used for spam purposes. There is a difference between blocking people because they don't use your software and blocking spam robots.