Stopping SpamBots With Apache
primetyme writes: "Sick of email harvesting spam robots cruising your Apache based site? Here's an in depth article that shows one way you can configure a base Apache installation to keep those nasty bots of your site - and the spam out of your Inbox." Anything that helps annoy spammers is a good thing.
I have been using this method for a long time, I don't know how new that article is, but I used it a long time ago to not only block all the spambots I could find, but all of the software for mirroring my webpage as well.
Here is a longer list of common spam bots and mirror bots that I have been able to find:
SetEnvIfNoCase User-Agent "EmailSiphon" bad_bot
SetEnvIfNoCase User-Agent "EmailWolf" bad_bot
SetEnvIfNoCase User-Agent "CherryPickerSE" bad_bot
SetEnvIfNoCase User-Agent "CherryPickerElite" bad_bot
SetEnvIfNoCase User-Agent "Crescent" bad_bot
SetEnvIfNoCase User-Agent "EmailCollector" bad_bot
SetEnvIfNoCase User-Agent "EmailSiphon" bad_bot
SetEnvIfNoCase User-Agent "MCspider" bad_bot
SetEnvIfNoCase User-Agent "bew" bad_bot
SetEnvIfNoCase User-Agent "Deweb" bad_bot
SetEnvIfNoCase User-Agent "FEZhead" bad_bot
SetEnvIfNoCase User-Agent "Fetcher" bad_bot
SetEnvIfNoCase User-Agent "Getleft" bad_bot
SetEnvIfNoCase User-Agent "GetURL" bad_bot
SetEnvIfNoCase User-Agent "HTTrack" bad_bot
SetEnvIfNoCase User-Agent "IBM_Planetwide" bad_bot
SetEnvIfNoCase User-Agent "KWebGet" bad_bot
SetEnvIfNoCase User-Agent "Monster" bad_bot
SetEnvIfNoCase User-Agent "Mirror" bad_bot
SetEnvIfNoCase User-Agent "NetCarta" bad_bot
SetEnvIfNoCase User-Agent "OpaL" bad_bot
SetEnvIfNoCase User-Agent "PackRat" bad_bot
SetEnvIfNoCase User-Agent "pavuk" bad_bot
SetEnvIfNoCase User-Agent "PushSite" bad_bot
SetEnvIfNoCase User-Agent "Rsync" bad_bot
SetEnvIfNoCase User-Agent "Shai" bad_bot
SetEnvIfNoCase User-Agent "Spegla" bad_bot
SetEnvIfNoCase User-Agent "SpiderBot" bad_bot
SetEnvIfNoCase User-Agent "SuperBot" bad_bot
SetEnvIfNoCase User-Agent "tarspider" bad_bot
SetEnvIfNoCase User-Agent "Templeton" bad_bot
SetEnvIfNoCase User-Agent "WebCopy" bad_bot
SetEnvIfNoCase User-Agent "WebFetcher" bad_bot
SetEnvIfNoCase User-Agent "WebMiner" bad_bot
SetEnvIfNoCase User-Agent "webvac" bad_bot
SetEnvIfNoCase User-Agent "webwalk" bad_bot
SetEnvIfNoCase User-Agent "w3mir" bad_bot
SetEnvIfNoCase User-Agent "XGET" bad_bot
SetEnvIfNoCase User-Agent "Wget" bad_bot
SetEnvIfNoCase User-Agent "WebReaper" bad_bot
SetEnvIfNoCase User-Agent "WUMPUS" bad_bot
SetEnvIfNoCase User-Agent "FAST-WebCrawler" bad_bot
http://www.archive.org/details/ThePowerOfNightmares
On the other hand, there are ways to fight spambots; they just don't rely on trusting the user. Here's one way:
There are good ways to deal with spammers but this isn't one of them. It *might* work on a small scale and it definitely won't work on a medium or large scale. It's about as useful as the Sendmail "MX/domain validation" trick that Eric Raymond and the rest of the Sendmail team thought would stop spammers dead in its tracks. (It didn't.) Instead he was "surprised by spam."
-CT
"Here are a couple of the User-Agents that fell for our trap that I pulled out of last months access_log for lists.evolt.org:
Wget/1.6"
Email spider, my ass! Wget is a damn useful HTTP downloader utility which is great for obtaining large files as it can resume interrupted transfers. It can also mirror web sites, which I assume is why it fell into the honeypot. Oh, and you can also change what it says it is on the command line.
And to add my 2 cents to the email problems, one other solution I've seen is to translate email addresses into an image and drop that onto the page. It's not a fantastic solution for those still using Lynx, and you can no longer just click to send mail to somebody, but at least it doesn't go the Javascript route and should be a sufficient technical hurdle to stop automated harvesters for a couple of years at least.
- Anonymous and happy.