Stopping SpamBots With Apache
primetyme writes: "Sick of email harvesting spam robots cruising your Apache based site? Here's an in depth article that shows one way you can configure a base Apache installation to keep those nasty bots of your site - and the spam out of your Inbox." Anything that helps annoy spammers is a good thing.
is to give your site a terrible color scheme, like purple & brown. ;-)
If you celebrate Xmas, befriend me (538
I'd say add at least the following email adresses to your webpage and strike back that way (somehow):
president@whitehouse.gov, abuse@127.0.0.1, some MAP adress.
is to not install Apache at all. Instead, throw a year or two-old copy of Microsoft IIS and watch the virii propagate. You won't have enough bandwidth or enough minutes of up-time to be able to serve pages with email addresses on them ;-)
If you celebrate Xmas, befriend me (538
First it was the hack to reboot systems asking for your default.ida file. Now it is code to trap and kill spiders...
What is an apache admin to do, it is so configurable there doesn't appear to be anything that it can't do. What is next using apache to brew my morning coffee (well there is the coffee pot cam - anyone know what webserver it ran on) write my website for me, solve world hunger ???
WHY WHY WHY do people run IIS anyway, I would love to see what it would take to do this with IIS, any takers ?
Checking the user agent won't work for long - how hard will it be for the spammers to change the user agent to "Mozilla..."
Using some client side Javascript would be harder for them to deal with (although if your browser can view it they will be able to also).
I guess graphics would be next...
Surfing slowly, in the Bandwidth Ghetto
I have been using this method for a long time, I don't know how new that article is, but I used it a long time ago to not only block all the spambots I could find, but all of the software for mirroring my webpage as well.
Here is a longer list of common spam bots and mirror bots that I have been able to find:
SetEnvIfNoCase User-Agent "EmailSiphon" bad_bot
SetEnvIfNoCase User-Agent "EmailWolf" bad_bot
SetEnvIfNoCase User-Agent "CherryPickerSE" bad_bot
SetEnvIfNoCase User-Agent "CherryPickerElite" bad_bot
SetEnvIfNoCase User-Agent "Crescent" bad_bot
SetEnvIfNoCase User-Agent "EmailCollector" bad_bot
SetEnvIfNoCase User-Agent "EmailSiphon" bad_bot
SetEnvIfNoCase User-Agent "MCspider" bad_bot
SetEnvIfNoCase User-Agent "bew" bad_bot
SetEnvIfNoCase User-Agent "Deweb" bad_bot
SetEnvIfNoCase User-Agent "FEZhead" bad_bot
SetEnvIfNoCase User-Agent "Fetcher" bad_bot
SetEnvIfNoCase User-Agent "Getleft" bad_bot
SetEnvIfNoCase User-Agent "GetURL" bad_bot
SetEnvIfNoCase User-Agent "HTTrack" bad_bot
SetEnvIfNoCase User-Agent "IBM_Planetwide" bad_bot
SetEnvIfNoCase User-Agent "KWebGet" bad_bot
SetEnvIfNoCase User-Agent "Monster" bad_bot
SetEnvIfNoCase User-Agent "Mirror" bad_bot
SetEnvIfNoCase User-Agent "NetCarta" bad_bot
SetEnvIfNoCase User-Agent "OpaL" bad_bot
SetEnvIfNoCase User-Agent "PackRat" bad_bot
SetEnvIfNoCase User-Agent "pavuk" bad_bot
SetEnvIfNoCase User-Agent "PushSite" bad_bot
SetEnvIfNoCase User-Agent "Rsync" bad_bot
SetEnvIfNoCase User-Agent "Shai" bad_bot
SetEnvIfNoCase User-Agent "Spegla" bad_bot
SetEnvIfNoCase User-Agent "SpiderBot" bad_bot
SetEnvIfNoCase User-Agent "SuperBot" bad_bot
SetEnvIfNoCase User-Agent "tarspider" bad_bot
SetEnvIfNoCase User-Agent "Templeton" bad_bot
SetEnvIfNoCase User-Agent "WebCopy" bad_bot
SetEnvIfNoCase User-Agent "WebFetcher" bad_bot
SetEnvIfNoCase User-Agent "WebMiner" bad_bot
SetEnvIfNoCase User-Agent "webvac" bad_bot
SetEnvIfNoCase User-Agent "webwalk" bad_bot
SetEnvIfNoCase User-Agent "w3mir" bad_bot
SetEnvIfNoCase User-Agent "XGET" bad_bot
SetEnvIfNoCase User-Agent "Wget" bad_bot
SetEnvIfNoCase User-Agent "WebReaper" bad_bot
SetEnvIfNoCase User-Agent "WUMPUS" bad_bot
SetEnvIfNoCase User-Agent "FAST-WebCrawler" bad_bot
http://www.archive.org/details/ThePowerOfNightmares
On the other hand, there are ways to fight spambots; they just don't rely on trusting the user. Here's one way:
There are good ways to deal with spammers but this isn't one of them. It *might* work on a small scale and it definitely won't work on a medium or large scale. It's about as useful as the Sendmail "MX/domain validation" trick that Eric Raymond and the rest of the Sendmail team thought would stop spammers dead in its tracks. (It didn't.) Instead he was "surprised by spam."
-CT
I am just wondering where the hell these god-awful ugly color schemes for some of the sections come from. Shit-brown and purple don't mix.
"Here are a couple of the User-Agents that fell for our trap that I pulled out of last months access_log for lists.evolt.org:
Wget/1.6"
Email spider, my ass! Wget is a damn useful HTTP downloader utility which is great for obtaining large files as it can resume interrupted transfers. It can also mirror web sites, which I assume is why it fell into the honeypot. Oh, and you can also change what it says it is on the command line.
And to add my 2 cents to the email problems, one other solution I've seen is to translate email addresses into an image and drop that onto the page. It's not a fantastic solution for those still using Lynx, and you can no longer just click to send mail to somebody, but at least it doesn't go the Javascript route and should be a sufficient technical hurdle to stop automated harvesters for a couple of years at least.
- Anonymous and happy.
I found this article to lack in depth. Using an identifier which can be easily changed by the spammer is plain silly. How did this article get posted? i have written better haiku than this!! jeesus.
Thank ghod the article only mentioned wget 1.6 as a spambot, I'm running 1.5.3, which doesn't have the --evil-bastard or --potted-meat options.
Click here if you just like to click on shit.
Is it just me, or do the colors on this article just look like shit? Is the mixture of piss-yellow and dark purple actually pleasing to some geek's eyes? Please get rid of these ass-ugly color schemes, Taco!
just because you dont know how to use anything but msft. products doesn't mean you have to make pro-linun/apache comments as trolls.
I used the tip from the article and put /email-addresses/
.htaccess:
.htaccess to web user's group. This will provide me a list of unique ip's in my .htaccess.
Disallow:
in my robots.txt then in my
ForceType application/x-httpd-php
and in email-addresses:
and chgrp'd
I do selective agent blocking using mod_rewrite directives in .htaccess files. The article claims that mod_rewrite is difficult to learn, but I disagree, and its major advantage is especially visible in shared/virtual hosting environments. If Apache was compiled with mod_rewrite support, anyone on the system can create their own set of agent filters and place them in an .htaccess file. You don't need access to httpd.conf!
/nofilesucking.php [L]
The syntax is simple,
#Send filesucking programs to hell
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^FlashGet.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline Explorer.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^wget.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar.* [NC]
RewriteRule ^.*$
Seems effective enough for me, and it ain't tough to learn when you can find an example. Of course this does rely on the idea that filesucking programs (or email harvesting bots) identify themselves, but I think naysayers would be surprised at how many of them do just that.
Shaun
Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!
I've had a spambot-trap on my web site for over a year, and while I've had around 10,000 page views each day during that time, I've never gotten one single spam to the email addresses featured in the trapped space.
Or does this mean that the spam bots are sufficiently sophisticated that they recognize my trap for what it is? It's meant to be obvious to humans.
Long ago I heard of a CGI script by the name of WebPoison. It would generate a page of random text; the first set of text would be random words that all linked to differently parsed URLs right back to the same page. The second and much longer set of text was a long list of randomly generated bogus e-mail addresses. Because the recursive links were all different (and random) it would theoretically cause a spambot to contunally follow a circular path and constantly retrieve hundreds of fake e-mail addresses (thus the name Webpoison -- it poisons their list).
.cgi show up in the URL (to fool the spambots) and you'd have to have some mechanism to check that the random addresses did not use real domains. Might also use up your bandwidth as bots got stuck, but you could then use their IP to file a complaint against their ISP (and ban them from hitting your server in the future).
There were some flaws. You'd need a webserver that let you run CGI scripts without necessarily having
Sadly, I've not found any information on it recently. Perhaps someone could hack out a more efficient version of such to address potential problems and bugs.
STOP MISUSING APOSTROPHES, YOU MORONS!!!
Does anyone know how to have the webserver return a constantly running stream of garbage?
/dev/random data and delivering it to some hacker that had hacked into his system (instead of what the hacker was trying to syphon off.) This would keep the connection open, and if enough people implemented it, would seriously limit their through put.
I had heard of a guy taking chargen or
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
Most spambots don't id themselves. A few do but most don't thouse that do won't for long if this info gets acted on.
What does work is building a nice static list of email addresses and names. Link to another page and have it full of the same info. Do this on serveral virtual servers and make sure the web bots can find it.
You can also be nice to the real search engines and tell them not to visit you spam traps and since robots.txt is offten used by the spam bots, telling google not to search that page works out good for both sides of the spider wars.
The next thing is to lock down your mail program once it detects any of the spam traps. There are serveral good ways of doing this based on how you pay for bandwidth. Two of the best options are either play dead with the connection or return a "user mailbox is full". Both of these tie up resources on the spamers end. The other choice is reject 99.99% of the mail and hope they pull your domain out of their lists for being full of junk.
I run @abnormal.com which tends to sort near the top, has lots of bougus addresses and has been running spam traps for years. Everyday I get hit by spamers that have sorted addresses.
One thing to keep in mind is that most bots are run by people only selling lists, not the spamers. Because of that there is no direct link between the searching bots and the mail host that spams latter.
I wonder if its its time to make a RBL like thing that is just for poisoned addresses.
First of all, wget is not a spambot. It is a
non-interactive HTTP/FTP downloading utility
with tons of features. Don't let Stallman hear
you call wget a spammers' tool!
Second,
whatever_commands | sort -u is the right way.
uniq(1) cannot unique unsorted lists.
Yet here's an article that advocates doing exactly the same thing, except characterized as saving the world from spammers, and it's OK?
If you're running apache, you could have your web site display e-mail addresses as graphics. You could have it match the same fonts your site is using, so it would look like normal test but a spide couldn't read it.
Sample PHP Script
"Prepare for the worst - hope for the best."