Slashdot Mirror


Stopping SpamBots With Apache

primetyme writes: "Sick of email harvesting spam robots cruising your Apache based site? Here's an in depth article that shows one way you can configure a base Apache installation to keep those nasty bots of your site - and the spam out of your Inbox." Anything that helps annoy spammers is a good thing.

2 of 55 comments (clear)

  1. Also useful for... by dpete4552 · · Score: 5, Informative

    I have been using this method for a long time, I don't know how new that article is, but I used it a long time ago to not only block all the spambots I could find, but all of the software for mirroring my webpage as well.

    Here is a longer list of common spam bots and mirror bots that I have been able to find:

    SetEnvIfNoCase User-Agent "EmailSiphon" bad_bot
    SetEnvIfNoCase User-Agent "EmailWolf" bad_bot
    SetEnvIfNoCase User-Agent "CherryPickerSE" bad_bot
    SetEnvIfNoCase User-Agent "CherryPickerElite" bad_bot
    SetEnvIfNoCase User-Agent "Crescent" bad_bot
    SetEnvIfNoCase User-Agent "EmailCollector" bad_bot
    SetEnvIfNoCase User-Agent "EmailSiphon" bad_bot
    SetEnvIfNoCase User-Agent "MCspider" bad_bot
    SetEnvIfNoCase User-Agent "bew" bad_bot
    SetEnvIfNoCase User-Agent "Deweb" bad_bot
    SetEnvIfNoCase User-Agent "FEZhead" bad_bot
    SetEnvIfNoCase User-Agent "Fetcher" bad_bot
    SetEnvIfNoCase User-Agent "Getleft" bad_bot
    SetEnvIfNoCase User-Agent "GetURL" bad_bot
    SetEnvIfNoCase User-Agent "HTTrack" bad_bot
    SetEnvIfNoCase User-Agent "IBM_Planetwide" bad_bot
    SetEnvIfNoCase User-Agent "KWebGet" bad_bot
    SetEnvIfNoCase User-Agent "Monster" bad_bot
    SetEnvIfNoCase User-Agent "Mirror" bad_bot
    SetEnvIfNoCase User-Agent "NetCarta" bad_bot
    SetEnvIfNoCase User-Agent "OpaL" bad_bot
    SetEnvIfNoCase User-Agent "PackRat" bad_bot
    SetEnvIfNoCase User-Agent "pavuk" bad_bot
    SetEnvIfNoCase User-Agent "PushSite" bad_bot
    SetEnvIfNoCase User-Agent "Rsync" bad_bot
    SetEnvIfNoCase User-Agent "Shai" bad_bot
    SetEnvIfNoCase User-Agent "Spegla" bad_bot
    SetEnvIfNoCase User-Agent "SpiderBot" bad_bot
    SetEnvIfNoCase User-Agent "SuperBot" bad_bot
    SetEnvIfNoCase User-Agent "tarspider" bad_bot
    SetEnvIfNoCase User-Agent "Templeton" bad_bot
    SetEnvIfNoCase User-Agent "WebCopy" bad_bot
    SetEnvIfNoCase User-Agent "WebFetcher" bad_bot
    SetEnvIfNoCase User-Agent "WebMiner" bad_bot
    SetEnvIfNoCase User-Agent "webvac" bad_bot
    SetEnvIfNoCase User-Agent "webwalk" bad_bot
    SetEnvIfNoCase User-Agent "w3mir" bad_bot
    SetEnvIfNoCase User-Agent "XGET" bad_bot
    SetEnvIfNoCase User-Agent "Wget" bad_bot
    SetEnvIfNoCase User-Agent "WebReaper" bad_bot
    SetEnvIfNoCase User-Agent "WUMPUS" bad_bot
    SetEnvIfNoCase User-Agent "FAST-WebCrawler" bad_bot

    --
    http://www.archive.org/details/ThePowerOfNightmares
  2. You can't win an arms race by CmdrTroll · · Score: 5, Insightful
    The premise behind this article is patently ridiculous. Spambots are voluntarily identifying themselves, and any spambot author with an ounce of common sense will simply change their user-agent string to the standard "Mozilla 4.0 (Microsoft Internet Explorer 5.5)" string that every Windows client uses. A well-designed spambot is indistinguishable from a valid user, or Google, or ht://dig.

    On the other hand, there are ways to fight spambots; they just don't rely on trusting the user. Here's one way:

    • Buy a domain.
    • Set up a cgi that generates a unique email address @ that domain for every visitor. Log the address used, the date/time of visit, the visitor's IP, and other characteristics (user-agent?) of the visitor.
    • Use the logged data to block the user when spam mail gets sent to one of the random accounts.
    • Use the logged data as evidence to present to the offender's ISP, to get their fast connection pulled.
    • Find a way to automate this on a large scale, then get a bunch of sysadmins together to sue and prosecute the spammer for abuse of resources.

    There are good ways to deal with spammers but this isn't one of them. It *might* work on a small scale and it definitely won't work on a medium or large scale. It's about as useful as the Sendmail "MX/domain validation" trick that Eric Raymond and the rest of the Sendmail team thought would stop spammers dead in its tracks. (It didn't.) Instead he was "surprised by spam."

    -CT