Slashdot Mirror


Stopping SpamBots With Apache

primetyme writes: "Sick of email harvesting spam robots cruising your Apache based site? Here's an in depth article that shows one way you can configure a base Apache installation to keep those nasty bots of your site - and the spam out of your Inbox." Anything that helps annoy spammers is a good thing.

3 of 55 comments (clear)

  1. Also useful for... by dpete4552 · · Score: 5, Informative

    I have been using this method for a long time, I don't know how new that article is, but I used it a long time ago to not only block all the spambots I could find, but all of the software for mirroring my webpage as well.

    Here is a longer list of common spam bots and mirror bots that I have been able to find:

    SetEnvIfNoCase User-Agent "EmailSiphon" bad_bot
    SetEnvIfNoCase User-Agent "EmailWolf" bad_bot
    SetEnvIfNoCase User-Agent "CherryPickerSE" bad_bot
    SetEnvIfNoCase User-Agent "CherryPickerElite" bad_bot
    SetEnvIfNoCase User-Agent "Crescent" bad_bot
    SetEnvIfNoCase User-Agent "EmailCollector" bad_bot
    SetEnvIfNoCase User-Agent "EmailSiphon" bad_bot
    SetEnvIfNoCase User-Agent "MCspider" bad_bot
    SetEnvIfNoCase User-Agent "bew" bad_bot
    SetEnvIfNoCase User-Agent "Deweb" bad_bot
    SetEnvIfNoCase User-Agent "FEZhead" bad_bot
    SetEnvIfNoCase User-Agent "Fetcher" bad_bot
    SetEnvIfNoCase User-Agent "Getleft" bad_bot
    SetEnvIfNoCase User-Agent "GetURL" bad_bot
    SetEnvIfNoCase User-Agent "HTTrack" bad_bot
    SetEnvIfNoCase User-Agent "IBM_Planetwide" bad_bot
    SetEnvIfNoCase User-Agent "KWebGet" bad_bot
    SetEnvIfNoCase User-Agent "Monster" bad_bot
    SetEnvIfNoCase User-Agent "Mirror" bad_bot
    SetEnvIfNoCase User-Agent "NetCarta" bad_bot
    SetEnvIfNoCase User-Agent "OpaL" bad_bot
    SetEnvIfNoCase User-Agent "PackRat" bad_bot
    SetEnvIfNoCase User-Agent "pavuk" bad_bot
    SetEnvIfNoCase User-Agent "PushSite" bad_bot
    SetEnvIfNoCase User-Agent "Rsync" bad_bot
    SetEnvIfNoCase User-Agent "Shai" bad_bot
    SetEnvIfNoCase User-Agent "Spegla" bad_bot
    SetEnvIfNoCase User-Agent "SpiderBot" bad_bot
    SetEnvIfNoCase User-Agent "SuperBot" bad_bot
    SetEnvIfNoCase User-Agent "tarspider" bad_bot
    SetEnvIfNoCase User-Agent "Templeton" bad_bot
    SetEnvIfNoCase User-Agent "WebCopy" bad_bot
    SetEnvIfNoCase User-Agent "WebFetcher" bad_bot
    SetEnvIfNoCase User-Agent "WebMiner" bad_bot
    SetEnvIfNoCase User-Agent "webvac" bad_bot
    SetEnvIfNoCase User-Agent "webwalk" bad_bot
    SetEnvIfNoCase User-Agent "w3mir" bad_bot
    SetEnvIfNoCase User-Agent "XGET" bad_bot
    SetEnvIfNoCase User-Agent "Wget" bad_bot
    SetEnvIfNoCase User-Agent "WebReaper" bad_bot
    SetEnvIfNoCase User-Agent "WUMPUS" bad_bot
    SetEnvIfNoCase User-Agent "FAST-WebCrawler" bad_bot

    --
    http://www.archive.org/details/ThePowerOfNightmares
  2. You can't win an arms race by CmdrTroll · · Score: 5, Insightful
    The premise behind this article is patently ridiculous. Spambots are voluntarily identifying themselves, and any spambot author with an ounce of common sense will simply change their user-agent string to the standard "Mozilla 4.0 (Microsoft Internet Explorer 5.5)" string that every Windows client uses. A well-designed spambot is indistinguishable from a valid user, or Google, or ht://dig.

    On the other hand, there are ways to fight spambots; they just don't rely on trusting the user. Here's one way:

    • Buy a domain.
    • Set up a cgi that generates a unique email address @ that domain for every visitor. Log the address used, the date/time of visit, the visitor's IP, and other characteristics (user-agent?) of the visitor.
    • Use the logged data to block the user when spam mail gets sent to one of the random accounts.
    • Use the logged data as evidence to present to the offender's ISP, to get their fast connection pulled.
    • Find a way to automate this on a large scale, then get a bunch of sysadmins together to sue and prosecute the spammer for abuse of resources.

    There are good ways to deal with spammers but this isn't one of them. It *might* work on a small scale and it definitely won't work on a medium or large scale. It's about as useful as the Sendmail "MX/domain validation" trick that Eric Raymond and the rest of the Sendmail team thought would stop spammers dead in its tracks. (It didn't.) Instead he was "surprised by spam."

    -CT

  3. Wget is not a spider! by Anonymous Coward · · Score: 4, Informative

    "Here are a couple of the User-Agents that fell for our trap that I pulled out of last months access_log for lists.evolt.org:

    Wget/1.6"

    Email spider, my ass! Wget is a damn useful HTTP downloader utility which is great for obtaining large files as it can resume interrupted transfers. It can also mirror web sites, which I assume is why it fell into the honeypot. Oh, and you can also change what it says it is on the command line.

    And to add my 2 cents to the email problems, one other solution I've seen is to translate email addresses into an image and drop that onto the page. It's not a fantastic solution for those still using Lynx, and you can no longer just click to send mail to somebody, but at least it doesn't go the Javascript route and should be a sufficient technical hurdle to stop automated harvesters for a couple of years at least.

    - Anonymous and happy.