Slashdot Mirror


User: loraksus

loraksus's activity in the archive.

Stories
0
Comments
2,248
First seen
Last seen
Profile
(view on slashdot.org)

Comments · 2,248

  1. Sorta OT on Cheap KVM Over IP? · · Score: 1, Offtopic

    But IF I don't get modded to hell, does anyone want to make a reccomendation on a cheap regular kvm switch?

  2. Re:M$ to Reveal Windows Source Code! on MS to Implement Some DoJ Settlement Terms Preemptively · · Score: 2

    Solitaire
    Minesweeper
    Notepad

    . . . Gasp.

  3. So... on Feds to Require Digital Receivers In All New TVs? · · Score: 2

    So why doens't the industry just stop producing televisions and produce "Analog Video Signal Viewing Appliances" :)

  4. So, I heard this a while ago, seems to fit and all on WarTalking Arrest · · Score: 2

    Good joke, it's relevant, read on. . .

    A man is flying in a hot air balloon and realises he is lost. He reduces height and spots a man down below. He lowers the balloon further and shouts:

    "Excuse me, can you tell me where I am?"

    The man below says: "yes you're in a hot air balloon, hovering 30 feet above this field."

    "You must work in I.T." says the balloonist.

    "I do" replies the man. "How did you know."

    "Well" says the balloonist, "everything you have told me is technically correct, but it's no bloody use to anyone."

    The man below says "you must work in business."

    "I do" replies the balloonist, "but how did you know?"

    "Well", says the man, "you don't know where the hell you are, or where the hell you're going, but you expect me to be able to help. You're in the same position you were before we met, but now it's my fault."

  5. Re:Burn the observatory, so this never happens aga on WarTalking Arrest · · Score: 2

    I wouldn't be suprised, the "consultants" get most of their pay from whoever made the hardware

  6. Re:Once again....security through obscurity... on WarTalking Arrest · · Score: 2

    weep for the next one, at least we sorta know how to deal with what is going on around us. . .
    I of course, have to point out that this is Texas, not that my home state isn't full of redneck hicks either.

  7. Appologies to everyone on Spy Fly · · Score: 3, Informative

    Berkely researchers are close to actually getting their models to fly, but according to a source within the university, there were still some bugs to work out.

    Actually, this is pretty damn cool, these things weigh less than 1/24th of a penny, have a wingspan of a quarter. The propulsion system on this thing is pretty interesting / amazing.

  8. Re:How low? on MPAA Requests Immunity to Commit Cyber-Crimes · · Score: 2

    Thats right. Congressclowns spend an average of $450,000 a year to get "elected" to a position that pays 150,000 a year. Hm. . .

  9. Re:no more TV for me.... on MPAA vs. Television · · Score: 2

    umm. . . fuck, lay, go to bed with etc?

    ya, ya mod me to shit, whatever
    Waiting for the lameness filter etc etc

  10. Re:I want a sledge hammer on I Believe You Have My Stapler · · Score: 1

    sweet. I'd love the II and whatever your friend wants to have someone cart away.

  11. Re:Argh on Happy Birthday Code Red · · Score: 1

    Beaverton, Oregon.
    waits
    for the
    lameness
    filter
    etc...

  12. Re:Must start on diesel on Drive a Greasecar - DIY Biodiesel · · Score: 1

    nice sig. I too like to sound profound :)

  13. Re:Argh on Happy Birthday Code Red · · Score: 2

    Heh, you worked qwest dsl support too eh?
    I kept a record of cisco 67xs toasted - 78.
    u?

  14. Re:I want a sledge hammer on I Believe You Have My Stapler · · Score: 2

    heh, you wouldn't be in portland oregon by any chance?
    waits for fucking
    lameness
    filter.
    Hoo Rah

  15. Re:Disturbing on Windows 2000 - Nine Months to Live · · Score: 2

    I suppose the small issue of money might be part of the equation here.

  16. Really, on Cable Firms Limit Users' Freedoms · · Score: 2

    If you want to host a server, just pay setup + $3.95 a month to have it professionally hosted, christ, oc48 vs 15k shouldn't take any thought. Yes, I have a server based off my home dsl, but it's kiddie crap, a simple mail server and a web server. If my ISP, verizon, decided to charge, or restrict, I'd switch in an instant.
    That said, cable (and fone) companies are cheap bastards who piss away money on stuff like sending trucks out to scan for waps, but what do you expect from an arrogant monopoly.
    Which is why I can get a dsl line in canada 1.5/768 w/3 static ips for $40 canadian and pay $60 a month for Verizon dsl down here.

  17. Re:First Criminals on UK Parliament to ban DoS Attacks · · Score: 2

    what the fuck was he doing sending 14000+ emails?

  18. History on WorldCom CFO Accused of $3.6 Billion Fraud · · Score: 2

    I saw this article on the net a while back, quite appropriate. Basically a list of the biggest screwups and questionable business practices in the last year or so. Wonderful for getting figures of exec bonuses [?] who drown the companies they work for.
    Alas, the /. lameness filter is bitching and moanin about the text file, so here is a link.

    http://www.business2.com/articles/mag/0,1640,386 04 , 0.html
    It also has a nice commentary on Balmer's monkey dance, with pictures.

  19. In other news on WorldCom CFO Accused of $3.6 Billion Fraud · · Score: 1, Offtopic

    The sun rose in the east today.

  20. Late, but kinda important on Home-Built vs. Store-Bought PCs · · Score: 2

    Especially if you ever plan to upgrade. . .
    Dell, on virtually every machine manufactured since '98,use non-standard pinouts on their atx power supplies and mobos.
    IF YOU SWITCH A DELL POWER SUPPLY WITH A "NORMAL" ATX PS, YOU WILL KILL BOTH THE BOARD AND THE PS.
    This "killing" is usually rather spectacular, dell power supply to an atx board = flames out the back of the ps.

    http://www.upgradingandrepairingpcs.com/articles /u pgrade3_01_01.asp

    http://inquirerinside.com/19040209.htm

  21. Re:You think Verizon's bad?? on Baby Bells Victorious Over Sharing Rules · · Score: 2

    sure it is. You just don't activate the voice circuit on the line. Of course qwest will lie to you and tell you (and their technicians, who "might" not know better . . ) that phone service is necessary. Truth is, there is no need for the frequencies below 20kHz, since analog voice is only ~ 300 - 4000Hz. DSL runs much higher, and the frequencies depend on whether you use CAP/QAM or DMT or the bastardized Glite on the line (probably DMT if you had DSL installed in the last 2 years)
    As long as the copper is between you and the co, you can get dsl.
    http://www.orckit.com/fr_newsa.html?/how_doe s_ads_ works.html

  22. Code Red / Nimda not a problem eh? on Viruses: More Hype than Danger? · · Score: 2

    I'm running apache on my webserver that gets almost no legitimate hits a day. I don't advertise it etc.
    My error.log file is 50 (Fifty) megs. Since January. 2002.
    Lots of entries look like this, with some variations. I also appreciate skript kiddies trying to run root.exe on my box.

    [Wed Apr 24 10:44:21 2002] [error] [client 4.35.125.66] File does not exist: *:/****/msadc/..%5c/..%5c/..%5c/..Á/..Á/..Á/win nt/system32/cmd.exe

    I'd say that the main problem is not that the virus actually does anything harmful, but that their box is broadcasting to random ip's "hack me" and that person's hdd is shared with full perms and that if a script kiddie wanted to delete all files on the lamer's machine, they probably could, theft of corporate info (i.e. if someone works at home) is also really easy.

  23. mirror on Stopping Spambots: A Spambot Trap · · Score: 0, Redundant

    looks like /. ate his website, not spambots :)

    The Problem: Spambots Ate My Website
    Spambot: (noun) - A software program that browses websites looking for email addresses, which it then "harvests" and collects into large lists. These lists are then either used directly for marketing purposes, or else sold, often in the form of CD-ROMs packed with millions of addresses. To add insult to injury, you may receive a spam email which is asking you to buy one of these lists yourself. Spambots (and spam) are a pestilence which needs to be stamped out wherever it is found.

    I have a website, http://www.crazyguyonabike.com, which has bicycle tour journals, message boards and guestbooks. I started noticing around the end of 2001 that the site was getting hit a lot by spambots. You can spot this sort of activity by looking for very rapid surfing, strange request patterns, and non-browser User-Agents.

    After looking at the server logs, I realized a couple of things: Firstly, the spambots came from many different IP addresses, so this precluded the simple option of adding the source IP to my firewall blocks list. Secondly, there seemed to be a common behavior between the bots - even if this was the first visit from a particular IP address (or even a particular network, so no chance of just being a different proxy) they would come straight into the middle of my website, at a specific page rather than the root. This means that the spambots obviously had some kind of database of pages, which had presumably been built up from previous visits, before I'd noticed the activity, and this database was being shared between a large number of different hosts, each of which was apparently running the same software.

    Another distinctive behavior was that the spambots would follow only those links which had certain keywords which would seem promising if you're looking for email addresses: "guestbook", "journal", "message", "post" and so on. On each of the pages in my site there were many other links in the navbars, but only links with these keywords were being followed. Also, robots.txt was never even being read, let alone followed. Moreover, the bot would come in, scan pages rapidly for maybe a few seconds, and then stop for a while. So it was obviously making at least some attempt to circumvent blocks based on frequency/quantity of requests.

    This was very annoying. For one thing, these things were picking off email addresses from my website (at that point, I was letting people who posted on my message boards decide for themselves whether they wanted their email addresses to be visible or not). But quite apart from that, it was taking up resources, and was just plain rude. I hate spam. I resent my webserver having to play host to people whose obvious goal is to cynically exploit the co-operative protocols of the internet to their own selfish, antisocial gain. So, I decided to do something about it.

    The first thing I did was to look at the User-Agent fields which were being used by the bots. There were a variety, including variations on the following:

    DSurf15a 01
    PSurf15a VA
    SSurf15a 11
    DBrowse 1.4b
    PBrowse 1.4b
    UJTBYFWGYA (and other strings of random capital letters)
    I searched the internet for references to these strings, but all I found was a slew of website statistics analysis logs. This meant that these particular spambots obviously got around. It was also discouraging, because there was no mention anywhere of what these things actually were. I was surprised that there seemed to be no discussion whatsoever of something that seemed to be pandemic. Then I found a couple of other websites with guestbooks that had actually been defiled by these spambots: (if you follow these links and you don't see a lot of empty messages left by the above user agents, then that means the webmaster of the site has finally found a way to stop it, so good for them...)

    http://www.virtualglasgow.com/guestbook.html
    http://www.donotenter.com/guestbook/gbook.html
    I reckon the spambots didn't really intend to leave empty messages. They just tend to want to follow links with the keyword 'post'. So if the guestbook posting form has no preview or confirmation page, then the spambot would leave a message simply by following this link! My guestbooks and message boards have a preview page, which is probably why I hadn't had any of this.

    Anyway, I started thinking about what kind of program this thing was. First of all, it comes from all kinds of different IP addresses. I couldn't quite believe that this many different IP addresses were all intentionally using the same software, of which I could find absolutely no mention anywhere on the Web. This made me think it might be some kind of virus/trojan/worm or whatever that silently installed itself on people's computers, and then used the CPU and bandwidth to surf the Web without the owner being aware of it. I thought that if this was the case, then it must be sending the results somewhere - and if we could find out where, then we could go about shutting the operation down. But I have had no luck at all in getting any help from the sysadmins at ISP's I have contacted. A typical exchange was the one with a guy at Cox internet, which was where a persistent offending IP address was sourced. He just couldn't be bothered, and eventually told me that spidering was not against the law, or their terms of service. I asked whether actions which were blatantly obviously geared toward the generation of spam were against their terms of use, but he never replied to that. I had no more luck anywhere else: Nobody had heard of this thing. I even sent an email to CERT, but no response. So, I turned instead to thinking about how I could erase these pests from my life as much as possible. This document is about my quest to stop spambots (not just this one, but ALL spambots) from abusing my website. Hopefully it will be useful to you.

    Overview of the Spambot Trap
    There are three main parts to the technique which I outline here:

    Banish visible email addresses from your websites altogether, or else obfuscate them so they can't be harvested. Examples of how to do this are given. This is your fail-safe, in case the spambots figure out a way around your other defences. Even if they manage to cruise your website on their very best behavior, they still should not be able to harvest email addresses!

    Block known spambots: Certain User-Agents are just known to be bad, so there's no reason to let them come on your site at all. True, spambots could in theory spoof the User-Agent, but the simple reality is that a lot of them don't. We use an enhanced version of the BlockAgent.pm module from the O'Reilly mod_perl book. This extension adds offending IP addresses to a MySQL (or other relational) database, which is picked up by the third part of our cunning system...

    Set a Spambot Trap, which blocks hosts based on behavior. We set a trap for spambots, which normal users with browsers and well-behaved spiders should not fall into. If the bot falls in the trap, then its IP address is quickly blocked from all further connections to the webserver.
    This works using a persistent, looping Perl script called badhosts_loop, which checks every few seconds for additions to a 'badhosts' database. This script then adds 'DENY' rules for each bad hosts to the ipchains firewall. Blocks have an expiry, which is initially set to one day. If a host falls in the trap again after the block expires, then that IP is blocked again - and the expiration time is doubled to 2 days. And so on. This algorithm ensures that the worst offenders get progressively more blocked, while one-time offenders don't stick around in our firewall rules eating up resources.

    There are various components to the Spambot Trap, including the badhosts_loop Perl script, the BlockAgent.pm module, ipchains config, MySQL database, httpd.conf, robots.txt, and your HTML files. These are all covered in the sections below.

    Banishing 'mailto:'
    The first and most urgent thing you need to do is to get email addresses off your website altogether. This means, unfortunately, banishing the venerable mailto: link. It's a real shame that perfectly good mechanisms should be removed because of abuse, but that's just the way the world is these days. You need to be defensive, and assume that the spammers will try to take advantage of your resources as much as possible.
    It's an arms race
    The important thing that you need to realize is that no matter what blocks we put in place, this game is an arms race. Eventually the spambot writers will develop smarter bots which circumvent our techniques. Therefore you want to have a failsafe, which will prevent email addresses from getting into the hands of the spambot even if all else fails. The only real way to do that is to completely remove all email address from your website.
    Contact forms
    You should replace the mailto: links with links to a special form where people can type their name, email address and message. A CGI can then deliver the email, and your email address never has to be disclosed. There are a number of different mailer scripts out there - just be careful to check for vulnerabilities which could allow malicious users to use the form to send email to third parties (i.e. spam, ironically enough) using your server. The formmail script is popular, but an earlier version had such a vulnerability (since fixed). The Embperl package has a simple MailFormTo command to send an email from a form.
    Since I have seen guestbooks out there which have been extensively defiled by spambots, I would add that you should have a preview screen on your contact forms. This will ensure that an email doesn't get fired off simply by a spambot following the 'post' or 'contact' link (which it will likely try to do).

    Alternatives to totally banishing mailto:
    There are alternatives to completely removing email addresses, but they all depend on the stupidity of the spambot, and so could be compromised by a new generation of pest. These include:

    Write out email addresses in a non-email format, e.g. instead of writing 'username@domain.com' you would write 'username at domain dot com', or something similar. It would only take some spambot with a little more intelligence to be able to scan these patterns and pick up "likely" addresses, so this strategy is a little risky. Any consistent method you choose to write out email addresses could in theory be analyzed and decoded by a savvy bot.

    Add stuff to the email address to make it invalid, but so that a human could easily know what to do to make it work. An example of this is writing 'username@_NO_SPAM_domain.com'. You need to remove the "_NO_SPAM_" part to make the email address valid. You can have some kind of explanation to make it clear what people have to do to use the address. Personally, I don't like this - you're depending on a level of sophistication on the part of your users which is risky. In my experience, there are a lot of very 'novice' level users out there, who only know how to click on a link. They don't know how to edit an email address. Heck, I've had people come to my site by typing the URL into Google, rather than the 'Location' box of their browser. Also, people don't read instructions.

    Make graphics images which contain the email address. Spambots usually don't download graphics, and even if they did, they probably couldn't decode the bits to get the text. However, they could do it in theory, since software for doing OCR (optical character recognition, getting text from scanned documents) has been around for a while. A downside to this approach is that the user has to manually copy down the email address, since it can't be cut'n'pasted. Also, you can't put a mailto: link on the image, otherwise you're back to square one. But you could put a link to a contact form, with an argument in the link telling your server internally what email address to use. For example, the link could say "contact.cgi?to=23", where '23' is some database key to the actual email address. But the downside here is that you still need to generate the image, which is a bit of a pain in the ass if you have a lot of them. You can do it automatically, if you're willing to put the work in and write the scripts. There are some very nice graphics generation packages out there on CPAN for Perl. Here's an example of an email address presented as an image:

    MySQL
    Download badhosts MySQL database dump
    We need to set up a MySQL database, where we store records of the hosts which are to be blocked. This doesn't have to be MySQL, but I use it because it's extremely fast, and very appropriate for this kind of application. You need to create a new database, called 'badhosts'. You then create a table, again called 'badhosts', with the following structure:

    Field Type Comment
    ip_address varchar(20) not null, indexed The IP address of the host to be blocked
    user_agent varchar(255) not null The HTTP User-Agent of the spambot, for reference
    expire_days int unsigned not null How many days is this block for. Doubled every time a new block has to be created for a particular IP address
    created datetime not null When this block was created
    expiry datetime not null, indexed When this block expires

    You could use the dump provided above to load directly into your database:

    shell> mysqladmin create badhosts
    shell> mysql badhosts < badhosts.dump

    That's about it! The fields which are marked as 'indexed' are the only ones which need indexes, because they are searched on to see if a particular IP address has been previously blocked, and also to see which blocks should be removed because they've expired. If you have access privilages set on your MySQL databases, then you need to allow the Apache user (usually 'nobody') access. The other script that will require access is badhosts_loop, which runs as root.
    Next, we look at the script that populates this database.

    BlockAgent.pm
    Download BlockAgent.pm
    Download bad_agents.txt
    The BlockAgent.pm Apache/mod_perl module is taken from the excellent book "Writing Apache Modules with Perl and C" by Lincoln Stein & Doug MacEachern (O'Reilly). This script basically acts as an Apache authentication module which checks the HTTP User-Agent header against a list of known bad agents. If there's a match, then a 403 'Forbidden' code is returned. The script compiles and caches a list of subroutines for doing the matches, and automatically detects when the 'bad_agents.txt' file has changed. I have found that it has no noticeable impact on the performance of the webserver. This script is useful in the case where you know for certain that a certain User-Agent is bad; there's no point in letting it go anywhere on your site, so it's a good first line of defense. We'll cover how to add this module to your website a little later, along with the rest of the configuration settings in the section on httpd.conf.
    Of course, one of the first arguments you'll see with regard to this method of blocking spambots is that it's easy to circumvent, by simply passing in a User-Agent string which is identical to the major browsers out there. This is perfectly true, but don't ask me why the spambot writers haven't done this - maybe it's a question of pride or ego, they want to see their baby out there on record in Web server logs. I honestly don't know. The main point is that at present, the User-Agent header CAN be used very effectively to block most bad agents. But, I have added more features so that we can also block agents which look ok, but behave badly by going somewhere they shouldn't - the Spambot Trap. More on that soon.

    You'll notice that the bad_agents.txt file which I have supplied here is very comprehensive. A good strategy here is probably to save the full version somewhere (perhaps as bad_agents.txt.all), and just keep the ones you actually encounter in the bad_agents.txt file. Then you keep the list shorter, and more relevant to what actually hits you. For example, my bad_agents.txt file currently has the following lines in it, because these are the spambots that I see most frequently:

    ^[A-Z]+$
    ^.Browse\s
    ^.Eval
    ^EO Browse
    ^.Surf
    ^Microsoft.URL
    ^Mozilla\/3.0.+Indy Library
    ^Zeus.*Webster

    You'll notice from this that BlockAgents.pm is very flexible, being able to take full advantage of the excellent regular expression capabilities of Perl. This means you can capture a lot of different agents with just one line. For example, the very first line catches all the variations of the agent which passes in random strings of capital letters, e.g. FHASFJDDJKHG or UYTWHJVJ. The spambot obviously thinks it's being pretty smart by looking different each time, but by using an easily identifiable pattern, it shoots itself in the foot. Hah.
    The original version of the BlockAgent.pm script is well explained in the O'Reilly book, but I've added an extra hook that checks to see whether the client is accessing any of the spambot trap directories. If it is, then we add an entry to the MySQL database (you could use another relational database if you want, as long as it's accessible from Perl DBI).

    The first time an IP address is blocked, an expiry of one day is set. If the same host subsequently comes in and falls into the trap again, then the expiry time is doubled. And so on. This way, the block gets longer and longer, in proportion to how persistently the spambot revisits our website. Once the IP address is blocked, the spambot can't even connect to our web server, since we use 'Deny' in the ipchains rule. This means that no acknowledgement is given to any packets coming in from the badhost, and as far as they know, our server has just gone away. Hopefully, after this happens for long enough, our server will be taken off the spambot's "visit" list. Another nice little side-effect of this is that the spambot will probably have to wait for a while before giving up each connection attempt. Anything that makes them waste more time is ok by me!

    BlockAgent.pm notifies the badhosts_loop script that something has happened by touching a file called /tmp/badhosts.new. The badhosts_loop file checks this file every few seconds and if it has changed then it knows that a new record's been added to the database, and it needs to re-generate the blocks list.

    The BlockAgent.pm script is our alarm system. It's what tells us that something happened. In order to act on this information, we need to be able to add rules to the ipchains firewall. We'll cover this next.

    ipchains
    Download sample ipchains config file
    The ipchains module (here's the HOWTO doc) is a very nice way of providing a good level of basic network security to your server. If you haven't already set it up (or it's successor, iptables), then you really should. It's a very easy way to configure who can and cannot have access to your machine. A good resource for learning about this is "Building Linux and OpenBSD Firewalls", by Wes Sonnenreich and Tom Yates (Wiley). This is where I learned about ipchains, and it's on their excellent explanations and examples that I based my own config file. Another is "Linux Firewalls" by Ziegler (New Riders), which seems to have a more recent 2nd edition that covers iptables too.
    The example ipchains config file given here is complete, but the bit which is most important to us is that we create a chain called 'blocks'. This is our own custom chain, which we can then add rules to. The badhosts_loop script will flush this chain and build it back up whenever a spambot falls in your trap. Once the spambot's IP address is on the blocks list, that host cannot connect to your server at all.

    Remember to restart ipchains after you've changed the config file. Next, we'll look at the script that actually adds the firewall rules.

    badhosts_loop
    Download badhosts_loop script
    You run this script in the background, as root. It has to be run as root, because only root has the ability to add rules to the firewall. The script spends most of its time sleeping. It wakes up every five seconds or so and does a quick check on /tmp/badhosts.new. If this file has been changed since the last time it looked, then it goes and re-generates the firewall blocks list with all the current (non-expired) blocks. If nothing else happens, then the script will automatically do this at least once a day, to ensure that blocks really do expire even if there is no new activity.
    You should probably add the following line to your /etc/rc.local file (or equivalent), so that the script is automatically started up on reboot:

    /path/to/badhosts_loop --loop &

    This will start the script looping in the background. The script automatically checks to see if it is already running, by attempting to lock /var/lock/badhosts_loop.lock. If the file is already locked then the script will exit with an error message. If you want to just run the script once, without looping, then just omit the '--loop' option. This can be useful for testing.
    Logging is done to /var/log/badhosts_loop.log by default. Every time the script generates the blocks list, it writes a list of all the blocks to the log. This is a good place to monitor if you're interested in what hosts are being blocked. Here's an example of the log output:

    Thu Apr 11 16:09:07 2002:
    Flushing blocks chain:
    Generating blocks list:
    Adding 68.5.99.89 (8) 2002-04-04 14:08:11 to 2002-04-12 14:08:11 DSurf15a 01
    Adding 24.234.28.85 (8) 2002-04-07 10:43:42 to 2002-04-15 10:43:42 DBrowse 1.4b

    The log shows the IP address which is being added, then (in brackets) the number of days the block is effective for (doubling each time), then the start and end dates of this block, and finally the name of the User-Agent which committed the crime. This can be useful for quickly seeing whether you need to add a new one to the bad_agents.txt file.
    This is a pretty stable script that should just sit there and chug quietly, not taking up much in the way of resources. Checking for a file being changed every five seconds is not a big deal in Unix, so you shouldn't even notice it.

    Now you have to create the trap itself - the spambot_trap directory.

    spambot_trap/ Directory
    Download gzipped tarball of sample spambot_trap directory
    View the sample directory
    You can create this directory anywhere on your server. We will create an alias the httpd.conf to access it. I put mine in /www/spambot_trap/. The point is, this doesn't have to be a real directory under your webserver directory root. If you use the <Alias> directive, then multiple websites can access the same spambot_trap directory, potentially through different aliases. You can use the sample tarball as a starting point, it has subdirectories and links which the spambots I have seen find irresistable. You should create your own image file for the unblock_email.gif file, to have a valid email address of your own.
    The spambot_trap and spambot_trap/guestbook/ directories are not used directly to spring the trap. This is because I wanted to have a warning level, a lead-in, where real users would be able to realize they are getting into dangerous waters and could then back out. You're going to be placing hard-to-click links on your web pages which lead into the real trap, and there's always a chance that a real user will accidentally click on one of these. So, some of the links will point into the warning level. I have made a GIF image which contains a warning text. Why an image? Mainly because spambots can't understand images, and I didn't want to give big clues like "WARNING!!! DO NOT ENTER" in plain text. So, the user sees the warning, the spambots don't. If the spambot proceeds into any of the subdirectories (email, contact, post, message), then the trap is sprung and the host is blocked.

    You also need to try to stop good spiders (e.g. google) from falling into the spambot trap and being blocked. To do this, we utilize the robots.txt file.

    robots.txt
    Download sample robots.txt
    This should allow good robots (such as google) to surf your site without falling into the spambot trap. Most bad spambots don't even check the robots.txt file, so this is mainly for protection of the good bots.
    You'll see that we list a bunch of directories under '/squirrel'. This could be anything; you'll set an alias later in httpd.conf. In fact, you may even want this to be dynamically generated (see later, under Embperl), so that you can quickly change the name of the spambot trap directory if the spambots adapt and start avoiding it. At present, a static setup should work just fine, however.

    Next, we need to look at the bait - links within your HTML files which lead the spambot into the trap.

    Your HTML Files
    Download sample HTML code
    Download sample transparent 1 pixel image for hiding the trap
    Here's an example of HTML with links into the spambot trap:
    <HTML>

    <BODY BGCOLOR="beige">
    <A HREF="/squirrel/guestbook/message/"></A>
    <A HREF="/squirrel/guestbook/post/"><IMG SRC="/guestbook.gif" WIDTH=1 HEIGHT=1 BORDER=0></A>

    Body of the page here

    <TABLE WIDTH=100%>
    <TR>
    <TD ALIGN=RIGHT>
    <A HREF="/squirrel/guestbook/">
    <SMALL><FONT COLOR="beige">guestbook</FONT></SMALL& gt;
    </A>
    </TD>
    </TR>
    </TABLE>

    </BODY>

    </HTML>

    Spambots tend to be stupid. You'd think they would check for empty links (which don't show up in a real browser), but they don't seem to. Sure, they may get smarter, but meantime you might as well pick the low hanging fruit. So, the very first thing in the body of your HTML should be an empty link which goes straight into the trap proper - not the warning level, but the actual trap itself. This is because there is no way for someone using a real browser to click on this link, and good spiders will ignore it anyway because it's in the robots.txt file.
    We also use a one pixel big transparent GIF (a favorite web bug technique) to anchor a link to the trap, just in case the spambot is smart enough to avoid empty links. If we put this as the very first thing in the body, then it'll be pretty hard for a real user to click on, since it's only one pixel in size. But a spambot will quite happily go there!

    Finally, there is an example of a non-graphic, text based link. This will be placed on the right side of the screen by the table, and the text will appear in the same color as the background (in this example, beige). The link does not go straight into the trap, but into the warning level, because with this one there is a bigger chance that real people could click on it accidentally. The link may be invisible, but it's still there, and someone could find it. So, they get to see a nice warning, and they should back off from there. But the spambot won't. By the way, we have the link going to /squirrel/guestbook/ rather than just /squirrel/ because some of the spambots seem to specifically follow links with certain keywords, e.g. 'guestbook', 'message', 'post', etc.

    You can sprinkle these links all around your HTML files. I put them in every single one, since I use Embperl templates which make that sort of thing very easy.

    Embperl
    Download sample dynamic robots.txt using Embperl
    Download sample dynamic HTML code using Embperl
    The point of this is to make it easier to change the spambot trap directory without having to edit a whole bunch of files. We pass an environment variable to Perl from httpd.conf (see below), which says what the trap directory is called. We then use this in Embperl to substitute into the HTML and robots.txt files at request time. Thus if we wanted to change the name of the trap from 'squirrel' to 'badger', then we only need to change httpd.conf, restart apache, and we're done. All the links in the HTML are dynamic, as is robots.txt (see the samples above).
    Now, we bring it all together in the Apache configuration file.

    httpd.conf
    Download sample httpd.conf directives
    Download sample startup.pl script (used in httpd.conf)
    You need to have mod_perl installed before you can use BlockAgent.pm. You should take a look at the sample given above, and integrate these directives into your own virtual hosts. The most important lines are:
    Alias /squirrel /www/spambot_trap
    PerlSetEnv SPAMBOT_TRAP_DIR squirrel

    You should set the 'squirrel' name to whatever you'd like for your website; you'll then access the trap using a URL something like http://www.yourdomain.com/squirrel/guestbook/messa ge. This will spring the trap. You also need to set up the BlockAgent.pm access handler:
    <Location />
    PerlAccessHandler Apache::BlockAgent
    PerlSetVar BlockAgentFile /www/conf/bad_agents.txt
    </Location>

    This ensures that all accesses to your website will go through BlockAgent.pm first. You should choose your own location for the bad_agents.txt file.
    Finally, you might want to install Embperl so that you can embed Perl into your HTML code (always executed on the server side, never seen on the client side):

    # Set EmbPerl handler for main directory
    <Directory "/www/vhosts/www.yourdomain.com/htdocs/">

    # Handle HTML files with Embperl
    <FilesMatch ".*\.html$">
    SetHandler perl-script
    PerlHandler HTML::Embperl
    Options ExecCGI
    </FilesMatch>

    # Handle robots.txt with Embperl
    <FilesMatch "^robots.txt$">
    SetHandler perl-script
    PerlHandler HTML::Embperl
    Options ExecCGI
    </FilesMatch>

    </Directory>

    That about does it. You should now have the setup which will allow you to block spambots. You'll probably be interested in monitoring what happens...
    Monitoring
    Download sample script for monitoring web server logs
    This simple script just tails the badhosts_loop log. You'll have fun (I do) seeing what comes on your site and promptly falls into the trap, and then SPLAT. No more spambot. Heh heh heh.
    Conclusions
    This setup works pretty well for me at the moment. I've no doubt there are flaws in my design, but it seems stable and is "good enough" for the time being. If you can see any improvements then I'd love to hear about them. To finish up, here's a summary of the strengths and potential weaknesses of the Spambot Trap system.
    Strengths
    Does not rely exclusively on the HTTP User-Agent header, but at the same time allows us to block agents which we know to be bad.

    Does not rely on the spambot abusing the robots.txt file. Many spambots don't even load it. But the robots.txt file will protect "good" robots from falling into the spambot trap. So, for example, googlebot will be just fine.

    The blocks happen based on behavior, rather than trusting anything the spambot tells us about itself (e.g. User-Agent). Thus we don't rely on any prior knowledge of the spambots in order to block them; an entirely new one that we've never seen before will still fall in the trap and be duly blocked.

    Once a spambot is blocked, then it cannot connect to your server again at all for the duration of the block. If it tries to connect, it won't even get a 'connection refused' error, because the firewall rule just quietly drops all the packets from the bad hosts. The ipchains firewall is very effective, and more efficient at blocking hosts than anything you could put together with Apache. So, you save on server resources. If you're wondering whether the block lists might get large, I have found that with the constant expiring of one day blocks, the active block list has never been more than about 20 IP addresses at a time, out of a list (so far) of 100 distinct hosts.

    The blocks initially expire after one day. This means that one-off offenders are quickly removed from the firewall rules. On the other hand, repeat offenders get progressively longer and longer blocks (doubled each time). This means that the more abusive a host is, the more it will be blocked. It also means that if a bot is coming in from multiple IP addresses (through a proxy), then each of the individual IP addresses will probably not go on to be blocked for too long. Thus you won't be blocking everyone in AOL. On the other hand, if you continue to get hit from the same network, then it's obviously a source of trouble and should be blocked. If it's a major network like AOL, which you really don't want to block, then you need to take the IP addresses and times of the abuse, and send it to the sysadmin at the ISP concerned. There's really not a lot else you can do. I haven't seen this in reality, though. In my experience, the spambots come in from all sorts of different IP addresses, and the ones that are very persistent over time are mostly static IPs from DSL and small ranges of IPs from cable modems. These are the people with the always-on, high bandwidth capabilities which are needed for large scale email harvesting.

    The system uses a relational database to manage the blocks, and so it is very scalable, and potentially you could share the database between multiple servers. If any one server gets a spambot, the the offending IP address can automatically also be blocked at all the other servers. Also, the fact that we don't delete expired blocks means that we can keep track of the history of the blocks, and perhaps perform analyses which would lead to more permanent ipchains blocks of entire subnets, if desired.
    Weaknesses
    It would be possible for the spambots to get wise, and start following the robots.txt file rules. Then the spambot could in theory surf your entire site (or at least the bits allowed by robots.txt) without falling into the trap. However this also means that you can control where the spambot goes, which is the whole point of robots.txt. If you want, you can allow google into one part of the site, but exclude all others. Still, you should remove all email addresses from your site as the fail-safe.

    It's possible that a spambot could come in through a proxy such as AOL, which means you'll be blocking multiple AOL IP addresses. This is not very nice, and I'm not sure what the solution is at the moment. All I can say is that it hasn't happened yet, and the worst offenders on my site all have static IPs. They seem to come in from cable and DSL connections mostly.

    I don't know how feasible this would be, but it may be possible to conduct a "denial of service" type attack on your webserver by making many requests to the spambot trap directory from different IP addresses. I think, however, that you actually need to have those IP addresses (rather than spoofing them) in order to set up a real TCP connection with the web server. I don't know how likely this is, but it comes more under the "attack" category than spambots. If someone tries this on your site, then it's definitely something that can be pursued with legal means. It's no longer just a petty annoyance, but rather a hostile action which must be chased down. Also, the motivation is totally different - the spammers don't want to do this kind of thing. They just want their email addresses. The DDOS attacks are notoriously difficult to track, but I think in the couple of years that have passed since the first ones brought down Amazon and Yahoo!, there has been some progress made. Anyhow, I just wanted to bring the idea into the light of day. If anyone has any clues about it then I'd be glad to know.

    Possible Future Enhancements
    Spot large numbers of blocks occurring on a particular subnet, and automatically consolidate blocks into a single one which blocks the entire subnet (e.g. 128.123.31.0/24).

    More interactive tools to allow removal of blocks

    Analysis tools which can tell us something about patterns of abuse from particular networks.

    If you can think of any more potential problems (or unrecognised strengths!) then I'd be happy to hear about it. I'd also like to hear about any comments on this document.

  24. The only way to fight on Privacy Policies Heading Downhill · · Score: 2

    I'm not sure how many yahoo people have started to receive phone spam, but even though I am on the oregon no-call, I've received 6 calls asking for the bullshit "name" on my yahoo account. Coincidence? Nah.
    I've reported this to the abuse list, which probably means these companies will end up paying some sort of fine, which probably means that they will be reluctant to do business with yahoo in the future. I say going after the demand is the best way to approach things, as yahoo etc, can change the user agreement pretty much at will, as is shown - is it dirty? ya - low down and fucking annoying, yup, but you did agree to the terms which include that they can change the terms at any time. Besides, the service is free, so as pissed as I am, I do have to aknowledge that they might as well make some money.

    Also, I find filtering anything with the word "unsubscribe" in it to trash works pretty well :)

    Long live banner ad filters.

  25. Mr. despair enters on GeekPAC · · Score: 1

    Not to sound negative, but I'm a pessimist by nature. $25 a year from the few geeks that contribute is not going to add up to nowhere near the total the "opposition" is able to generate. Moreover, we can not invite politicians to spend some time at our "weekend retreats" (i.e. big, expensive house), nor take them out for $500 lunches, etc.

    Great idea, I just think that if you're going to set yourself up as a lobbying group, you'd better have a lot more money than what you are going to get from donations.