Slashdot Mirror


Unusual HTTP Requests For robots.txt?

Fooster asks: "I edit several (mostly) unrelated Web sites hosted on a Linux virtual hosting machine running Apache. Often in an idle moment between edits, I'll watch my logs with a 'ail -f access &'. Today, I started to get bursts of requests for robots.txt from several different major service provider IP blocks that were almost simultaneous. Some time later, I'd get another burst, with some of the requests coming from different IPs. All in all, I had over 100 times more requests for robots.txt today than ever before in one day. Unlike most search engine robots.txt requests, there was no info in the referrer field and a reverse DNS lookup did not lead me back to a search engine info provider. I found the requests to be coming from blocks owned by ISPs like Qwest, AT&T, BBN and others. A cursory examination of the literature revealed no reports of exploits based on robots.txt, so I decided to 'Ask Slashdot.' Have any other Webmasters noticed this? Am I just being paranoid? Take a look at the logs yourself, and let me know please."

5 of 17 comments (clear)

  1. Some days ago I suffer from the same by overlord · · Score: 3

    Some days ago I have the following logs:

    206.229.153.121 - - [19/Sep/2000:15:14:01 -0300] "GET /robots.txt" 200 37 "-" "-"
    206.64.105.121 - - [19/Sep/2000:15:14:01 -0300] "GET /robots.txt" 200 37 "-" "-"
    206.98.113.121 - - [19/Sep/2000:15:14:01 -0300] "GET /robots.txt" 200 37 "-" "-"
    208.47.242.121 - - [19/Sep/2000:15:14:01 -0300] "GET /robots.txt" 200 37 "-" "-"
    208.47.242.121 - - [19/Sep/2000:15:14:01 -0300] "GET /robots.txt" 200 37 "-" "-"
    12.27.166.121 - - [19/Sep/2000:15:14:01 -0300] "GET /robots.txt" 200 37 "-" "-"
    route.ocy.pnap.net - - [19/Sep/2000:15:14:05 -0300] "GET /robots.txt" 200 37 "-" "-"
    route.ocy.pnap.net - - [19/Sep/2000:15:14:05 -0300] "GET /robots.txt" 200 37 "-" "-"
    207.86.73.121 - - [19/Sep/2000:15:14:08 -0300] "GET /robots.txt" 200 37 "-" "-"
    4.20.90.121 - - [19/Sep/2000:15:14:17 -0300] "GET /robots.txt" 200 37 "-" "-"

    Seems to be pretty similar.
    Basically it was repeted every hour.

    a test for a DOS ?

    Bye

    OverLord

  2. IE "Make Available Offline" by whydna · · Score: 3

    I realize a large majority of the audience avoids products like MSIE... but I believe that that's the source of the problem...

    When a user bookmarks a page, they age given an option to "Make Available Offline" which, if selected, pops up some configuration dialog boxes (where they get to choose how many layers deep, etc). It essentially grabs all the code, graphics, etc. and saves it locally.

    Personally, I use this function when I don't know if the content is likely to be around for a while. As it is processing, it shows that it is grabbing all sorts of robots.txt files from all over the damned place (especially if it follows a number of links deep).

    It's not the brightest of MS's "wizards", so i probably keeps requesting the same one repeatitively when links follow to the same server. Try to check what the HTTP_USER_AGENT
    says about that robots.txt file.

    If your logs can't tell you, Make php process .txt files (in you Apache settings of via a .htaccess file) and run a little script in your robots.txt file that'll log the HTTP_USER_AGENT
    to a db or text file, etc.

    The HTTP_USER_AGENT /should/ be in blocks of the same type (more or less)

    -Andy

  3. Most likely by Dast · · Score: 3

    it is looking for some insecure cgi type package (search bugtraq for the many possibilities) that puts something in robots.txt. Whatever it puts in there could be used to identify whether the package is installed on the server, letting the cracker know the box is can be compromised.

    Better double check your security.

    --

    This sig is false.

  4. incident list by po_boy · · Score: 3
    I personally don't believe this is a security related incident, but if you do, you may want to take this up on the incidents list at INCIDENTS (at) SECURITYFOCUS.COM. Head over to securityfocus.com and check out the list. It's like BUGTRAQ, but for reporting/discussing incidents.

    Hope it helps.

  5. Looks like IP Spoofing by scotpurl · · Score: 5

    I think someone's using you as a test case for some IP spoofing. Awful lot of .41 and .81 ending IP addresses in there, but from vastly different subnets. Looks too similar for me to beleive it's coincidence. I think the exploit works that one box sends hundreds of spoofs, then another box (somewhere else) receives the response. Some responses go to legitimate boxes (which didn't ask for the info), some to unused IP space, and one to the actual box you wanted the results to go to. The exploiter is hoping you wont' figure out which of the hundreds of requests actually went to a box you can trace back to them.

    Also, since your robots.txt file says what not to index, that's frequently the list of directories with tasty things that people would most like to hack into. Think about it. What's in your robots.txt file? Things that change too often to be listed in search engine results, or the sorts of things that you don't want out there.

    I think you're being probed. Make sure your backups are up to date, and that the box is secured. :-)