Slashdot Mirror


Unusual HTTP Requests For robots.txt?

Fooster asks: "I edit several (mostly) unrelated Web sites hosted on a Linux virtual hosting machine running Apache. Often in an idle moment between edits, I'll watch my logs with a 'ail -f access &'. Today, I started to get bursts of requests for robots.txt from several different major service provider IP blocks that were almost simultaneous. Some time later, I'd get another burst, with some of the requests coming from different IPs. All in all, I had over 100 times more requests for robots.txt today than ever before in one day. Unlike most search engine robots.txt requests, there was no info in the referrer field and a reverse DNS lookup did not lead me back to a search engine info provider. I found the requests to be coming from blocks owned by ISPs like Qwest, AT&T, BBN and others. A cursory examination of the literature revealed no reports of exploits based on robots.txt, so I decided to 'Ask Slashdot.' Have any other Webmasters noticed this? Am I just being paranoid? Take a look at the logs yourself, and let me know please."

1 of 17 comments (clear)

  1. Looks like IP Spoofing by scotpurl · · Score: 5

    I think someone's using you as a test case for some IP spoofing. Awful lot of .41 and .81 ending IP addresses in there, but from vastly different subnets. Looks too similar for me to beleive it's coincidence. I think the exploit works that one box sends hundreds of spoofs, then another box (somewhere else) receives the response. Some responses go to legitimate boxes (which didn't ask for the info), some to unused IP space, and one to the actual box you wanted the results to go to. The exploiter is hoping you wont' figure out which of the hundreds of requests actually went to a box you can trace back to them.

    Also, since your robots.txt file says what not to index, that's frequently the list of directories with tasty things that people would most like to hack into. Think about it. What's in your robots.txt file? Things that change too often to be listed in search engine results, or the sorts of things that you don't want out there.

    I think you're being probed. Make sure your backups are up to date, and that the box is secured. :-)