White House Website Limits Iraq-Related Crawling
oscarcar writes "Dan Gillmor is reporting on the White House website's use of its robots.txt file to disable search engines from crawling certain material. Many excluded items in the robots.txt file involve mentions of Iraq, possibly to prevent people from finding changes to past statements and information when archived elsewhere."
I have to admit, when I first read the story I thought someone was being paranoid. But you really should RTF robots.txt file before you accuse the poster of being paranoid. The disallowed files are extraordinarily specific. I really can't come up with a plausible explanation beyond simoniker's.
It really doesn't look like it. It looks like someone screwed up, because none of those directories appear to exist at all. I mean really, what are the chances of /firstlady/photos/2003/01/iraq actually having at some time contained real data?
It looks like someone did a
find . -type d|perl -e 'while(<>){print "${_}/iraq\n"; print "${_}/text\n";}' > robots.txt
I have no idea what the purpose would be, but it seems like a funny thing to do if you were trying to hide something.
By the way, who is going around looking at people's robots.txt files?
Engineering and the Ultimate
http://www.bway.net/~keith/whrobots/disdirs.html And, yes these files *are* relevant.
Melius mori in libertate quam vivere in servitute.