White House Website Limits Iraq-Related Crawling
oscarcar writes "Dan Gillmor is reporting on the White House website's use of its robots.txt file to disable search engines from crawling certain material. Many excluded items in the robots.txt file involve mentions of Iraq, possibly to prevent people from finding changes to past statements and information when archived elsewhere."
It looks like 99% of the stuff related to Iraq is filtered out in robots.txt.
/infocus/iraq directory (which is dissallowed in robots.txt)
But not a problem, on google.com I just specify the site by saying 'Iraq site:whitehouse.gov' and it had 14,000 hits... the first one is the root of
If you try actually *loading* the directories listed in the robots.txt, they don't exist. Not one. Not by going to their index.html or trying to find them through the site navigation. While they could still be accused of deleting them, many of the links are unlikely to have existed in the first place (http://www.whitehouse.gov/president/heartland-tou r-gallery/iraq? /president/holiday/decorations/iraq? /president/tee-ball-01/iraq? ) This may be just some IT grunt running a bad script on robots.txt.
Most of the pages in the robots.txt are actually 404's and dont exist anymore. Its that simple. Keeps the robots from constantly requesting content that doesn't exist anymore. A few are blocked because they are bandwidth intensive videos and things, and some others are blocked for more mundane reasons I assume.
Seems odd and pointless to me. I'd like a statement explaining it. A lot like the "Disallow: /hidden/passwd" kind of entries.
Looks like they removed a bunch of files where they were making claims that Saddam was behind 9/11. One could be lead to suspect that now that Bush got his war his doesn't need that lie anymore, and wants to erase all history of it since it undermines his authority.
Peace, or Not?
The complaint is they've done it before - "combat operations are done" became "major combat operations are done" when the fighting didn't stop. You can check here.
Compare the screenshots of what used to be on the white house website vs what's currently on the website.
Yes, I know, "how do we know this blogger didn't alter the screenshots?" You don't.
http://www.whitehouse.gov/infocus/iraq
Not any more.
Although the current Google cache lists
[snip 22 lines]
the current robots.txt leaps from
to
Conspiracy theory over...
Referring to a website critical of him (but correct in every detail)