Slashdot Mirror


White House Website Limits Iraq-Related Crawling

oscarcar writes "Dan Gillmor is reporting on the White House website's use of its robots.txt file to disable search engines from crawling certain material. Many excluded items in the robots.txt file involve mentions of Iraq, possibly to prevent people from finding changes to past statements and information when archived elsewhere."

9 of 837 comments (clear)

  1. Everything Iraq.... by c_oflynn · · Score: 4, Informative

    It looks like 99% of the stuff related to Iraq is filtered out in robots.txt.

    But not a problem, on google.com I just specify the site by saying 'Iraq site:whitehouse.gov' and it had 14,000 hits... the first one is the root of /infocus/iraq directory (which is dissallowed in robots.txt)

    1. Re:Everything Iraq.... by mrpuffypants · · Score: 3, Informative

      Well, yes it would still be in google's search results if the GoogleBot hasn't crawled the whitehouse site since the change was made.

      Next time it crawls the site it won't read the forbidden directories and will delete them (if present) from the Google Cache, essentially erasing any official iraq history from google (and other search engines)

  2. Not conspiracy, but I don't know what it *is* eith by Have+Blue · · Score: 4, Informative

    If you try actually *loading* the directories listed in the robots.txt, they don't exist. Not one. Not by going to their index.html or trying to find them through the site navigation. While they could still be accused of deleting them, many of the links are unlikely to have existed in the first place (http://www.whitehouse.gov/president/heartland-tou r-gallery/iraq? /president/holiday/decorations/iraq? /president/tee-ball-01/iraq? ) This may be just some IT grunt running a bad script on robots.txt.

  3. Most of them are blocked because they're 404's by steveit_is · · Score: 3, Informative

    Most of the pages in the robots.txt are actually 404's and dont exist anymore. Its that simple. Keeps the robots from constantly requesting content that doesn't exist anymore. A few are blocked because they are bandwidth intensive videos and things, and some others are blocked for more mundane reasons I assume.

  4. Wayback Machine by BLuP1 · · Score: 3, Informative
    The Wayback machine does archive robots.txt, it seems like the whitehouse updates this file about every week or so. The current update happened after April 13th, 2003, and it simply took all of those references that said ".../.../.../text" and added /iraq as well.

    Seems odd and pointless to me. I'd like a statement explaining it. A lot like the "Disallow: /hidden/passwd" kind of entries.

  5. Missing Iraq and 9.11 files by jjn1056 · · Score: 5, Informative

    Looks like they removed a bunch of files where they were making claims that Saddam was behind 9/11. One could be lead to suspect that now that Bush got his war his doesn't need that lie anymore, and wants to erase all history of it since it undermines his authority.

    --
    Peace, or Not?
  6. Re:Interesting allegation... by davebo · · Score: 4, Informative

    The complaint is they've done it before - "combat operations are done" became "major combat operations are done" when the fighting didn't stop. You can check here.

    Compare the screenshots of what used to be on the white house website vs what's currently on the website.

    Yes, I know, "how do we know this blogger didn't alter the screenshots?" You don't.

  7. Someone's been busy by billybob2001 · · Score: 3, Informative
    For instance:
    http://www.whitehouse.gov/infocus/iraq/ 100days


    Not any more.

    Although the current Google cache lists

    /infocus/iraq
    /infocus/iraq/100days/iraq
    /infocus/iraq/100days/text
    [snip 22 lines]
    /infocus/iraq/photoessay/iraq
    /infocus/iraq/photoessay/text
    /infocus/iraq/text



    the current robots.txt leaps from
    /infocus/internationaltrade/text
    to
    /infocus/judicialnominees/iraq

    Conspiracy theory over...

    ...or is it?

  8. "There ought to be limits to freedom" - G.W. Bush by JimmytheGeek · · Score: 3, Informative

    Referring to a website critical of him (but correct in every detail)