Slashdot Mirror


White House Website Limits Iraq-Related Crawling

oscarcar writes "Dan Gillmor is reporting on the White House website's use of its robots.txt file to disable search engines from crawling certain material. Many excluded items in the robots.txt file involve mentions of Iraq, possibly to prevent people from finding changes to past statements and information when archived elsewhere."

13 of 837 comments (clear)

  1. Queue somebody... by Dave2+Wickham · · Score: 4, Insightful

    Queue somebody to take a crawler (hell, even a bash script using wget) to specifically archive these pages. Hell, they could even use a user-agent which doesn't look like a bot.

    Of course, people would be less likely to trust random-Joe from the Internet than, say, The Wayback Machine, but I expect this is what will happen...

  2. Re:Oh please by phritz · · Score: 5, Insightful
    Congratulations to simoniker, poster of the most inanely paranoid comment I have ever read here on slashdot. And that's saying something.

    I have to admit, when I first read the story I thought someone was being paranoid. But you really should RTF robots.txt file before you accuse the poster of being paranoid. The disallowed files are extraordinarily specific. I really can't come up with a plausible explanation beyond simoniker's.

  3. Re:More American Cencorship by kableh · · Score: 4, Insightful

    Keep telling yourself that.

    And 70% of the people in this country STILL think that Saddam played some part in 9/11. What was your point again?

  4. Re:And your ... by SQL+Error · · Score: 4, Insightful

    Better explanation: Someone screwed up a search-and-replace in a major way. Many (most?) of those pages with "iraq" in them don't exist.

    It looks like someone blocked off parts of the site to web-crawlers; I don't know for sure why all those blah/bloo/iraq entries are in there but they sure as hell don't lead to anything.

    Censorship: 0
    Screwups: 100

  5. re: and your ... by ed.han · · Score: 4, Insightful

    what's that old saying? "never attribute to malice that which can be attributed to stupidity" or something like that?

    let's not get reactionary here, folks. it wouldn't make sense to do what's being alleged:

    1. every major journalist worth his/her salt would be all over it within hours. so it wouldn't succeed in obscuring information.

    2. it would create an incredible backlash as soon as detected. what purpose would this serve?

    ed

  6. Re:Drawing farfetched conclusions by johnnyb · · Score: 5, Insightful

    It really doesn't look like it. It looks like someone screwed up, because none of those directories appear to exist at all. I mean really, what are the chances of /firstlady/photos/2003/01/iraq actually having at some time contained real data?

    It looks like someone did a

    find . -type d|perl -e 'while(<>){print "${_}/iraq\n"; print "${_}/text\n";}' > robots.txt

    I have no idea what the purpose would be, but it seems like a funny thing to do if you were trying to hide something.

    By the way, who is going around looking at people's robots.txt files?

  7. Re:A CLASSIC QUOTE... by Selanit · · Score: 4, Insightful
    Nothing's hidden, it's all there, it's all searchable from the white house website, just not from search engines.
    Correction: it's all there, as far as we can tell. How can I be sure that the results returned by the whitehouse.gov search engine are full and complete when google and all the other search engines have been partially crippled? There's no way to verify the completeness of the results -- I just have to take their word for it. Just like I was asked to take their word about Hussein's weapons of mass destruction.

    Paranoia aside, I object to these restrictions as a matter of principle. They're making it more difficult to access publically available information. It's not classified, and it never was. I, as a citizen of the U.S.A., have a right to know what my leaders have said and done.

    Let's assume the whitehouse.gov search engine is completely honest, and faithfully returns a complete listing of all materials on the site having to do with Iraq. If that's so, then there should be no reason to disable other search engines, since their results would just confirm the internal results.

    But the restrictions are in place, meaning that someone thought there was a good reason to do so. Restricting access makes it more difficult for people to research information pertaining to Iraq on the whitehouse.gov web site. Who are the people most likely to be doing that? Answer: journalists, activists, and concerned citizens. Obviously these restrictions aren't enough by themselves to dissuade a determined researcher; but it might slow them down. And it might actually stop a diffident researcher completely.

    I'm not even going to go into scenarios where the whitehouse.gov search engine is not trustworthy, because serving up "doctored" speeches or information is highly unlikely. There are too many other archives to compare against, and it would be a major scandal if the administration was found to be altering records on its website. They'd have to be really, really dumb to do that.

    The whole thing still leaves a bad taste in my mouth, though.
  8. Re: and your ... by AllUsernamesAreGone · · Score: 4, Insightful

    "1. every major journalist worth his/her salt would be all over it within hours."

    Don't be naive. How long do you think that any mainstream journalist who made a story of this would have a job for? The answer - not long. The US media in particular, although the UK is getting as bad, is little more than a relay system for government propaganda and real, detailed, complete examination of government behaviour, with equal air time to truly dissenting opinions (how many times has Chomsky been on CNN in the past 4 months?) is out of the question. What the government does is Good and Right and Should Not Be Questioned.

    Media by the elite, serving the elite.

  9. Re:A CLASSIC QUOTE... by fermion · · Score: 4, Insightful
    The rules for transparency goes beyond merely 'not hiding' information. It is necessary to make information available from well know locations in the most convenient form practical. This, for instance, is why we have a congressional record rather than just binders of unsorted documents in a basement of some public building.

    The other rule for transparency is that all material information be made available, kept, or destroyed in accordance to public regulation and individual policy. Individual policy must be consistent and decisions must be defensible based on policy.

    The fact that people do not understand these two aspectsof transparency are what allow situations like Enron to develop. The later is what caused the destruction of Arthur Anderson. They have done nothing wrong, but they did not follow their own policy on document destruction, which made then look like at best idiots and at worst criminals.

    We may compare this to other ventures to suggest policy. The NYT does not want google to cache articles because the NYT sells those articles after a certain time. Many other companies do not want deep linking because it reduces ad revenue. A fascist government may want to insure all users enter their site from a top page to make sure all users must go through the daily propaganda. A library tries hard to not track patrons so that no is afraid of using the library. The rational of the White House is beyond me.

    The White House is not hiding documents. However, they are reducing the transparency of the government by limiting the avenues by which the public may access documents. Since the White House has stated many times that it believes in transparency, and in fact requires transparency when dealing with other governments, one can stipulate that transparency is the appropriate standard. So, until someone comes up with a policy that was developed and vetted through the normal processes used in the U.S., one has every reason to suspect nefarious motives.

    And, if I may modify a statement that conservatives like to make, if you do not like transparency, go move to Iraq.

    --
    "She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
  10. Re:country is not at war by flossie · · Score: 4, Insightful

    An honourable country would not keep people imprisoned in Guantanamo Bay without either giving them PoW status or charging them with a specific offence and giving them the right to a fair trial, including free, unhindered and unmonitored access to legal counsel.

  11. Re:Other, arguably more reasonable explanations by EinarH · · Score: 5, Insightful
    Didn't think so, not a single one that I went to is a valid URL, and I highly doubt that they were valid to begin with.
    From
    http://www.bway.net/~keith/whrobots/disdirs.html
    Some of the directories that 404 truly are empty of files. FOr instance:
    http://www.whitehouse.gov/news/timeline/iraq

    doesn't have files.

    But at least some of the files that 404 above Do have files in the directory, just not an index file. For instance:

    http://www.whitehouse.gov/infocus/iraq/100days

    does not have an index page, so just entering that URL will give a 404.

    However, the directory has the following files in it:

    http://www.whitehouse.gov/infocus/iraq/100days/100 days.pdf
    http://www.whitehouse.gov/infocus/iraq/100days/int roduction.html
    http://www.whitehouse.gov/infocus/iraq/100days/par t1.html
    http://www.whitehouse.gov/infocus/iraq/100days/par t2.html
    http://www.whitehouse.gov/infocus/iraq/100days/par t3.html
    http://www.whitehouse.gov/infocus/iraq/100days/par t4.html
    http://www.whitehouse.gov/infocus/iraq/100days/par t5.html
    http://www.whitehouse.gov/infocus/iraq/100days/par t6.html
    http://www.whitehouse.gov/infocus/iraq/100days/par t7.html
    http://www.whitehouse.gov/infocus/iraq/100days/par t8.html
    http://www.whitehouse.gov/infocus/iraq/100days/par t9.html
    http://www.whitehouse.gov/infocus/iraq/100days/par t10.html

    All those files are excluded by the directory disallow entry in robots.txt

    And, yes these files *are* relevant.
    --

    Melius mori in libertate quam vivere in servitute.

  12. Re:EXACTLY by davebo · · Score: 4, Insightful

    Nobody thinks Bush and Cheney are updating the website. Jeeze. But the folks that are running the website (and I would bet this extends down to the actual webmaster/tech guy) are political appointees who are there to make the president look good. That is their job. Their actions are all filtered through this political role.

    Let's present an alternate scenario - since you have no evidence for yours, I don't have to present any evidence for mine.

    It's May - Pres. makes his speech on the Carrier, the assumption by those-in-charge are that Chalabi's government will have control of the country within a couple of weeks and the US troops will be heading on home. The web folks (who want to make B & C look good) declare "combat's done! the troops are coming home! re-elect Bush!"

    A few months later, that rosy scenario hasn't quite panned out. The aircraft carrier speech is becoming a liability for Bush - people started counting the number of dead troops in Iraq since he gave the speech, and it keeps going up. The web folks (who want to make B & C look good) say to themselves "this is a potential embarrassment to the president - let's see how we can make it less embarrassing."

    And there you have it.

  13. Re:And your ... by Darby · · Score: 4, Insightful

    Well terrorists have been attacking us since we have been in Iraq till this point in time, but i guess that doesnt mean there is any link..... naaaah

    Native people fighting against an occupying force are known as freedom fighters, not terrorists.

    ry again sparky.