White House Website Limits Iraq-Related Crawling
oscarcar writes "Dan Gillmor is reporting on the White House website's use of its robots.txt file to disable search engines from crawling certain material. Many excluded items in the robots.txt file involve mentions of Iraq, possibly to prevent people from finding changes to past statements and information when archived elsewhere."
Queue somebody to take a crawler (hell, even a bash script using wget) to specifically archive these pages. Hell, they could even use a user-agent which doesn't look like a bot.
Of course, people would be less likely to trust random-Joe from the Internet than, say, The Wayback Machine, but I expect this is what will happen...
If this was some crazy government conspiracy and they were trying to hide the information, why would they put it on their website? Could be any number of reasons they have done this perhaps they were getting loads of hits from google about iraq related things but if anyone really wants the information surely they can just visit it.
--
On Slashdot I'm a lawyer.
American people should have some say in a situation like went on in Iraq.
They do, it's called voting, not to mention public opinion polls, which were near 70% for the invasion when the US invaded.
Slashdot "libertarians": Small government for me, big government for those I disagree with. -1, I disagree with you
Nothing's hidden, it's all there, it's all searchable from the white house website, just not from search engines.
I have to admit, when I first read the story I thought someone was being paranoid. But you really should RTF robots.txt file before you accuse the poster of being paranoid. The disallowed files are extraordinarily specific. I really can't come up with a plausible explanation beyond simoniker's.
Winston's greatest pleasure in life was in his work. Most of it was a tedious routine, but included in it there were also jobs so difficult and intricate that you could lose yourself in them as in the depths of a mathematical problem -- delicate pieces of forgery in which you had nothing to guide you except your knowledge of the principles of Ingsoc and your estimate of what the Party wanted you to say. Winston was good at this kind of thing. On occasion he had even been entrusted with the rectification of the Times leading articles, which were written entirely in Newspeak. He unrolled the message that he had set aside earlier. It ran:
times 3.12.83 reporting bb dayorder doubleplusungood refs unpersons rewrite fullwise upsub antefiling
In Oldspeak (or standard English) this might be rendered:
The reporting of Big Brother's Order for the Day in the Times of December 3rd 1983 is extremely unsatisfactory and makes references to non-existent persons. Rewrite it in full and submit your draft to higher authority before filing.
<a href="http://www.joblessjimmy.com">Work is dumb and so is Jobless Jimmy.</a>
It could be something innocent but really, why would anyone want to keep search engines out of a publicly funded website? People have been accusing the poster of "baseless accusations" but the guy does have a point. I've seen a couple of GW's speeches and afterwards the transcripts of those speeches and noted that gramatical errors were corrected. While this is only a minor offence in editing history it does make you wonder what other opinions and information may have appeared and then later have been edited. Seriously, these are our government officials here, we deserve to have an unedited record of what they say and to hold them to it. A little bit of speculation on the reasons for excluding various terms is far from paranoia.
Chris
But rather than preventing the search of this information, why not mark it as such? In fact, I'll bet it's already dated per page.
I agree that this is yes another step in the misinformation campaign surrounding the current administration. The policies that we've heard flip through hoops like trained seals. There's just no logic to all the reversals of focus, the "misquotes" and the public snafus we've seen happen. This is just another one of them.
It is rather emberassing to find another view of your own opinions in google cache... lol...
You've been on Slashdot this long and you still haven't learned that nobody actually reads the dates on articles to see if they're current?
Nosirree, no legitimate webmaster would ever use robots.txt to gently guide visiting bots to the appropriate parts of the site and to keep them from trying to do silly things. The only possible use is to trample your rights while installing the new corporate-owned government.
Geez, people. Honestly.
Dewey, what part of this looks like authorities should be involved?
Keep telling yourself that.
And 70% of the people in this country STILL think that Saddam played some part in 9/11. What was your point again?
The majority of American People did not vote for this administration. The American People, my friend elected Al Gore. This administration was put in place by the Supreme Court. Has your brain been washed so quickly you have already forgotten? Wake up people these guys don't give a shit about you or anyone you know unless they have a net worth greater than 10 million. Look at the facts, overall our economy is in the toilet with the vast majority of citizens considerably worse off than they were 4 years ago. Of course, the extremely rich are doing kust fine, getting extremely richer.
every time a republican dies a queer angel gets his wings
Implementing a system where when people stumble onto out-of-date materials on your site, they get a notice saying "This material is out of date. Follow this link for a more current page." involves nontrivial programming changes, careful thought, an architecture for tracking which pages currently reflect the present on each issue, and a careful and continnuous evaluation of your site for which materials no longer reflect the current state of things. It would be extremely useful and neat, but also require, you know, actual work.
./*" and 20 minutes. Which, as the story link notes, appears to have been exactly how this was done.
Implementing a system where out-of-date materials are in robots.txt, thus decreasing the possibility people will accidentally stumble onto them, requires an intern, a perl script, "find
Irritable, left-wing and possibly humorous bumper stickers and t-shirts
Why should a government-authored site (which, under the Constitution, by definition is public domain text) be exlcuded from non-government electronic publishing sites?
By the way, show me where in that Robots.txt file there's a command that would block http://www.whitehouse.gov/holiday/2002/art/01.html from Google? If you're right, there should be a line
disallow /holiday/2002/art/ . I don't see one. So, yeah, it's explicitly Iraq-related stuff that they're trying to block. Either 1. they're afraid that sensitive information might end up on the site by accident and want to make sure that it isn't archived if it is - in which case, they've got a lot more serious problems than political connivance - or 2. the theory is correct, and they're trying to set up a memory hole. Given Karl Rove's history, which do YOU think it is?
I honestly think this is stuff that goes on beneath GWB's notice. I'm with Molly Ivins on him: he's not evil, mean, or stupid, just wrong.
Better explanation: Someone screwed up a search-and-replace in a major way. Many (most?) of those pages with "iraq" in them don't exist.
It looks like someone blocked off parts of the site to web-crawlers; I don't know for sure why all those blah/bloo/iraq entries are in there but they sure as hell don't lead to anything.
Censorship: 0
Screwups: 100
There hasn't been a real declared war since WWII. You can't "declare war on terrorists" and be done with it either, wars are supposed to be declared on countries when you go to fight them. It was what an honorable nation would do before hostilities.
what's that old saying? "never attribute to malice that which can be attributed to stupidity" or something like that?
let's not get reactionary here, folks. it wouldn't make sense to do what's being alleged:
1. every major journalist worth his/her salt would be all over it within hours. so it wouldn't succeed in obscuring information.
2. it would create an incredible backlash as soon as detected. what purpose would this serve?
ed
# robots.txt for http://www.ingsoc.gov/
/cgi-bin /search /query.html /help /appointments/eurasia /appointments/eastasia /ask/images/eurasia /ask/images/eastasia /deptofhomeland/analysis/eurasia /deptofhomeland/analysis/eastasia /deptofhomeland/eurasia /deptofhomeland/eastasia /economy/eurasia /economy/eastasia /goodbye/eurasia /goodbye/eastasia /government/handbook/eurasia /government/handbook/eastasia /government/images/eurasia /government/images/eastasia /government/eurasia /government/eastasia
User-agent: *
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
And now, an offering for the lameness filter...
Oceania was at war with Eastasia: Oceania has always been at war with Eastasia. A large part of the political literature of five years was now completely obsolete. Reports and records of all kinds, newspapers, books, pamphlets, films, sound tracks, photographs- all had to be rectified at lightning speed. Although no directive was ever issued, it was known that the chiefs of the Department intended that within one week no reference to the war with Eurasia, or the alliance with Eastasia, should remain in existence anywhere. The work was overwhelming, all the more so because the processes that it involved could not be called by their true names. Everyone in the Records Department worked eighteen hours in the twenty-four, with two three-hour snatches of sleep. Mattresses were brought up from the cellars and pitched all over the corridors; meals consisted of sandwiches and Victory Coffee wheeled round on trolleys by attendants from the canteen. Each time that Winston broke off for one of his spells of sleep he tried to leave his desk clear of work, and each time that he crawled back sticky-eyed and aching, it was to find that another shower of paper cylinders had covered the desk like a snowdrift, half burying the speakwrite and overflowing onto the floor, so that the first job was always to stack them into a neat-enough pile to give him room to work. What was worst of all was that the work was by no means purely mechanical. Often it was enough merely to substitute one name for another, but any detailed report of events demanded care and imagination. Even the geographical knowledge that one needed in transferring the war from one part of the world to another was considerable.
This was written in 1948. Things have really progressed!
There could be 10 lines in that whole file designed to prevent pages being archived, and the rest are garbage thrown in for confusion/as bad-robot honeypots.
It really doesn't look like it. It looks like someone screwed up, because none of those directories appear to exist at all. I mean really, what are the chances of /firstlady/photos/2003/01/iraq actually having at some time contained real data?
It looks like someone did a
find . -type d|perl -e 'while(<>){print "${_}/iraq\n"; print "${_}/text\n";}' > robots.txt
I have no idea what the purpose would be, but it seems like a funny thing to do if you were trying to hide something.
By the way, who is going around looking at people's robots.txt files?
Engineering and the Ultimate
Paranoia aside, I object to these restrictions as a matter of principle. They're making it more difficult to access publically available information. It's not classified, and it never was. I, as a citizen of the U.S.A., have a right to know what my leaders have said and done.
Let's assume the whitehouse.gov search engine is completely honest, and faithfully returns a complete listing of all materials on the site having to do with Iraq. If that's so, then there should be no reason to disable other search engines, since their results would just confirm the internal results.
But the restrictions are in place, meaning that someone thought there was a good reason to do so. Restricting access makes it more difficult for people to research information pertaining to Iraq on the whitehouse.gov web site. Who are the people most likely to be doing that? Answer: journalists, activists, and concerned citizens. Obviously these restrictions aren't enough by themselves to dissuade a determined researcher; but it might slow them down. And it might actually stop a diffident researcher completely.
I'm not even going to go into scenarios where the whitehouse.gov search engine is not trustworthy, because serving up "doctored" speeches or information is highly unlikely. There are too many other archives to compare against, and it would be a major scandal if the administration was found to be altering records on its website. They'd have to be really, really dumb to do that.
The whole thing still leaves a bad taste in my mouth, though.
In any case, I wonder how much of whitehouse.gov is actually disallowed?
Arghh... How can you people be so dumb? Why don't you actually look at the website and figure some of this stuff out? The URLs ending in "/text" are text versions of pages. They prefer search engines to dump people to the graphical versions. Here, I'll spell it out:
Graphical version (not found in robots.txt): http://www.whitehouse.gov/firstlady/recipes/
Text version (found in robots.txt): http://www.whitehouse.gov/firstlady/recipes/text
Nonexistent version (found in robots.txt): http://www.whitehouse.gov/firstlady/recipes/iraq
The "iraq" entries were probably added by mistake. Most likely a junior webmaster didn't understand the script that is (apparently) used to generate robots.txt for whitehouse.gov.
The only not-completely-ridiculous conspiracy theory that I can think of which is that someone wanted to discourage archiving of some pages, and decided to hide the fact by making it look like a script had screwed up. But I personally don't find that explanation plausible. Why not use meta tags? Most spiders simply do not respect robots.txt in this form. Pages like whitehouse.gov/iraq are still in google's cache anyway.
So personally I'm 100% convinced that this is a simple screwup. And even if it's not a screwup, most of the accusations made by the paranoids around here make about as much sense as a Wookie deciding to live on Endor.
"1. every major journalist worth his/her salt would be all over it within hours."
Don't be naive. How long do you think that any mainstream journalist who made a story of this would have a job for? The answer - not long. The US media in particular, although the UK is getting as bad, is little more than a relay system for government propaganda and real, detailed, complete examination of government behaviour, with equal air time to truly dissenting opinions (how many times has Chomsky been on CNN in the past 4 months?) is out of the question. What the government does is Good and Right and Should Not Be Questioned.
Media by the elite, serving the elite.
The other rule for transparency is that all material information be made available, kept, or destroyed in accordance to public regulation and individual policy. Individual policy must be consistent and decisions must be defensible based on policy.
The fact that people do not understand these two aspectsof transparency are what allow situations like Enron to develop. The later is what caused the destruction of Arthur Anderson. They have done nothing wrong, but they did not follow their own policy on document destruction, which made then look like at best idiots and at worst criminals.
We may compare this to other ventures to suggest policy. The NYT does not want google to cache articles because the NYT sells those articles after a certain time. Many other companies do not want deep linking because it reduces ad revenue. A fascist government may want to insure all users enter their site from a top page to make sure all users must go through the daily propaganda. A library tries hard to not track patrons so that no is afraid of using the library. The rational of the White House is beyond me.
The White House is not hiding documents. However, they are reducing the transparency of the government by limiting the avenues by which the public may access documents. Since the White House has stated many times that it believes in transparency, and in fact requires transparency when dealing with other governments, one can stipulate that transparency is the appropriate standard. So, until someone comes up with a policy that was developed and vetted through the normal processes used in the U.S., one has every reason to suspect nefarious motives.
And, if I may modify a statement that conservatives like to make, if you do not like transparency, go move to Iraq.
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
Yeah whatever --- those charming "Clinton body count" people said the same thing regarding their own little conspiracies.
Why don't you people get a grip on reality. There are a lot of problems with the media, but a basic inability to question government is not one of them.
"Simply stated, there is no doubt that Saddam Hussein now has weapons of mass destruction. There is no doubt that he is amassing them to use against our friends, against our allies, and against us. And there is no doubt that his aggressive regional ambitions will lead him into future confrontations with his neighbors -- confrontations that will involve both the weapons he has today, and the ones he will continue to develop with his oil wealth."
I can't possibly imagine why the Bush administration would want to keep these kinds of quotes out of search engines...
I guess one of those would be not finding it at all. That's what this robot.txt file will do as google drops pages on its list.
If you check out the actual post that got the guy into trouble you'd see that he didn't post anonymously. He had his full name and email address out in the open; not too hard to trackdown. And it makes you think, what kind of idiot would seriously plot in public to kill someone and use their real fucking name? Either a naive intellectual with no actual intent or an idiot who's not capable of pulling it off in the first place. That's who.
Yeah because it makes sense that site is the only one who has his press releases. I am sure MIB is making sure all the major networks and mirrors have removed their copies from archive as well. Seriously, grow up.
So, someone finds a problem with blocking search engine bots.
1) First, a lot of these docs involve Iraq. So, wihtout real factual information, it's assumed they're trying to do something fishy regarding Iraq info
2) Using that assumption, the next assumption is that they're purposely trying to keep people from trying to find contradictory statements.
This could all be true, or it couldn't be. Either way, by making two assumptions without any real facts is just pathetic yellow journalism.
Sorry, I'm with Al Franken on him. (though Ivins is great!)
"I think he's mean. I think we're all too ready to blame Karl Rove, or Dick Cheney, or Ari Fleischer, or Gale Norton, or Donald Rumsfeld, or John Ashcroft when this administration does something despicable. When South Carolinians get push polls saying John McCain fathered an illegitimate black child, you know Karl Rove had something to do with it. But it's really Bush. When our energy policy is set by cronies from the oil, coal, and automobile industries, you can shake your fist at Dick Cheney. But it's Bush. When Ari Fleischer feeds rumors that the Clinton people vandalized the White House, doing $200,000 worth of damage, but month later a GAO report say that ain't true, you can say that Ari Fleisher is a chimp. And he is. But it's Bush."
...
"And I'm through with him."
I want peace on earth and goodwill toward man.
We are the United States Government! We don't do that sort of thing.
Day by day and almost minute by minute the past was brought up to date. In this way every prediction made by the Party could be shown by documentary evidence to have been correct, nor was any item of news, or any expression of opinion, which conflicted with the needs of the moment, ever allowed to remain on record. All history was a palimpsest, scraped clean and reinscribed exactly as often as was necessary. In no case would it have been possible, once the deed was done, to prove that any falsification had taken place.
When you are sure of something, you probably are wrong (search for "Unskilled and Unaware of It").
http://www.bway.net/~keith/whrobots/disdirs.html And, yes these files *are* relevant.
Melius mori in libertate quam vivere in servitute.
Correct me if I am wrong but the data is still there right? Also, wasn't the purpose of robots.txt(that honor it) to stop crawlers from incessantly crawlign the page sapping your bandwidth? I just don't feel that this is a big issue. If they made it not searchable from the main whitehouse page, thats when I would have issues. They are just trying to save themselves bandwidth. Pages like these Iraq pages are peobably updated often. They'd be getting crawled constantly.
Gorkman
I really do wonder what brings people to zealously defend actions like this. Sure, it could be a mix up, but a really ill conceived one. It's obvious that you don't have all the answers, just like others here.
My guess is that the poster feels that Slashdot posters are simply leaping to unjustified paranoid conclusions, and the depth of this faith (or so he pictures it) outrages him (or her).
The intensity of the poster's reaction is simply a reflection of his or her perception of Slashdot readers' zeal.
There are many possible explanations which do not involve conspiracy to hide information. For example, this could just be the work of some low-level IT guy who wanted to filter out one URL which happened to contain 'iraq' because the search-engine robots were burdensome to the webserver. I, for one, prefer to remain suspicious.
Not true. Some of them do exist, like this one: /climatechangefactsheet/text
"Only the small secrets need to be protected. The big ones are kept secret by public incredulity." - Marshall McLuhan
Nobody thinks Bush and Cheney are updating the website. Jeeze. But the folks that are running the website (and I would bet this extends down to the actual webmaster/tech guy) are political appointees who are there to make the president look good. That is their job. Their actions are all filtered through this political role.
Let's present an alternate scenario - since you have no evidence for yours, I don't have to present any evidence for mine.
It's May - Pres. makes his speech on the Carrier, the assumption by those-in-charge are that Chalabi's government will have control of the country within a couple of weeks and the US troops will be heading on home. The web folks (who want to make B & C look good) declare "combat's done! the troops are coming home! re-elect Bush!"
A few months later, that rosy scenario hasn't quite panned out. The aircraft carrier speech is becoming a liability for Bush - people started counting the number of dead troops in Iraq since he gave the speech, and it keeps going up. The web folks (who want to make B & C look good) say to themselves "this is a potential embarrassment to the president - let's see how we can make it less embarrassing."
And there you have it.
I'm so sorry I expended my mod points earlier in the day. What a bunch of flamebait bullshit this line of crap is. "Dictatorship?" Get fucking real. Let me ask this in non-partisan terms:
If the fiasco that was the 2000 presidential election went in Gore's favor, would you care to label his administration a dictatorship?
Has martial law been declared?
Are SS agents en route to your residence right now to conduct a little Q&A over this post?
Snap the fuck out of it. While I completely disagree with this appraisal of the Bush administration, I can (barely) live with you posting it. Just don't such nonsense to go unanswered and undebunked by me.
There are a lot of problems with the media, but a basic inability to question government is not one of them.
That is, absolutely, the primary problem with the American media. Please pull your head out of your ass and inform yourself.
Well terrorists have been attacking us since we have been in Iraq till this point in time, but i guess that doesnt mean there is any link..... naaaah
Native people fighting against an occupying force are known as freedom fighters, not terrorists.
ry again sparky.
If you can't run video on the nightly news or CNN it has the same effect as, and is the equivalent of banning the media. The American public has a right to see those images and the media has a responsibility to show them. To do otherwise is irresponsible.
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety" - BF
"This is why you and I do something else for a living. We know shit as it relates to politics. Say it with me. IANAP. I Am Not A Politician."
so us common folks should just stay out of the political game altogether? we shouldn't have opinions about politicians, and hell, let's all stop voting while we're at it. career politicians do a fine job governing this country, and if we question their wisdom, it's only out of a sort of working class ignorance, right?
"Life is great; without it, you'd be dead." -Harmony Korine
Or the robots.txt file was updated since the last time google crawled the web.
"We have got to make Stan understand the importance of voting, because he'll definitely vote for our guy." - South Park
The fact that there are "enemy combatants" that include US citizens, and the fact that there were 2000 Muslims arrested after 9/11 without being charged, and 16,000 deportations subsequently, does NOT bode well for us democratically.
At the best, it's causing hatred in the Muslim world. At worst, its provoking terrorism (combined with our unconditional support of Israel).