White House Website Limits Iraq-Related Crawling
oscarcar writes "Dan Gillmor is reporting on the White House website's use of its robots.txt file to disable search engines from crawling certain material. Many excluded items in the robots.txt file involve mentions of Iraq, possibly to prevent people from finding changes to past statements and information when archived elsewhere."
whitehouse.com doesn't have that problem.
sulli
RTFJ.
it's good to see the whitehouse embracing technology so much.
!(^((ri)|(mp))aa$)
Many excluded items in the robots.txt file involve mentions of Iraq, possibly to prevent people from finding changes to past statements and information when archived elsewhere."
Maybe, but I would think they might also be looking for "shady" spiders that ignored robots.txt. I wouldn't be surprised if there aren't a few honeypot pages in there too.
To ensure perfect aim, shoot first and call whatever you hit the target
Queue somebody to take a crawler (hell, even a bash script using wget) to specifically archive these pages. Hell, they could even use a user-agent which doesn't look like a bot.
Of course, people would be less likely to trust random-Joe from the Internet than, say, The Wayback Machine, but I expect this is what will happen...
Or you'll tear his tinfoil hat and then the black helicopters will be able to find him again.
Nugs
If this was some crazy government conspiracy and they were trying to hide the information, why would they put it on their website? Could be any number of reasons they have done this perhaps they were getting loads of hits from google about iraq related things but if anyone really wants the information surely they can just visit it.
--
On Slashdot I'm a lawyer.
Disallow: /president/spongebobsquarepants_archive
I didn't know gee-dub likes SpongeBob too! My nephew is gonna flip out when he hears this.
"My mother never saw the irony in calling me a son-of-a-bitch." - Jack Nicholson
Perhaps their goal is simply so that when people google or whatnot for information on the Bush Administration and Iraq, they will be likely to find the Bush Administration's current views on and actions in Iraq, rather than outdated material?
Completely ignoring for the moment the fact that these views and actions are really somewhat embarrasing for the Bush administration, this really makes sense from a practical viewpoint. Few things are as annoying as searching for something news-ish and finding primarily material from two years ago. And after all, if they ONLY were interested in people forgetting the old materials, they could have just removed those materials from the site totally. (Though perhaps they were aware removing the materials completely would cause mirrors, which would be fully searchable, to spring up.)
Irritable, left-wing and possibly humorous bumper stickers and t-shirts
If you're surprised by this, THAT's the news, not what the White House is doing with this information control. Click here for a list of the White House's policies with restricting FOI and other related requests since Sept 11th.
This isn't partisan politics, either. The Republican party has been trying to keep Bush from violating the Presidential Records Act.
Yes, yes, the country's at war. Makes you wonder why Bush doesn't want anybody to know about communications between Reagan and his advisors.
--------
Bleah! Heh heh heh... BLEAH BLEAH!!! Ha ha ha ha...
American people should have some say in a situation like went on in Iraq.
They do, it's called voting, not to mention public opinion polls, which were near 70% for the invasion when the US invaded.
Slashdot "libertarians": Small government for me, big government for those I disagree with. -1, I disagree with you
While anything is possible in politics, is it possible that the web admin is trying to limit the amount of traffic on the site? Is it possible that his analysis of the weblogs show a lot of traffic from robots looking for Iraqi-related info?
If you persist in contemplating a world where whatever statements that the WH puts out, no matter how they might seem to contradict previous statements, are not totally true and correct, then a relocation expert from Guantanamo will be by in a few minutes. Just step away from the computer.
It looks like 99% of the stuff related to Iraq is filtered out in robots.txt.
/infocus/iraq directory (which is dissallowed in robots.txt)
But not a problem, on google.com I just specify the site by saying 'Iraq site:whitehouse.gov' and it had 14,000 hits... the first one is the root of
Nothing's hidden, it's all there, it's all searchable from the white house website, just not from search engines.
I have to admit, when I first read the story I thought someone was being paranoid. But you really should RTF robots.txt file before you accuse the poster of being paranoid. The disallowed files are extraordinarily specific. I really can't come up with a plausible explanation beyond simoniker's.
Obviously, they're keeping people from accessing the top-secret teeball Iraq files ! Besides:
check out these other frightening examples of censorship:Truly frightening.
Computer Go: Writing Software to Play the Ancient Game of Go
Consider the fact that GW Bush has banned media (hello?? freedom of the press? 1st Amendment??) coverage of returning killed soldiers. Why? Because seeing dead soldiers makes people realize that the war is real and people are dieing.
The current administration is trying its damndest to control infomation that it doesn't like
welcome our White House Robot Overlords. It would be funnier if it weren't true.
- - - If the sun is a star, why can't I see it at night?
Winston's greatest pleasure in life was in his work. Most of it was a tedious routine, but included in it there were also jobs so difficult and intricate that you could lose yourself in them as in the depths of a mathematical problem -- delicate pieces of forgery in which you had nothing to guide you except your knowledge of the principles of Ingsoc and your estimate of what the Party wanted you to say. Winston was good at this kind of thing. On occasion he had even been entrusted with the rectification of the Times leading articles, which were written entirely in Newspeak. He unrolled the message that he had set aside earlier. It ran:
times 3.12.83 reporting bb dayorder doubleplusungood refs unpersons rewrite fullwise upsub antefiling
In Oldspeak (or standard English) this might be rendered:
The reporting of Big Brother's Order for the Day in the Times of December 3rd 1983 is extremely unsatisfactory and makes references to non-existent persons. Rewrite it in full and submit your draft to higher authority before filing.
<a href="http://www.joblessjimmy.com">Work is dumb and so is Jobless Jimmy.</a>
It could be something innocent but really, why would anyone want to keep search engines out of a publicly funded website? People have been accusing the poster of "baseless accusations" but the guy does have a point. I've seen a couple of GW's speeches and afterwards the transcripts of those speeches and noted that gramatical errors were corrected. While this is only a minor offence in editing history it does make you wonder what other opinions and information may have appeared and then later have been edited. Seriously, these are our government officials here, we deserve to have an unedited record of what they say and to hold them to it. A little bit of speculation on the reasons for excluding various terms is far from paranoia.
Chris
This gets modded up as Insightful? I mean, the White House is routinely editing their trascripts, and if bots like Google and Wayback can go and find that no, Bush said that we found weapons, not a weapons program, then there goes Bush's latest FUD... *thud*. Just because it's a tinfoil hat worthy theory doesn't mean it isn't true... most aren't, but therein lies the issue: most.
#define DRM chmod 000
Here's a minor example of something those two sites didn't catch: Remember Iraq's so-called "mobile biological weapons factories"? A month after the story broke that they were for weather balloons, the CIA moved their report's URL.
An intriguing fact about this whitehouse.gov/*/iraq thing is that they do in fact cover some of the important statements which are apparently not duplicated in the press release, conference, and briefing directories. Perhaps there was a "unique urgency" to cover up some poor choices of words?
If you think that's the most paranoid comment, you must not read any RFID tag threads here yet.
You found it didn't you? It failed... congratulations, you have somehow circumvented the government's website security system, prepare for the wrath of the DMCA, backed by none other than Bush himself!
;)
Well either that, or it's simply preventing search engines from indexing honeypot type pages used for mis-information... Either or... but I like the first version... since it's more paranoid, and I have plenty of tinfoil ready to be shaped into hats...
---
Programming is like sex... Make one mistake and support it the rest of your life.
Goodness knows we can't have googlebots archiving all of those top-secret/confidential web pages at the whitehouse. I guess we'll just have to live with the top-secret info that has already been archived.
What's that? Oh, all of the real top-secret stuff is at the NSA website?
Never mind then.
Nope... didn't take me long to find something that was disallowed to be a valid URL:
/infocus/iraq
Disallow:
http://www.whitehouse.gov/infocus/iraq is a valid URL.
If you try actually *loading* the directories listed in the robots.txt, they don't exist. Not one. Not by going to their index.html or trying to find them through the site navigation. While they could still be accused of deleting them, many of the links are unlikely to have existed in the first place (http://www.whitehouse.gov/president/heartland-tou r-gallery/iraq? /president/holiday/decorations/iraq? /president/tee-ball-01/iraq? ) This may be just some IT grunt running a bad script on robots.txt.
I can't see this as a conspiracy .. it's just too silly.
Why on Earth wouldn't they just EDIT the bleedin' files? They wouldn't have to delete them or set up robots.txt, they would just change them to reflect the "message of the moment". They probably do that anyway, same as a lot of other sites.
Do they really think people would be blocked by robots.txt?? Nobody's that dumb (yeah they could be Windows MSCE droids but c'mon).
I think they did it for some other reason like keeping traffic down.
Another possibility: a hacker got in there and did this because a) he only had write access to robots.txt for some reason or b) he wanted to play a subtle joke. But I doubt that too.
Anyway this is strange, but pointless, so I wouldn't bother with it unless you're a democrat looking for something else to whine about...
As of just a few minutes ago, these entries were seen added to the robots.txt file:
/news/slashdot /news/tinfoilhat /allyouriraq/are/belongto/US ...
/. ... if they're so worried about people finding out their insidious plots, they'll just flip the switch on all their mind-controlling ...
... MUST DESTROY WEBLOGS ... TRUTH GETTING OUT ... DUBYA IS MY FRIEND ... MUST DESTROY SLASHDOT ... MUST DESTROY WEBLOGS ...
Disallow:
Disallow:
Disallow:
Come on. This is extremely paranoid and far-fetched, even for
MUST DESTROY SLASHDOT
topreacher@signature.slashdot.org 1% rm -rf sig
Most of the pages in the robots.txt are actually 404's and dont exist anymore. Its that simple. Keeps the robots from constantly requesting content that doesn't exist anymore. A few are blocked because they are bandwidth intensive videos and things, and some others are blocked for more mundane reasons I assume.
Nosirree, no legitimate webmaster would ever use robots.txt to gently guide visiting bots to the appropriate parts of the site and to keep them from trying to do silly things. The only possible use is to trample your rights while installing the new corporate-owned government.
Geez, people. Honestly.
Dewey, what part of this looks like authorities should be involved?
Seems odd and pointless to me. I'd like a statement explaining it. A lot like the "Disallow: /hidden/passwd" kind of entries.
Since when as the truth been needed for mainstream newstory?
Onward to the Aether Sphere!
Looks like someone just added IRAQ to all of the exsiting links. It's obviously some sort of search/replace/copy function. Go look for yourself, I found this one:
/firstlady/recipes/iraq
Disallow:
Now, how many pages would this possibly block?
M@
Krispy Cream is people
Keep telling yourself that.
And 70% of the people in this country STILL think that Saddam played some part in 9/11. What was your point again?
Looks like they removed a bunch of files where they were making claims that Saddam was behind 9/11. One could be lead to suspect that now that Bush got his war his doesn't need that lie anymore, and wants to erase all history of it since it undermines his authority.
Peace, or Not?
The majority of American People did not vote for this administration. The American People, my friend elected Al Gore. This administration was put in place by the Supreme Court. Has your brain been washed so quickly you have already forgotten? Wake up people these guys don't give a shit about you or anyone you know unless they have a net worth greater than 10 million. Look at the facts, overall our economy is in the toilet with the vast majority of citizens considerably worse off than they were 4 years ago. Of course, the extremely rich are doing kust fine, getting extremely richer.
every time a republican dies a queer angel gets his wings
Why should a government-authored site (which, under the Constitution, by definition is public domain text) be exlcuded from non-government electronic publishing sites?
By the way, show me where in that Robots.txt file there's a command that would block http://www.whitehouse.gov/holiday/2002/art/01.html from Google? If you're right, there should be a line
disallow /holiday/2002/art/ . I don't see one. So, yeah, it's explicitly Iraq-related stuff that they're trying to block. Either 1. they're afraid that sensitive information might end up on the site by accident and want to make sure that it isn't archived if it is - in which case, they've got a lot more serious problems than political connivance - or 2. the theory is correct, and they're trying to set up a memory hole. Given Karl Rove's history, which do YOU think it is?
I honestly think this is stuff that goes on beneath GWB's notice. I'm with Molly Ivins on him: he's not evil, mean, or stupid, just wrong.
Better explanation: Someone screwed up a search-and-replace in a major way. Many (most?) of those pages with "iraq" in them don't exist.
It looks like someone blocked off parts of the site to web-crawlers; I don't know for sure why all those blah/bloo/iraq entries are in there but they sure as hell don't lead to anything.
Censorship: 0
Screwups: 100
It seems like every single directory has had the word "iraq" appended to the end. Do you think that this might have been a knee-jerk reaction by some admin who didn't really know what they were doing? I can't really imagine there are legitimate iraq dirs under easter and teeball directories.
It appears that this robots.txt file was probably auto-generated. It looks like someone used a script to crawl the sites entire directory structure appending /iraq and /text to every directory. In the process they seem to have created a pretty complete map of the sites underlying directory structure -- not necessarily a good thing.
.html if they're actual pages.
Having said that, I'm not even sure that this robots.txt file would work the way it's supposed to. Seems like these iraq references should all have a trailing slash or a
Someone clearly doesn't want Google caching Whitehouse content on Iraq. The question is why? And how come they're so lame about it?
There hasn't been a real declared war since WWII. You can't "declare war on terrorists" and be done with it either, wars are supposed to be declared on countries when you go to fight them. It was what an honorable nation would do before hostilities.
I knew it!
- Grep the errors log for 404's from search engines.
- Parse out the directory paths.
- Add those to robots.txt.
Which might explain why at least one of the directories -I have to agree that it's more strange than sinister. Besides, I'm not sure that the web site is the official archive for white house statements.
what's that old saying? "never attribute to malice that which can be attributed to stupidity" or something like that?
let's not get reactionary here, folks. it wouldn't make sense to do what's being alleged:
1. every major journalist worth his/her salt would be all over it within hours. so it wouldn't succeed in obscuring information.
2. it would create an incredible backlash as soon as detected. what purpose would this serve?
ed
# robots.txt for http://www.ingsoc.gov/
/cgi-bin /search /query.html /help /appointments/eurasia /appointments/eastasia /ask/images/eurasia /ask/images/eastasia /deptofhomeland/analysis/eurasia /deptofhomeland/analysis/eastasia /deptofhomeland/eurasia /deptofhomeland/eastasia /economy/eurasia /economy/eastasia /goodbye/eurasia /goodbye/eastasia /government/handbook/eurasia /government/handbook/eastasia /government/images/eurasia /government/images/eastasia /government/eurasia /government/eastasia
User-agent: *
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
And now, an offering for the lameness filter...
Oceania was at war with Eastasia: Oceania has always been at war with Eastasia. A large part of the political literature of five years was now completely obsolete. Reports and records of all kinds, newspapers, books, pamphlets, films, sound tracks, photographs- all had to be rectified at lightning speed. Although no directive was ever issued, it was known that the chiefs of the Department intended that within one week no reference to the war with Eurasia, or the alliance with Eastasia, should remain in existence anywhere. The work was overwhelming, all the more so because the processes that it involved could not be called by their true names. Everyone in the Records Department worked eighteen hours in the twenty-four, with two three-hour snatches of sleep. Mattresses were brought up from the cellars and pitched all over the corridors; meals consisted of sandwiches and Victory Coffee wheeled round on trolleys by attendants from the canteen. Each time that Winston broke off for one of his spells of sleep he tried to leave his desk clear of work, and each time that he crawled back sticky-eyed and aching, it was to find that another shower of paper cylinders had covered the desk like a snowdrift, half burying the speakwrite and overflowing onto the floor, so that the first job was always to stack them into a neat-enough pile to give him room to work. What was worst of all was that the work was by no means purely mechanical. Often it was enough merely to substitute one name for another, but any detailed report of events demanded care and imagination. Even the geographical knowledge that one needed in transferring the war from one part of the world to another was considerable.
This was written in 1948. Things have really progressed!
The complaint is they've done it before - "combat operations are done" became "major combat operations are done" when the fighting didn't stop. You can check here.
Compare the screenshots of what used to be on the white house website vs what's currently on the website.
Yes, I know, "how do we know this blogger didn't alter the screenshots?" You don't.
Disallow: /climatechangefactsheet/iraq
/climatechangefactsheet/text
Disallow:
Now why would they want to stop these being crawled?
Paul.
Downloading the "robot.txt" file and doing a quick ctrl-f on different words, I discovered that there are six instances of "Barney" coming up in the robot.txt:
/holiday/2002/barney/iraq /holiday/2002/barney/text /kids/barney/iraq /kids/barney/text /kids/photoessays/barney/iraq /kids/photoessays/barney/text
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Which is the same number as "cheney", "powell" had 4, "saddam" didn't have any and "bush" only comes up with "bushpets".
Clearly, there is something to do with Barney and Iraq that The White House doesn't want you to know about.
myke
Mimetics Inc. Twitter
It really doesn't look like it. It looks like someone screwed up, because none of those directories appear to exist at all. I mean really, what are the chances of /firstlady/photos/2003/01/iraq actually having at some time contained real data?
It looks like someone did a
find . -type d|perl -e 'while(<>){print "${_}/iraq\n"; print "${_}/text\n";}' > robots.txt
I have no idea what the purpose would be, but it seems like a funny thing to do if you were trying to hide something.
By the way, who is going around looking at people's robots.txt files?
Engineering and the Ultimate
Paranoia aside, I object to these restrictions as a matter of principle. They're making it more difficult to access publically available information. It's not classified, and it never was. I, as a citizen of the U.S.A., have a right to know what my leaders have said and done.
Let's assume the whitehouse.gov search engine is completely honest, and faithfully returns a complete listing of all materials on the site having to do with Iraq. If that's so, then there should be no reason to disable other search engines, since their results would just confirm the internal results.
But the restrictions are in place, meaning that someone thought there was a good reason to do so. Restricting access makes it more difficult for people to research information pertaining to Iraq on the whitehouse.gov web site. Who are the people most likely to be doing that? Answer: journalists, activists, and concerned citizens. Obviously these restrictions aren't enough by themselves to dissuade a determined researcher; but it might slow them down. And it might actually stop a diffident researcher completely.
I'm not even going to go into scenarios where the whitehouse.gov search engine is not trustworthy, because serving up "doctored" speeches or information is highly unlikely. There are too many other archives to compare against, and it would be a major scandal if the administration was found to be altering records on its website. They'd have to be really, really dumb to do that.
The whole thing still leaves a bad taste in my mouth, though.
Obviously robots.txt just happened to be in the path!
flossie
Write now. Defend liberty
"1. every major journalist worth his/her salt would be all over it within hours."
Don't be naive. How long do you think that any mainstream journalist who made a story of this would have a job for? The answer - not long. The US media in particular, although the UK is getting as bad, is little more than a relay system for government propaganda and real, detailed, complete examination of government behaviour, with equal air time to truly dissenting opinions (how many times has Chomsky been on CNN in the past 4 months?) is out of the question. What the government does is Good and Right and Should Not Be Questioned.
Media by the elite, serving the elite.
So, am I to understand that the same administration that was smart enough to rig an election, Smart enough to cause 9/11, Smart enough to forge evidence and go to war is the same administration that came up with the brilliant plan of HIDING information by putting it in a PUBLICALY availible file?
T Money
World Domination with a plastic spoon since 1984
The other rule for transparency is that all material information be made available, kept, or destroyed in accordance to public regulation and individual policy. Individual policy must be consistent and decisions must be defensible based on policy.
The fact that people do not understand these two aspectsof transparency are what allow situations like Enron to develop. The later is what caused the destruction of Arthur Anderson. They have done nothing wrong, but they did not follow their own policy on document destruction, which made then look like at best idiots and at worst criminals.
We may compare this to other ventures to suggest policy. The NYT does not want google to cache articles because the NYT sells those articles after a certain time. Many other companies do not want deep linking because it reduces ad revenue. A fascist government may want to insure all users enter their site from a top page to make sure all users must go through the daily propaganda. A library tries hard to not track patrons so that no is afraid of using the library. The rational of the White House is beyond me.
The White House is not hiding documents. However, they are reducing the transparency of the government by limiting the avenues by which the public may access documents. Since the White House has stated many times that it believes in transparency, and in fact requires transparency when dealing with other governments, one can stipulate that transparency is the appropriate standard. So, until someone comes up with a policy that was developed and vetted through the normal processes used in the U.S., one has every reason to suspect nefarious motives.
And, if I may modify a statement that conservatives like to make, if you do not like transparency, go move to Iraq.
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
This really shouldn't shock anyone. It has been going on at the White House for ages. Look at this clip from the robots.txt file from 1998: /history/photoessays/blueroom/blowjobs /history/photoessays/blueroom/text /history/photoessays/cabinetroom/blowjobso w: /history/photoessays/cabinetroom/text /history/photoessays/crosshalls/blowjobsw : /history/photoessays/crosshalls/temp/blowjobss allow: /history/photoessays/crosshalls/temp/texto w: /history/photoessays/crosshalls/text /history/photoessays/diplomaticroom/blowjobsa llow: /history/photoessays/diplomaticroom/textw : /history/photoessays/downstairscorridor/blowjobs
Disallow: /history/photoessays/downstairscorridor/texta llow: /history/photoessays/easter/2002/blowjobso w: /history/photoessays/easter/2002/text /history/photoessays/easter/2003/defenselink/blowj obs /history/photoessays/easter/2003/defenselink/text /history/photoessays/easter/2003/blowjobso w: /history/photoessays/easter/2003/text /history/photoessays/easter/one/blowjobsw : /history/photoessays/easter/one/text /history/photoessays/easter/three/blowjobs
Disallow:
Disallow:
Disallow:
Disall
Disallow:
Disallo
Di
Disall
Disallow:
Dis
Disallo
Dis
Disall
Disallow:
Disallow:
Disallow:
Disall
Disallow:
Disallo
Disallow:
Viv
Gmail invites for ip
So, someone finds a problem with blocking search engine bots.
1) First, a lot of these docs involve Iraq. So, wihtout real factual information, it's assumed they're trying to do something fishy regarding Iraq info
2) Using that assumption, the next assumption is that they're purposely trying to keep people from trying to find contradictory statements.
This could all be true, or it couldn't be. Either way, by making two assumptions without any real facts is just pathetic yellow journalism.
Sorry, I'm with Al Franken on him. (though Ivins is great!)
"I think he's mean. I think we're all too ready to blame Karl Rove, or Dick Cheney, or Ari Fleischer, or Gale Norton, or Donald Rumsfeld, or John Ashcroft when this administration does something despicable. When South Carolinians get push polls saying John McCain fathered an illegitimate black child, you know Karl Rove had something to do with it. But it's really Bush. When our energy policy is set by cronies from the oil, coal, and automobile industries, you can shake your fist at Dick Cheney. But it's Bush. When Ari Fleischer feeds rumors that the Clinton people vandalized the White House, doing $200,000 worth of damage, but month later a GAO report say that ain't true, you can say that Ari Fleisher is a chimp. And he is. But it's Bush."
...
"And I'm through with him."
I want peace on earth and goodwill toward man.
We are the United States Government! We don't do that sort of thing.
Day by day and almost minute by minute the past was brought up to date. In this way every prediction made by the Party could be shown by documentary evidence to have been correct, nor was any item of news, or any expression of opinion, which conflicted with the needs of the moment, ever allowed to remain on record. All history was a palimpsest, scraped clean and reinscribed exactly as often as was necessary. In no case would it have been possible, once the deed was done, to prove that any falsification had taken place.
When you are sure of something, you probably are wrong (search for "Unskilled and Unaware of It").
Where have you been living the past five years? Journalists don't criticize Bush.
They still have not published the fact that he deserted from the national guard during Vietnam and they practically ignored his DUI conviction.
The GOP has the media cowed with their constant 'liberal media' babble. There number of journalists who are prepared to hold Bush to account is tiny - Krugman, Conanston, Irvins, Alterman. After that its Al Franken, Jon Stewart and David Letterman.
it would create an incredible backlash as soon as detected. what purpose would this serve?
The chances that the mainstream media will pick this one up are very small. Just think how they would have reacted if it was Clinton!
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
Yeah, but it removes any pages it has stored when it finds itself disallowed from the page, IIRC.
Umm... it's on the tip of my tongue... what was that name... begins with a 'G'... oh, Google, that's right, Google. Supposed to be some kind of search engine or something.
Mod down people who tell people how to mod in their sigs
Correct me if I am wrong but the data is still there right? Also, wasn't the purpose of robots.txt(that honor it) to stop crawlers from incessantly crawlign the page sapping your bandwidth? I just don't feel that this is a big issue. If they made it not searchable from the main whitehouse page, thats when I would have issues. They are just trying to save themselves bandwidth. Pages like these Iraq pages are peobably updated often. They'd be getting crawled constantly.
Gorkman
I really do wonder what brings people to zealously defend actions like this. Sure, it could be a mix up, but a really ill conceived one. It's obvious that you don't have all the answers, just like others here.
My guess is that the poster feels that Slashdot posters are simply leaping to unjustified paranoid conclusions, and the depth of this faith (or so he pictures it) outrages him (or her).
The intensity of the poster's reaction is simply a reflection of his or her perception of Slashdot readers' zeal.
There are many possible explanations which do not involve conspiracy to hide information. For example, this could just be the work of some low-level IT guy who wanted to filter out one URL which happened to contain 'iraq' because the search-engine robots were burdensome to the webserver. I, for one, prefer to remain suspicious.
There are a lot of missing dates, but it looks to me like whitehouse.gov had a major site redesign sometime between Jul 13 and Sep 13 2001, and that when the new site was released they started putting in lots of the disallow statments for certain paths.
From Jul 13:
7-13 Whitehouse.gov
7-13 Robots.txt
From Sep 13:
9-13 Whitehouse.gov
9-13 Robots.txt
It seems to me like the simplest explanation is just that their redesigned site has multiple paths to the same information, and for some reason they felt that their search engine rankings would improve if they eliminated superfluous paths. Although I'll admit it's suspicious that their old robots.txt from 2 years ago had 151 Disallows, and the one from today has 1552 Disallows, while the site uses basically the same navigation structure.
Not true. Some of them do exist, like this one: /climatechangefactsheet/text
"Only the small secrets need to be protected. The big ones are kept secret by public incredulity." - Marshall McLuhan
Other posters have claimed it's more than one. I haven't checked, so I don't know. However, even if it is just infocus/iraq, that's still a hell of a lot.
e pt26.html
That subdirectory seems to contain all or most of the transcripts of Ari Fleischer's and Bush's interviews and press conferences leading up to the war and after. An example is this:
http://www.whitehouse.gov/infocus/iraq/excerpts_s
Bullshit.
The Iraq entries could only have got there if someone was told to go and stop stories appearing in the Google cache.
The person who got the job appears to have done it in a pretty clumsy way, that is pretty much par for the course for this type of work. Nixon did not expect Gordon Liddy and his pals to get caught in a third rate burgalry either.
It looks to me like someone was told to block out the Iraq files and simply did a directory listing on the web server and then appended /iraq to everything.
If you want to find out for sure file some FOIAs.
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
Nobody thinks Bush and Cheney are updating the website. Jeeze. But the folks that are running the website (and I would bet this extends down to the actual webmaster/tech guy) are political appointees who are there to make the president look good. That is their job. Their actions are all filtered through this political role.
Let's present an alternate scenario - since you have no evidence for yours, I don't have to present any evidence for mine.
It's May - Pres. makes his speech on the Carrier, the assumption by those-in-charge are that Chalabi's government will have control of the country within a couple of weeks and the US troops will be heading on home. The web folks (who want to make B & C look good) declare "combat's done! the troops are coming home! re-elect Bush!"
A few months later, that rosy scenario hasn't quite panned out. The aircraft carrier speech is becoming a liability for Bush - people started counting the number of dead troops in Iraq since he gave the speech, and it keeps going up. The web folks (who want to make B & C look good) say to themselves "this is a potential embarrassment to the president - let's see how we can make it less embarrassing."
And there you have it.
The crux of the matter is that he refused to have his pilots medical just after the Pentagon added a check for illegal drug use.
You can try to spin this whichever way that Karl Rove tells you but the facts are against you. The fact is that your great leader is a coward who ducked the draft and then deserted to avoid a drug test.
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
Would the White House sue for violation of the robots.txt file? Under what laws could they sue? Is robots.txt an implicit grant of permission to view copyrighted content? Would GWB press the Congress for a new bill, to mandate legal enforcement of the robots.txt?
That's probably not going to happen anytime soon, but it raises an interesting question. Is robots.txt legally enforceable? And if it was, would that be a good thing or a bad thing?
Your thoughts?
See:e xt/20030501-15.html
r aq/20030501-15.html
http://www.whitehouse.gov/news/releases/2003/05/t
which differs from
http://www.whitehouse.gov/news/releases/2003/05/i
In the text version, the pages says 'President Bush Announces Combat Operations in Iraq Have Ended' while in the robot accessible version, it is ''President Bush Announces Major Combat Operations in Iraq Have Ended'.
Get your own screenshots.
Well terrorists have been attacking us since we have been in Iraq till this point in time, but i guess that doesnt mean there is any link..... naaaah
Native people fighting against an occupying force are known as freedom fighters, not terrorists.
ry again sparky.
Pardon me, but some of them do lead to interesting things. /news/releases/2003/05/iraq/ exists, and even contains different data than
e xt/20030501-15.html versus http://www.whitehouse.gov/news/releases/2003/05/ir aq/20030501-15.html and http://www.whitehouse.gov/robots.txt has /news/releases/2003/05/iraq/ in it.
news/releases/2003/05/text/ or news/releases/2003/05/
See for yourself:
http://www.whitehouse.gov/news/releases/2003/05/t
Compare the headlines.
http://www.whitehouse.gov/infocus/iraq
Not any more.
Although the current Google cache lists
[snip 22 lines]
the current robots.txt leaps from
to
Conspiracy theory over...
"This is why you and I do something else for a living. We know shit as it relates to politics. Say it with me. IANAP. I Am Not A Politician."
so us common folks should just stay out of the political game altogether? we shouldn't have opinions about politicians, and hell, let's all stop voting while we're at it. career politicians do a fine job governing this country, and if we question their wisdom, it's only out of a sort of working class ignorance, right?
"Life is great; without it, you'd be dead." -Harmony Korine
Referring to a website critical of him (but correct in every detail)
Or the robots.txt file was updated since the last time google crawled the web.
"We have got to make Stan understand the importance of voting, because he'll definitely vote for our guy." - South Park
I'm so sorry I expended my mod points earlier in the day. What a bunch of flamebait bullshit this line of crap is. "Dictatorship?" Get fucking real. Let me ask this in non-partisan terms:
Yeah, so what? I don't know about you, but part of my governmental conditioning program, er, public education, included a long history lesson attached to the flimsy statement of "Those who forget the past are doomed to repeat it." Noticing that many of the things going on in this country during Bush's term as president are reminiscent of things that happened in Germany in the '20s and '30s isn't "bullshit". It's trying not to repeat the past.
You can disagree with it all you want, of course, and there are plenty who want to portray this country as a dictatorship when it's not--yet. It may not become one, either. One thing we have that the Germans of the '20s and '30s did not have is the history of Germany in the '20s and '30s. We can apply the hindsight and use the lessons in the present to prevent this country from becoming this "Fourth Reich".
But we can't do it if we spend our time in denial of history and present events. It may well turn out that there's no correlation, and that all that's really happening here is an incompetent president during a time of crisis (after Homeland Security failed to become the Gestapo upon inception, I'm inclined to think our president's just incompetent, and I recall from his governorship that it's a documented fact). But we have to be prepared for the worst, and hope for the best. Reality, as always, will be somewhere in the middle.
Like what I said? You might like my music