White House Website Limits Iraq-Related Crawling
oscarcar writes "Dan Gillmor is reporting on the White House website's use of its robots.txt file to disable search engines from crawling certain material. Many excluded items in the robots.txt file involve mentions of Iraq, possibly to prevent people from finding changes to past statements and information when archived elsewhere."
Many excluded items in the robots.txt file involve mentions of Iraq, possibly to prevent people from finding changes to past statements and information when archived elsewhere."
Maybe, but I would think they might also be looking for "shady" spiders that ignored robots.txt. I wouldn't be surprised if there aren't a few honeypot pages in there too.
To ensure perfect aim, shoot first and call whatever you hit the target
Perhaps their goal is simply so that when people google or whatnot for information on the Bush Administration and Iraq, they will be likely to find the Bush Administration's current views on and actions in Iraq, rather than outdated material?
Completely ignoring for the moment the fact that these views and actions are really somewhat embarrasing for the Bush administration, this really makes sense from a practical viewpoint. Few things are as annoying as searching for something news-ish and finding primarily material from two years ago. And after all, if they ONLY were interested in people forgetting the old materials, they could have just removed those materials from the site totally. (Though perhaps they were aware removing the materials completely would cause mirrors, which would be fully searchable, to spring up.)
Irritable, left-wing and possibly humorous bumper stickers and t-shirts
Robots accounted for well over 10% of all web traffic at a Huge E-commerce company I worked at a few years ago...
Those robots consumed Many millions in system capacity.
Of course this is completely different as our freedom is at stake.
While anything is possible in politics, is it possible that the web admin is trying to limit the amount of traffic on the site? Is it possible that his analysis of the weblogs show a lot of traffic from robots looking for Iraqi-related info?
If you persist in contemplating a world where whatever statements that the WH puts out, no matter how they might seem to contradict previous statements, are not totally true and correct, then a relocation expert from Guantanamo will be by in a few minutes. Just step away from the computer.
This gets modded up as Insightful? I mean, the White House is routinely editing their trascripts, and if bots like Google and Wayback can go and find that no, Bush said that we found weapons, not a weapons program, then there goes Bush's latest FUD... *thud*. Just because it's a tinfoil hat worthy theory doesn't mean it isn't true... most aren't, but therein lies the issue: most.
#define DRM chmod 000
Here's a minor example of something those two sites didn't catch: Remember Iraq's so-called "mobile biological weapons factories"? A month after the story broke that they were for weather balloons, the CIA moved their report's URL.
An intriguing fact about this whitehouse.gov/*/iraq thing is that they do in fact cover some of the important statements which are apparently not duplicated in the press release, conference, and briefing directories. Perhaps there was a "unique urgency" to cover up some poor choices of words?
I found the original code on usenet, modified it and left the original french comments in. Heh, originally they made the referer the cia to scare unsuspecting webmasters. silly french:) this could easily be made to cycle through the robot.txt file, but i don't have the time right now, i'm in lab:)
.NET CLR 1.1.4322\)"); # super browser !
e ader( :-)
#!/usr/bin/perl -w
use strict;
use LWP::UserAgent;
my $ua = new LWP::UserAgent;
$ua->agent("Mozilla/4.0 \(compatible; MSIE 6.0; Windows NT 5.0;
my $req = new HTTP::Request 'GET' => 'http://www.whitehouse.gov/pathtostuff';
$req->h
'Accept' => 'text/html',
'Referer' => 'http://www.yahoo.com' # pour faire flipper le webmestre
);
my $res = $ua->request($req);
if ($res->is_success) {
# traitement resultat $res
}
else {
print "Erreur : ".$res->status_line."\n";
}
It seems like every single directory has had the word "iraq" appended to the end. Do you think that this might have been a knee-jerk reaction by some admin who didn't really know what they were doing? I can't really imagine there are legitimate iraq dirs under easter and teeball directories.
It appears that this robots.txt file was probably auto-generated. It looks like someone used a script to crawl the sites entire directory structure appending /iraq and /text to every directory. In the process they seem to have created a pretty complete map of the sites underlying directory structure -- not necessarily a good thing.
.html if they're actual pages.
Having said that, I'm not even sure that this robots.txt file would work the way it's supposed to. Seems like these iraq references should all have a trailing slash or a
Someone clearly doesn't want Google caching Whitehouse content on Iraq. The question is why? And how come they're so lame about it?
Disallow: /climatechangefactsheet/iraq
/climatechangefactsheet/text
Disallow:
Now why would they want to stop these being crawled?
Paul.
Last year the Washington Post ran a story on who would benefit from a war in Iraq. It mentioned Haliburton and the $50+ million in stock options Dick Cheney recieved from that company as an employee. It also mentioned the fact that Sadam had cancelled the oil contracts of several American companies. I tried to find it again to show a friend but the story had disappeared from both the Washington Post search routine and Google. (The WP could have had it removed from Google, I doubt Google itself had anything to do with the stories disappearance.)
true. true. Apparently some poor fool made similar remarks on k5 a while back, and did indeed receive a personal visit from the SS. No charges filed, but 'tis a rude awakening indeed when your online words come and knock on your door.
Not only does http://www.whitehouse.gov/infocus/iraq/ exist, it is also currently indexed by google.
I guess the googlebot doesn't visit the page, but knows of its existence from other pages??? Either that, or the googlebot is a bad boy that ignores robots.txt.
Where have you been living the past five years? Journalists don't criticize Bush.
They still have not published the fact that he deserted from the national guard during Vietnam and they practically ignored his DUI conviction.
The GOP has the media cowed with their constant 'liberal media' babble. There number of journalists who are prepared to hold Bush to account is tiny - Krugman, Conanston, Irvins, Alterman. After that its Al Franken, Jon Stewart and David Letterman.
it would create an incredible backlash as soon as detected. what purpose would this serve?
The chances that the mainstream media will pick this one up are very small. Just think how they would have reacted if it was Clinton!
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
No. We live in a republic.
Actually, we don't. The fix is in, friend. Much like the Roman Empire, where most people did not realize the republic was dead until over a century after the fact, our republic died many years ago and most people don't know it. Something akin to professional wrestling (i.e. a good guy and a bad guy... both working for the same promoter) was put in its place.
Yeah, but it removes any pages it has stored when it finds itself disallowed from the page, IIRC.
Bullshit.
The Iraq entries could only have got there if someone was told to go and stop stories appearing in the Google cache.
The person who got the job appears to have done it in a pretty clumsy way, that is pretty much par for the course for this type of work. Nixon did not expect Gordon Liddy and his pals to get caught in a third rate burgalry either.
It looks to me like someone was told to block out the Iraq files and simply did a directory listing on the web server and then appended /iraq to everything.
If you want to find out for sure file some FOIAs.
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
The crux of the matter is that he refused to have his pilots medical just after the Pentagon added a check for illegal drug use.
You can try to spin this whichever way that Karl Rove tells you but the facts are against you. The fact is that your great leader is a coward who ducked the draft and then deserted to avoid a drug test.
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
Would the White House sue for violation of the robots.txt file? Under what laws could they sue? Is robots.txt an implicit grant of permission to view copyrighted content? Would GWB press the Congress for a new bill, to mandate legal enforcement of the robots.txt?
That's probably not going to happen anytime soon, but it raises an interesting question. Is robots.txt legally enforceable? And if it was, would that be a good thing or a bad thing?
Your thoughts?
While I agree that by now there ought to have been more transparency by the US govt regarding the Guantanamo Bay detainees by now, if you had SHOT at U.S. soldiers in an engagement in Great Britain, you'd be an illegal combatant. This is pretty much why these people are being detained.
These people were probably by and large draftees, which unfortunately in Afghanistan, meant they weren't going to _get_ a uniform. They certainly have a right to public trial, but by and large they were probably arrested legitimately. I see this more as an indictment of the unfairness of the Geneva Conventions with regard to poor nations, or forces that aren't backed by recognized governments.
It would be a lot easier to classify this as "disgusting" if we knew just what was happening down there. Right now, we don't really know much of anything, which is disturbing on several levels. But isn't disgusting in the way I'd classify the very well documented types of supression that were commonplace under the government these combatants were fighting for.
Pardon me, but some of them do lead to interesting things. /news/releases/2003/05/iraq/ exists, and even contains different data than
e xt/20030501-15.html versus http://www.whitehouse.gov/news/releases/2003/05/ir aq/20030501-15.html and http://www.whitehouse.gov/robots.txt has /news/releases/2003/05/iraq/ in it.
news/releases/2003/05/text/ or news/releases/2003/05/
See for yourself:
http://www.whitehouse.gov/news/releases/2003/05/t
Compare the headlines.
Regarding your comment: it's a childish retaliation against another poster's .sig that appears (the link is broken so I'm going from the link text) to down France about it's quite obvious ties to Iraq. You know.. the whole "Pot calling the Kettle black thing"?
So no, it's not news that the fundamentalist USA supported secular Iraq in a war against fundamentalist Iran.
Gee... and we wonder why so many Muslims in that part of the world think we're just a bunch of marauding, Koran-hating Christian crusaders. Mm mm... no mixed messages coming from THIS side of the ocean... noooooooo.
Here's a tip for anybody thinking about replying to start an argument over Iraq:
I don't care. Bush fucked the whole thing up from the beginning by "going it alone" and now it's too late, so we'll just have to slog through it.
And vote that asshole out of office when the elections come around. "Bring em on". Yea fucker... bring em right on in to the White House and see how big you are then. Tough talk from a military deserter... goddamn idiot. "Bring em on"... yea, as long as it's not YOU and YOUR kids that are meeting them on the field.... right?
Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!
A dictatorship assumes George Bush took absolute power of the government. Our government is way too inefficient to support that political model.
I don't think USA can ever be a dictatorship. However, it can very well turn into a fascist government. Look at someone like Hitler or Mussolini. People always claim that Hitler was a dictator but that is missing the point. If there were elections held in Germany (open fair elections monitored by the UN), Hitler would still have won by a massive majority.
The US govt CAN start practicing fascism. I'm not saying it is doing that now but it isn't inconceivable. Dictatorship, on the other hand, is highly unlikely...
Sivaram Velauthapillai
Sivaram Velauthapillai
Seeking the meaning of life... @slashdot of all places
What wasn't reported widely in the media was that Saddam Hussein had the possesion of 2 of the 3 Egyptian God Cars! If he was able to get ahold of the third remaining card and the Millinium Puzzle he would have been able to TAKE OVER THE WORLD!!!!!
On a more serious note, as much as I hear people joke about "We kept the receipts" that actually is how the UN Weapons Inspectors were able to find the weapons that they did.
(btw, what percentage of the country think that it was Saddam Hussein that kicked out the inspectors in 1997?)
Anyway, according to Scott Ritter, by the time that Clinton kicked the inspectors out of Iraq they had accounted for 95% of the WMD, and the main reason they were not able to complete the job was more because of the Clinton administration than the Iraqis. (Not to say that there were not a bunch of problems from the Iraqis.)
Scott Ritter has been very outspoken about these issues and as a Marine Corps Captain durring Desert Storm and a Chief UN Weapons Inspector he is a very qualified authority. He risked his life searching for weapons and I think more people need to listen to him.
This signature used to contain a cute kitty virus with ansii art. Please set the slashdot editors on fire. Thank you