Britain's Conservatives Scrub Speeches from the Internet

Where's the torrent file? by Anonymous Coward · 2013-11-13 05:45 · Score: 3, Insightful

Where's the torrent file?

Re:Where's the torrent file? by Joce640k · 2013-11-13 06:56 · Score: 4, Insightful

I dunno, but I'm guessing none of these politicians have ever heard of the Streisand Effect.

--
No sig today...
Re:Where's the torrent file? by Jeremiah+Cornelius · 2013-11-13 07:41 · Score: 4, Insightful

I dunno, but I'm guessing none of these politicians have ever heard of the Streisand Effect.
I dunno, but I'm guessing none of these politicians have ever heard of 1984.

--
"Flyin' in just a sweet place,
Never been known to fail..."
Re:Where's the torrent file? by d3m0nCr4t · 2013-11-13 08:16 · Score: 5, Insightful

I dunno, but I'm guessing none of these politicians have ever heard of the Streisand Effect.
I dunno, but I'm guessing none of these politicians have ever heard of 1984.
Oh they have, but instead of feeling appalled, they just get a hard-on.
Re:Where's the torrent file? by LifesABeach · 2013-11-13 09:42 · Score: 2

I thought that America's Tea Party were over the edge, I think they've been one upped by the Torrie's. This is a time when we need WikiLeaks, could someone forward a message to Julian, to "get back to work."

As a side note. Maybe someone at the NSA could send the data over to Snowden who could then send it over to Julian; that would be epic.
Re:Where's the torrent file? by bfandreas · 2013-11-13 16:04 · Score: 4, Insightful

The UK Tories under Cameron are indeed appalling. It is hard to decide if they are merely incompetent or malicious. Their actions of late point to the latter. Indeed one could only speculate how bad it would have been without the LibDems.

The UK political scene has always been a bit foreign to my German tastes. A backbench MP suggesting that feckless fathers should be dragged to work in chains in defense of the badly executed bedroom-tax would have been forced to apologize in German politics. And he would have lost his seat come the next election. The comically idiotic ads targeting "illegal" immigrants to turn themselves in are both malicious and incompetent. And even now there is another push to introduce the "snooper's charta" which in the light of the recent revelations about the GCHQ isn't even needed for them to do what they do.

The other paries in the UK look good in comparison because of the unmitigated disaster that is the current Tory crop. Thatcher was bad but potentially a necessary evil due to the unmaintainability of the Postwar Dream. But think as I may I can't begin to fathom where to start to look for a justification for that cabinet, that PM and that party. They do not even have the use of a compass needle that permanently points to the south. You can't say "let's do the opposite of what they are suggesting" due to the utter confusion that is their politics.

--
20 minutes into the future

Archive.org should not respect robots.txt by Anonymous Coward · 2013-11-13 05:47 · Score: 4, Interesting

People have used robots.txt to buy up domains they want to censor.

For example, this happened with partyvan.

Re:Archive.org should not respect robots.txt by Anonymous Coward · 2013-11-13 06:22 · Score: 2, Informative

He misspoke. He meant to say they bought up domains and then used robots.txt to subsequently censor the site (including all older content)
Re:Archive.org should not respect robots.txt by Obfuscant · 2013-11-13 06:42 · Score: 3, Informative

Of course, as a robot, archive.org should respect robots.txt. I have a website with millions of files of data that archive.org has no reason to keep for me, all behind a robots.txt that bars such nonsense.
I also have a link to a realtime predicted tide generator which takes about 30 seconds to calculate the information it sends back. Before I hacked in a robots.txt to cover it (it's on a different port than the normal web server and thus, according to the robot operators, a completely different website than the one that already had a robots.txt to stop them) one "helpful" robot indexer latched onto it and was sending ten requests per minute. Nice of them to throttle themselves, yeah, when they were running my apache server up to the connection limit (keeping other people from using the site) and driving the load up so the site was useless for anyone local.
So any suggestion that any robot operator ignore robots.txt should be shouted down as the complete nonsense it is.

People have used robots.txt to buy up domains they want to censor.
You can't buy a domain with a robots.txt. Once you own the domain, you have the right to "censor" it all you want, including the use of a robots.txt that bars all robots. But if your goal was to "censor" a website, just stop running an HTTP server. That's much better than any robots.txt in keeping everyone from getting your stuff.
Re:Archive.org should not respect robots.txt by lgw · 2013-11-13 06:54 · Score: 4, Informative

As I understand it, Archive.org uses robots.txt to censor old, already captured data. That's a serious flaw in an archive IMO.

--
Socialism: a lie told by totalitarians and believed by fools.
Re:Archive.org should not respect robots.txt by fustakrakich · 2013-11-13 07:03 · Score: 2

When robots.txt is used for censorship, it no longer deserves any respect. I hope more people decide to ignore them. We should never let other people decide what we can see and hear. For the time being we can store stuff locally and employ P2P.

--
“He’s not deformed, he’s just drunk!”
Re:Archive.org should not respect robots.txt by morgauxo · 2013-11-13 07:27 · Score: 4, Interesting

The problem is that people are buying up the domain names of old websites which no longer exist just to publish a robots.txt file. Then archive.org automatically deletes, or at least blocks access to the entire history of everything that ever happened at that domain including the past website which the new owner has nothing to do with.
I suppose they are just trying to honor site owner's wishes even when they may have initially forgotten about robots.txt and added it later. The robot doesn't know that the old content belonged to someone else who DID NOT wish to block it. Maybe a good solution is that when they notice a new robots.txt everything for the last 'X' months get deleted. (go ahead and debate values of X) Data from prior to that should be left alone. Even if it was posted by the same site owner who is posting the robots.txt today. Tough cookies! If you want to control how your data is used I don't see a problem with requiring you actually take the time to learn about things like robots.txt before you publish. It's really no different than releasing source code under the GPL and then later turning it into a closed source product. All your new work belongs to you but you don't get to force everyone to delete ever copy they might have of the old code and you can't stop them from forking it.
-- I would totally consider an 'X value' of zero as being on the table btw
Re:Archive.org should not respect robots.txt by Bardez · 2013-11-13 07:44 · Score: 4, Insightful

Robots.txt should be respected at the time of retrieval. It should not be retroactively respected to censor or remove old data. That is a shame. I've used the Archive before on a site of a gaming company that I loved, which nearly went bankrupt (or perhaps did) but managed to eke its way through. Part of their relaunch nuked the Internet Archive's archives and I definitely felt a sense of loss.

--
Perception is the thin dividing line between reality and fiction.
Re:Archive.org should not respect robots.txt by RedBear · 2013-11-13 08:29 · Score: 4, Insightful

Robots.txt should be respected at the time of retrieval. It should not be retroactively respected to censor or remove old data. That is a shame. I've used the Archive before on a site of a gaming company that I loved, which nearly went bankrupt (or perhaps did) but managed to eke its way through. Part of their relaunch nuked the Internet Archive's archives and I definitely felt a sense of loss.
Yeah, I had the silly impression all this time that the entire purpose of the Internet Archive was to archive the goddamn Internet precisely so that people couldn't pull this kind of retroactive erasure "cleansing of history" bullshit and get away with it.
What a dope I am. It's amazing how inadequately we are protecting our freedoms and our history these days. If we don't do something much more drastic our grandchildren will end up being slaves to some theocratic corporatocracy and they'll have no idea that the world was ever any different.
Lately I think Orwell was overly optimistic.
Re:Archive.org should not respect robots.txt by mikael · 2013-11-13 08:34 · Score: 3, Interesting

They buy up a domain when it becomes available, set the robots.txt file to "do not archive", then the google-bot spider will send the instruction to delete all
past archives.
You used to be able to visit old web-pages through the google-cache. Remember when google would always have a cached copy of what you wanted to read. Nowadays they just seem to be happy to be a proxy server which records everything you download from the target webpage.

--
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
Re:Archive.org should not respect robots.txt by Obfuscant · 2013-11-13 10:21 · Score: 3, Insightful

When robots.txt is used for censorship, it no longer deserves any respect.
It's not censorship when I tell robot data scrapers to bugger off and not abuse the website I run by copying every image I have and looping through the multiple links that take people there, or to invoke a program that generates data on they fly tens of thousands of times a day to the detriment of real users who actually have an interest in the information and can't get it because some robot is using all the available server processes.

I hope more people decide to ignore them.
The day that the first scraper starts ignoring mine, his IP is going into the firewall. If he tries to be a sneaky shit and use multiple IPs, then the site where YOU could come get data for free may very well go away, and you wind up with nothing. Neither I nor my employer have the spare bandwidth and cpu cycles to have every robot come download the Tb of data I have on the web. If free public access becomes an abuse of the server, the free public access goes away.

We should never let other people decide what we can see and hear.
When you are talking about my data, I have every right to decide whether you can see or hear it. It is your attitude of entitlement that makes me always have second thoughts about putting anything on the web. Most people are reasonable, decent people who appreciate the service. Some think they have a right to demand it.
Re:Archive.org should not respect robots.txt by Obfuscant · 2013-11-13 10:49 · Score: 2

I suppose they are just trying to honor site owner's wishes even when they may have initially forgotten about robots.txt and added it later. The robot doesn't know that the old content belonged to someone else who DID NOT wish to block it.
That's probably why they do it that way. They could have picked either side and been wrong for some group. The website operator who didn't know about robots.txt to start with and found some of his material on a robot indexer shouldn't have to track down every robot who has ever visited to be able to rectify the mistake.
The other issue with keeping data after a robots.txt is published by a new owner of a domain is that the archive will contain data that claims to have come from that website but in fact did not. This can create issues for both the new and old owners. Someone who finds defamatory material in the archive may decide to focus his anger on the new website owner who is completely innocent and forced to prove a negative ("I never published that page.") The old owner may find out that his copyright material now has a presumed ownership by the new domain name owner. ("This image came from the domain example.com which is owned by Bill Smith ...") If the material was popular, the new owner may be deluged with requests for updates to material he never knew existed and doesn't have time to deal with.
The other side of the choice means that someone who buys an abandoned domain name can get all the previous content from that domain dumped from the archives. The question that could be asked, if the original domain owner cared about having his information archived, why did he abandon the domain? Is abandoning his domain also abandoning the data it served?

Tough cookies! If you want to control how your data is used I don't see a problem with requiring you actually take the time to learn about things like robots.txt before you publish.
Yes, "screw every website owner who didn't fall off the turnip truck knowing everything there was to know about websites" is one opinion. Being a bit more considerate and not making every mistake a permanent one is another opinion.

It's really no different than releasing source code under the GPL and then later turning it into a closed source product.
Ahh, yes, it is different. Putting information on a website so people can come look at it is not releasing it under GPL.
A reasonable solution might be for the archives to remove everything they have with the same link as is now protected by the robots.txt, but keep anything that isn't the same. This runs into the problem that to know if the links are the same the robot has to scrape the site.
It's not a simple issue.
As someone who has been running websites since around 1992 or so, long before robots.txt was necessary, and before almost anyone imagined trying to keep a full archive of everything that ever appeared on the web, I am actually amazed at the attitude that everything that has ever appeared on the web must be available for anyone who wants it no matter what the original author desires. I know there are too many things that I've had to correct to ever want all previous versions to be wandering about to confuse people. Too many things I've changed my mind on, too.
Re:Archive.org should not respect robots.txt by Obfuscant · 2013-11-13 12:56 · Score: 2

The problem is that people such as yourself often think that the presence of your data on someone elses machine somehow gives you the right to install invasive DRM software in an attempt to get their machine to do your bidding instead of the owner's.
I don't know what the fuck you are talking about. Robots.txt is not DRM, and I'm not trying to get some data scraper's system to "do my bidding". They can do whatever the fuck they want as long as they don't use my system to do it.

Once the data is recorded and someone else gets a copy, it's only a matter of time before it gets decrypted/distributed.
Still unclear on what you think you are contributing to this discussion. You want to look at my data, you can come do it all day, every day. Yeah, if you're a moron who tries to update a static image once a minute 24/7, I'll shut you off like the idiot abuser you are, but other than that, knock yourself out.
What robots.txt stops is people who think they can index my system better than I can and want to make money doing it, or who think that they need to archive tide predictions for every day from year 1900 through 2100 at every site where tidal constituents are available, they need my system to generate those predictions for them, and in doing so they keep legitimate users from being able to access my site.

The only solution that offers certainty of control is non-distribution. If you can't make money this way, it's time to find a new job.
Thanks. Your vote for me to take the free information I provide off-web has been recorded.
As for "mak[ing] money this way", I don't get paid to give out data for free, I get paid for collecting the data. That's whether or not I let people like you abuse my systems by trying to index them for me.
Re:Archive.org should not respect robots.txt by Obfuscant · 2013-11-13 17:55 · Score: 2

IPv6 users are laughing at your dumb ass right now. Can your idiot ass guess why?
Because it is easy for them to be sneaky shits and use mullions of different IP addresses. Like I said, when the sneaky shits overwhelm the services they are being given for free, the services go away. I've dealt with people like you before. I'm still here. So are my websites.

Fuck, my company would have been dead long ago with an idiot like you behind the wheel.
Yeah, it's a horrible thing to try to make sure that company resources are available for the intended company use and not overwhelmed by leeches sucking up every cycle on a service that they aren't paying for but feel entitled to suck dry. Yeah, the company would simply love being told that they need to buy a rack full of servers and more bandwidth because you want to copy every bit of data on the website.

You're the kind of moron that thinks a TI 99/4A would make for a good switch or router.
It's such a pleasure talking to you. Do you do anything constructive or just toss insults at people you disagree with?

Doesn't that kinda defeat the point of the archive by spiritplumber · 2013-11-13 05:47 · Score: 2

How did they delete them from archive.org? Did they hack it?

--
Liberty - Security - Laziness - Pick any two.

Insert references to an Orwell work here by themushroom · 2013-11-13 05:48 · Score: 2

Because that's what they did in that book.

--
Laughter is the Spackle of the Soul.

Re: Insert references to an Orwell work here by Mabhatter · 2013-11-13 05:53 · Score: 3, Interesting

The main character's job was "correcting" stored historical documents to match what was being said "right now".
The reasoning why their government must keep EVERYTHING on private people, but can obstruct and hide PUBLICLY OFFERED documents has to be really really funny!
Re: Insert references to an Orwell work here by TapeCutter · 2013-11-13 10:39 · Score: 2

Also politicians don't want you to hear every speech they make, someone might point out they are making contradictory statements to different demographics..

--
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.

Internet Archive's Wayback Machine by FriendlyLurker · 2013-11-13 05:49 · Score: 5, Funny

Lucky they now have secret blacklists at every major UK ISP to block these. Think of the children that would be harmed by reading these speeches!

FTFA:

In a remarkable step the party has also blocked access to the Internet Archive's Wayback Machine, a San-Francisco-based library which captures webpages for future generations, using a software robot that directs search engines not to access the pages.

Re:Internet Archive's Wayback Machine by Obfuscant · 2013-11-13 06:51 · Score: 2

BTW, I am sure he NSA's archive crawler does not honor the robots hing.
As a website operator who carefully watches connection counts and has a robots.txt to exclude most of the site content, I am pretty sure there is no "NSA's archive crawler" visiting, at least none with any frequency that it matters.
Re:Internet Archive's Wayback Machine by MadCow42 · 2013-11-13 08:12 · Score: 2

Well, maybe it's not a "crawler", but if it copies all outbound traffic it does essentially the same thing while leaving no footprints.
Chew on that for a while. :)

--
I used to have a sig, but I set it free and it never came back.

And let's not forget why: by Joining+Yet+Again · 2013-11-13 05:49 · Score: 5, Insightful

because they broke almost all of their pre-election promises.

The most important thing to learn about the Tory party in the UK is that, contrary to popular opinion, it is not the party for the responsible, the capitalists, nor the hard-working (except in the sense that they want most people to work hard for them). It is a party representing a few wealthy individuals, and their mission is not small government, but privatised government, where nothing happens without their masters getting a cut.

Sorta like a mafia.

Re:And let's not forget why: by mpe · 2013-11-13 05:56 · Score: 4, Insightful

because they broke almost all of their pre-election promises.

When was the last time a political party (or even an individual politician) did anything else?
Re:And let's not forget why: by Joining+Yet+Again · 2013-11-13 06:01 · Score: 3, Informative

There have been more ideologically-oriented governments, from post-War Labour to Thatcher.
They might not keep all their promises, and all ideologically is strongly diluted with practicality, but they're not the vacuous bunch of cunts we have in Britain today. (They're not that different from Blair, of course, but Blair had a more representative set of people to steer him.)
Re:And let's not forget why: by roninmagus · 2013-11-13 06:30 · Score: 5, Insightful

The main issue that conservatives (at least in the US) have in their thought process (trust me, I am one) is that they believe "responsible," "capitalist," and "hard-working" actually leads one to become one of those few wealthy individuals.

Unfortunately this is usually not the case at all; the responsible, capitalist and hard-working ones only lead those wealthy few to become more wealthy.

This is a truth I think conservatives should realize and embrace, so that we can actually come up with real solutions to problems.
Re:And let's not forget why: by Blue+Stone · 2013-11-13 06:58 · Score: 5, Informative

because they broke almost all of their pre-election promises.
Here's a nice little summary of all those broken promises, pledges and outright deceit.

--
Corporation, n. An ingenious device for obtaining individual profit without individual responsibility. - Ambrose Bierce
Re:And let's not forget why: by fnj · 2013-11-13 12:00 · Score: 2

I'll tackle that one, and very succinctly. A panel you can fire is preferable to a panel you can't. Any day. And they both have precisely the same motivation. To save money, the better to spend it somewhere else.

1984 by MyLongNickName · 2013-11-13 05:49 · Score: 5, Insightful

“He who controls the past controls the future. He who controls the present controls the past.” George Orwell, 1984

--
See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year

Re:1984 by Anonymous Coward · 2013-11-13 06:05 · Score: 3, Insightful

He who controls the spice controls the universe.

Re:Lol! Good luck with that by x0ra · 2013-11-13 05:53 · Score: 2

The White House and its UK equivalent has unfortunately enough power to order Google to "forget" about this as well.

Wrong by symes · 2013-11-13 05:54 · Score: 3, Informative

This is not accurate. Speeches made in Parliament are archived in Hansard for a start. And there is no changing that.

Re:Wrong by game+kid · 2013-11-13 06:13 · Score: 3, Insightful

I like your optimism.
They'll find a way to close that to public access (except "on a need-to-know basis" and to Royal family members, staff, and "security" officials) too, as soon as they see how embarrassing (or criminal) parts of the archive may be. Clearly, they always find a way, however brutish.

--
You can hold down the "B" button for continuous firing.
Re:Wrong by EasyTarget · 2013-11-13 06:47 · Score: 2, Informative

Sigh.. 'Wrong in what way?
This was the archive of speeches, not just the parliamentary ones; but all the ones at election rallies and conferences too.
For instance; ToryBoy recently sat in a big gold chair and ate a 4 course meal along with all his rich chums in the Guildhall, London. He then stood in front of an gilded podium and made a speech in which he told all the little people that they had not worked hard enough and that austerity is now here to stay.
This speech is exactly the sort of one that will never appear on Hansard, and in a few years may well be the sort of thing Tory spinsters will hope to make 'disappear'.

--
"Oops, I always forget the purpose of competition is to divide people into winners and losers." - Hobbes

Re:Doesn't that kinda defeat the point of the arch by uncle+slacky · 2013-11-13 05:55 · Score: 5, Insightful

No, but the Wayback Machine always respects takedown requests. Note that the British Library maintains an archive of UK sites, and still has the speeches in question (from April 2008 onwards):http://www.webarchive.org.uk/wayback/archive/20080410100951/http://www.conservatives.com/tile.do?def=news.speeches.page

--
Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it.

History will be lost by Anonymous Coward · 2013-11-13 05:55 · Score: 5, Interesting

There's a theory out there that states that because most of what we do in the so-called Information Age is stored is somewhat fragile digital storage systems (as opposed to, for example, parchment) historians in the future will have very little to base their research on about our age, as most of the info will be permanently lost.
Well, hundreds of thousands of posts on BBS systems from the 80's and 90's are already gone, delete the Internet Archive and the Web is gone too, any thoughts?

Re:Doesn't that kinda defeat the point of the arch by LocalH · 2013-11-13 05:58 · Score: 5, Informative

It's not even a takedown request. IA will honor robots.txt totally and retroactively - if they have 10-15 years of archived data at a specific domain (or subdirectory on that domain), and someone puts up a robots.txt disallowing them access, not only will they refuse to archive it going forward, but they will remove all previously archived material from being viewable (I hope they don't actively remove it from their archive, but merely stop making it available).

--
FC Closer

Re:Lol! Good luck with that by PPH · 2013-11-13 05:58 · Score: 3, Funny

No problem. Just look right here.

--
Have gnu, will travel.

100 Years by BringsApples · 2013-11-13 06:06 · Score: 3, Interesting

We as humans are not able to "remember" back further than 100 years. I mean that you cannot get any information from anyone that would give you a clear, practical understanding of the mindset from 100 years ago. You can go ask your grandparent(s) things about the past, but the vocabulary that they use more than likely won't fit your vocabulary and therefor you will not be able to get the understanding that they're trying for. Maybe 100 years is to small, but it can't be far from the real number, plus it's nice and round ;)

In this way, our society(s) are going through life sorta like that movie Memento. All that has to happen is a slight variation of the real story, that would produce the same basic result, but with a new context - Christopher Columbus "discovered" America comes to mind. Perhaps the powers that be depend on this, and are looking to make that number (100 here) smaller.

--
Politics; n. : A religion whereby man is god.

Not in the USA! by edibobb · 2013-11-13 06:09 · Score: 4, Insightful

In the U.S., politicians post speeches full of lies online, and nobody cares. I'm not sure if this is because everybody believes the lies, or because nobody believes the politicians.

http://www.seattlepi.com/national/article/Rumsfeld-denies-making-claims-Iraq-had-WMDs-1202942.php

http://www.youtube.com/watch?v=CU0m6Rxm9vU

Re:Doesn't that kinda defeat the point of the arch by pixelpusher220 · 2013-11-13 06:19 · Score: 2

Actually no, those speeches don't seem to exist on the party website now either.

--
People in cars cause accidents....accidents in cars cause people :-D

Re:Doesn't that kinda defeat the point of the arch by Arthur+Dent+'99 · 2013-11-13 06:22 · Score: 5, Informative

I apologize for my mistake. Until just a few minutes ago, I was unaware that the Internet Archive agrees to RETROACTIVELY honor a robots.txt file. So once a robots.txt file restricts access to content, they voluntarily remove access to previously archived content from the archive. Here's the related item from their FAQ:

Some sites are not available because of robots.txt or other exclusions. What does that mean?

The Internet Archive follows the Oakland Archive Policy for Managing Removal Requests And Preserving Archival Integrity

The Standard for Robot Exclusion (SRE) is a means by which web site owners can instruct automated systems not to crawl their sites. Web site owners can specify files or directories that are disallowed from a crawl, and they can even create specific rules for different automated crawlers. All of this information is contained in a file called robots.txt. While robots.txt has been adopted as the universal standard for robot exclusion, compliance with robots.txt is strictly voluntary. In fact most web sites do not have a robots.txt file, and many web crawlers are not programmed to obey the instructions anyway. However, Alexa Internet, the company that crawls the web for the Internet Archive, does respect robots.txt instructions, and even does so retroactively. If a web site owner decides he / she prefers not to have a web crawler visiting his / her files and sets up robots.txt on the site, the Alexa crawlers will stop visiting those files and will make unavailable all files previously gathered from that site. This means that sometimes, while using the Internet Archive Wayback Machine, you may find a site that is unavailable due to robots.txt (you will see a "robots.txt query exclusion error" message). Sometimes a web site owner will contact us directly and ask us to stop crawling or archiving a site, and we endeavor to comply with these requests. When you come accross a "blocked site error" message, that means that a siteowner has made such a request and it has been honored.

Currently there is no way to exclude only a portion of a site, or to exclude archiving a site for a particular time period only.

When a URL has been excluded at direct owner request from being archived, that exclusion is retroactive and permanent.

Re:Doesn't that kinda defeat the point of the arch by pixelpusher220 · 2013-11-13 06:24 · Score: 5, Interesting

couple that with the google cached copy of the site has a 'search for speeches' section which now is, interestingly enough, missing as well.

--
People in cars cause accidents....accidents in cars cause people :-D

Re:Deleted from the Internet Archive? by flimflammer · 2013-11-13 06:30 · Score: 4, Informative

No, they put robots.txt on their website and the Internet Archive respects robots.txt retroactively. If they had 20 years worth of data archived from one domain, and someone puts a robots.txt on the domain, all 20 years worth of data is removed from the archive. Whether it's actually deleted or hidden is unknown, but I hope it isn't deleted.

Re:Doesn't that kinda defeat the point of the arch by asmkm22 · 2013-11-13 06:30 · Score: 2

So there's no actual internet archive? How was this not planned for years ago?

Re:Doesn't that kinda defeat the point of the arch by LocalH · 2013-11-13 06:45 · Score: 2

It fully explains it. Someone bought up the domain that you were hosted on previously, added a blanket disallow in robots.txt, and suddenly all your old stuff is gone.

--
FC Closer

Only partially. (Also a wishlist.) by Ungrounded+Lightning · 2013-11-13 06:48 · Score: 5, Informative

Indeed this is ridiculous that the IA would retroactively remove stuff though as you say hopefully just disable access instead.

I think the archive actually does just suppress access rather than purge the actual data, so they can again display it once copyright runs out (if it ever does...).

I also think the point is that newbies may not know about robots.txt and that even an experienced webmaster might accidentally allow access to something private long enough for it to get archived, or receive and honor a takedown notice, so this allows the correction of the error.

It's an 'archive' and should reflect how stuff 'was' at the time; legalities of that obviously being quite murky and hard to defend against expensive lawsuits, but still.

That's why. They have limited funds and need them to buy more disks and stuff, not fight lawsuits. If the choice is not display some stuff or go broke and not display anything, the choice is also obvious.

I wish, though, that they were able to detect when a domain changed hands and not honor robots.txt requests retroactively past the boundary. IMHO a new owner is a new web site that happens to have the same name.

Especially: I wish domain name parking sites didn't put up robots.txt files that cause the archive to immediately purge/hide the previous owners' content. I've lost access to a lot of content from dead sites that way. (It also keeps the owners from rescuing their old content if they don't have personal backups.)

--
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way

If you're not doing anything wrong by hduff · 2013-11-13 07:01 · Score: 2

what do you have to fear? 8)

--
"I believe in Karma. That means I can do bad things to people all day long and I assume they deserve it." : Dogbert

Re: Doesn't that kinda defeat the point of the arc by iamhassi · 2013-11-13 07:08 · Score: 2

Not much of an archive if they delete the past because someone says it should be deleted. Even Wikipedia allows you to go back and see all changes to an article.

--
my karma will be here long after I'm gone

It gets worse by clickclickdrone · 2013-11-13 07:43 · Score: 3, Informative

I just tried to complain to my MP about this but it seems he's blocked me on Twitter. I guess that's it then, we are living in a fascist state.

--
I want a list of atrocities done in your name - Recoil

Re:It gets worse by niks42 · 2013-11-13 08:20 · Score: 2

http://www.theyworkforyou.com/ will find you a page where you can mail your MP and they will answer. I complained to my MP about the police use of Terror laws to detain David Miranda, and I know it got to him as he replied. He did reply saying it was a police matter and nothing to do with Parliament, but hell, it struck home! Power to the People ..

Winston Smith's job / 1984 by volvox_voxel · 2013-11-13 07:57 · Score: 3, Insightful

It makes Winston Smith's job at the Ministry of Truth more difficult if there are old archives available..

Re:The problem is career politicians by niks42 · 2013-11-13 08:33 · Score: 3, Funny

Small problem with your sig there ..

Firefox can't establish a connection to the server at isohunt.com

Robots.txt by LeadSongDog · 2013-11-13 10:22 · Score: 2

The Internet Archive says that it subscribes to the The Oakland Archive Policy which for |requests by governments" says:

Archivists will exercise best-efforts compliance with applicable court orders Beyond that, as noted in the Library Bill of Rights, 'Libraries should challenge censorship in the fulfillment of their responsibility to provide information and enlightenment.'

Seems like this may just have slipped past them. Let's make sure they know they need to sort it out... Surely they only removed it from the Wayback Machine, not from the archive itself.

--
Oh, I'm sorry sir, I thought you were referring to me, Mr. Wensleydale.

What the hell is it with Conservatives? by msobkow · 2013-11-13 10:43 · Score: 3, Interesting

Here in Canada, Conservative PM Harper has taken heat lately for breaking all the links on our government's historical archive of the legislation that's been posted for the past decade or two. It's just... gone. The entire archive, except for maybe the past 5 years worth.

That archive is public government information, not Conservative property.

--
I do not fail; I succeed at finding out what does not work.

Re:And thus invoking the . . . by Evil+Pete · 2013-11-13 12:08 · Score: 2

OTOH, this means that whenever reference is made to one of their speeches people can just insert scandalous bits. Objections by the Tories would be countered by pointing out that because they removed all copies from the Internet then anything they publish has been modified and is therefore not to be trusted. It should be easy to cultivate an aura of mistrust in anything that they say after that. Well, that is what I would do if I was Machiavelli. Or true to my username. :)

--
Bitter and proud of it.

Slashdot Mirror

Britain's Conservatives Scrub Speeches from the Internet

60 of 234 comments (clear)