Slashdot Mirror


Britain's Conservatives Scrub Speeches from the Internet

An anonymous reader writes news of an attempt to erase a bit of history. From the article: "The Conservative Party have attempted to delete all their speeches and press releases online from the past 10 years, including one in which David Cameron promises to use the Internet to make politicians 'more accountable'. The Tory party have deleted the backlog of speeches from the main website and the Internet Archive — which aims to make a permanent record of websites and their content — between 2000 and May 2010."

23 of 234 comments (clear)

  1. Archive.org should not respect robots.txt by Anonymous Coward · · Score: 4, Interesting

    People have used robots.txt to buy up domains they want to censor.

    For example, this happened with partyvan.

    1. Re:Archive.org should not respect robots.txt by lgw · · Score: 4, Informative

      As I understand it, Archive.org uses robots.txt to censor old, already captured data. That's a serious flaw in an archive IMO.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    2. Re:Archive.org should not respect robots.txt by morgauxo · · Score: 4, Interesting

      The problem is that people are buying up the domain names of old websites which no longer exist just to publish a robots.txt file. Then archive.org automatically deletes, or at least blocks access to the entire history of everything that ever happened at that domain including the past website which the new owner has nothing to do with.

      I suppose they are just trying to honor site owner's wishes even when they may have initially forgotten about robots.txt and added it later. The robot doesn't know that the old content belonged to someone else who DID NOT wish to block it. Maybe a good solution is that when they notice a new robots.txt everything for the last 'X' months get deleted. (go ahead and debate values of X) Data from prior to that should be left alone. Even if it was posted by the same site owner who is posting the robots.txt today. Tough cookies! If you want to control how your data is used I don't see a problem with requiring you actually take the time to learn about things like robots.txt before you publish. It's really no different than releasing source code under the GPL and then later turning it into a closed source product. All your new work belongs to you but you don't get to force everyone to delete ever copy they might have of the old code and you can't stop them from forking it.

      -- I would totally consider an 'X value' of zero as being on the table btw

    3. Re:Archive.org should not respect robots.txt by Bardez · · Score: 4, Insightful

      Robots.txt should be respected at the time of retrieval. It should not be retroactively respected to censor or remove old data. That is a shame. I've used the Archive before on a site of a gaming company that I loved, which nearly went bankrupt (or perhaps did) but managed to eke its way through. Part of their relaunch nuked the Internet Archive's archives and I definitely felt a sense of loss.

      --
      Perception is the thin dividing line between reality and fiction.
    4. Re:Archive.org should not respect robots.txt by RedBear · · Score: 4, Insightful

      Robots.txt should be respected at the time of retrieval. It should not be retroactively respected to censor or remove old data. That is a shame. I've used the Archive before on a site of a gaming company that I loved, which nearly went bankrupt (or perhaps did) but managed to eke its way through. Part of their relaunch nuked the Internet Archive's archives and I definitely felt a sense of loss.

      Yeah, I had the silly impression all this time that the entire purpose of the Internet Archive was to archive the goddamn Internet precisely so that people couldn't pull this kind of retroactive erasure "cleansing of history" bullshit and get away with it.

      What a dope I am. It's amazing how inadequately we are protecting our freedoms and our history these days. If we don't do something much more drastic our grandchildren will end up being slaves to some theocratic corporatocracy and they'll have no idea that the world was ever any different.

      Lately I think Orwell was overly optimistic.

  2. Internet Archive's Wayback Machine by FriendlyLurker · · Score: 5, Funny

    Lucky they now have secret blacklists at every major UK ISP to block these. Think of the children that would be harmed by reading these speeches!

    FTFA:

    In a remarkable step the party has also blocked access to the Internet Archive's Wayback Machine, a San-Francisco-based library which captures webpages for future generations, using a software robot that directs search engines not to access the pages.

  3. And let's not forget why: by Joining+Yet+Again · · Score: 5, Insightful

    because they broke almost all of their pre-election promises.

    The most important thing to learn about the Tory party in the UK is that, contrary to popular opinion, it is not the party for the responsible, the capitalists, nor the hard-working (except in the sense that they want most people to work hard for them). It is a party representing a few wealthy individuals, and their mission is not small government, but privatised government, where nothing happens without their masters getting a cut.

    Sorta like a mafia.

    1. Re:And let's not forget why: by mpe · · Score: 4, Insightful

      because they broke almost all of their pre-election promises.

      When was the last time a political party (or even an individual politician) did anything else?

    2. Re:And let's not forget why: by roninmagus · · Score: 5, Insightful

      The main issue that conservatives (at least in the US) have in their thought process (trust me, I am one) is that they believe "responsible," "capitalist," and "hard-working" actually leads one to become one of those few wealthy individuals.

      Unfortunately this is usually not the case at all; the responsible, capitalist and hard-working ones only lead those wealthy few to become more wealthy.

      This is a truth I think conservatives should realize and embrace, so that we can actually come up with real solutions to problems.

    3. Re:And let's not forget why: by Blue+Stone · · Score: 5, Informative

      because they broke almost all of their pre-election promises.

      Here's a nice little summary of all those broken promises, pledges and outright deceit.

      --
      Corporation, n. An ingenious device for obtaining individual profit without individual responsibility. - Ambrose Bierce
  4. 1984 by MyLongNickName · · Score: 5, Insightful

    “He who controls the past controls the future. He who controls the present controls the past.” George Orwell, 1984

    --
    See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year
  5. Re:Doesn't that kinda defeat the point of the arch by uncle+slacky · · Score: 5, Insightful

    No, but the Wayback Machine always respects takedown requests. Note that the British Library maintains an archive of UK sites, and still has the speeches in question (from April 2008 onwards):http://www.webarchive.org.uk/wayback/archive/20080410100951/http://www.conservatives.com/tile.do?def=news.speeches.page

    --
    Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it.
  6. History will be lost by Anonymous Coward · · Score: 5, Interesting

    There's a theory out there that states that because most of what we do in the so-called Information Age is stored is somewhat fragile digital storage systems (as opposed to, for example, parchment) historians in the future will have very little to base their research on about our age, as most of the info will be permanently lost.
    Well, hundreds of thousands of posts on BBS systems from the 80's and 90's are already gone, delete the Internet Archive and the Web is gone too, any thoughts?

  7. Re:Doesn't that kinda defeat the point of the arch by LocalH · · Score: 5, Informative

    It's not even a takedown request. IA will honor robots.txt totally and retroactively - if they have 10-15 years of archived data at a specific domain (or subdirectory on that domain), and someone puts up a robots.txt disallowing them access, not only will they refuse to archive it going forward, but they will remove all previously archived material from being viewable (I hope they don't actively remove it from their archive, but merely stop making it available).

    --
    FC Closer
  8. Not in the USA! by edibobb · · Score: 4, Insightful

    In the U.S., politicians post speeches full of lies online, and nobody cares. I'm not sure if this is because everybody believes the lies, or because nobody believes the politicians.

    http://www.seattlepi.com/national/article/Rumsfeld-denies-making-claims-Iraq-had-WMDs-1202942.php

    http://www.youtube.com/watch?v=CU0m6Rxm9vU

  9. Re:Doesn't that kinda defeat the point of the arch by Arthur+Dent+'99 · · Score: 5, Informative

    I apologize for my mistake. Until just a few minutes ago, I was unaware that the Internet Archive agrees to RETROACTIVELY honor a robots.txt file. So once a robots.txt file restricts access to content, they voluntarily remove access to previously archived content from the archive. Here's the related item from their FAQ:


    Some sites are not available because of robots.txt or other exclusions. What does that mean?

    The Internet Archive follows the Oakland Archive Policy for Managing Removal Requests And Preserving Archival Integrity

    The Standard for Robot Exclusion (SRE) is a means by which web site owners can instruct automated systems not to crawl their sites. Web site owners can specify files or directories that are disallowed from a crawl, and they can even create specific rules for different automated crawlers. All of this information is contained in a file called robots.txt. While robots.txt has been adopted as the universal standard for robot exclusion, compliance with robots.txt is strictly voluntary. In fact most web sites do not have a robots.txt file, and many web crawlers are not programmed to obey the instructions anyway. However, Alexa Internet, the company that crawls the web for the Internet Archive, does respect robots.txt instructions, and even does so retroactively. If a web site owner decides he / she prefers not to have a web crawler visiting his / her files and sets up robots.txt on the site, the Alexa crawlers will stop visiting those files and will make unavailable all files previously gathered from that site. This means that sometimes, while using the Internet Archive Wayback Machine, you may find a site that is unavailable due to robots.txt (you will see a "robots.txt query exclusion error" message). Sometimes a web site owner will contact us directly and ask us to stop crawling or archiving a site, and we endeavor to comply with these requests. When you come accross a "blocked site error" message, that means that a siteowner has made such a request and it has been honored.

    Currently there is no way to exclude only a portion of a site, or to exclude archiving a site for a particular time period only.

    When a URL has been excluded at direct owner request from being archived, that exclusion is retroactive and permanent.

  10. Re:Doesn't that kinda defeat the point of the arch by pixelpusher220 · · Score: 5, Interesting

    couple that with the google cached copy of the site has a 'search for speeches' section which now is, interestingly enough, missing as well.

    --
    People in cars cause accidents....accidents in cars cause people :-D
  11. Re:Deleted from the Internet Archive? by flimflammer · · Score: 4, Informative

    No, they put robots.txt on their website and the Internet Archive respects robots.txt retroactively. If they had 20 years worth of data archived from one domain, and someone puts a robots.txt on the domain, all 20 years worth of data is removed from the archive. Whether it's actually deleted or hidden is unknown, but I hope it isn't deleted.

  12. Only partially. (Also a wishlist.) by Ungrounded+Lightning · · Score: 5, Informative

    Indeed this is ridiculous that the IA would retroactively remove stuff though as you say hopefully just disable access instead.

    I think the archive actually does just suppress access rather than purge the actual data, so they can again display it once copyright runs out (if it ever does...).

    I also think the point is that newbies may not know about robots.txt and that even an experienced webmaster might accidentally allow access to something private long enough for it to get archived, or receive and honor a takedown notice, so this allows the correction of the error.

    It's an 'archive' and should reflect how stuff 'was' at the time; legalities of that obviously being quite murky and hard to defend against expensive lawsuits, but still.

    That's why. They have limited funds and need them to buy more disks and stuff, not fight lawsuits. If the choice is not display some stuff or go broke and not display anything, the choice is also obvious.

    I wish, though, that they were able to detect when a domain changed hands and not honor robots.txt requests retroactively past the boundary. IMHO a new owner is a new web site that happens to have the same name.

    Especially: I wish domain name parking sites didn't put up robots.txt files that cause the archive to immediately purge/hide the previous owners' content. I've lost access to a lot of content from dead sites that way. (It also keeps the owners from rescuing their old content if they don't have personal backups.)

    --
    Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
  13. Re:Where's the torrent file? by Joce640k · · Score: 4, Insightful

    I dunno, but I'm guessing none of these politicians have ever heard of the Streisand Effect.

    --
    No sig today...
  14. Re:Where's the torrent file? by Jeremiah+Cornelius · · Score: 4, Insightful

    I dunno, but I'm guessing none of these politicians have ever heard of the Streisand Effect.

    I dunno, but I'm guessing none of these politicians have ever heard of 1984.

    --
    "Flyin' in just a sweet place,
    Never been known to fail..."
  15. Re:Where's the torrent file? by d3m0nCr4t · · Score: 5, Insightful

    I dunno, but I'm guessing none of these politicians have ever heard of the Streisand Effect.

    I dunno, but I'm guessing none of these politicians have ever heard of 1984.

    Oh they have, but instead of feeling appalled, they just get a hard-on.

  16. Re:Where's the torrent file? by bfandreas · · Score: 4, Insightful

    The UK Tories under Cameron are indeed appalling. It is hard to decide if they are merely incompetent or malicious. Their actions of late point to the latter. Indeed one could only speculate how bad it would have been without the LibDems.

    The UK political scene has always been a bit foreign to my German tastes. A backbench MP suggesting that feckless fathers should be dragged to work in chains in defense of the badly executed bedroom-tax would have been forced to apologize in German politics. And he would have lost his seat come the next election. The comically idiotic ads targeting "illegal" immigrants to turn themselves in are both malicious and incompetent. And even now there is another push to introduce the "snooper's charta" which in the light of the recent revelations about the GCHQ isn't even needed for them to do what they do.

    The other paries in the UK look good in comparison because of the unmitigated disaster that is the current Tory crop. Thatcher was bad but potentially a necessary evil due to the unmaintainability of the Postwar Dream. But think as I may I can't begin to fathom where to start to look for a justification for that cabinet, that PM and that party. They do not even have the use of a compass needle that permanently points to the south. You can't say "let's do the opposite of what they are suggesting" due to the utter confusion that is their politics.

    --
    20 minutes into the future