Slashdot Mirror


The Wayback Machine, Friend or Foe?

ShaunC asks: "As the webmaster of numerous sites, I'm curious how others feel about the Wayback Machine. What particularly interests me is the fact that the Machine is a relatively new animal, yet it contains snapshots from my sites dating back to 1998. I can't help but wonder: where did they get such old copies of my websites, and who gave them permission to make those copies? I certainly didn't provide either. Perhaps I'm too much of a purist, but I've always seen the internet as an ever-changing medium, not a permanent one. Archives have bothered me ever since the fledgling days of DejaNews." This site last made an appearance on Slashdot, earlier this year. Internet archival sites are right smack in the crosshairs of copyright, but they are useful. Anyone who has ever used Google's cache (and there are plenty of those links on Slashdot) can attest to this. Of course, the issue that may bug many content providers is how to opt-out of such services, since some see it as a copyright violation. Is it possible to balance the issues of copyright and history, or will these two Internet resources find themselves in legal trouble in the future?

"The way I see it, archives are much like SPAM; I never opted in, why should it be my responsibility to opt out? I manage a number of domains and the process of refining robots.txt files and submitting myself to the Wayback Machine for removal seems to be intrusive. Worse, domains I've abandoned (which have lapsed or been re-registered by someone else) are forever archived in the Machine and I have no way to exclude them. Why should I have to deliberately remove my copyrighted material from an archive which was never granted permission to replicate that material in the first place?"

28 of 508 comments (clear)

  1. Erm by adamwright · · Score: 3, Insightful

    Isn't this exactly the point of robots.txt? Google won't cache content it doesn't spider, and it won't spider content forbidden by your robots.txt. Does the WayBack Machine obey the robots rules?

    1. Re:Erm by kevinank · · Score: 5, Insightful
      The goal of the person who started archive.org was to record the history of the world wide web. The assumption was that whatever anyone thinks about the archive, there will never be another chance to go back and get that data once it is lost.

      The copies that they have archived in their databases are individual copies served from the original web requests, so they have the right to keep them. They became their copy when they were originally downloaded. Whether they have the right to make new copies and redistribute them depends on how you think fair use applies to that content.

      Ultimately if a lot of people start suing them they will probably shut down the archive to public access and only allow researchers to view their original copies on site. And if you'd prefer that, well, you'll end up with the world you deserve.

      --
      LibBT: BitTorrent for C - small - fast - clean (Now Versio
    2. Re:Erm by uncoveror · · Score: 2, Insightful

      I like the wayback machine's reason for being: preserving history. In 20 or 100 years, it will be very valuable information. I found old copies of my website, The Uncoveror there. It relly took me back. What I didn't like, though, is that it tried to force-feed me spyware, namely Gator and Bonzi Buddy. If the Spyware and ads were removed, then it would be a true historical archive; the kind real historians, and students can use for research. With the garbage on it, however it has little, if any, academic value.

      --
      The Uncoveror: It's the real news.
  2. There are more than copyright concerns... by Anonymous Coward · · Score: 4, Insightful

    It's a scary thought that things kids are saying on message boards when they're teenagers are going to be back to haunt them when they apply for jobs in their mid 40s...

    I mean, if everything I posted on BBSes in the 1980s were still attributable to me... yikes.

    Remember kids. Use a nickname, and change it frequently if you ever want to run for any kind of office.

    1. Re:There are more than copyright concerns... by TheMonkeyDepartment · · Score: 4, Insightful

      Well, that's a great point, and it's a good illustration of the double-edged sword of free speech. You are free to say whatever dumbshit, ridiculous things you want. But you are also free to deal with the social consequences.

  3. Permission... by gorf · · Score: 3, Insightful

    who gave them permission to make those copies?

    The way I see it, you implicitly give people some limited form of permission by putting it up on the internet freely available to download in the first place. You put it up for people to download, print out and so forth (which amounts to copying), and therefore you've implied that people may do so.

    Sure, you own copyright, and blatant plagarism is something that clearly is wrong. But I see nothing wrong with taking an article that you published on the web and reproducing it, as long as it is taken in context and is clearly attributed (and it made obvious that the copy isn't the original, but proper attribution would do this and therefore suffice).

    Of course, this is republication and so the issue is not so clear and obviously subjective. That's just my opinion.

  4. Who DOES have permission to copy your site? by allism · · Score: 3, Insightful

    Do I have permission to copy the content of your site to my browser history directory, and if so, how long do I have permission to keep it? Can I show a copy of an html document that is stored in my browser history to my mother? What about my neighbor? Or the dude in another country I happen to be chatting with online?

    IANAL blah blah blah, but once you open your files up to being downloaded and stored by a browser, you've pretty much given up the right to tell people they can't be re-distributed--I would think the best you could hope for is that people would re-distribute them, in whole, the way you originally released them.

  5. Re:"The Wayback Machine" by Disevidence · · Score: 4, Insightful

    I think the question is not about its being publicly available, but rather about it archiving web pages that were taken down at later dates for various reasons.

    Its legally grey, and all it really takes is for some paranoid person to sue, and then the fireworks start.

    IANAL.

    --
    Think nothing is impossible? Try slamming a revolving door.
  6. I like it but... by rknop · · Score: 4, Insightful

    When I first discovered it, it was a lot of fun. Much nostalgia; it was fun seeing earlier verisons of my webpages. Some go back quite a number of years.

    On the other hand, I was horrified when I realized that there was full archiving of www.dramex.org. If you visit that site, you will see that there are a large number of scripts (as in plays), many of which have restrictions on use. Over the years, we've had people request that scripts be removed from the site; of course, we did so. However, they weren't necessarily removed from the archive, and an archive keeps them forever. Specifically with the wayback machine, I was able to submit stuff that removed the specific directories I was worried about (they don't archive the scripts from www.dramex.org, just the "front page" stuff which is all part of the fun), and keep them from doing it again.

    I like the idea of archives; it preserves history. The web is a transient medium, but not entirely. Yes, much of the content is dynamic and should only be dynamic. Some of it, though, is like the front page of a newspaper. Each day, what's on "today's front page" is different-- but there is value and use in seeing what was on the front page in any day in history.

    But sometimes you need to delete something and make sure it really is no longer available. When you don't completely control your site (i.e. somebody else archives it, rather than just mirrors it), that becomes impossible.

    newspaper.

    (Incremental backups can have a similar issue. If you only back up files which are "newer than the last backup", your backup doesn't have the information about files which have been *deleted* since the last backup. When you restore, you might find some files there you thought shouldn't exist any more.)

    (Dramex.org has changed so that it's not straightforward to get directly to the scripts any more. META tags tell the search engines to leave the actual scripts alone, and you can only get the text itself via CGI. Yes, it's easy to subvert if you put your mind to it, but at least you do have to put your mind to it, and automated search engines or archivers won't. 90% of the security for 1% of the effort.)

    -Rob

  7. As a webmaster of various sites... by schon · · Score: 5, Insightful

    As a webmaster of various sites, I have no problem with archives.. if I didn't want people to see my stuff, I wouldn't have put it on the internet in the first place.

    where did they get such old copies of my websites, and who gave them permission to make those copies?

    They probably got the copies the same way everybody else did - by surfing. You (implicitly) gave them permission to cache your sites by not including an appropriate entry in your robots.txt.

    The way I see it, archives are much like SPAM; I never opted in, why should it be my responsibility to opt out?

    Archives are nothing like spam. Spam is primarily harrassment. These guys aren't harrassing you. They did ask your permission (by way of checking your robots.txt). If you've since changed your mind, it's your responsibility to notify them.

    Google caches material too - do you consider them to be spam as well?

    Archive sites provide a valuable resource to the rest of the 'net. If you don't like it, put an appropriate entry in your robots.txt file, and be done with it.

  8. Preserving information is important. by Chiasmus_ · · Score: 5, Insightful

    I doubt that I'm alone in my belief that it is always tragic when any piece of information--no matter how trivial--is lost forever.

    If a person has offered that information for free at any point, to the extent that an automated script could access it, then I believe that information can be safely considered public domain. I doubt that there's any mechanism by which Richard M. Stallman could lose his mind and "rein in" all copies of GNU, or by which Stephen King could recall all his novels and refund the purchase price; once something is offered to the public, it no longer belongs exclusively to the publisher.

    In my opinion, the value of archives in the future immeasurably outweighs occasional inconveniences of having information stick around longer than the author would have wished.

    --
    "Beware he who would deny you access to information, for in his heart he deems himself your master."
  9. Archives need to be made by Waffle+Iron · · Score: 4, Insightful
    If the courts determine that it is technically illegal to make archives of electronic content, then the copyright laws should be changed to explicitly allow archiving. Otherwise, we could eventually lose track of history. The only written record of large portions of our civilization would be relegated to a few rusting web server hard drives buried landfills.

    If you read 1984, you might remember that the government tightly controlled all old copies of documents so that they could manipulate history as they wished. We might get into a similar situation by accident if we don't allow independent archives of electronic information.

    With traditional media, you publish something on paper, but you don't get to control who puts the paper copies in which archives. That has served us well for keeping track of history, and an equivalent system needs to maintained for electronic content.

  10. A Real World Example/Question by GeekLife.com · · Score: 2, Insightful

    Do libraries have to get permission to save and allow browsing of copies of newspapers (both physical and microfiche)?

  11. And what to do when info must die? by Nf1nk · · Score: 2, Insightful

    For the most part I don't have a problem with them archiving my sites (after all they can show me what a site used to look like faster than digging out my back ups), but recently one of my customers told me to remove all traces of a product from thier site (something about nasty litigatiation). I pulled the info off our servers quickly, but three hours later I get a nasty phone call from the customer saying he can still see the product on the site. seems it was hung up in some proxy server between here and there.

    back to the point how do you deal with an archive when you need to get rid of information that is a liability to you now? Maybe we are better off without them in some cases

    --
    I used to have a cool sig, back when I cared
    1. Re:And what to do when info must die? by Anonymous Coward · · Score: 1, Insightful


      Let me paraphrase:

      "Archives make it harder to sweep nasty secrets under the rug"

      And that is bad how?

  12. Re:Opting out -- of publicly available HTTP??? by KillerCow · · Score: 4, Insightful

    When you publish something on the web, it is publicly available via HTTP. End of story.

    I don't think that that is a good enough standard. When a television show is broadcast, or when a book is published, it is publicly available -- but we don't think that the publisher looses their right to copyright protection in these cases. Publishing on the web is similar. The creator wants people to see his/her creation, but does not automatically give visitors the right to archive and retransmit the works.

  13. Re:"The Wayback Machine" by martyn+s · · Score: 4, Insightful

    So I suppose libraries should just stop carrying books because the author doesn't like what he wrote anymore? I mean, what the fuck?

  14. best thing since sliced bread by John+Sokol · · Score: 2, Insightful

    There is nothing-worst then revisionist history. I can't stand seeing site that post something and a bit later it vanished forever or have it altered removing the very think I was interested in.
    There are several GPL'ed Open Source software packages that I have copies of, that have vanished with all references to them and are no longer available on the net. Also a number of great sites that came and gone for either lack of cash or time. I think if someone open sources something it should stay that way.

    Also if it's open on the net for public viewing, then it should be fair game. Especially if the original author is credited and it is in the original context, like the Wayback Machine is. I know there are always special cases where something was put up that the webmaster was not entitled to like a copyrighted book or something, but for most stuff this is invaluable and a great service to humanity.

    Also think of all those users who's we site was lost without backup. Now they can get that data back.

    The Wayback Machine is one of the few web services I'd be willing to pay for.

    John

    --
    I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso
  15. Purist? Pure what? by American+AC+in+Paris · · Score: 5, Insightful
    Perhaps I'm too much of a purist, but I've always seen the internet as an ever-changing medium, not a permanent one. Archives have bothered me ever since the fledgling days of DejaNews.

    I'd say it makes you more of a control freak than a purist, personally.

    Seriously, how did you ever get it into your head that a medium that serves documents to the general public on demand would be somehow exempt from archiving?

    Would it bother you of John Q. Savant could recite the contents of your web pages from memory ten years after you'd taken it down?

    Would it bother you to learn that stock prices, perhaps the most "ever-changing" thing out there, are permanently archived by a variety of services?

    Or are you just jittery at the thought that your spouse/boss/Friendly Neighborhood Representative of The Man/kids may be able to someday look at the shite you plastered all over the web in your younger days? ("Ech, that stupid Netscape 2 animated title hack--honey, you actually -did- that?")

    --

    Obliteracy: Words with explosions

  16. You have given permission by MrResistor · · Score: 4, Insightful

    By the very act of posting your site on the web you have given permission to make copies of it. Otherwise, how would anyone view it? And if no one is supposed to view it, why have you published it in a publicly accessible space?

    If I went to your website 2 years ago and never closed or refreshed that browser window, would I now be violating your copyright? What if I saved the page so I could view it later offline? What if I never erased that file, would that mean that I'm violating your copyright? I have several floppies of web sites I saved at school for viewing at home from the days when I was stuck on a crappy dial-up service. Does that make me a pirate? What about all the copies of sites held in my browsers cache?

    Don't get me wrong, I understand where the sentiment is coming from, even if I disagree with it. I'm just trying to point out how incongruous it is with the basic nature of computers and the internet and how they work.

    These questions aside, though, I have to come down in favor of the historians. People here are always whining about old movies/books/music being lost because their owners refuse to let them go, even if they aren't using them, why should the web suffer the same fate? The rate of destruction is far faster on the internet, and since it isn't a physical media, the information has to be actively archived if it is to be preserved.

    --
    Under capitalism man exploits man. Under communism it's the other way around.
  17. A great tool for future historians /archeologists by msoldo · · Score: 2, Insightful

    Could you imagine if there was the equivalent of the wayback machine for everything published in 5th century Athens? We'd know and incredible amount more about were the human race had already been intellectually and where its going.

    I publish several websites and I don't mind this a bit - If someone wants to host my content for free and offer my customers a way to get at older versions of the site for whatever reason (maybe they want to know what prices were 2 years ago), then they've done me a service. Cool.

  18. Excuse me? by innocent_white_lamb · · Score: 2, Insightful

    Er, you posted content on the WWW for world+dog to read. After all, that's the purpose of posting said content. And now you're unhappy because folks are reading it?

    If you don't want folks reading your stuff, for heavens sake don't post it on the web!

    Seems obvious to me, somehow...

    --
    If you're a zombie and you know it, bite your friend!
  19. Re:"The Wayback Machine" by Rick+the+Red · · Score: 5, Insightful
    No, the issue is more akin to a library carrying newspapers and magazines for years, and their publishers suddenly telling the libraries "those copies are out of date, stop letting people read them." Why? If you didn't want anyone to read it, why did you put it out on the web?

    Are you ashamed of what you did back then, when you were young and foolish? Grow up -- we're all ashamed of what we did when we were young and foolish, and years from now you'll be ashamed of what you're doing today. Get over it.

    Personally, I think archives are great. Whenever I design an application I always ask about archiving, because inevitably they're gonna want it and it's easier to design in from the start. Oh, you want to know what your top 10 customers ordered last Christmas? Now you tell me! Geeze, we flushed that data last February, 'cause you said once the credit card cleared you didn't care to pay for the storage. But I digress.

    Someday your next client will want examples of your previous work, then you'll go crawling on your hands and knees to the Wayback Machine, begging them to show you what your pages looked like. And they'll honor your robots.txt file and tell you to get lost.

    --
    If all this should have a reason, we would be the last to know.
  20. Re:Opting out -- of publicly available HTTP??? by krypto246 · · Score: 4, Insightful

    People are just pissed about this archinving because they like the internet to be a 100% responsibility free zone - now matter what you say or do, you ca nalways change, edit or delete it later. How about standing behind your comments and opinions, instead of just deleting them when they can be held against you? Yes - use nicknames and aliases, but dont expect that the things you put out there to be temporary. You put something out into the internet, it stays there, and it can be found later, thats the power of the net, and the price you pay for it.

  21. A serious question? If so, it's OT by Anonymous Coward · · Score: 1, Insightful

    The author writes: and who gave them permission to make those copies?

    Honestly, is this a serious question to pose to /.? I don't know that /. is in any way connected to this site. So what's going on here? It sounds like the author is trying to rally public outrage by claiming to be a victim.

    Personally, I found the writer just a little bit insulting and selfish. (No offense, but that's how I read it.) To the author, I say: if you have copyright disputes with the site, contact the maintainers. Copyright problems happen all the time, and are handled gracefully and quickly. You don't need my help or /.'s involvement in this.

    I suspect the only injury was to the writer's pride. Had there been any commercial loss from the infringement, he would not have used this "wounded bird" rouse in his story.

    On the same topic, you might consider an more enlightened view, and place your old sites under the GNU Free Document license. Details are available at: http://www.fsf.org/copyleft/fdl.html

    So, to the author's (seemingly) rhetorical question, I reply: if you are serious, your question is completely off topic.

  22. The purpose of copyright... by kcbrown · · Score: 3, Insightful
    The Congress shall have Power To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries
    -- United States Constitution

    The purpose of copyright is to promote progress, to entice authors and inventors to release their works and discoveries to the public.

    But that is not an end unto itself. The true end is the benefit to society that the release of such works brings.

    Now, remember that the whole incentive here, the entire reason for granting the monopoly privilege of copyright, is to allow the originators of works to make money from their works, which in turn (theoretically) gives them incentives to release their works to the public.

    When you publish something on the web, you're publishing your works for free, unless you go to the extra trouble of implementing some kind of access control. The Wayback Machine won't work on a site that has access control, so all it ends up archiving is stuff that was published for free public consumption.

    So the real question is: if a work has already been released for free to the general public, how would letting authors restrict the republication of that work after the fact bring greater benefit to society than not letting the author impose such restrictions?

    My opinion is that it is much more beneficial to society as a whole if the release of a work for free public consumption automatically implied that members of the public have the right to redistribute that work. So if an author doesn't want people in the general public to be able to redistribute his work, he has to control who receives the work and who doesn't. Certainly requiring payment for the work in question is sufficient to meet the requirement of controlling access. But whatever method the author chooses, it should be one that makes it clear that the work in question is not being released for free to the public.

    --
    Use 'slashdot stuff' in the subject line in any email you send me if you want to get past the spam filter.
  23. Re:"The Wayback Machine" by pjrc · · Score: 3, Insightful
    Are you ashamed of what you did back then, when you were young and foolish?

    I am. Well, sorta anyway. My site has all of the pages that have ever appeared, all the way back to 1995. For example, this circuit board schematic page got a lot of hits in 1995. For years, I got emails from people who attempted to build it... a few were success but most were failures. So, in 1997 I redesigned the board/schematic so that it would be much easier to build and troubleshoot, and then I made another new rev in 1999 (because the flash rom chip became obsolete).

    Based on lots of user feedback, I redesigned it yet again in 2001, mainly to increase the speed, add more memory to be C compiler friendly, and I added the most user-requested feature, a port to plug in a standard LCD.

    Today, those old pages (well, still need to update the '99 ones) have a message at the top of the page that tell the visitor they're viewing obsolete material and strongly suggests they follow a link to the new version of the circuit board, which is easier to build (added in 1997), uses parts that are currently available on the market (added in 1999), and has more features (added in 2001).

    An archive of the original 1995 page, even archived in 1996, isn't going to warn the poor user about the usability improvements added in 1997, the part that became obsolete in 1999, and the nice new features that were added in 2001. At the very least, it'd be proper for archive.org to link to the current version of the page (if it's on-line)... but even that would be difficult since the site moved from a university to its permanent domain name in 1999 (the old site keep a redirect for a couple years, but even that is gone now).

    So, while it sucks that someone might find that old material and suffer though all the problems that have been corrected and miss out on the improvements of the last several years, it doesn't suck enough that I'd hire a lawyer, or even bother to tell them to exclude my material.

    But I can understand how a large company would not want its old products displayed with the then-current literature in a way that might confuse potential customers.

  24. No less stringent than the GPL by blueskies · · Score: 2, Insightful
    The copies that they have archived in their databases are individual copies served from the original web requests, so they have the right to keep them. They became their copy when they were originally downloaded.


    You have the right to something once you download it?

    If I copyright my content, other people are not allowed to distribute it without my consent. There is no way around this. I don't have to add extra disclaimers, just a copyright notice. How can there be any arguement about this?

    Ok, someone GPLs some software they wrote and put it on their website. If you download a compiled version of the software, you can't redistribute the compiled executable without making the source available. Why? Because the copyright owner (via the GPL) only gives you permission to redistribute if you also make the source available. The owner can do this because the GPL is backed by copyright laws, just like copyrighted web content. Notice I said owner, because the law grants special priviledges to people that create content and copyright it. There is no implied social contract that says the content is up for grabs. And there is also no reason fair use even comes close to applying if you are talking about a large quantity of content.

    I do think the archive provides a useful service, but I think they are on shaky legal ground.