The Wayback Machine, Friend or Foe?
ShaunC asks: "As the webmaster of numerous sites, I'm curious how others feel about the Wayback Machine. What particularly interests me is the fact that the Machine is a relatively new animal, yet it contains snapshots from my sites dating back to 1998. I can't help but wonder: where did they get such old copies of my websites, and who gave them permission to make those copies? I certainly didn't provide either. Perhaps I'm too much of a purist, but I've always seen the internet as an ever-changing medium, not a permanent one. Archives have bothered me ever since the fledgling days of DejaNews." This site last made an appearance on Slashdot, earlier this year. Internet archival sites are right smack in the crosshairs
of copyright, but they are useful. Anyone who has ever used Google's cache (and there are plenty of those links on Slashdot) can attest to this. Of course, the issue that may bug many content providers is how to opt-out of such services, since some see it as a copyright violation. Is it possible to balance the issues of copyright and history, or will these two Internet resources find themselves in legal trouble in the future?
"The way I see it, archives are much like SPAM; I never opted in, why should it be my responsibility to opt out? I manage a number of domains and the process of refining robots.txt files and submitting myself to the Wayback Machine for removal seems to be intrusive. Worse, domains I've abandoned (which have lapsed or been re-registered by someone else) are forever archived in the Machine and I have no way to exclude them. Why should I have to deliberately remove my copyrighted material from an archive which was never granted permission to replicate that material in the first place?"
When you publish something on the web, it is publicly available via HTTP. End of story. Responsible netizens can observe the requests of "robots.txt" but they don't have to. If you want something more controlled, create a VPN or intranet or some other kind of non-public data server.
Your argument is similar to that of newspaper publishers who didn't like "deep linking." What they couldn't (or didn't want to) understand is that the nature of an HTTP web server is quite simple. A client asks for a file, the server gives it back. Using that protocol implies that you are OK with that. If you're not, I suggest you look into different technologies, instead of complaining about lack of control, in a medium that was never intended to provide it.
Went back and looked at the site for the .com I used to work for, very nostalgic. The wayback machine is a good resource for people who create content on someone's site (a.k.a. me), and then lose access to it because the company goes under. Now I'm able to add my old content to my portfolio, now that the company who once owned it is gone.
.....
Well, the wayback machine helped me in confronting some companies for raising their prices when we changed to the euro :)
:)
Especially dominio's pizza. They raised their prices more that 12%. I printed out the page and got a 15% discount
I don't mind that my site is being added to indexes that the public have use of for free. I have a problem where a company uses my site to make a profit, with no public benefit.
There is case law where unauthorized access to a website is a copyright violation.
I am trying to use copyright law against some of the spammers who scrape my site for email addresses. Then, go after the spam software companies for contributory infringement (let the napster rulings serve some good).
Fight Spammers!
Some have already drawn analogies to TV broadcasts, saying hey, it was broadcast, you get to keep a copy. You can't bitch now if people still have that copy, unless you're Jack Valenti.
You can spin this how you want. Here's one valid way to think about it though: a TV network brodcasts a show. You make a private copy on a VCR tape. Jack Valenti aside, you can watch that copy again as often as you like, and it's no big deal. However, you do emph not have the right to rebroadcast your copy of that show to the public without the permission of the original copyright holder. (I have my B5 tapes. I'm watching them through again now, showing them to my wife. I'm sure nobody is upset about this. But I'd be in deep doo-doo if I managed to broadcast them on a local access station, or uploaded them to a public website.)
If you are inclined to be negative about the Wayback Machine, you could view it this way. While the page existed on the original site, it was broadcast to the public. If somebody made a personal copy, they have it and will always have it, even if the site goes down. However, when the site goes down, individuals do not necessarily have the right to then "rebroadcast" (i.e. post) themselves the content they downloaded and kept. This, however, is what the WayBack machine is doing.
Mind you, except for the issue with www.dramex.org that I noted above (and which I fixed long ago), I like the WayBack machine, and am happy that they archived the content which was implicitly copyrighted to me. I would have opted in if I had wanted to. But, of course, I didn't know about it back in 1996 to opt in.
I don't have a good answer to the questions. Just thought.
-Rob
As a historian and future librarian, one thing has always bothered me about the Internet. Because change is a constant, it's very difficult to keep records. It isn't like newspapers, pamphlets, books, or any other form of written record of the past five thousand years. Unless they're printed out, our writings here leave no physical evidence of their existance. Because I feel that the Internet is as significant as the printing press five centuries ago, the prospect of having no records from its early days is frightening.
We have books from five centuries ago. Will anything here still exist in a readable form five centuries from now? Unless something is done to preserve it, I feel there will be a massive gap in history.
And this is why I do not object to web archives. They are a half step to printed and more permanent storage mediums, but preferable to nothing at all.
I didn't know that the wayback machine went that far back. I wonder if anyone is going to go to jail from posts they made in the past....
"Only one thing, is impossible for god: to find any sense in any copyright law on the planet." Mark Twain
Only 'flamers' flame!
dejanews was my best tool to weed out resumes
before I secheduled even a phone interview, I'd always search dejanews for the person in question. Sometimes I'd come up with a definate hit (first and last name as well as email and mentioning the local area or some work that was on their resume) and I'd be able to see what kind of person I was really dealing with. That's when I started looking at what I'd posted.
"We are not tolerant people. We prefer drastically effective solutions"
In my opinion, when you post publicly on the web, you are essentially saying "This is public information, it may be copyrighted, but it is public". Then it's a question of whether or not the Wayback Machine is considered "fair use", and I believe it is. If it is, then you can't stop them. End of discussion, right?
Now, if you don't want this stuff to be publicly accessible on the web, there is now a precedent (set by Google) for SSL sites. There is also the robots.txt convention you mentioned.
The only real issue I see in the archival sites is "How do they know that domain ownership changed hands?". If a porn site comes along and buys the domain after you're done with it, how does the wayback machine protect you from inconsequential damages that might arise?
I don't know... But I do know that the web and the internet in general was never intended for privacy or copyright, as such, and maybe we just need a new protocol?
Dave
Yesterday I used the Wayback Machine for one of the lawyers at the law firm I work at to prove that a company at one point had an office in a certain location. The company in question was trying to duck out of a contracted agreement by saying they were not the people who signed the contract.
The Wayback Machine proved that they indeed knew of, approved, and granted authorization to this specific office, and the other people had a valid contract. In this specific case, the Wayback Machine prevented an apparently scumbag company from trying to screw some apparently good people over.
Kickstart
If anyone has ever heard of the Library of Alexandria it was supposedly the most impressive knowledge base the world had ever assembled. Some crazy guy came by and burnt it to the ground -- setting the entire industrialized planet back hundreds perhaps thousands of years. We are now in the process of surpassing this great library, and are making it even easier for people to have access to knowledge. That knowledge may be porn, may be the morning news, or sports scores, it may even be how to construct a nuclear bomb. Nevertheless it is knowledge and EVERY person who is alive has the God (and any other higher power) given right to knowledge, despite what any government agency, or copyright may say. 21st century libraries such as the WayBack Machine are providing the tools necessary for researchers to go "back to the future." This is a great service to mankind, and it's overall importance should not be outweighed by greedy, and or overparanoid privacy rights activists. If you do not wish to be known, please do not post any information on the web, and move to the jungles of Africa and step away from a time and place known as the PRESENT.
I don't know what kind of a "purist" this person thinks they are. DejaNews (now google) is one of the *best* places to look for info that's relevant but not this week's headline. We might as well burn all the libraries to the ground, since they contain books with embarassing misprints or factual errors.
It might not be easy to get your site out of the Wayback machine, but it doesn't sound like it's impossible either. Consider the alternatives; would you rather live in a world where the past can be "updated" as needed, like the (purportedly reputable) New York Times did to the web version of a Sep. 9 story warning about Osama bin Laden. Right after September 11 they replaced it with a puff piece-- full details here. (Warning, contains links to the NYT registration-reqd pages and I think the content may have been re-scrubbed since this appeared on BuzzFlash.)
If there's no record of content, how am I supposed to provide a bibliography or references for "something I saw on the web somewhere?"
before I secheduled even a phone interview, I'd always search dejanews for the person in question. Sometimes I'd come up with a definate hit (first and last name as well as email and mentioning the local area or some work that was on their resume) and I'd be able to see what kind of person I was really dealing with. That's when I started looking at what I'd posted.
This kind of freaked me out when I started teaching in 1998 - I'd been running a large fan web site devoted to one of my favorite bands, and being heavily into the band, I posted a lot in their newsgroup and participated in more than one flame war. Of course, I was in college and in my very early 20's and late teens, but it's all archived on DejaNews now, with no way to remove it. I really doubt any public school districts are going to wise up to this (or even care, considering the national teacher shortage), but I wouldn't be surprised if it came back to haunt me in some way some day. As a previous poster mentioned, such is the burden of free speech.
An interesting thing did happen to me at the beginning of this school year. I teach high school computer classes, and I was talking about managing that fan web site when one of my students (a junior) opened his eyes really big and pointed at me with his jaw dropped, sort of aghast. I paused and asked him what was wrong, and he exclaimed that he downloaded and used the guitar tabs I'd written years earlier when he was in junior high. I found that kind of amusing!
I think the archiving of the internet is particularly scary when I can still find a lousy guitar tab I did of Pearl Jam's "Footsteps" that I did back in 1992, when I was a senior in high school piggybacking off an account at the nearby university, on my parents' Apple //e, while I was still learning how to play guitar. Obviously, the internet can have a much longer shelf life than a ProDOS 5.25" floppy (excluding news sites that "expire" their articles after limited availability).
First they ignore you, then they laugh at you, then they fight you, then you win. -- Gandhi
First they ignore you, then they laugh at you, then they fight you, then you win. -- Gandhi
That's all changed. They've got the kinks worked out, as best I can tell, and have begun obeying robots.txt files. They weren't so diligent about it three months ago, or I wouldn't have gotten ticked at 'em.
BTW, my submission was edited in at least one place: I don't capitalize the word "SPAM," as the capitalized version is Hormel's trademark. (Maybe my submission was combined with someone else's; hard to remember what I wrote 3 months ago.)
Everything else I'd say has already been said, I wish I'd noticed the story sooner.
Shaun
Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!