The Wayback Machine, Friend or Foe?
ShaunC asks: "As the webmaster of numerous sites, I'm curious how others feel about the Wayback Machine. What particularly interests me is the fact that the Machine is a relatively new animal, yet it contains snapshots from my sites dating back to 1998. I can't help but wonder: where did they get such old copies of my websites, and who gave them permission to make those copies? I certainly didn't provide either. Perhaps I'm too much of a purist, but I've always seen the internet as an ever-changing medium, not a permanent one. Archives have bothered me ever since the fledgling days of DejaNews." This site last made an appearance on Slashdot, earlier this year. Internet archival sites are right smack in the crosshairs
of copyright, but they are useful. Anyone who has ever used Google's cache (and there are plenty of those links on Slashdot) can attest to this. Of course, the issue that may bug many content providers is how to opt-out of such services, since some see it as a copyright violation. Is it possible to balance the issues of copyright and history, or will these two Internet resources find themselves in legal trouble in the future?
"The way I see it, archives are much like SPAM; I never opted in, why should it be my responsibility to opt out? I manage a number of domains and the process of refining robots.txt files and submitting myself to the Wayback Machine for removal seems to be intrusive. Worse, domains I've abandoned (which have lapsed or been re-registered by someone else) are forever archived in the Machine and I have no way to exclude them. Why should I have to deliberately remove my copyrighted material from an archive which was never granted permission to replicate that material in the first place?"
Isn't this exactly the point of robots.txt? Google won't cache content it doesn't spider, and it won't spider content forbidden by your robots.txt. Does the WayBack Machine obey the robots rules?
Slashdot from 1997.
In college, really poor, need a flatscreen.
"The Wayback Machine" has been a pet project for a long time, and we're only now seeing results. I know for a fact that they have pages back at least as far as 1996, and it's a damn shame they don't have anything that much earlier...
And yes, it obeys the Robot Exclusion Principle.
"Ask Google" strikes again; I would hope that you could find all of this information by searching, or reading an "About" page, or something. Fortunately, these abortions to journalism don't appear on the Front Page very often.
pb Reply or e-mail; don't vaguely moderate.
I had recently placed a restricted robots.txt file on my site and when trying to access any of the past revisions, I get a message saying that the owner has restricted access to the site via robots.txt. They seem to have that aspect under control.
It's a scary thought that things kids are saying on message boards when they're teenagers are going to be back to haunt them when they apply for jobs in their mid 40s...
I mean, if everything I posted on BBSes in the 1980s were still attributable to me... yikes.
Remember kids. Use a nickname, and change it frequently if you ever want to run for any kind of office.
When you publish something on the web, it is publicly available via HTTP. End of story. Responsible netizens can observe the requests of "robots.txt" but they don't have to. If you want something more controlled, create a VPN or intranet or some other kind of non-public data server.
Your argument is similar to that of newspaper publishers who didn't like "deep linking." What they couldn't (or didn't want to) understand is that the nature of an HTTP web server is quite simple. A client asks for a file, the server gives it back. Using that protocol implies that you are OK with that. If you're not, I suggest you look into different technologies, instead of complaining about lack of control, in a medium that was never intended to provide it.
Went back and looked at the site for the .com I used to work for, very nostalgic. The wayback machine is a good resource for people who create content on someone's site (a.k.a. me), and then lose access to it because the company goes under. Now I'm able to add my old content to my portfolio, now that the company who once owned it is gone.
.....
who gave them permission to make those copies?
The way I see it, you implicitly give people some limited form of permission by putting it up on the internet freely available to download in the first place. You put it up for people to download, print out and so forth (which amounts to copying), and therefore you've implied that people may do so.
Sure, you own copyright, and blatant plagarism is something that clearly is wrong. But I see nothing wrong with taking an article that you published on the web and reproducing it, as long as it is taken in context and is clearly attributed (and it made obvious that the copy isn't the original, but proper attribution would do this and therefore suffice).
Of course, this is republication and so the issue is not so clear and obviously subjective. That's just my opinion.
Of course in practice you have to purse this and ask them to remove it.
If you really object I suggest a list of every site you have or have had and dates with a request to remove everything. Then you only need to notify them when you put up a new site that that whould also be excluded. That would not be such a nuisance, would it?
That said I think they are providing a service that is interesting so unless you are harmed by it, why object?
I am interested in knowing how they had such old versions of your site though. Do search engines keep archives?
www.cisco.com, 1 page (1996)
www.microsoft.com, 5 pages (1996)
www.ibm.com, 7 pages (1996)
This is in the FAQ.
As someone who makes lots of free sellable and href="http://www.furinkan.net/fanfic/">unsellab le content, I think The Wayback Machine is an invaluable resource. I can look back a see how big a dork I was and still am. I've also found stuff of mine that I've lost over time, amazed that anyone ever bothered to hold on to it.
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
Sherman: Mr. Peabody, I want to go back in time!
.... Damn it, boy, fire up the Wayback machine. And fetch me my chew toy.
Mr. Peabody: Be quite, Sherman. This new Wayback Machine is now accessable via a browser. Be happy with that.
Sherman: But I wanted to go back in time and watch Cleopatra taking one of those milk baths again.
Mr. Peabody:
52 Weeks, 52 Religions with John Hummel
Do I have permission to copy the content of your site to my browser history directory, and if so, how long do I have permission to keep it? Can I show a copy of an html document that is stored in my browser history to my mother? What about my neighbor? Or the dude in another country I happen to be chatting with online?
IANAL blah blah blah, but once you open your files up to being downloaded and stored by a browser, you've pretty much given up the right to tell people they can't be re-distributed--I would think the best you could hope for is that people would re-distribute them, in whole, the way you originally released them.
Denver Isuzu Suzuki
When I first discovered it, it was a lot of fun. Much nostalgia; it was fun seeing earlier verisons of my webpages. Some go back quite a number of years.
On the other hand, I was horrified when I realized that there was full archiving of www.dramex.org. If you visit that site, you will see that there are a large number of scripts (as in plays), many of which have restrictions on use. Over the years, we've had people request that scripts be removed from the site; of course, we did so. However, they weren't necessarily removed from the archive, and an archive keeps them forever. Specifically with the wayback machine, I was able to submit stuff that removed the specific directories I was worried about (they don't archive the scripts from www.dramex.org, just the "front page" stuff which is all part of the fun), and keep them from doing it again.
I like the idea of archives; it preserves history. The web is a transient medium, but not entirely. Yes, much of the content is dynamic and should only be dynamic. Some of it, though, is like the front page of a newspaper. Each day, what's on "today's front page" is different-- but there is value and use in seeing what was on the front page in any day in history.
But sometimes you need to delete something and make sure it really is no longer available. When you don't completely control your site (i.e. somebody else archives it, rather than just mirrors it), that becomes impossible.
newspaper.(Incremental backups can have a similar issue. If you only back up files which are "newer than the last backup", your backup doesn't have the information about files which have been *deleted* since the last backup. When you restore, you might find some files there you thought shouldn't exist any more.)
(Dramex.org has changed so that it's not straightforward to get directly to the scripts any more. META tags tell the search engines to leave the actual scripts alone, and you can only get the text itself via CGI. Yes, it's easy to subvert if you put your mind to it, but at least you do have to put your mind to it, and automated search engines or archivers won't. 90% of the security for 1% of the effort.)
-Rob
What's the problem?
If you do something illegal on your website, you won't be held responsible more than once just because the data persists on the Wayback machine. If you remove the offensive material from your site, that's all you can do. The Wayback machine can deal with their own lawsuit threats. And I'm sure they'll remove material if you are the site owner and ask nicely.
As far as outdated information, anyone reading pages on the wayback machine and expecting them to be current would have to be crazy. It's an archive after all.
It's easy to opt out. Google provides instructions in there webmaster faq which points out "There is a standard for robot exclusion at http://www.robotstxt.org/wc/norobots.html."
As a webmaster of various sites, I have no problem with archives.. if I didn't want people to see my stuff, I wouldn't have put it on the internet in the first place.
where did they get such old copies of my websites, and who gave them permission to make those copies?
They probably got the copies the same way everybody else did - by surfing. You (implicitly) gave them permission to cache your sites by not including an appropriate entry in your robots.txt.
The way I see it, archives are much like SPAM; I never opted in, why should it be my responsibility to opt out?
Archives are nothing like spam. Spam is primarily harrassment. These guys aren't harrassing you. They did ask your permission (by way of checking your robots.txt). If you've since changed your mind, it's your responsibility to notify them.
Google caches material too - do you consider them to be spam as well?
Archive sites provide a valuable resource to the rest of the 'net. If you don't like it, put an appropriate entry in your robots.txt file, and be done with it.
The submitter states that he never gave the Internet Archive permission to replicate his work. He is wrong.
By placing material on the web, one is implicitly granting permission for it to be read. If I put a poster up in my window, I lose the right to complain if someone walking by on the street reads it.
Equally, I lose the right to complain if someone walks by and takes a photograph of the front of my house, including the poster. The fact that someone might then be able to read the poster ten years from now is irrelevant.
If the Internet Archive were required to seek permission before archiving freely and publicly available material, then the same argument would require libraries to seek permission prior to archiving (free) newspapers.
Timeshifting is fair use, and it applies to web pages just as well as TV signals.
Tarsnap: Online backups for the truly paranoid
I would never have visisted countless sites I reguarly surf to. Google has definitely been a major gateway to the internet for me.
I think making an issue of the caching is a moot point, as about 99% of the time I always go to the website for the content since the source is always better than the cache. I use the cache only in cases when the content has disapeared or in some cases when the website itself is gone.
This is a valuable service Google is providing-- and webmasters get it for free.
Do not spread "09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0" over the internet, thank you.
I doubt that I'm alone in my belief that it is always tragic when any piece of information--no matter how trivial--is lost forever.
If a person has offered that information for free at any point, to the extent that an automated script could access it, then I believe that information can be safely considered public domain. I doubt that there's any mechanism by which Richard M. Stallman could lose his mind and "rein in" all copies of GNU, or by which Stephen King could recall all his novels and refund the purchase price; once something is offered to the public, it no longer belongs exclusively to the publisher.
In my opinion, the value of archives in the future immeasurably outweighs occasional inconveniences of having information stick around longer than the author would have wished.
"Beware he who would deny you access to information, for in his heart he deems himself your master."
"I can't help but wonder: where did they get such old copies of my websites, and who gave them permission to make those copies?"
You sound like Television broadcasters when you say something like that. "We'll broadcast content over the airwaves, but you better not capture it!"
Well, let me make it simple for you: When you make something public you cannot expect to bottle it up later. That's the whole reason that the internet is in existance: Extreme redundancy so that data is never lost. The original idea was to build a data network that could survive a nuclear attack.
I don't think anybody should ever post stuff on the web without expecting it to last forever in some form or another, regardless of whether permission is granted.
"Derp de derp."
"Of course, the issue that may bug many content providers is how to opt-out of such services, since some see it as a copyright violation."
So I need to burn all my old comics? Or perhaps I don't need to every allow anybody to look at them?
Caches aren't republishing information, they are archiving it. That's what libraries do to. Hell, they can even charge for the service if they want and still be in the moral right.
My $0.02 will always be worth more than your â0.02, so
I browsed your all of your sites (even the abandoned ones) and since my browser cache is set to 782TB (and I'm still running Netscape 1.0N), your sites are still there. And my cache is publically accessible via my webserver. Yet another way you're being violated. Ah, the risks and perils of publishing on a public network.
If you read 1984, you might remember that the government tightly controlled all old copies of documents so that they could manipulate history as they wished. We might get into a similar situation by accident if we don't allow independent archives of electronic information.
With traditional media, you publish something on paper, but you don't get to control who puts the paper copies in which archives. That has served us well for keeping track of history, and an equivalent system needs to maintained for electronic content.
Do libraries have to get permission to save and allow browsing of copies of newspapers (both physical and microfiche)?
For the most part I don't have a problem with them archiving my sites (after all they can show me what a site used to look like faster than digging out my back ups), but recently one of my customers told me to remove all traces of a product from thier site (something about nasty litigatiation). I pulled the info off our servers quickly, but three hours later I get a nasty phone call from the customer saying he can still see the product on the site. seems it was hung up in some proxy server between here and there.
back to the point how do you deal with an archive when you need to get rid of information that is a liability to you now? Maybe we are better off without them in some cases
I used to have a cool sig, back when I cared
I was talking to this guy who works for a web hosting company, and he says a fourth of his sales calls are people calling him up cause they're pissed that their last hosting company 'lost' thier site. (in reality most the time its later found out that the guy deleted it himself or renamed index.html to index2.html, etc..) He says 90% of the sites he can find a copy on the wayback machine. He'll then start to quote the website's contents to the guy on the phone and usually will have the amazed (and dumbfounded) customer signing a hosting contract by the end of day.
Use robots.txt, stupido. It lets you prevent search engines from indexing and archiving your property. However, if you're that concerned about people copying your pages, you might try avoiding the internet.
I personally love the internet archive and google's cache.
This parent post said almost everything I was going to, but one thing that I wanted to add was that the web, if a spider is even able to get to a page, (even if it doesn't follow the robots protocol which the wayback machine does) is only seeing a public page that anyone with an internet connection can get to.
Otherwise you have bad control over your content and need to update your web server to not serve that content. If you don't want people to be able to copy your information then don't give it to them. Or only give it to them in a signed format that cannot be easily duplicated.
It's like being surprised that someone has forwarded an email that you sent them.
For such a "webMASTER" this guy doesn't seem to know a lot about the Internet, seems more concerned with keeping his "Intellectual Property" safe then actually understanding the way things work.
/. feel the same.
People like this ruin the concept of the Internet, the free exchange of knowledge. I hope other people on
I don't mind that my site is being added to indexes that the public have use of for free. I have a problem where a company uses my site to make a profit, with no public benefit.
There is case law where unauthorized access to a website is a copyright violation.
I am trying to use copyright law against some of the spammers who scrape my site for email addresses. Then, go after the spam software companies for contributory infringement (let the napster rulings serve some good).
Fight Spammers!
According to Locke, the "natural rights" of man are life, liberty, and the ability to own property; when you enter into a society, you turn over all those rights to the State in return for whatever rights it deems fit to grant you.
Thus, no one has the right to eat, have children, work, or be sheltered, unless their government sees fit to grant those rights. Certainly, America does not acknowledge a right to be employed or to eat; in fact, it's been known to blacklist people in the hope that they'll do neither.
And no, no society I'm aware of has ever given its citizens the right to copy information indiscriminately. Personally, I would love to see a society do so, because I suspect that such a society would actually probably end up richer in technology and culture. Both sides of the argument make some sense, but only one is actually tried, and it's apparent that excessively restrictive copyright laws actually retard cultural and economic growth. But, no, as it stands, society has deemed that the exclusive right to copy a piece of work is something a government can hand out.
"Beware he who would deny you access to information, for in his heart he deems himself your master."
I understand the concerns, but I think it's a part of the net, a good part, that we have to wrap our minds around.
Especially when you mention Usenet archives, which are (ok, get ready to laugh) historically important. I'm not kidding! There is a little signal in there, it's a cultural brain dump, and that's of historic interest.
I think the rub is, if the archive presents the data exactly as you presented it (that is, it doesn't play with your content, present it in a frame or otherwise embed it as their own content), then it is a fair archive, a ghost of your site still walking the internet. There is no taking it back once you post it.
-pyrrho
Some have already drawn analogies to TV broadcasts, saying hey, it was broadcast, you get to keep a copy. You can't bitch now if people still have that copy, unless you're Jack Valenti.
You can spin this how you want. Here's one valid way to think about it though: a TV network brodcasts a show. You make a private copy on a VCR tape. Jack Valenti aside, you can watch that copy again as often as you like, and it's no big deal. However, you do emph not have the right to rebroadcast your copy of that show to the public without the permission of the original copyright holder. (I have my B5 tapes. I'm watching them through again now, showing them to my wife. I'm sure nobody is upset about this. But I'd be in deep doo-doo if I managed to broadcast them on a local access station, or uploaded them to a public website.)
If you are inclined to be negative about the Wayback Machine, you could view it this way. While the page existed on the original site, it was broadcast to the public. If somebody made a personal copy, they have it and will always have it, even if the site goes down. However, when the site goes down, individuals do not necessarily have the right to then "rebroadcast" (i.e. post) themselves the content they downloaded and kept. This, however, is what the WayBack machine is doing.
Mind you, except for the issue with www.dramex.org that I noted above (and which I fixed long ago), I like the WayBack machine, and am happy that they archived the content which was implicitly copyrighted to me. I would have opted in if I had wanted to. But, of course, I didn't know about it back in 1996 to opt in.
I don't have a good answer to the questions. Just thought.
-Rob
There is nothing-worst then revisionist history. I can't stand seeing site that post something and a bit later it vanished forever or have it altered removing the very think I was interested in.
There are several GPL'ed Open Source software packages that I have copies of, that have vanished with all references to them and are no longer available on the net. Also a number of great sites that came and gone for either lack of cash or time. I think if someone open sources something it should stay that way.
Also if it's open on the net for public viewing, then it should be fair game. Especially if the original author is credited and it is in the original context, like the Wayback Machine is. I know there are always special cases where something was put up that the webmaster was not entitled to like a copyrighted book or something, but for most stuff this is invaluable and a great service to humanity.
Also think of all those users who's we site was lost without backup. Now they can get that data back.
The Wayback Machine is one of the few web services I'd be willing to pay for.
John
I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso
.... and wayback is sponsored, amongst others, by the library of congress. The archive itself a 501(c)(3) public nonprofit. See 17 U.S.C. SECTION 108(a)(3) for more information.
:)
Strange that such a complaint would appear within a group expousing that "information wants to be free."
What particularly interests me is the fact that the Machine is a relatively new animal, yet it contains snapshots from my sites dating back to 1998.
Interestingly, if you look at Slashdot's earliest entry (man, that page was ugly back then!), and then look at the bottom of the page, it shows the domain that was used to pull the page: "Welcome User From firestone.alexa.com".
Alexa.com appears to be some web search ("powered by Google") toolbar thingy. I can't determine if they are the same people as the wayback machine or not.
Sometimes it's best to just let stupid people be stupid.
I'd say it makes you more of a control freak than a purist, personally.
Seriously, how did you ever get it into your head that a medium that serves documents to the general public on demand would be somehow exempt from archiving?
Would it bother you of John Q. Savant could recite the contents of your web pages from memory ten years after you'd taken it down?
Would it bother you to learn that stock prices, perhaps the most "ever-changing" thing out there, are permanently archived by a variety of services?
Or are you just jittery at the thought that your spouse/boss/Friendly Neighborhood Representative of The Man/kids may be able to someday look at the shite you plastered all over the web in your younger days? ("Ech, that stupid Netscape 2 animated title hack--honey, you actually -did- that?")
Obliteracy: Words with explosions
Now, when that person redistributes it, then it becomes an issue of fair use, copyright and license.
Fight Spammers!
http://web.archive.org/web/19961020014044/http://w ww.microsoft.com/
Well back in 1996 you really could win a million dollars from Bill Gates... well atleast a cruise.
See all the exciting things happening on the Internet in Latin America, and win big prizes at the same time! Register for the first Latin American Internet Explorer Race. You'll have a great time, and perhaps even win a Caribbean cruise!
robots.txt
User-agent: ia_archiver
Disallow: /
--
Don't sweat the petty things, and don't pet the sweaty things.
By the very act of posting your site on the web you have given permission to make copies of it. Otherwise, how would anyone view it? And if no one is supposed to view it, why have you published it in a publicly accessible space?
If I went to your website 2 years ago and never closed or refreshed that browser window, would I now be violating your copyright? What if I saved the page so I could view it later offline? What if I never erased that file, would that mean that I'm violating your copyright? I have several floppies of web sites I saved at school for viewing at home from the days when I was stuck on a crappy dial-up service. Does that make me a pirate? What about all the copies of sites held in my browsers cache?
Don't get me wrong, I understand where the sentiment is coming from, even if I disagree with it. I'm just trying to point out how incongruous it is with the basic nature of computers and the internet and how they work.
These questions aside, though, I have to come down in favor of the historians. People here are always whining about old movies/books/music being lost because their owners refuse to let them go, even if they aren't using them, why should the web suffer the same fate? The rate of destruction is far faster on the internet, and since it isn't a physical media, the information has to be actively archived if it is to be preserved.
Under capitalism man exploits man. Under communism it's the other way around.
Could you imagine if there was the equivalent of the wayback machine for everything published in 5th century Athens? We'd know and incredible amount more about were the human race had already been intellectually and where its going.
I publish several websites and I don't mind this a bit - If someone wants to host my content for free and offer my customers a way to get at older versions of the site for whatever reason (maybe they want to know what prices were 2 years ago), then they've done me a service. Cool.
As a historian and future librarian, one thing has always bothered me about the Internet. Because change is a constant, it's very difficult to keep records. It isn't like newspapers, pamphlets, books, or any other form of written record of the past five thousand years. Unless they're printed out, our writings here leave no physical evidence of their existance. Because I feel that the Internet is as significant as the printing press five centuries ago, the prospect of having no records from its early days is frightening.
We have books from five centuries ago. Will anything here still exist in a readable form five centuries from now? Unless something is done to preserve it, I feel there will be a massive gap in history.
And this is why I do not object to web archives. They are a half step to printed and more permanent storage mediums, but preferable to nothing at all.
This is just....mind blowing. Look at Ebay from 1997 [archive.org].
You fool! You've just Slashdotted Ebay!
I think we've also taken out Slashdot, and we're probably on our way to taking out the whole damn history of the internet. It's one thing to knock out somebody's geocities account or web serving PDA, but the Slashdot effect has finally gone totally out of control!
My next sig will be ready soon, but friends can beat the rush!
I didn't know that the wayback machine went that far back. I wonder if anyone is going to go to jail from posts they made in the past....
"Only one thing, is impossible for god: to find any sense in any copyright law on the planet." Mark Twain
Only 'flamers' flame!
I was just digging through a few hundred pages of information in the wayback machine when the site became sluggish. I jokingly told my friends (you know, the kind that live in my head?) it seemed I was singlehandedly slashdotting the site.
*sighs* Seems I had some help...
Anyway, I love the Wayback Machine. Besides being an extremely useful tool, it proves that Zindell was right. Information is never lost, only ever created.
I used to run a Half-life map review site, and a TFC map review site called "radium". I took my sites down a couple of years ago, and recently some friends pointed out that they showed up on one of these archival sites. I took my sites down for a reason, and didn't appreciate them hovering about on someone else's server without my permission. Say what you will, but I just don't like it. I emailed them and had my property removed from their servers. It took a bit of badgering, but it finally got done.
The long-term plan is to have a copy of the history of the Internet, beyond the power of any single government to censor. To this end, there are copies of the archive at multiple locations around the world.
One of them is in the Bibliotheca Alexandrina, in Egypt. They too have a Wayback Machine. It's jointly operated by the Government of Egypt and the United Nations Scientific and Cultural Organization. While they will usually honor removal requests, they don't have to do so.
There are plans for two more archive sites around the world, affiliated with major national libraries.
Wayback machines should function exactly like search engines. If there's a robots.txt file, check it. If it tells you to get lost, do so. A search engine is going to cache at least the text part of your site, and you know it. And you can prevent it if you wish. And depending on the engine, it can take months or years to update.
Besides, wayback machines will run into the same snags that search engines do. They can't replicate cgi scripts any better than search engines can, so to deny them access to those resources for their sake as well as the server's makes sense.
I don't know how wayback works. At the very least they SHOULD read the robots file. If they do, then I consider most of the copyright issues to be a moot point.
-Restil
Play with my webcams and lights here
Anyone else find it mildly disturbing that 1998 is considered to be distant history?
I stole this Sig
It is suspected by many that archive.org also removes archives based on content.
For instance, try accessing news sites back in the days immediately before and after 9/11. It is a very spotty record.
I have seen this for myself as well, as a web site I am struggling to find the time to build, and which has controversial content, was at one time retrievable under archive.org, but no longer is.
For that matter, it seems impossible to get Google to index it anymore either (though they too once included the site.)
By presenting themselves as having a complete record of the Internet's web sites, and then selectively deleting or restricting access to sites based on content is a very pernicious form of censorship. It isn't a First Amendment issue perhaps since dotgov assumedly isn't the one restricting content, but it is worrisome nonetheless.
Is this truly the only Earth I can live on?
You can't unregister a copyright.
You give a copy of your work to the Libary of Congress, and there the evidence sits for eternity, free to be accessed by anyone with a request slip.
The price you pay for copyright protection is public availability and persistence of your old rantings.
--Blair
Once you're on the Internet you can never get out. Its simple fact. Someone will always have a copy of that e-mail you sent professing your love to Missy Gringlebach or the nntp post about how brilliant Hitler was or your web site dedicated to New Kids on The Block.
Trying to get that stuff off is futile at best. A professor of mine once said that there is not a nanosecond when some computer isn't processing or storing something about you somewhere. And that was in 1991. I've got to side with McNealy on this. There is no such thing as privacy anymore.
A few things
1) They've been archiving since 1998, but they've only recently had the horse power to provide a live connection to it
2) It is very easy to not have your stuff indexed. the directions are here.
It's so funny that I've been sending around links to my friends of their old corporate websites for months now. Totally freaks them out.
On a different note, how long until the wayback machine is used as evidence in court?
"No, Your Honor, we never posted slanderous comments about XYZ Company. *Oh CRAP! Not the Wayback Machine?!?*
Er, you posted content on the WWW for world+dog to read. After all, that's the purpose of posting said content. And now you're unhappy because folks are reading it?
If you don't want folks reading your stuff, for heavens sake don't post it on the web!
Seems obvious to me, somehow...
If you're a zombie and you know it, bite your friend!
You know what, I actually found that amusing.
Why, somehow, does this strike me as similar to an author having published an utterly bad, horribly stinky book that, later in life, he regrets ever having let see the printing press, and complaining that some people won't turn in their copies to him to destroy now that he wants to unpublish it? Remember that copyright isn't an unlimited right to prevent copies. IMHO most of these archival sites fall into the same category as a library that bought a newspaper, scanned it onto microfilm and then subsequently had the original newspapers destroyed in a flood: they had legitimate access to the originals, the copies were legitimate fair-use copies when made, the originals haven't been transferred to anyone else, the copies remain legitimate fair-use copies.
It may be embarrassing to the creators to have copies of their sites preserved for posterity, but copyright isn't about preventing an author from being embarrassed.
Since material put on the web and made available for free access has no value, there can be no damage due to copying should someone copy it for their own use, or to use it against you in the future.
Your copyright is valid, but valueless.
While all that is true, proxy servers cache information to re-transmit and nobody complains about that. Don't my Usenet posts from 1990 implicitly have my copyright on them? Where do you draw the line? I say if you put it out there, you should just live with it and let the chips fall where they may. It's more like archeology than copyright theft...
I am not a number! I am a man! And don't you
It's funny the submitter should mention this...because I remember when the people who archived it started archiving it in the first place. A rather big to-do was made about it, as I recall; it was archived as a side-project of the folks at Alexa--you know, the ones who provide the "what's related" technology to Netscape? At the time they started, they didn't know for sure what they would do with it except store it for future generations...but they clearly had some ideas, judging from what they've done with it recently.
As to the poster's complaint about his old stuff being archived...my immediate response is to say, "Well, tough...you should have thought about that before you put your content out there in the open for anybody who wanted to look at it."
I mean, seriously, if you do something in public, you have no reasonable expectation of privacy thereafter.
Editor Emeritus and Senior Writer, TeleRead.org
The purpose of copyright is to promote progress, to entice authors and inventors to release their works and discoveries to the public.
But that is not an end unto itself. The true end is the benefit to society that the release of such works brings.
Now, remember that the whole incentive here, the entire reason for granting the monopoly privilege of copyright, is to allow the originators of works to make money from their works, which in turn (theoretically) gives them incentives to release their works to the public.
When you publish something on the web, you're publishing your works for free, unless you go to the extra trouble of implementing some kind of access control. The Wayback Machine won't work on a site that has access control, so all it ends up archiving is stuff that was published for free public consumption.
So the real question is: if a work has already been released for free to the general public, how would letting authors restrict the republication of that work after the fact bring greater benefit to society than not letting the author impose such restrictions?
My opinion is that it is much more beneficial to society as a whole if the release of a work for free public consumption automatically implied that members of the public have the right to redistribute that work. So if an author doesn't want people in the general public to be able to redistribute his work, he has to control who receives the work and who doesn't. Certainly requiring payment for the work in question is sufficient to meet the requirement of controlling access. But whatever method the author chooses, it should be one that makes it clear that the work in question is not being released for free to the public.
Use 'slashdot stuff' in the subject line in any email you send me if you want to get past the spam filter.
really slow proxie server. It's just got lots of options for which caches version you want to see. :-)
BUT...
You have to KNOW the thing exists in order to put them in your robots file.
This means that there are MANY sites in that archive that are being captured and re-published without any knowlede by the authors.
the robots.txt file is like making a burular alarm that only stops people you know to rob houses. Wayback should use the robots file to only archive sites that specifically allow them to do so.
Article X: The powers not delegated... by the Constitution...are reserved...to the people
There are removal instructions at:
http://www.archive.org/internet/remove.html
--
http://www.aikiweb.com - AikiWeb Aikido Information
.. which anyone can listen to.
Do you use caution when speaking into a microphone? Why?
Anything you publish can be used against you. Data wants to be free, remember?
=brian
If you want to OPT OUT, then don't put it up on the net. The NET is a public utility, put content out there and expect it to accessed, cached, and backed up in numerous ways by LOTS of individuals intentionally or un-intentionally. If you want your data private DON'T put it on the net, seems fairly straight forward and simple.
errr....umm...*whooosh* *whoosh* Is this thing on ?
Yesterday I used the Wayback Machine for one of the lawyers at the law firm I work at to prove that a company at one point had an office in a certain location. The company in question was trying to duck out of a contracted agreement by saying they were not the people who signed the contract.
The Wayback Machine proved that they indeed knew of, approved, and granted authorization to this specific office, and the other people had a valid contract. In this specific case, the Wayback Machine prevented an apparently scumbag company from trying to screw some apparently good people over.
Kickstart
This wayback machine is invaluable!
I was able to travel back to the early days of internet pr0n (click here to launch sex.com from '96) and research ancient authentication methods including "Click here if you are over 18".
There are really two issues: 1. Should the archives be made? Which is what everyone seems to be discussing, and 2. Should the archives be publically accessible?
I agree that any interpretation of copyright law that says the answer to "1" is "No" means that copyright law needs to be changed, not that it is "illegal and therefore immoral".
But a case can be made for "2" that the distribution should only be made for when copyright on the material has either expired, or could reasonably be expected to be expired. Which brings up two other issues, which are the absurd lenght of copyright materials, and the near impossibilty of determining if a material is still copyrighted.
So, I don't have any answers, just better questions.
If anyone has ever heard of the Library of Alexandria it was supposedly the most impressive knowledge base the world had ever assembled. Some crazy guy came by and burnt it to the ground -- setting the entire industrialized planet back hundreds perhaps thousands of years. We are now in the process of surpassing this great library, and are making it even easier for people to have access to knowledge. That knowledge may be porn, may be the morning news, or sports scores, it may even be how to construct a nuclear bomb. Nevertheless it is knowledge and EVERY person who is alive has the God (and any other higher power) given right to knowledge, despite what any government agency, or copyright may say. 21st century libraries such as the WayBack Machine are providing the tools necessary for researchers to go "back to the future." This is a great service to mankind, and it's overall importance should not be outweighed by greedy, and or overparanoid privacy rights activists. If you do not wish to be known, please do not post any information on the web, and move to the jungles of Africa and step away from a time and place known as the PRESENT.
Happy lawsuits... when you steal a logo from a corporation that just wants to screw someone...
Only 'flamers' flame!
Would be an archive site that kept versions of news articles before and after they were changed by editors. Often, an article making allegations of corruption or bad intent gets changed shortly after it is published, and the replacement gives a more neutral stance, which doesn't give readers the whole story anymore, and in many instances makes the story a non-story, leading me to wonder why it was even published in the first place.
You see? You see? Your stupid minds! Stupid! Stupid!
Strange that such a complaint would appear within a group expousing that "information wants to be free." :)
Not strange at all.
Slashdot is not populated by a bunch of lockstepping conformists. Its postership is large and diverse. The individuals are NOT the average, nor are they the stereotype.
Perhaps on the average the posters think that IP laws are 'way too tight. But some think they're too loose. Post an article about somebody making them tighter and the make-em-loosers will complain, post one about somebody apparently not respecting them at all and the make-em-tighters will sound off.
Further: Few if any Slashdot posters think a published author has no rights at all over the distribution of his work. (How would Copyleft work if that were true? B-) ) So when it looks like a service may be copying and republishing past works far beyond the authors' intended distribution they may sound off.
And even the most fanatic of the "information wants to be free" faction may still post a cautionary note about how a particular act of radically freeing it may attract opposition.
Which seems to be what happened here.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
Yeah, that's actually the book I had in mind when I said this. That and "How to get the women you desire into bed", by Ross Jeffries. He also mentioned in the forward that he was thinking about, but was talked out of, taking the book off the market.
:)
See one day I might regret having admitted that I read that book "How to get the women you desire into bed", but there ain't nothing I can do about it
1) I was glad that they had one of my old pages on there. I lost it due to a crash (my brain crashed and I wiped it out). I was able to pull it back off their site and get it right back running.
/., but sometimes the irony is just too much.
2) Are we not the same collective group that gets mad at NBC for not wanting us to use our Tivo's? I realize that there are a crapload of people on
Arguing otherwise is like saying retaining old copies of magazines after the new ones have come out is an infringing use of those magazines.
I found some information on the Wayback that I would really like to archive myself - for legally defensive reasons (i.e. trademark use, and to kill patents).
:)
Is there a way to archive sites from the Wayback machine in a clean (linked) way? I tried using standard web downloaders (Webreaper, Offline Explorer), but they didn't work correctly. Their FAQ says it can't be done, but for some reason I don't believe them...
Anyone have advice? Thanks.
My ancient vanity site that received no traffic, nor deserved any, has been duly archived. I'm dying of embarrassment at my rudimentary HTML- back in the day.
My question is why I was even on their radar?
Those that suggest you "dance like no one is watching" really want to see you make a complete fool of yourself.
What's that quote from Cryptonomicon, when the guy tells his buddy to use 4096 bit encryption? Something like "I want this encrypted until men no longer do evil."
Vintage computer games and RPG books available. Email me if you're interested.
You have the right to something once you download it?
If I copyright my content, other people are not allowed to distribute it without my consent. There is no way around this. I don't have to add extra disclaimers, just a copyright notice. How can there be any arguement about this?
Ok, someone GPLs some software they wrote and put it on their website. If you download a compiled version of the software, you can't redistribute the compiled executable without making the source available. Why? Because the copyright owner (via the GPL) only gives you permission to redistribute if you also make the source available. The owner can do this because the GPL is backed by copyright laws, just like copyrighted web content. Notice I said owner, because the law grants special priviledges to people that create content and copyright it. There is no implied social contract that says the content is up for grabs. And there is also no reason fair use even comes close to applying if you are talking about a large quantity of content.
I do think the archive provides a useful service, but I think they are on shaky legal ground.
transaction companies decide to integrate
their historical transaction databases.
That way, when this game is over, we get all of our money back.
?sp
The bigger issue is the rudeness of the archive in ignoring robots.txt and rifling through files that one does not wish to have linked or accessed (e.g. stuff under development that isn't ready for 'prime time' yet).
I sincerely hope that they don't ever really delete things, and that they ignore robots.txt as far as archiving goes. It's fine for them to not serve back your pages if you ask them not to. For a while. Say, until you are long dead.
But this information might be interesting to future generations, and frankly, any librarian or archivist owes more to those unborn people than they have any obligation to obey your transitory wishes.
Copyright laws change.
Oblivion is forever.
I read an article about the site.. the project has actually been running since 1998 - thats when they started collecting peoples websites, and adding hardware to their 'collective' to store all the data.. they only made the site public in like 2001 (or whenever it was) despite collecting it for so long.
I think if you use the Wayback Machine to go back to their own site in 1998/1999 their front page tells you this.
"Hey! Unless this is a nude love-in, get the hell off my property!!"
http://web.archive.org/web/19980113191222/http://s lashdot.org/
How much should employers find out about you based on the Wayback Machine?
... in the same way that water wants to run downhill. Finding it strange that people object to certain uses of their information is like finding it strange that people object when you spill their beer.
--
E_NOSIG
Learn the robots.txt protocol, you can shut off all bots and only allow the ones you want by simply having /
User-agent: good_bot Allow:
User-agent: * Disallow: /
I know the robots.txt, but I (along with most web publishers) have better things to do than to keep track of every web bot that may visit my site. Given how fast crawlers come and go, just keeping up with a list would probably be a daunting task.
/
Maybe the robots.txt spec should have a new tag that the archive bots look for:
archive-agent: * Disallow:
Article X: The powers not delegated... by the Constitution...are reserved...to the people