The Wayback Machine, Friend or Foe?
ShaunC asks: "As the webmaster of numerous sites, I'm curious how others feel about the Wayback Machine. What particularly interests me is the fact that the Machine is a relatively new animal, yet it contains snapshots from my sites dating back to 1998. I can't help but wonder: where did they get such old copies of my websites, and who gave them permission to make those copies? I certainly didn't provide either. Perhaps I'm too much of a purist, but I've always seen the internet as an ever-changing medium, not a permanent one. Archives have bothered me ever since the fledgling days of DejaNews." This site last made an appearance on Slashdot, earlier this year. Internet archival sites are right smack in the crosshairs
of copyright, but they are useful. Anyone who has ever used Google's cache (and there are plenty of those links on Slashdot) can attest to this. Of course, the issue that may bug many content providers is how to opt-out of such services, since some see it as a copyright violation. Is it possible to balance the issues of copyright and history, or will these two Internet resources find themselves in legal trouble in the future?
"The way I see it, archives are much like SPAM; I never opted in, why should it be my responsibility to opt out? I manage a number of domains and the process of refining robots.txt files and submitting myself to the Wayback Machine for removal seems to be intrusive. Worse, domains I've abandoned (which have lapsed or been re-registered by someone else) are forever archived in the Machine and I have no way to exclude them. Why should I have to deliberately remove my copyrighted material from an archive which was never granted permission to replicate that material in the first place?"
Isn't this exactly the point of robots.txt? Google won't cache content it doesn't spider, and it won't spider content forbidden by your robots.txt. Does the WayBack Machine obey the robots rules?
Slashdot from 1997.
In college, really poor, need a flatscreen.
"The Wayback Machine" has been a pet project for a long time, and we're only now seeing results. I know for a fact that they have pages back at least as far as 1996, and it's a damn shame they don't have anything that much earlier...
And yes, it obeys the Robot Exclusion Principle.
"Ask Google" strikes again; I would hope that you could find all of this information by searching, or reading an "About" page, or something. Fortunately, these abortions to journalism don't appear on the Front Page very often.
pb Reply or e-mail; don't vaguely moderate.
I had recently placed a restricted robots.txt file on my site and when trying to access any of the past revisions, I get a message saying that the owner has restricted access to the site via robots.txt. They seem to have that aspect under control.
It's a scary thought that things kids are saying on message boards when they're teenagers are going to be back to haunt them when they apply for jobs in their mid 40s...
I mean, if everything I posted on BBSes in the 1980s were still attributable to me... yikes.
Remember kids. Use a nickname, and change it frequently if you ever want to run for any kind of office.
When you publish something on the web, it is publicly available via HTTP. End of story. Responsible netizens can observe the requests of "robots.txt" but they don't have to. If you want something more controlled, create a VPN or intranet or some other kind of non-public data server.
Your argument is similar to that of newspaper publishers who didn't like "deep linking." What they couldn't (or didn't want to) understand is that the nature of an HTTP web server is quite simple. A client asks for a file, the server gives it back. Using that protocol implies that you are OK with that. If you're not, I suggest you look into different technologies, instead of complaining about lack of control, in a medium that was never intended to provide it.
Went back and looked at the site for the .com I used to work for, very nostalgic. The wayback machine is a good resource for people who create content on someone's site (a.k.a. me), and then lose access to it because the company goes under. Now I'm able to add my old content to my portfolio, now that the company who once owned it is gone.
.....
Come on people, wake up! First NPR, now this brain dead crack monkey who calls himself a "webmaster". Anyone who doesn't understand the simple rule stated above is not qualified to be a webmaseter.
I can understand clueless users, but clueless sysadmins is something with which I will not put up.
Nathan's blog
who gave them permission to make those copies?
The way I see it, you implicitly give people some limited form of permission by putting it up on the internet freely available to download in the first place. You put it up for people to download, print out and so forth (which amounts to copying), and therefore you've implied that people may do so.
Sure, you own copyright, and blatant plagarism is something that clearly is wrong. But I see nothing wrong with taking an article that you published on the web and reproducing it, as long as it is taken in context and is clearly attributed (and it made obvious that the copy isn't the original, but proper attribution would do this and therefore suffice).
Of course, this is republication and so the issue is not so clear and obviously subjective. That's just my opinion.
If I choose Friend, I can get half or none of the Wayback Machine's content...
but if I choose Foe, I can get all or none of its content?
Better choose Foe.
Of course in practice you have to purse this and ask them to remove it.
If you really object I suggest a list of every site you have or have had and dates with a request to remove everything. Then you only need to notify them when you put up a new site that that whould also be excluded. That would not be such a nuisance, would it?
That said I think they are providing a service that is interesting so unless you are harmed by it, why object?
I am interested in knowing how they had such old versions of your site though. Do search engines keep archives?
www.cisco.com, 1 page (1996)
www.microsoft.com, 5 pages (1996)
www.ibm.com, 7 pages (1996)
This is in the FAQ.
As someone who makes lots of free sellable and href="http://www.furinkan.net/fanfic/">unsellab le content, I think The Wayback Machine is an invaluable resource. I can look back a see how big a dork I was and still am. I've also found stuff of mine that I've lost over time, amazed that anyone ever bothered to hold on to it.
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
Sherman: Mr. Peabody, I want to go back in time!
.... Damn it, boy, fire up the Wayback machine. And fetch me my chew toy.
Mr. Peabody: Be quite, Sherman. This new Wayback Machine is now accessable via a browser. Be happy with that.
Sherman: But I wanted to go back in time and watch Cleopatra taking one of those milk baths again.
Mr. Peabody:
52 Weeks, 52 Religions with John Hummel
This is just....mind blowing. Look at Ebay from 1997.
In college, really poor, need a flatscreen.
Do I have permission to copy the content of your site to my browser history directory, and if so, how long do I have permission to keep it? Can I show a copy of an html document that is stored in my browser history to my mother? What about my neighbor? Or the dude in another country I happen to be chatting with online?
IANAL blah blah blah, but once you open your files up to being downloaded and stored by a browser, you've pretty much given up the right to tell people they can't be re-distributed--I would think the best you could hope for is that people would re-distribute them, in whole, the way you originally released them.
Denver Isuzu Suzuki
When I first discovered it, it was a lot of fun. Much nostalgia; it was fun seeing earlier verisons of my webpages. Some go back quite a number of years.
On the other hand, I was horrified when I realized that there was full archiving of www.dramex.org. If you visit that site, you will see that there are a large number of scripts (as in plays), many of which have restrictions on use. Over the years, we've had people request that scripts be removed from the site; of course, we did so. However, they weren't necessarily removed from the archive, and an archive keeps them forever. Specifically with the wayback machine, I was able to submit stuff that removed the specific directories I was worried about (they don't archive the scripts from www.dramex.org, just the "front page" stuff which is all part of the fun), and keep them from doing it again.
I like the idea of archives; it preserves history. The web is a transient medium, but not entirely. Yes, much of the content is dynamic and should only be dynamic. Some of it, though, is like the front page of a newspaper. Each day, what's on "today's front page" is different-- but there is value and use in seeing what was on the front page in any day in history.
But sometimes you need to delete something and make sure it really is no longer available. When you don't completely control your site (i.e. somebody else archives it, rather than just mirrors it), that becomes impossible.
newspaper.(Incremental backups can have a similar issue. If you only back up files which are "newer than the last backup", your backup doesn't have the information about files which have been *deleted* since the last backup. When you restore, you might find some files there you thought shouldn't exist any more.)
(Dramex.org has changed so that it's not straightforward to get directly to the scripts any more. META tags tell the search engines to leave the actual scripts alone, and you can only get the text itself via CGI. Yes, it's easy to subvert if you put your mind to it, but at least you do have to put your mind to it, and automated search engines or archivers won't. 90% of the security for 1% of the effort.)
-Rob
" Perhaps I'm too much of a purist, but I've always seen the internet as an ever-changing medium, not a permanent one."
If you dont have a record of what something was before, how do you know its changed?
Personally I love seeing older versions of previous work and watching the trends in web development as they progres.
thirsty*i^2
"Ya I finished that last week, it just doesn't work"
What's the problem?
If you do something illegal on your website, you won't be held responsible more than once just because the data persists on the Wayback machine. If you remove the offensive material from your site, that's all you can do. The Wayback machine can deal with their own lawsuit threats. And I'm sure they'll remove material if you are the site owner and ask nicely.
As far as outdated information, anyone reading pages on the wayback machine and expecting them to be current would have to be crazy. It's an archive after all.
It's easy to opt out. Google provides instructions in there webmaster faq which points out "There is a standard for robot exclusion at http://www.robotstxt.org/wc/norobots.html."
As a webmaster of various sites, I have no problem with archives.. if I didn't want people to see my stuff, I wouldn't have put it on the internet in the first place.
where did they get such old copies of my websites, and who gave them permission to make those copies?
They probably got the copies the same way everybody else did - by surfing. You (implicitly) gave them permission to cache your sites by not including an appropriate entry in your robots.txt.
The way I see it, archives are much like SPAM; I never opted in, why should it be my responsibility to opt out?
Archives are nothing like spam. Spam is primarily harrassment. These guys aren't harrassing you. They did ask your permission (by way of checking your robots.txt). If you've since changed your mind, it's your responsibility to notify them.
Google caches material too - do you consider them to be spam as well?
Archive sites provide a valuable resource to the rest of the 'net. If you don't like it, put an appropriate entry in your robots.txt file, and be done with it.
The submitter states that he never gave the Internet Archive permission to replicate his work. He is wrong.
By placing material on the web, one is implicitly granting permission for it to be read. If I put a poster up in my window, I lose the right to complain if someone walking by on the street reads it.
Equally, I lose the right to complain if someone walks by and takes a photograph of the front of my house, including the poster. The fact that someone might then be able to read the poster ten years from now is irrelevant.
If the Internet Archive were required to seek permission before archiving freely and publicly available material, then the same argument would require libraries to seek permission prior to archiving (free) newspapers.
Timeshifting is fair use, and it applies to web pages just as well as TV signals.
Tarsnap: Online backups for the truly paranoid
I would never have visisted countless sites I reguarly surf to. Google has definitely been a major gateway to the internet for me.
I think making an issue of the caching is a moot point, as about 99% of the time I always go to the website for the content since the source is always better than the cache. I use the cache only in cases when the content has disapeared or in some cases when the website itself is gone.
This is a valuable service Google is providing-- and webmasters get it for free.
Do not spread "09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0" over the internet, thank you.
I doubt that I'm alone in my belief that it is always tragic when any piece of information--no matter how trivial--is lost forever.
If a person has offered that information for free at any point, to the extent that an automated script could access it, then I believe that information can be safely considered public domain. I doubt that there's any mechanism by which Richard M. Stallman could lose his mind and "rein in" all copies of GNU, or by which Stephen King could recall all his novels and refund the purchase price; once something is offered to the public, it no longer belongs exclusively to the publisher.
In my opinion, the value of archives in the future immeasurably outweighs occasional inconveniences of having information stick around longer than the author would have wished.
"Beware he who would deny you access to information, for in his heart he deems himself your master."
I like it...I'm just the latest in a long line of webmasters for the site I run, my boss ran it before me. I will gleefully pull out his work for him anytime he gripes about the current incarnation. :)
"I can't help but wonder: where did they get such old copies of my websites, and who gave them permission to make those copies?"
You sound like Television broadcasters when you say something like that. "We'll broadcast content over the airwaves, but you better not capture it!"
Well, let me make it simple for you: When you make something public you cannot expect to bottle it up later. That's the whole reason that the internet is in existance: Extreme redundancy so that data is never lost. The original idea was to build a data network that could survive a nuclear attack.
I don't think anybody should ever post stuff on the web without expecting it to last forever in some form or another, regardless of whether permission is granted.
"Derp de derp."
"Of course, the issue that may bug many content providers is how to opt-out of such services, since some see it as a copyright violation."
So I need to burn all my old comics? Or perhaps I don't need to every allow anybody to look at them?
Caches aren't republishing information, they are archiving it. That's what libraries do to. Hell, they can even charge for the service if they want and still be in the moral right.
My $0.02 will always be worth more than your â0.02, so
I myself am a fan of the Wayback Machine. I really like to see snapshots some of how my sites and some of favorite websites have evolved over the years. I would also like to think that I could actually show my Grand Children what the internet was like in my prime instead of saying "back in my day we read Slashdot and we liked it, now pass me my teeth".
~~Some people never go crazy what truly horrible lives they must lead.~~ Charles Bukowski
I browsed your all of your sites (even the abandoned ones) and since my browser cache is set to 782TB (and I'm still running Netscape 1.0N), your sites are still there. And my cache is publically accessible via my webserver. Yet another way you're being violated. Ah, the risks and perils of publishing on a public network.
how cute...baby slashdot...
If you read 1984, you might remember that the government tightly controlled all old copies of documents so that they could manipulate history as they wished. We might get into a similar situation by accident if we don't allow independent archives of electronic information.
With traditional media, you publish something on paper, but you don't get to control who puts the paper copies in which archives. That has served us well for keeping track of history, and an equivalent system needs to maintained for electronic content.
Do libraries have to get permission to save and allow browsing of copies of newspapers (both physical and microfiche)?
We have right to live, feed, have children, work, be under cover. We have no right to copy. Copying is free! And don't restrict rights of other to access to information, please!
(yes I know about to copy and copyright)
For the most part I don't have a problem with them archiving my sites (after all they can show me what a site used to look like faster than digging out my back ups), but recently one of my customers told me to remove all traces of a product from thier site (something about nasty litigatiation). I pulled the info off our servers quickly, but three hours later I get a nasty phone call from the customer saying he can still see the product on the site. seems it was hung up in some proxy server between here and there.
back to the point how do you deal with an archive when you need to get rid of information that is a liability to you now? Maybe we are better off without them in some cases
I used to have a cool sig, back when I cared
It's even as slow as it was back then!
I was talking to this guy who works for a web hosting company, and he says a fourth of his sales calls are people calling him up cause they're pissed that their last hosting company 'lost' thier site. (in reality most the time its later found out that the guy deleted it himself or renamed index.html to index2.html, etc..) He says 90% of the sites he can find a copy on the wayback machine. He'll then start to quote the website's contents to the guy on the phone and usually will have the amazed (and dumbfounded) customer signing a hosting contract by the end of day.
Move it to Sealand or something like that, or some other country where copyrights are meaningless.
depending on whether the site you had up when you were scanned is/was any good!
sulli
RTFJ.
Caching of web pages on the internet is considered fair use and is central to the Web. Isn't this like a time-delayed caching server. This is just caching for a different purpose... and they aren't making money off of other people's content.
Use robots.txt, stupido. It lets you prevent search engines from indexing and archiving your property. However, if you're that concerned about people copying your pages, you might try avoiding the internet.
I personally love the internet archive and google's cache.
This parent post said almost everything I was going to, but one thing that I wanted to add was that the web, if a spider is even able to get to a page, (even if it doesn't follow the robots protocol which the wayback machine does) is only seeing a public page that anyone with an internet connection can get to.
Otherwise you have bad control over your content and need to update your web server to not serve that content. If you don't want people to be able to copy your information then don't give it to them. Or only give it to them in a signed format that cannot be easily duplicated.
It's like being surprised that someone has forwarded an email that you sent them.
I know everyone is going to say, "just make a robots.txt file and everything will be okay." Sadly, that is naive and incorrect. What makes you think that the people who send out 'bots looking for content (rather than create their own or use hyperlinks!) would honor such a noble convention?
This is like trying to solve music piracy by putting a "No Napster" sticker on the jewel box. Nice thought, but it's a dead-end.
Karma: Good (despite my invention of the Karma: sig)
someone backed up the Internet to floppy.
Well, the wayback machine helped me in confronting some companies for raising their prices when we changed to the euro :)
:)
Especially dominio's pizza. They raised their prices more that 12%. I printed out the page and got a 15% discount
For such a "webMASTER" this guy doesn't seem to know a lot about the Internet, seems more concerned with keeping his "Intellectual Property" safe then actually understanding the way things work.
/. feel the same.
People like this ruin the concept of the Internet, the free exchange of knowledge. I hope other people on
I don't mind that my site is being added to indexes that the public have use of for free. I have a problem where a company uses my site to make a profit, with no public benefit.
There is case law where unauthorized access to a website is a copyright violation.
I am trying to use copyright law against some of the spammers who scrape my site for email addresses. Then, go after the spam software companies for contributory infringement (let the napster rulings serve some good).
Fight Spammers!
I understand the concerns, but I think it's a part of the net, a good part, that we have to wrap our minds around.
Especially when you mention Usenet archives, which are (ok, get ready to laugh) historically important. I'm not kidding! There is a little signal in there, it's a cultural brain dump, and that's of historic interest.
I think the rub is, if the archive presents the data exactly as you presented it (that is, it doesn't play with your content, present it in a frame or otherwise embed it as their own content), then it is a fair archive, a ghost of your site still walking the internet. There is no taking it back once you post it.
-pyrrho
Some have already drawn analogies to TV broadcasts, saying hey, it was broadcast, you get to keep a copy. You can't bitch now if people still have that copy, unless you're Jack Valenti.
You can spin this how you want. Here's one valid way to think about it though: a TV network brodcasts a show. You make a private copy on a VCR tape. Jack Valenti aside, you can watch that copy again as often as you like, and it's no big deal. However, you do emph not have the right to rebroadcast your copy of that show to the public without the permission of the original copyright holder. (I have my B5 tapes. I'm watching them through again now, showing them to my wife. I'm sure nobody is upset about this. But I'd be in deep doo-doo if I managed to broadcast them on a local access station, or uploaded them to a public website.)
If you are inclined to be negative about the Wayback Machine, you could view it this way. While the page existed on the original site, it was broadcast to the public. If somebody made a personal copy, they have it and will always have it, even if the site goes down. However, when the site goes down, individuals do not necessarily have the right to then "rebroadcast" (i.e. post) themselves the content they downloaded and kept. This, however, is what the WayBack machine is doing.
Mind you, except for the issue with www.dramex.org that I noted above (and which I fixed long ago), I like the WayBack machine, and am happy that they archived the content which was implicitly copyrighted to me. I would have opted in if I had wanted to. But, of course, I didn't know about it back in 1996 to opt in.
I don't have a good answer to the questions. Just thought.
-Rob
There is nothing-worst then revisionist history. I can't stand seeing site that post something and a bit later it vanished forever or have it altered removing the very think I was interested in.
There are several GPL'ed Open Source software packages that I have copies of, that have vanished with all references to them and are no longer available on the net. Also a number of great sites that came and gone for either lack of cash or time. I think if someone open sources something it should stay that way.
Also if it's open on the net for public viewing, then it should be fair game. Especially if the original author is credited and it is in the original context, like the Wayback Machine is. I know there are always special cases where something was put up that the webmaster was not entitled to like a copyrighted book or something, but for most stuff this is invaluable and a great service to humanity.
Also think of all those users who's we site was lost without backup. Now they can get that data back.
The Wayback Machine is one of the few web services I'd be willing to pay for.
John
I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso
And they can bite my shiny metal ass. Especially Bloomberg.
this reminds me alot of the old opt-in/ opt-out p2p debate.
Don't publish a website available to anyone on the Internet if you don't want a "snapshot" taken. I'm personally very comfortable with my work and writings being available to anyone, forever. If I wasn't I wouldn't have put them online.
I like to think of the Wayback Machine as my personal backup server.
I just put all my most vital files in a web folder, and their crawlers take care of the rest.
And for encryption? Two words, baby:
ROT-13
A customer service representative will be with me shortly.
.... and wayback is sponsored, amongst others, by the library of congress. The archive itself a 501(c)(3) public nonprofit. See 17 U.S.C. SECTION 108(a)(3) for more information.
:)
Strange that such a complaint would appear within a group expousing that "information wants to be free."
What particularly interests me is the fact that the Machine is a relatively new animal, yet it contains snapshots from my sites dating back to 1998.
Interestingly, if you look at Slashdot's earliest entry (man, that page was ugly back then!), and then look at the bottom of the page, it shows the domain that was used to pull the page: "Welcome User From firestone.alexa.com".
Alexa.com appears to be some web search ("powered by Google") toolbar thingy. I can't determine if they are the same people as the wayback machine or not.
Sometimes it's best to just let stupid people be stupid.
I'd say it makes you more of a control freak than a purist, personally.
Seriously, how did you ever get it into your head that a medium that serves documents to the general public on demand would be somehow exempt from archiving?
Would it bother you of John Q. Savant could recite the contents of your web pages from memory ten years after you'd taken it down?
Would it bother you to learn that stock prices, perhaps the most "ever-changing" thing out there, are permanently archived by a variety of services?
Or are you just jittery at the thought that your spouse/boss/Friendly Neighborhood Representative of The Man/kids may be able to someday look at the shite you plastered all over the web in your younger days? ("Ech, that stupid Netscape 2 animated title hack--honey, you actually -did- that?")
Obliteracy: Words with explosions
Now, when that person redistributes it, then it becomes an issue of fair use, copyright and license.
Fight Spammers!
It's always been a monument to bad grammer and spelling on my part. So years down the road I can go back to see how terrible it was, then realize it hasn't improved one bit.
Plus the darn thing crawls my web sites everyday.
In Denmark, it is a legal obligation to hand over a copy of any and all publicized material to the Royal Library, including anything publicized on websites, for archiving and historical services/research.
That so few does it just indicates that nobody knows about that law.
But, I think it's a wonderful law that there is one central place that at least tries to be complete...
I'd like to see a similar law passed in international media, regarding services such as the WayBack Machine, so that they are not only allowed to, but required to take copies of every and all public material.
For academic, research, history, whatever reasons...
-- Tino Didriksen / Project JJ
http://web.archive.org/web/19961020014044/http://w ww.microsoft.com/
Well back in 1996 you really could win a million dollars from Bill Gates... well atleast a cruise.
See all the exciting things happening on the Internet in Latin America, and win big prizes at the same time! Register for the first Latin American Internet Explorer Race. You'll have a great time, and perhaps even win a Caribbean cruise!
For those who aren't in control of the root domain, and still want to exclude portions of their site, you can (try to) use meta-tags. No guarantees if a spider will honor it.
<META NAME="ROBOTS" CONTENT="NOINDEX">
<META NAME="ROBOTS" CONTENT="NOFOLLOW">
<META NAME="ROBOTS" CONTENT="NOARCHIVE">
robots.txt
User-agent: ia_archiver
Disallow: /
--
Don't sweat the petty things, and don't pet the sweaty things.
By the very act of posting your site on the web you have given permission to make copies of it. Otherwise, how would anyone view it? And if no one is supposed to view it, why have you published it in a publicly accessible space?
If I went to your website 2 years ago and never closed or refreshed that browser window, would I now be violating your copyright? What if I saved the page so I could view it later offline? What if I never erased that file, would that mean that I'm violating your copyright? I have several floppies of web sites I saved at school for viewing at home from the days when I was stuck on a crappy dial-up service. Does that make me a pirate? What about all the copies of sites held in my browsers cache?
Don't get me wrong, I understand where the sentiment is coming from, even if I disagree with it. I'm just trying to point out how incongruous it is with the basic nature of computers and the internet and how they work.
These questions aside, though, I have to come down in favor of the historians. People here are always whining about old movies/books/music being lost because their owners refuse to let them go, even if they aren't using them, why should the web suffer the same fate? The rate of destruction is far faster on the internet, and since it isn't a physical media, the information has to be actively archived if it is to be preserved.
Under capitalism man exploits man. Under communism it's the other way around.
Could you imagine if there was the equivalent of the wayback machine for everything published in 5th century Athens? We'd know and incredible amount more about were the human race had already been intellectually and where its going.
I publish several websites and I don't mind this a bit - If someone wants to host my content for free and offer my customers a way to get at older versions of the site for whatever reason (maybe they want to know what prices were 2 years ago), then they've done me a service. Cool.
Some people worry too much. If I want information, I want to be able to find it whether someone wants to host it anymore or not. If I'm bored, I can find entertainment from the Wayback Machine. If you don't want your site to be part of the Wayback Machine, program it so that it can't be snagged by the Wayback Machine. The wayback machine will not be confused with the real thing, and since the HTML / images format of most of the www is inherently unprotectable, content owners have no claim to stop linking and caching. It was the nature of the beast they signed up for.
As a historian and future librarian, one thing has always bothered me about the Internet. Because change is a constant, it's very difficult to keep records. It isn't like newspapers, pamphlets, books, or any other form of written record of the past five thousand years. Unless they're printed out, our writings here leave no physical evidence of their existance. Because I feel that the Internet is as significant as the printing press five centuries ago, the prospect of having no records from its early days is frightening.
We have books from five centuries ago. Will anything here still exist in a readable form five centuries from now? Unless something is done to preserve it, I feel there will be a massive gap in history.
And this is why I do not object to web archives. They are a half step to printed and more permanent storage mediums, but preferable to nothing at all.
The way i see it (maybe its been said before on here i dont have time to check now) if you put up something on the web that is FREELY available to anyone you don't exactly lose the rights to it, but you have to expect that people may distribute it around long after your site is down. If you dont want people seeing stuff in a few years time, don't put it up on your website.
"Save me jebus!" - Homer Simpson (btw, I'm probably talkin out of me arse)
The wayback machine recognizes the poster's viewpoint. Not only will they pass over your site for archiving if robots.txt advises so, but they will also make your previous archive entries unavailable until such time that you change your robots.txt policy to allow indexing by web crawlers.
Cheers.
I didn't know that the wayback machine went that far back. I wonder if anyone is going to go to jail from posts they made in the past....
"Only one thing, is impossible for god: to find any sense in any copyright law on the planet." Mark Twain
Is that the wayback machine is part of archive.org - which sits in the same room (and network) at alexa.com which is probably where they got the web pages. Don't ask to much how I know this - just when I was working for another company I used to have contacts with them (for instance I know their sys admin on a first name basis). In other words - archive.org has really been around a lot longer then you think.
So stop freaking out and go back to browsing porn.
Comment removed based on user account deletion
Only 'flamers' flame!
Yeah, I don't know why we keep all of those pesky pre-contemporary books around either. Lets get rid of it all. It really bothers me that you would want to rewrite or erase history. Even if your site is just a blog or some crap like that. It tells us something about the mental state of those that came before us. Which can help us understand each other.
He who does not know about the dot com bubble is doomed to repeat it.
Browsers by default have a history folder that is only what, 15 days long? Websites rarely last longer than, what, 2 years?
The "internet" seems to be a transitory medium. Unlike paper, digital information is intangible, and can be easily wiped and replaced, or edited. A perfect example of this is the way that news sites often take their articles offline once they've been up for a week, or just look at webtracking software, which shows that links are dying faster than ever before.
If this trend continues (and given the current architecture of the 'net, I don't see it changing) then we might have a serious problem. I won't analyze it, but there is definitely something wrong when data is forever lost after existing for such a short period of time.
------- "From bored to fanboy in 3.8 asian girls" ----------
What happens if it archives a website with the Nimda/Code red virus?
In college, really poor, need a flatscreen.
With some of the attitudes about control of information on the web that we're seeing, maybe we should flip the WWW over and call it MMM....
I was just digging through a few hundred pages of information in the wayback machine when the site became sluggish. I jokingly told my friends (you know, the kind that live in my head?) it seemed I was singlehandedly slashdotting the site.
*sighs* Seems I had some help...
Anyway, I love the Wayback Machine. Besides being an extremely useful tool, it proves that Zindell was right. Information is never lost, only ever created.
I used to run a Half-life map review site, and a TFC map review site called "radium". I took my sites down a couple of years ago, and recently some friends pointed out that they showed up on one of these archival sites. I took my sites down for a reason, and didn't appreciate them hovering about on someone else's server without my permission. Say what you will, but I just don't like it. I emailed them and had my property removed from their servers. It took a bit of badgering, but it finally got done.
Scarier then the archives not asking your permission is their connection to Alexa Internet and their ownership by Amazon for use as a marketing tool and guide. Also of interest is the change in Alexa's search tool's privacy notice from the original aggregated/generalized data only to the newer we-track-who-you-are-and-where-you-go version - but old users likely did not notice the change. Bezos is no dope....there's gold in them thar archives....
The long-term plan is to have a copy of the history of the Internet, beyond the power of any single government to censor. To this end, there are copies of the archive at multiple locations around the world.
One of them is in the Bibliotheca Alexandrina, in Egypt. They too have a Wayback Machine. It's jointly operated by the Government of Egypt and the United Nations Scientific and Cultural Organization. While they will usually honor removal requests, they don't have to do so.
There are plans for two more archive sites around the world, affiliated with major national libraries.
OK all this you can copy this you can not link to this etc junk should end and how should it end you ask? Well the guy/body/orginization that holds the copyright/patentent/pink fuzzy thing that is the http protocal spec needs to include that all content delivered via this method is archivable, linkable, and general avalible to be munged with as people desire why because your using MY IP the protocal to deliver it and thats what I say dont like it then use another protocal and not https dossent count thats just ssl http a protocal in a protocal. So this would leave companys that want to complane about this only serving things up via http that they want open.
fucking jews
Access to this web page is restricted at this time.
Reason:
The Websense category "Proxy Avoidance Systems" is filtered.
URL:
http://web.archive.org/
Never trust a man wearing a coat and tie!
Wayback machines should function exactly like search engines. If there's a robots.txt file, check it. If it tells you to get lost, do so. A search engine is going to cache at least the text part of your site, and you know it. And you can prevent it if you wish. And depending on the engine, it can take months or years to update.
Besides, wayback machines will run into the same snags that search engines do. They can't replicate cgi scripts any better than search engines can, so to deny them access to those resources for their sake as well as the server's makes sense.
I don't know how wayback works. At the very least they SHOULD read the robots file. If they do, then I consider most of the copyright issues to be a moot point.
-Restil
Play with my webcams and lights here
Anyone else find it mildly disturbing that 1998 is considered to be distant history?
I stole this Sig
It is suspected by many that archive.org also removes archives based on content.
For instance, try accessing news sites back in the days immediately before and after 9/11. It is a very spotty record.
I have seen this for myself as well, as a web site I am struggling to find the time to build, and which has controversial content, was at one time retrievable under archive.org, but no longer is.
For that matter, it seems impossible to get Google to index it anymore either (though they too once included the site.)
By presenting themselves as having a complete record of the Internet's web sites, and then selectively deleting or restricting access to sites based on content is a very pernicious form of censorship. It isn't a First Amendment issue perhaps since dotgov assumedly isn't the one restricting content, but it is worrisome nonetheless.
Is this truly the only Earth I can live on?
You can't unregister a copyright.
You give a copy of your work to the Libary of Congress, and there the evidence sits for eternity, free to be accessed by anyone with a request slip.
The price you pay for copyright protection is public availability and persistence of your old rantings.
--Blair
A few points about why I think the Wayback Machine is good:
Have an old "emergency pager" (read, customer bugs calls because they cant get spell checker to work right at 3am) which was turned off, and then hidden so it is not carried. We lost the phone number, and then couldnt cancel the pager without it. Wayback had it from an old copy of our support site, and the phone number.
New web hosting client had 50% of the files that went down in the hosting company's servers in WTC. Wayback had them. We got all the verbiage from them.
We also occasionally need to point out to a customer what state their web site was in when we turned it over for maintanance by them. Having a third-party demonstrate that wiggly email gif was not us in the first place helps a lot.
I totally disagree with the original article, the Wayback Machine has some practical uses, and is fun for looking at old cheesy web sites. They also seem to be cooperating with people to take things out that need to be taken out. So I have no problem with it at all.
Once you're on the Internet you can never get out. Its simple fact. Someone will always have a copy of that e-mail you sent professing your love to Missy Gringlebach or the nntp post about how brilliant Hitler was or your web site dedicated to New Kids on The Block.
Trying to get that stuff off is futile at best. A professor of mine once said that there is not a nanosecond when some computer isn't processing or storing something about you somewhere. And that was in 1991. I've got to side with McNealy on this. There is no such thing as privacy anymore.
By present standards no one says that I have to destroy a book after a certain amount of time, or who I may share it. No one says I can't print out a web site and, within fair use laws, shares it with posterity. No one says that I can't take a book whose copyright has expired and post it on the net. The kind of laws that would be necessary to protect on-line work beyond what is already granted for other works would lead to the kind of legislation promoted by the RIAA and their ilk.
That said, it is scary that everything we say may be saved for the future. There should be some social standard on what can be saved and what can't. I would say that general emails may not be good candidates for archiving, as they are not published (Although notice that many peoples personal letters do make it into books, so there is some wiggle room here). On the other hand, publicly accessible web pages are pretty much subject to the same archiving standards as other published works. We can certainly pick nits over copyrights, but this is slashdot after all :-).
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
A few things
1) They've been archiving since 1998, but they've only recently had the horse power to provide a live connection to it
2) It is very easy to not have your stuff indexed. the directions are here.
OK, clowns, get a grip.
If you print a book, a library is free to put it in it's collection. Does it matter if you come out with other versions, or does it matter that more than one person might read it? NO!
I *highly* disrespect the notion that saying "this is Joe's website circa 1998" and showing that exact page to anyone who wants is a copyright violation. It's libel if I change it, and copyright violation if I take credit for it, but otherwise, I'm more like a library than someone selling illegal copies of a book or movie.
If you don't want it available- and reproduced - don't release it, especially digitally.
It's so funny that I've been sending around links to my friends of their old corporate websites for months now. Totally freaks them out.
On a different note, how long until the wayback machine is used as evidence in court?
"No, Your Honor, we never posted slanderous comments about XYZ Company. *Oh CRAP! Not the Wayback Machine?!?*
Er, you posted content on the WWW for world+dog to read. After all, that's the purpose of posting said content. And now you're unhappy because folks are reading it?
If you don't want folks reading your stuff, for heavens sake don't post it on the web!
Seems obvious to me, somehow...
If you're a zombie and you know it, bite your friend!
You know what, I actually found that amusing.
I'm not gay, I just hate my life..
Anyone want a blowjob? My only requirements is that you have at least 2 major STDs, one being AIDS.
You must also ejaculate in my mouth, and possibly pound my ass if youd like.
I must die. Thanks
Why, somehow, does this strike me as similar to an author having published an utterly bad, horribly stinky book that, later in life, he regrets ever having let see the printing press, and complaining that some people won't turn in their copies to him to destroy now that he wants to unpublish it? Remember that copyright isn't an unlimited right to prevent copies. IMHO most of these archival sites fall into the same category as a library that bought a newspaper, scanned it onto microfilm and then subsequently had the original newspapers destroyed in a flood: they had legitimate access to the originals, the copies were legitimate fair-use copies when made, the originals haven't been transferred to anyone else, the copies remain legitimate fair-use copies.
It may be embarrassing to the creators to have copies of their sites preserved for posterity, but copyright isn't about preventing an author from being embarrassed.
I guess you object to libraries keeping copies of all those old books that the author doesn't "like" anymore either, too?
There needs to be some sort of archive, make it free or payware, I don't care(as long as it is not a commercial company that controls it) of the web, like the Library of Congress does for books, like LexisNexis does for printed media.
It's called preserving history, the main medium of this page is digital bits, and ironically, it's the most transient compared to all past media.
Those who fail to learn about the past are doomed to repeat it...
There's 10 types of people in this world, those who understand binary and those who don't.
Since material put on the web and made available for free access has no value, there can be no damage due to copying should someone copy it for their own use, or to use it against you in the future.
Your copyright is valid, but valueless.
The author writes: and who gave them permission to make those copies?
/.? I don't know that /. is in
any way connected to this site. So what's
going on here? It sounds like the author
is trying to rally public outrage by claiming
to be a victim.
/.'s involvement
in this.
Honestly, is this a serious question to pose to
Personally, I found the writer just a little bit insulting and selfish. (No offense, but that's how I read it.) To the author, I say: if you have copyright disputes with the site, contact the maintainers. Copyright problems happen all the time, and are handled gracefully and quickly. You don't need my help or
I suspect the only injury was to the writer's pride. Had there been any commercial loss from the infringement, he would not have used this "wounded bird" rouse in his story.
On the same topic, you might consider an more enlightened view, and place your old sites under the GNU Free Document license. Details are available at: http://www.fsf.org/copyleft/fdl.html
So, to the author's (seemingly) rhetorical question, I reply: if you are serious, your question is completely off topic.
"I never opted in, why should it be my responsibility to opt out?"
Because the burden of caching/archiving falls on the archiver, not the archived. You are not paying for the long-term storage space. You are not having to wade through an inbox full of junk to get to it.
If there's something you don't want archived, you'd better have a damn good reason for it. Because in 50 years there will otherwise be no evidence that this discussion - ANY discussion, work of art or content that is online RIGHT NOW - ever existed in the first place.
What do you want the record of your generation's online activities to be? A mere footnote saying "no data"? Or an archive that can be browsed, read and appreciated (for good or bad) exactly as it was at the time?
If nobody remembers - why do it in the first place?
The Internet Archive and Alexa were founded more-or-less simultaneously by Brewster Kahle in April 1996. (I'm really surprised you haven't heard of Alexa. It's old news by now.)
Alexa crawls the web with a bot named ia_archiver as part of their site analysis. archive.org and alexa.com are legally separate organizations, but Kahle runs both, and Alexa still donates a copy of everything they crawl to the Archive.
Proud to be / Smiley-free / Since Nineteen / Ninety-Three
While all that is true, proxy servers cache information to re-transmit and nobody complains about that. Don't my Usenet posts from 1990 implicitly have my copyright on them? Where do you draw the line? I say if you put it out there, you should just live with it and let the chips fall where they may. It's more like archeology than copyright theft...
I am not a number! I am a man! And don't you
If you worry about people saving things you've said or produced,
maybe you should say/produce better things....
If your grandkids (or a Grand Jury) were to see
this, would you be ashamed?
(or be shown guilty of a crime?)
Simply put...watch what you say/do...and leave a good legacy.
(BTW, I understand "kids will be kids" but you have to grow up eventually and take responsibility for your past actions!)
I'm killing 2 quotes with one fact:
"where did they get such old copies of my websites"
and
"I know for a fact that they have pages back at least as far as 1996"
ia_archiver (the bot that collects files for the Internet Archive) was unveiled in September 1996, just a few months after the Archive was founded.
Here's a a copy of the original robot annoucement from 5 Sep 1996.
Proud to be / Smiley-free / Since Nineteen / Ninety-Three
Strange that such a complaint would appear within a group expousing that "information wants to be free." :)
Indeed. But there are two things to note about public archives of websites: One is that the archives were obviously started way before they were made available or even announced. That caused the misperception that the web is volatile by nature and many did not realize how their actions could come back to haunt them. The other thing is that we'd like to see everyone treated the same. Copyright for all or no one. Usually the big players get all the rights while the rest might complain or not - it doesn't make a difference. The well known archives Google and Waybackmachine do honor requests for content being taken down, but instead of deleting sites from their archive, they only block them. They claim that it's due to technical reasons, but the suspicion remains that paying customers might still be able to access these blocked parts of the archive. BTW, I'd still opt for free flow of information, but if that were to be the rule, I'd like to be informed about it at the same time as everybody else and I wouldn't want exceptions made for anyone.
It's funny the submitter should mention this...because I remember when the people who archived it started archiving it in the first place. A rather big to-do was made about it, as I recall; it was archived as a side-project of the folks at Alexa--you know, the ones who provide the "what's related" technology to Netscape? At the time they started, they didn't know for sure what they would do with it except store it for future generations...but they clearly had some ideas, judging from what they've done with it recently.
As to the poster's complaint about his old stuff being archived...my immediate response is to say, "Well, tough...you should have thought about that before you put your content out there in the open for anybody who wanted to look at it."
I mean, seriously, if you do something in public, you have no reasonable expectation of privacy thereafter.
Editor Emeritus and Senior Writer, TeleRead.org
I dug out my account I haven't logged in for a year or so to reply to this thread. I don't really have time to consisely write this, but hopefully it will catch at least some.
The idea of shutting down services like google or archive.org is a virulent stream of bad thought. It is largely predicated on fear and the fact that copyright law has become so perverted.
There is a _very_ strong societal interest in having history. If organizations are not allowed to archive things, then we end up with less history. We (society) are depending on organizations to accurately store their records and faithfully give them to us in the future. Historically, it has been the case that it is not likely that organizations will not clean up their history to favorably reflect themselves (Disney and their WWII propoganda cartoons for example).
Most organizations do not want to pay the cost of storing and have a general interest in avoiding the liability implicit in storing in this day and age.
The argument that some organization must get permission from _every single_ web page it archives is just insanity. Based on that theory everything comes to a halt. I can't take a picture of folks in Time Square because I need everyone's permission, permission of building owners, permission of advertisers, etc.
This is all so frustrating because the U.S.'s constitutional framers were concerned with creating IP rights at all. They feared a overarching monopoly on ideas, which is unfortunately what we are running towards. Surprisingly, copyright wasn't designed to give folks a monopolistic production right. It was designed to give a right ensuring you accuracy of reproductions on your work.
It WAS NOT the idea of copyright that some entity could control production rights EFFECTIVELY FOREVER.
Archive.org obeys robots and you can opt out. They are providing imo a very useful service. They are one of the organizations out there on the front line arguing and fighting battles to preserve _your_ right to history.
I think it is imparative that the Web have archives like the Wayback Machine so people can't go back and erase history. The argument that the Web should be fluid is nice, but what about accuracy. If you have read George Orwell's "1984," you will remember that the main character in the book had a job in which he deleted exerpts from newspapers and media in order to make history match the present. The web would have made his job as sinch because all he would have to do is delete or replace a Web site if people didn't want it to beleive it anymore... With the Wayback machine taking snapshots, there is no chance that people will be able to erase history. A bit of a rant but you get the point.
The purpose of copyright is to promote progress, to entice authors and inventors to release their works and discoveries to the public.
But that is not an end unto itself. The true end is the benefit to society that the release of such works brings.
Now, remember that the whole incentive here, the entire reason for granting the monopoly privilege of copyright, is to allow the originators of works to make money from their works, which in turn (theoretically) gives them incentives to release their works to the public.
When you publish something on the web, you're publishing your works for free, unless you go to the extra trouble of implementing some kind of access control. The Wayback Machine won't work on a site that has access control, so all it ends up archiving is stuff that was published for free public consumption.
So the real question is: if a work has already been released for free to the general public, how would letting authors restrict the republication of that work after the fact bring greater benefit to society than not letting the author impose such restrictions?
My opinion is that it is much more beneficial to society as a whole if the release of a work for free public consumption automatically implied that members of the public have the right to redistribute that work. So if an author doesn't want people in the general public to be able to redistribute his work, he has to control who receives the work and who doesn't. Certainly requiring payment for the work in question is sufficient to meet the requirement of controlling access. But whatever method the author chooses, it should be one that makes it clear that the work in question is not being released for free to the public.
Use 'slashdot stuff' in the subject line in any email you send me if you want to get past the spam filter.
really slow proxie server. It's just got lots of options for which caches version you want to see. :-)
is not quite that easy. Allthough they say that with the correct robots.txt, your index will not be searchable and they say also that all indexed content will be erased, they do not erase it.
I checked this with removing the robot.txt entries and wumms, the content is back in the machine. I call them a bunch of liars....
Unfortunately the present copyright system encourages defensiveness and the "but where's my share?" mentality.
If the "Wayback machine" is crippled or killed, entire old internet sites would probably be pirated, traded or sold.
BTW: I know I wouldn't want to work for someone who searched my adolescent transgressions, which may be found on school records, police records and who knows where else. It would reflect more on the seeker than the writer, I think.
This is great for getting around the corporate firewall so I can once again browse porn on company time! Woohoo!
This post is licensed under the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.
There are removal instructions at:
http://www.archive.org/internet/remove.html
--
http://www.aikiweb.com - AikiWeb Aikido Information
you're making a problem where one doesn't exist. why did you even ask this question? what do you care, you should be happy they've archived your web site. it doesn't make any difference to anybody. stop whining about pointless stuff. and no you can't sue them, if you posted stuff on the web it's publicly accessible. i can't believe you're even asking this question.
looses != loses
it is very nice to search for wayback porno and
remember all of the good wayback porno memorys
Fair use does include provisions for archiving for your_own_use. Just as you cannot tape Major League Baseball ("Free" broadcast) and rebroadcast it later without consent, the copyright holders of the websites have every right to be upset that their websites are being "retransmitted" without their knowledge or consent.
Archive.org does obey robots.txt. Unfortunately, it will still crawl a site even though the robots.txt ban is there. So, you have to add them to your htaccess ip/agent ban list.
Additionally, this isn't just the Wayback Machine we are dealing with - remember, there is a relationship with Alexa. You remember Alexa from their days in the cross hairs of privacy problems right?
There are so many big questions left hanging about archive.org. I can't figure it -can you? There is something more going on here. This isn't a normal site.
Unanswered or short answered Q's:
What is/has been archive.org doing with all that text for all these years? They haven't been sharing it publically for any time at all.
Have they been selling data (your site), to third parties other than Alexa?
Does archive.org have contractual agreements with any govts?
Who are they feeding? Hmmm, collecting data for how long? And now just putting it online.
How are they making any money? Where's the revenue stream to fund such a mass collection?
Who is funding such a massive long term effort?
Think about how long they have been doing this. Since 96 when a good work station would cost a couple years salary. This is massive, just massive tech investment that would probably put most of the search engines on the net to shame. Where's it coming from?
Finally, with rogue bots being the #1 problem of many sites, it is time for a robots INCLUSION standard. All bots are banned unless specifically allowed. That is a whole lot different than the deprecated, unworkable joke known as the robots.txt standard (that was never endorsed by any major net organization).
"Welcome to ABC's Monday Night Football. This telecast is for the sole exclusive use of our viewing audience. Any retransmission...."
Why should the web be any different? Copyright is copyright whether it is TV, MP3, or text on a SlashDot story.
/tanstaafl
But it could be embarassing, in the way that Google's "Complete USENET Archive is". Reading my posts there from 12-14 years ago makes me wince!
Anyway, I was was involved with a site that was pulled down because we got a credible threat of a lawsuit. I'm pleased to see it's in the WayBack machine!
--
Ask the Ya-Hoot Oracle Anything!
It's fair use to keep a personal copy in your browser's cache. It's arguably not fair use to redistribute that copy to millions of others through the Wayback machine.
One thing that may affect copyright claims is that it's not correct about the pages given the dates. I just checked a former employer, and the page that the WayBack Machine said was from Dec. 1998 had a 1999/2000 copyright notice, and announced a product I know was not available in 1998.
So copyright holder could claim the WayBack Machine misrepresents their site.
.. which anyone can listen to.
Do you use caution when speaking into a microphone? Why?
Anything you publish can be used against you. Data wants to be free, remember?
=brian
Man, I've found pages from old porn sites I worked on that never made it out of the fuckin' ISP. (Management troubles, A.K.A. intertcine warfare.)
What a STUPENDOUS waste of storage.
Who the fuck paid for all these drives?
Do his doctors know he's off his meds?
Could I get him to donate a few terabytes to my boxen?
MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
In my opinion, when you post publicly on the web, you are essentially saying "This is public information, it may be copyrighted, but it is public". Then it's a question of whether or not the Wayback Machine is considered "fair use", and I believe it is. If it is, then you can't stop them. End of discussion, right?
Now, if you don't want this stuff to be publicly accessible on the web, there is now a precedent (set by Google) for SSL sites. There is also the robots.txt convention you mentioned.
The only real issue I see in the archival sites is "How do they know that domain ownership changed hands?". If a porn site comes along and buys the domain after you're done with it, how does the wayback machine protect you from inconsequential damages that might arise?
I don't know... But I do know that the web and the internet in general was never intended for privacy or copyright, as such, and maybe we just need a new protocol?
Dave
If you want to OPT OUT, then don't put it up on the net. The NET is a public utility, put content out there and expect it to accessed, cached, and backed up in numerous ways by LOTS of individuals intentionally or un-intentionally. If you want your data private DON'T put it on the net, seems fairly straight forward and simple.
errr....umm...*whooosh* *whoosh* Is this thing on ?
So the wayback machine is just fine by me.
I'm of the belief that if you put something online that its public domain for anyone to do whatever they want with it.
Information wants to be free as long as it doesn't involve me or what I don't want to see...then i want total control because I am a hypocrite.
Yesterday I used the Wayback Machine for one of the lawyers at the law firm I work at to prove that a company at one point had an office in a certain location. The company in question was trying to duck out of a contracted agreement by saying they were not the people who signed the contract.
The Wayback Machine proved that they indeed knew of, approved, and granted authorization to this specific office, and the other people had a valid contract. In this specific case, the Wayback Machine prevented an apparently scumbag company from trying to screw some apparently good people over.
Kickstart
This wayback machine is invaluable!
I was able to travel back to the early days of internet pr0n (click here to launch sex.com from '96) and research ancient authentication methods including "Click here if you are over 18".
http://www.archive.org/internet/remove.html
There are really two issues: 1. Should the archives be made? Which is what everyone seems to be discussing, and 2. Should the archives be publically accessible?
I agree that any interpretation of copyright law that says the answer to "1" is "No" means that copyright law needs to be changed, not that it is "illegal and therefore immoral".
But a case can be made for "2" that the distribution should only be made for when copyright on the material has either expired, or could reasonably be expected to be expired. Which brings up two other issues, which are the absurd lenght of copyright materials, and the near impossibilty of determining if a material is still copyrighted.
So, I don't have any answers, just better questions.
If anyone has ever heard of the Library of Alexandria it was supposedly the most impressive knowledge base the world had ever assembled. Some crazy guy came by and burnt it to the ground -- setting the entire industrialized planet back hundreds perhaps thousands of years. We are now in the process of surpassing this great library, and are making it even easier for people to have access to knowledge. That knowledge may be porn, may be the morning news, or sports scores, it may even be how to construct a nuclear bomb. Nevertheless it is knowledge and EVERY person who is alive has the God (and any other higher power) given right to knowledge, despite what any government agency, or copyright may say. 21st century libraries such as the WayBack Machine are providing the tools necessary for researchers to go "back to the future." This is a great service to mankind, and it's overall importance should not be outweighed by greedy, and or overparanoid privacy rights activists. If you do not wish to be known, please do not post any information on the web, and move to the jungles of Africa and step away from a time and place known as the PRESENT.
What would happen if the wayback machine starts archiving its own site?
Would be an archive site that kept versions of news articles before and after they were changed by editors. Often, an article making allegations of corruption or bad intent gets changed shortly after it is published, and the replacement gives a more neutral stance, which doesn't give readers the whole story anymore, and in many instances makes the story a non-story, leading me to wonder why it was even published in the first place.
You see? You see? Your stupid minds! Stupid! Stupid!
A friend of mine discovered that that Google groups, when searching on his name, is reporting lots of spoofed postings to Usenet under his name. Really assinine wierd stuff. Not at all my fiends style, but a prospective employer might not understand this or even give him an opportunity to explain. Not only that but the tons of follow-ups quoting the articles and atributing it to him. I don't know what the hell he is going to do about it since Google says its his problem. Get all the the follow-up posters to understand and remove the flame-fest posts? Ha! Is Google a publisher or Republish of this I wonder. Anyone with a legal opinion? It certainly could dammage his reputation when people do searching about him. Life in the new era, Yikes!
There's a difference between copies made as a necessary part of reading, and the copies made by the Wayback Machine. The very nature of the Internet means that an intermediate copy must be made to read a web page. That's a fair use copy. A browser cache copy makes reloading the page more convenient for the user, and doesn't give profits to anyone--again, fair use.
Retaining a copy indefinitely and serving it up to other users isn't the same thing at all. The Wayback Machine isn't a necessary part of using the Internet. With the addition of a banner ad, it instantly becomes an income-generating enterprise. In that case, clearly there's a copyright issue, because they would be profiting from the work of others without compensation.
By putting up a web page, I'm not giving any permission for people to copy it beyond those copies strictly required to view it in the first place. Putting something up for public view does not place it in the public domain! U.S. law is quite clear on that count.
If I put my garden tractor on the front lawn, where others can see it, does that give my next door neighbor the right to come take it and use it on his lawn without asking? Nope.
Why are libraries different from Wayback Machine? Photocopies are expensive. It wouldn't be cost effective to photocopy a whole book. It'd be cheaper to buy a new copy. The costs of copying a web page are much lower... so there's no disincentive that keeps people from violating copyright flagrantly. In this case, though, it's not about profit like it is with the record companies and MP3s. It's about an author's right to decide who may profit from his work. Even if Wayback Machine isn't in it for the money, their reputation profits from other people's work. At best, this is a shady practice.
A better analogy would be: Wayback Machine is a public library consisting of photocopies of books. Anyone may check out books. It costs them nothing to do so. No profit is made... but photocopying the book in the first place was still illegal, because of copyright. Once the library buys a book, the First Sale Doctrine says they can lend it out. There's a consideration paid for the work. Wayback Machine isn't giving any monetary consideration for their use... and they aren't even being polite and asking permission!
(If someone sets up a tent on your lawn and camps out, do you think the cops will not arrest them for trespassing if they say "hey, the property owner never said we couldn't!" ? )
Like it or not, copyright is the law. Everything created in the U.S. has copyright invested in its author from the moment of creation until the copyright expires (if Congress ever permits that again) or until the author explicitly places it in the public domain. Publishing it, whether on paper or electronically, doesn't put it in the public domain. If I printed out my web page and handed the printouts to passersby on the street, I'm not giving away my copyright on the work.
Strange that such a complaint would appear within a group expousing that "information wants to be free." :)
Not strange at all.
Slashdot is not populated by a bunch of lockstepping conformists. Its postership is large and diverse. The individuals are NOT the average, nor are they the stereotype.
Perhaps on the average the posters think that IP laws are 'way too tight. But some think they're too loose. Post an article about somebody making them tighter and the make-em-loosers will complain, post one about somebody apparently not respecting them at all and the make-em-tighters will sound off.
Further: Few if any Slashdot posters think a published author has no rights at all over the distribution of his work. (How would Copyleft work if that were true? B-) ) So when it looks like a service may be copying and republishing past works far beyond the authors' intended distribution they may sound off.
And even the most fanatic of the "information wants to be free" faction may still post a cautionary note about how a particular act of radically freeing it may attract opposition.
Which seems to be what happened here.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
no all the mp3 snorkers out there now KNOW what its like to have their warzes stolen. Its exactly the same thing. Just with a different set of files. If it is library of congress they made themselves exempt from these sorts of laws YEARS ago. They want to keep large volumes of recorded, non recorded, digital, or whatever. To help preserve our history as a nation... What they are doing is a good thing. Even if there is copyrighted work on there. You can get the same sort of service just by going to your local public library and requesting some copy of something from them. They may charge you to do it to cover the shipping costs. But they will do it. They have been for years trying to make it easier and easier to get at their content. think Ill go try mp3.com on there.
As an interesting aside to this, William Powell, author of everyone's favorite tome of tyranny the Anarchist Cookbook, publicly denounced the book. There used to be a note from the author on Amazon urging people not to buy the book, but I see that it's been removed. Guess Barricade Books wasn't too fond of his idea.
I've got some encrypted messages which I posted a long time ago that have been archived. I'm not going to tell you where, you'll have to find them yourself. Their contents are not catastrophically embarrassing, but they're definitely not something I would enjoy having out there completely in public view - hence the encryption.
My problem is: the encryption I used when I was working on a 386 is now trivial to decrypt on modern machinery - potentially rendering my messages fully in the public view - at least to anyone who is marginally motivated.
If Internet archiving is more than a passing trend, I urge you to be very careful about what you put online - period. Encrypted content may be safe now, but when you're applying for a job 20 years down the road and your potential employer can view all of your PGP'd email from today, you might have one less job opportunity.
I'm not even inclined to entertain thoughts about how bad things could get for you if the changing climate of politics were to count your antiquated encrypted correspondence as disloyal.
Waaaah. Waaah. Civil Lib-babies want to opt out of consequences!
So making archival copies of copyrighted material on the Web is bad, but making copies of other copyrighted material (musoc, etc.)is okay?
Boo hoo - how dare somebody copy my Web site! The nerve of them!
If you dont want your data to be cache'd, then put it behind a password protected site.
You put something on the internet, and its going to be cached by a lot of places, some places may dump that cache weekly (proxy servers) or they may stay up for a while (google).
Seems to me, that everyone seems to be a little confused. Any web site, is like a book. Once it's written and posted, it's out in the world. So, if, by this opt-in/out conversation is carried all the way, I guess we can burn the bible/koran/budda texts... etc.... since they were scribed in the past. The Net NEEDS an archive of past sites/pages/texts. Without it, how can one qoute/research past ideas/thoughts and back it up with an actual document? Some one PLEASE explain to me, why a site, that is in efect a library of history, is causing such a stir... I just don't get it. If retaining history is such a bad idea... lets burn all books and forget about the past.... OH... unlike 'spam', you don't have to look at it... and every browser I know, has a history...so do I need your permission for that to? I think I see a class action suit against all browser makers...
/. does suck these days, nothing but a**es like me
Deal with it people...or stop writing something you consider "private" (PERIOD)
NOW GET OVER IT FOR F**K's SAKE.
oh, btw, yes
(censored by me, typos and spelling mistakes by me too, deal with that)
Comment removed based on user account deletion
Remember that once you've made a public statement in the real world, it's out there and there's nothing anyone can really do about it. You can issue(*) a later correction, retraction, clarification, or whatever, but the original doesn't go away, despite what politicians and other public figures might wish. Now that we all have access to the world's screens, we need to be careful what we say if we care about later consequences.
(*) There are other aspects of the Web that make this difficult. See the "Related Projects..." section at crit.org for more.
(apologies if I'm fuzzy on any details)
Credo sim. - I think I am.
Putting something on the web is 'opt-ing in' to allowing people to look at it, index it, and save it if they wish. Do you have a problem with search engines? If not, you shouldn't be complaining about this. Personally, I think the WBM is a good thing (tm), but I think webmasters should be able to opt out if they want.. for whatever reason. I don't know for sure, but maybe the WBM reads robots.txt?
...until the Wayback machine archives Google...and Google caches the Wayback machine...and the Wayback machine archives Google...and the entire internet gets sucked into a singularity.
1) I was glad that they had one of my old pages on there. I lost it due to a crash (my brain crashed and I wiped it out). I was able to pull it back off their site and get it right back running.
/., but sometimes the irony is just too much.
2) Are we not the same collective group that gets mad at NBC for not wanting us to use our Tivo's? I realize that there are a crapload of people on
I wonder if the archives contain any information that the government has since taken down in the wake of 9/11?
The wayback machine goes back to 1998. Upon searching, the slashdot search engine goes back to 25558. Here's a link. Problem is, with the older slashdot archive articles, there's NO YEAR. :-( I don't wanna go back through each page marking every december month, trying to figure out about where they occur, counting the years backwards.. someone else maybe?
-DrkShadow
Arguing otherwise is like saying retaining old copies of magazines after the new ones have come out is an infringing use of those magazines.
I found some information on the Wayback that I would really like to archive myself - for legally defensive reasons (i.e. trademark use, and to kill patents).
:)
Is there a way to archive sites from the Wayback machine in a clean (linked) way? I tried using standard web downloaders (Webreaper, Offline Explorer), but they didn't work correctly. Their FAQ says it can't be done, but for some reason I don't believe them...
Anyone have advice? Thanks.
Just because you are too lazy to create a robots.txt doesn't make me feel sorry for you. I have had nothing but good luck with the Wayback Machine. I was able to find work I did 4 years ago that I thought was long gone. I was able to find phone numbers to people who had taken down their web sites. I was able to research press releases and license agreements that had been changed by the authors (without telling anyone!).
So, in my opinion, the Wayback Machine is a great tool to data-mine the past. Just because you don't like it -- tough shit. Create a robots.txt file and maybe you won't get spidered. Your argument is weak.
IANAL, so I can't comment on the legality of archive.com. However, based upon my own sense of "fair play", I think that if you put information on a public web server and allow people free access to it (as opposed to making people pay to view), you can hardly cry flow when somebody actually makes use of the information. If you didn't want the information to be distributed across the Internet, then why did you post it on there in the first place?
I could see you having a beef if somebody took what you put on your web page, copied it, and claimed it as their own work. That is wrong. However, archive.com doesn't claim to have authored the pages. Visitors know up front that it is an _archive_.
------
www.moneybythenumbers.com
If Wayback Machine is slashdotted, can I find their archive in Google cache? And does the Wayback Machine archive items from Google cache? And does Google cache the Wayback archives of its cache? And does Wayback archive the Goog**stack overflow**
"Only the small secrets need to be protected. The big ones are kept secret by public incredulity." - Marshall McLuhan
Have you been stalked by Seth today?
Ok... a lot of people here seem to be bringing up robots.txt as a solution, that it is the same as indexing. It isn't.
People are saying that it is like libraries being asked to pull papers, journals, books, etc. if the publisher no longer wants them published. It isn't.
First, indexing is not making a copy of the pages at all, most indexes work via a dictionary method, and store at most phrases of about 3 words. Those words/phrases are entered into a database. When you do a search for "microsoft windows" the database looks up that key and sees that the following URLs have that word/phrase in it, it does that for all the words and phrases and subphreases and then computes the join of all of the records... the result is what you see on your search (basic theory, everyone does it their own way). If one could even considering it archiving (only operating on words or short phrases) it would certainly fall under fair use. What the wayback machine is copying WHOLE documents (questionable), and then REPUBLISHING them (just plain wrong, and in violation of copyright unless the author gave them permission or placed the document in the public domain). So now that we have established that indexing and archiving are different, how is robots.txt going to help? If I use it to prevent archiving, I apparently prevent indexing; that is not a real solution. (note that in this case I should need to opt-in to archiving, as the default of copyright is to NOT allow others to copy my work.
Ok, now to address the issue of libraries (or individuals for that matter). It is quite obvious that a publisher cannot reach into your home or library and pull back something you (or they) purchased (well, unless it has a EULA which says they can, but that's a different flame). What they do prevent is copying, regardless of if the publisher stoped printing it or not. If your library or you started copying entire papers, books, software and gave it out to whoever asked and was discovered do you really believe that you would be allowed to continue? (Makes a lot more sense when you look at it like software, doesn't it? Same laws apply to both).
Ok, so what is the solution? The simplest... keep archiving it, and publish it in 70 years, or 90, or whenever the copyright expires. No it isn't "what people want", it is however the "right" solution (legally).
if say a federal agency or someone was tracking a child pornographer who had erased his site's content along the way, if archive.org had archived bits of the site they could find enough evidence to take the owner of the website to court and perhaps jail him for life for distributing child pornography
My ancient vanity site that received no traffic, nor deserved any, has been duly archived. I'm dying of embarrassment at my rudimentary HTML- back in the day.
My question is why I was even on their radar?
Those that suggest you "dance like no one is watching" really want to see you make a complete fool of yourself.
If something is put out there for everyone, then why shouldn't there be an archive system of somekind. All it seems to look like is a simple, historical structure of what has gone on.
Complaining would be like Disney complaining about "Steamboat Mickey" being played again & again. Unlike them (Disney) you can't show any loss of revenue from a quick replay....
"Good, bad, I'm the guy with the gun."
wayback is new? if you have been on the net for any time at all you would know that the wayback project was started many years before your website. They have been capturing this data for at least the last five years now.
you don't lose control of your copyright because wayback has a copy. they have a copy made under fair use for research purposes, which is how libraries can copy vasts amounts of work for college courses and research.
... you are a moron. You publish stuff, it gets seen and archived. Tough. Can't be helped, move on.
I am sure Dave Winer would not find the previous post ironic, nor funny.
You have the right to something once you download it?
If I copyright my content, other people are not allowed to distribute it without my consent. There is no way around this. I don't have to add extra disclaimers, just a copyright notice. How can there be any arguement about this?
Ok, someone GPLs some software they wrote and put it on their website. If you download a compiled version of the software, you can't redistribute the compiled executable without making the source available. Why? Because the copyright owner (via the GPL) only gives you permission to redistribute if you also make the source available. The owner can do this because the GPL is backed by copyright laws, just like copyrighted web content. Notice I said owner, because the law grants special priviledges to people that create content and copyright it. There is no implied social contract that says the content is up for grabs. And there is also no reason fair use even comes close to applying if you are talking about a large quantity of content.
I do think the archive provides a useful service, but I think they are on shaky legal ground.
personaly I think that society will very soon progress to a stage where the line between human memory (in our brains, information stored by biological reactions) and computer memory (information stored in bits and bytes, currently on magnetic disk drives and silicone subtrate "memory" chips but feasably in the future will be stored on biological or sub-atomic storage arrays) will become so blured that society will cease to differentiate between the two.
I ask you to ponder the difference between the copys of the website/BBS/usenet comments everybody is so paranoid about which are stored in a readers humany memory (admitadly in 99.9999% of cases inacurate) and the copy stored (perfectly) by electronic means. Is it breach of copyright if i tell my freind what you said in usenet ten years ago? What if I forward him the post?
As a species we have progressed by learing from our parents, if in the animal world parents were to refuse to to teach their young they would die very quickly. If we had had to re-invent the wheel every generation im sure I wouldnt be posting this comment now. We have progressed as a species by passing on information, and electronic copies of data are merly an extention of our own memories which have the advantage of of being a lot more acurate than the human memory. This however doesnt address the line between what is public and what is private....
And what purpose does that serve other than to discriminate against people. Would you hire a usenet troll? What if he was trolling alt.os.windows? Would you hire someone who got into flame wars? What if they were flame wars against the win-trolls? Would you not hire someone because they subscribed to a scientology group, what about a gay/lesbian group?
Using any evaluation tool which places any applicant at a disadvantage is not only unscrupulous, but also against employment laws in the US.
Perhaps you did it to find the people who blasted you out of ng's for being a kook and wanted to hurt those people like they hurt your feelings.
OOOOOh maybe you based employment on mispelling on usenet. All those damn illiterate people who wrongly understood usenet to be for informal discussions and didn't mind their P's & Q's and dot their i's and cross their t's.
Well I guess you wouldn't hire me because you would have seen all the people I called "fairy-mushroom-rapists" and "meal-worm-cornhole-fucker." But that is good, because I really wouldn't want to work for a "tutu-wearing-mole-humping" net kook like you.
PS.... I work for you, you "snivel-shit-cavitated-peanut-headed-half-wit."
Slashdot stories.
has a way back machine, so it must be good.
Obscure reference only hard-core News Radio fans would get, but not necessarily find funny...
transaction companies decide to integrate
their historical transaction databases.
That way, when this game is over, we get all of our money back.
?sp
The internet is a medium for sharing information. It was created for military and later for educational sharing of data and other information. Commercialization of the internet and copyrighted content is nothing but a bastardization of it. Simple truth: The internet is a giant collection of stacks of papers. If you grab a copy of one, no big deal. It was provided without cost anyway. As long as you don't claim that you own it or that you created it if you didn't, there's nothing wrong with it. After all, facts cannot be copyrighted. If someone creates an archive of the internet and makes it available freely, I see no reason whatsoever for anyone to object without a flagrantly correct reason why not (i.e. you indexed my passworded site, you publicly published my email that I didn't make public, etc.) but when you make something freely available to the public over the internet, you have no justification for complaining if someone passes it around in its original form (or an unmodified text doc of it, if it's something that can be distributed in that manner, ,for that matter.)
In SOVIET RUSSIA... erm...NSA AMERICA, the Internet logs onto YOU!
The bigger issue is the rudeness of the archive in ignoring robots.txt and rifling through files that one does not wish to have linked or accessed (e.g. stuff under development that isn't ready for 'prime time' yet).
got such old data you'll read the above message.
http://web.archive.org/web/19991008013724/http://w ww.goatse.cx/
Goatse.cx from 1999!!!
I am an adoptee.
I posted a message on May 17th, 1989, on what was then the FidoNet ADOPTION message board. It was then gated to UseNet, and sent presumably across the world, seeking my birth family.
Because my mother found that message, in a Usenet archive, on the Internet, on May 17th, 1999 (ten years later), we have met. I know who my family is.
If you think archives are bad, Fuck You.
"Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/
I'm sorry, this seems ignorant to me.
You can't just put information on the internet and only have people using it the way you want. Once information is on the internet, available to the general public, it's no longer your concern what is done with it. Saying it's wrong to archive something you've found on the web is almost exactly the same as the RIAA saying you can't convert a CD to mp3 for archival purposes... Heck, it's even worse because we're supposedly talking about *FREE* information here (I'm assuming the Wayback machine doesn't try to crack through protection in order to archive private data, but correct me if I'm wrong).
It's your own responsibility (and it always has been) to make copyright information available alongside the documents being accessed, that way this data will be archived along with the rest.
If you want to put something on the web, you have to deal with the pros and cons. You can't have it one way without the other.
"But the cars are all flashing me, bright lights are passing me, I feel life passing me by" - Stiff Little Fingers
I sincerely hope that they don't ever really delete things, and that they ignore robots.txt as far as archiving goes. It's fine for them to not serve back your pages if you ask them not to. For a while. Say, until you are long dead.
But this information might be interesting to future generations, and frankly, any librarian or archivist owes more to those unborn people than they have any obligation to obey your transitory wishes.
Copyright laws change.
Oblivion is forever.
if I travel back to 1994 and i'm using Mosaic in 1993, I could go look at my old Mosaic web page. But, if I'm still using Mosaic in 1993, how could I have loaded the Wayback javascript page in '02 and traveled back to '94? Oh, no, I've gone cross-eyed!
Relive an old browser at http://dejavu.org/
Not only does it respect robots.txt, but it does so retroactively. In other words, if you create a robots.txt which blocks your site today, all previous content will be blocked the next time they spider your site.
I know this because we had a robots.txt blocking everything on our site during some new development when the Wayback Machine was announced, and found that we couldn't access anything. Fixed up the robots.txt and now we have archives going back to 1996.
I don't care if it's a simple test page or a great work of art -- if someone made it, it's theirs. It's not public domain. It's not free-for-all.
When you post something on the web, you don't forfeit copyright. Since when does robots.txt supercede copyright? That's ridiculous. I have pages with copyright notices in the Wayback Machine -- they chose to ignore them.
Web authors depend on copyright laws. Open source software depends on copyright laws. The only way you can enforce GPL is if you have strong copyright laws.
The web is dynamic, immediate, and conversational. People express their ideas freely. This is the way it should be. An archive threatens that freedom. Have you ever pulled out a video camera and watched people's behaviour change? They act a little differently when they're being recorded, right. I think content on the web might change too when people find out that everything is being stored in an archive.
I don't have a problem with the concept of an archive, I just have a problem with the Wayback Machine's implementation. I appreciate the desire to preserve knowledge and information, but an archive needs to be made openly with the cooperation of web authors and administrators, not clandestinely by a third party. It needs to be "Opt-In" only. Right now it's without the knowledge or consent of the site owners.
Here's the oldest copy of Slashdot that seemed to work on the Wayback Machine: Nov. 11, 1998. It doesn't look that much different design-wise, but the atmosphere of the comments seems to be significantly different.
The whole list.
It has allowed me to go back and view a website I designed for NTT (The Japan Telco) back in 1998. As it appears that internet avatars aren't part of their business anymore, it nice to be able to show people, "Hey look I did this!" :)
Also, it allows me to go back and laugh at failed prost production companies that had websites. (www.brickhouse-editorial.com)
It's mandatory to wash your hands before returning to the land of Dairy Queen.
PacBell just jacked up my DSL price from 39 to49 a month. When I called to ask why, they lied and said the price was a special promotion.
Considering I signed up in March of 2000, how the hell should I remember?
Checking back in the archive I can find that they never said it was a promotion at all. They had a promotion where they gave away a free computer....I never got a free computer.
There seem to be two main objections. Concern over copyright violations? These are all items that were freely available to the public. They are also all no longer available. sounds like abandon-ware. If you want to make your work private, put up a password.
The other concern seems to be people embarrassed over something they used to have up. Who publishes something with the expectation that they will be able to disavow the publication in the future? Don't write it if you think you might be embarrassed by it later.
I propose that the possible positive impacts arising from consumer protection vastly outweigh the concerns over
1) a non-profit making free copies of something previously provided for people to freely copy.
2) poster's remorse (to poorly coin a term)
Here's /.'s history /. around December 20, 1997.
Here's
If you're religishitty, KILL YOURSELF!
Ages ago, I put a quick bit of javascript in the HEAD of my webpages that checked the URL and if it wasn't as expected, booted you to where you should be.
As a result, when I heard about the Wayback Machine and tried to view old copies of my website I got booted to the current one...
Oh well.
~Fizzgig
I read an article about the site.. the project has actually been running since 1998 - thats when they started collecting peoples websites, and adding hardware to their 'collective' to store all the data.. they only made the site public in like 2001 (or whenever it was) despite collecting it for so long.
I think if you use the Wayback Machine to go back to their own site in 1998/1999 their front page tells you this.
"Hey! Unless this is a nude love-in, get the hell off my property!!"
I'm sure the reason that slashdot posted this story was because they knew it would be flamebait. And, could you think of an easier way to increase traffic to a site like slashdot if their parent company complained to the slashdot founders that their site's banner click-through numbers had dropped? And under the threat of losing funding for this money consuming venture, I'm sure the slashdot founders would be like OSDN lapdogs.
And, like a true democracy, this article will be moderated out of existance.
Two comments from me: 1) What do you have to hide that you're so bothered by archives? 2) OK, so now we go and copyright history as well?
Actually it seems that the wayback machine contains wrong contents also. For my site (www.wastl.net) it reports for the year 2000 the start page of MS IIS. However, the site has been running under Linux since it exists and I have always had full control over DNS...
Makes me think even more. Actually this is kind of forgery.
Sebastian
http://web.archive.org/web/19980113191222/http://s lashdot.org/
With today's shitty economy and the multitudes of "web developers" who are out of a job, the wayback machine is a real life saver.
Most of the projects i've worked on the last few years have gone bunkrupt for this or that reason. The sites themselves were tecnicaly fine, and there would be no way for me to prove i've done them if it wasn't for the way back machine. My CV/Resume would have been reduced 50% if i couldn't back up a project description with a url..... It's not the developers fault people can't run their company (or actually, CAN run it, but only towards the ground....)
And besides, we've all witnessed some of the dumbest ideas ever been put online in the name of 'business'... how else can we teach our children about 'stupidity' if we can't show them those blunders? eh?
"If we worked on it, we want to show it off" (mao tse bong)
-----------RL------------ http://www.harelmalka.com
Hey, you published the stuff in the Web, right? You had the work to get an HTTP server up and running (or leased one), put the right files in the right dirs, and voila... there were your pages. So, that's it. Now, to the information to be un-published, you have to take special steps (opt-out).
---h.
It's better to be the foot on the boot than the face on the pavement. ~~ tkx Kadin2048
Oh, how I love those geeks who cheerfully pirate mp3s and warez, support p2p networks and such and then when somebody absolutely legally take their puclicly distributed content they raise that fuss all over Slashdot. Come to think of it "archives are like SPAM" !!! Even RIAA speaking heads haven't come to such a ridiculous formula.
I think things like WebArchive straddle that line ... in some ways, they're making a snapshot of the entirety of the Internet as a whole and providing access to that. However, since it's done by copying data from individual servers, it isn't really all that similar.
--- Jason Olshefsky
Karma: Poser (mostly affected by adding this line long after everyone else did)
In one of the articles about the wayback machine, one of the creators commented that the amount of content on the internet has an upper bound (5 billion people typing 60 words per minute, 24 hours per day.. etc)
I was just thinking, if Google caches the Wayback machine, and the Wayback machine caches Google, don't we have an infinitely growing, ever changing cache? (Assuming that the systems constantly checked for changes in the other site...)
> Strange that such a complaint would appear
> within a group expousing that "information
> wants to be free."
Who told you we were a "group" ?
BoD
BoD
How much should employers find out about you based on the Wayback Machine?
Well, frankly, if your primary concern is not "how you feel" but "whether other people will view my site or not", then you should let WayBack do its job.
Whenever I've gone to a web page and found that they've blocked themselves (usually only obvious if their main page is unavailable on WayBack) I know that the people running that website don't give a damn about the content there. People who underestimate the value of content usually aren't worth my time; they say stupid things, like "Why would I read a book?" or "So what if the plot sucks, it's just an action movie."
If your concern is appearing intelligent to your customers/readers, then you want WayBack crawling all over your pages. If you have no such concern, then feel free to tell WayBack to stop archiving you.
As for why you should have to ask someone to remove their copies of your crap from their archive...you offered it publicly. If you can't handle a public archive of your public site, get the hell off the Net.
Think of the trust you can gain when a user ends up at the WayBack and sees that you've been publishing for X years. Think of the spirit of cooperation produced when you tell WayBack, "Yeah, archive me. I'm a valuable source, and I participate in a community."
WayBack is a free resource. If what you're doing has no return value, no ability to be updated, and no reason to be archived...then what's worthwhile about what you're doing?
-----------------------
You are what you think.
The company I work for (a big telecom concern) has wayback blocked by the s-e-c-u-r-i-t-y p-r-o-x-y f-i-r-e-w-a-l-l.
Your old content may not be getting seen by as many people as you fear.
Actually, the day you posted your web site, way back when, you gave permission. You put that content out on the web with the intention of allowing others to view your material. The fact that some people chose to store it and redisplay it is simply an element beyond your control.
Wow, and you just admitted to being a pothead on a page that will end up in Google's cache forever. Great idea.
Oh no, everyone will know the one and only "zootread" is a pothead. What will I ever do? I've ruined the reputation of my web alias. There's only one thing I can do now... cough hack cough.. damn, that's some good stuff. now what were we talking about?
Zoot!
I believe that the Wayback machine started as "Alexa" - a browser enhancement tool for MSIE and netscape that predated (and probably gave the idea for) "find similar" buttons on modern web browsers. Alexa was a very good tool, and I used it. One of it's features was the cache which did archive items - that in itself has turned into Alexa's killer app. With spidering and the alexa clients still out there, I'm not surprized if they have stuff from 1998.
As for "opting out" - legally, you don't have a leg to stand on. Wayback does acknowledge robots.txt, and you published the information publically. Wayback provides a service - the ability to archive the internet, that would not work under "opt-in" policy, the "opt-in" is neither intrusive, nor is it illegal, nor is it a violation of copyright. Implied in the fact that the material was published online is the fact that in order to access that information, copies of it would be downloaded to multiple hard drives - otherwise the information would not have been accessable. Once on the hard drive, the physical bits and bytes become property of the user and can be accessed at any time. (while the content those bits and bytes translate into may remain yours, copyright wise.)
What the Wayback machine does is take the cache on the hard drive and makes it available. It makes no claims to ownership of the property, it provides opt-out mechanisms to the owners of the property, and it does not alter the content. In that respect, I cannot possibly see any violations of copyright law. Yes, through advertising they may make a profit on other people's work, but if a specific complainant wishes to have his work no longer indexed by Alexa or Wayback, the information is removable. Lack of complaint becomes implied consent, until such time as one complains.
The similarities to SPAM are not useful in the least. Spam is unsolicited advertising that forces the recipient to bear the burden of the reciept of the message. Wayback's "victims" have recourse and are not bothered or harrassed. Furthermore, Spam provides (usually) no service - unlike Wayback, which provides an archive.
Yes, there is concern for copyright - Wayback does redistribute material that perhaps the origional owner did not want redistributed. But the "opt-out" mechanism doesn't impede on the operation of the database nor the owner of the origional content further than is required for routine maintainance... Cease & Desist letters are neither nessessary nor effective, since Wayback has the recourse of saying that non-authorized material may have been removed at any time.
I have found Wayback invaluable. In high school, I took alot of web design jobs on the cheap that now look good on a resume. Since many of those companies have gone under, however, I thought these items were lost permenantly - Wayback has been a savior.
Brian.
I'm rather amazed that the wayback machine found *6* old versions of my college-days website! Does anyone have an impressively obscure website they've found on the machine?
... in the same way that water wants to run downhill. Finding it strange that people object to certain uses of their information is like finding it strange that people object when you spill their beer.
--
E_NOSIG
I just discovered the Washington Post killed off NewsBytes.com, and I had three of their articles in my timeline. Unfortunately WebArchive's last Last NewsBytes record was Jan 24, but I recovered one article.
o m
/. gives me such hell when I do that I now use plain text.
IMO it's hard to use the WWW as a *serious* resource when stuff like news articles just *vanish.* Or, arguably worse, get silently diddled.
BTW does anyone know there's such a hole in Web Archive news-site records from mid-July to 9/11?
my thing: http://geocities.com/hclsmith/my-tl/
WA's NB records: http://web.archive.org/web/*/http%3A//newsbytes.c
Sorry I didn't make pretty HTML but
Oh wait, so you like it when the same question gets asked time after time on USENET? Yes, yes,
there are FAQs, but they aren;t always read and
don't contain everything. And what if it isn't a FAQ, but merely an occasionally recurring question?
Are search engines spam? No? Right. They aren't.
And yet you must manually-opt out with robotos.txt
(which doesnt; guarantee your protection).
Were that I say, pancakes?
This site has a more historically accurate analysis of the burning of the royal library. There is no possible basis to the claim that the planet was set back hundreds or thousands of years by this event, as Mediterranean civilization at the time was not much more advanced than it was 100 years later, if it was more advanced at all (in either of the three possible dates suggested - it seems most likely the crazy guy was Julius Caesar). Also, there were several other relatively advanced civilizations in existence at the time (e.g. China, India) which were completely unaffected. We have already surpassed the achievement of the library several times over: the most inflated accounts of the Alexandria holdings number 700,000 scrolls, which is orders of magnitude less information than contained in say, the Library of Congress. When we lose an information store like a library or an internet archive, the greatest loss is not to the advancement of industrialization, which tends to work on human expertise, but to the knowledge of later historians and anthropologists. The lesson we should be learning is that a single repository of information presents a single point of failure, and the wayback machine presents a means to keep our history from disappearing.