Wayback Machine Safe, Settlement Disappointing
Jibbanx writes "Healthcare Advocates and the Internet Archive have finally resolved their differences, reaching an undisclosed out-of-court settlement. The suit stemmed from HA's anger over the Wayback Machine showing pages archived from their site even after they added a robots.txt file to their webserver. While the settlement is good for the Internet Archive, it's also disappointing because it would have tested HA's claims in court. As the article notes, you can't really un-ring the bell of publishing something online, which is exactly what HA wanted to do. Obeying robots.txt files is voluntary, after all, and if the company didn't want the information online, they shouldn't have put it there in the first place."
Check out their robots.txt: http://www.healthcareadvocates.com/robots.txt They ONLY restrict Internet Archive, from accessing their web site, but don't restrict any other spider... Haven't they heard of Google's cache?
Another example: someone I know wrote an essay that he thought only people in his class would ever see. It contained one or two mildly embaressing disclosures, not terribly personal, but not something you'd want a complete stranger to know about you. Some idiot put it up on the school web site without his permission.
Here's a nasty possibility. Suppose somebody unintentionally publishes information useful to terrorists. DHS drops by and points out the error, and the information is withdrawn. Does Wayback Machine have a right to keep the information online?
In fact, Wayback Machine has never asserted their right to keep anything online. As the article points out, they'll remove stuff that's noncompliant with the current robots.txt, even though it was compliant at the time it was spidered. This lawsuit wasn't about their right keep stuff online. It was just somebody accusing them of being negligent about enforcing their own policies.
Many people think of the Wayback Machine as being a tool for history and nostalgia. However, consider copyright expiration (IANAL, etc.). Many web pages have items like "Copyright 1995-2006 Blah". Some of the content was created as early as 1995. Assuming, of course, that items created in modern times eventually have their copyright expire, we will need a record of the content of these pages at that time.
As more content moves online, the idea of publishing a work becomes blurred. Revisions years later can effectively update the copyright of the work, if the reader cannot distinguish when the content was created. So the Wayback Machine will hopefully provide that resource. The amount of potentially public-domain content there is huge.
As a side note, it will be interesting to note when the first GPL programs (for example) lose their copyright. Of course, by then, the languages will seem more than archaic.
"The universe seems neither benign nor hostile, merely indifferent." --Carl Sagan
You missed the best part of the quote.
If I post your credit card and bank information on a forum site, does that mean it is now public domain and you have no protection?
If anything bad comes from it, it only means that the banks employ weak security. That information by itself should mean nothing. Complain to the financial institutions, not the person who posts it. Make it the bank's problem and it will go away. Don't use their services until they make it secure without making it unduly inconvenient for the customer. The silly passwords and 20 minute waits for failed logins do nothing for security. Make financial security the institution's responsibility instead of suppressing the flow of information. And furthermore, you know what you can do with your copyrights. If you don't want people to use your photos keep them to yourself. If you don't want your information divulged, then don't reveal it to anybody.
What?