Wayback Machine Safe, Settlement Disappointing
Jibbanx writes "Healthcare Advocates and the Internet Archive have finally resolved their differences, reaching an undisclosed out-of-court settlement. The suit stemmed from HA's anger over the Wayback Machine showing pages archived from their site even after they added a robots.txt file to their webserver. While the settlement is good for the Internet Archive, it's also disappointing because it would have tested HA's claims in court. As the article notes, you can't really un-ring the bell of publishing something online, which is exactly what HA wanted to do. Obeying robots.txt files is voluntary, after all, and if the company didn't want the information online, they shouldn't have put it there in the first place."
I want a search engine that only indexes items excluded in the robots.txt file :-)
What's interesting is that I've heard of robots that do that exclusively. It may of been here on slashdot, but I've heard of people putting stuff in their exclude list in robots.txt and some robots _ONLY_ searched those files.
Check out their robots.txt: http://www.healthcareadvocates.com/robots.txt They ONLY restrict Internet Archive, from accessing their web site, but don't restrict any other spider... Haven't they heard of Google's cache?
So by the logic, if I didn't want AOL to release my search information I shouldn't be mad as it's my fault to have used them in the first place?
You never intended to make your search results publicly available. These guys intentionally made their web page publicly available.
Or that if I want my copyrighted information to not be republished by someone else, I should just simply not publish at all?
That's a better point, but the question is whether the Wayback Machine "republished copyrighted material". If they instead archived material available in the public domain, it is a different matter entirely, regardless of what the creators of that material want.
How about, if I don't want my GPL code resold by someone in a closed source product I should just know better and not put it out in the open to begin with.
If you don't want something to be used freely, don't release it into the public, unless there are legal protections in place. If it's the GPL, people are legally forbidden from incorporating it into a (publicly released) closed source product. If it's the LGPL, people can do so. If you don't like that, don't release it publicly.
And that if I post something stupid when I'm 9 we believe it should follow me around throughout my entire lifetime, because a 9 year old should know better.
This is a fact of Internet life and always has been. This isn't different from other activities of 9-year-olds or anyone else in the public sphere. If you streak through a mall naked and someone snaps your picture, too bad: you can't make the photos disappear.
Another example: someone I know wrote an essay that he thought only people in his class would ever see. It contained one or two mildly embaressing disclosures, not terribly personal, but not something you'd want a complete stranger to know about you. Some idiot put it up on the school web site without his permission.
Here's a nasty possibility. Suppose somebody unintentionally publishes information useful to terrorists. DHS drops by and points out the error, and the information is withdrawn. Does Wayback Machine have a right to keep the information online?
In fact, Wayback Machine has never asserted their right to keep anything online. As the article points out, they'll remove stuff that's noncompliant with the current robots.txt, even though it was compliant at the time it was spidered. This lawsuit wasn't about their right keep stuff online. It was just somebody accusing them of being negligent about enforcing their own policies.
Many people think of the Wayback Machine as being a tool for history and nostalgia. However, consider copyright expiration (IANAL, etc.). Many web pages have items like "Copyright 1995-2006 Blah". Some of the content was created as early as 1995. Assuming, of course, that items created in modern times eventually have their copyright expire, we will need a record of the content of these pages at that time.
As more content moves online, the idea of publishing a work becomes blurred. Revisions years later can effectively update the copyright of the work, if the reader cannot distinguish when the content was created. So the Wayback Machine will hopefully provide that resource. The amount of potentially public-domain content there is huge.
As a side note, it will be interesting to note when the first GPL programs (for example) lose their copyright. Of course, by then, the languages will seem more than archaic.
"The universe seems neither benign nor hostile, merely indifferent." --Carl Sagan
Furthermore, there are perfectly good ways to lock content away from the outside in a more rigourous way, password-protected pages, pages only accessable via VPN to the intranet, etc. All other information, that is put unlocked, unencrypted, over the internet can be considered open. There will be some chance that you will find it accidentally, for example.
molmod.com - computing tips from a molecular modeling
why do people make such god awful analogies?
if you give private information to AOL and they release it publicly then you can get upset
if you post private information on "check-out-my-ssn.com" and its public to the whole world then you can't get mad.
Lawyers should be required to instruct (off the clock) clients how to complain, and judges should ask clients if they've been informed (checking against a form the client signs). Failure should be like violating Miranda rights.
Yes, a more prolific lawyer should be more likely to be audited. Probably every nth case (by all lawyers) should have an audit initiated secretly to follow the proceedings, reporting malpractice as it's observed, so corrections aren't applied only after the case is derailed. That doesn't sound so sophisticated, but it does seem like lawyers would spend their careers learning to abuse it. NP complete, but best effort counts.
Another big reform that seems essential is to direct all punitive damages (not compensation damages) to the state, or perhaps even to some certified victim's fund, rather than to the plaintiff (and a percentage to their lawyers). That seems like a fundamental abuse that needs to be fixed, and would help fund a better justice department to make better decisions. Oh, and big penalties for lawyers introducing invalid evidence, all evidence determined before trial in separate hearings... anything for lawyer accountability to standards would make big improvements.
--
make install -not war
You missed the best part of the quote.
The US has copyright laws, and lots of people rely on it, including open source projects.
The robots.txt file is a clear indication of the conditions under which a copyright holder gives you access to their copyrighted materials. As such, it is not "voluntary".
In addition to probably being in violation of copyright law, it is simply rude for companies to ignore robots.txt files; if the Internet Archive does this, they are badly behaved.
If courts should decide that robots.txt files can be ignored at will, then more sites will require registration, click-through licenses, and those annoying "try to read this" safeguards, making life more miserable for all of us.
The best thing for everybody, including the Internet Archive, would be for the robots.txt standard to be enforced strongly by courts.
If I post your credit card and bank information on a forum site, does that mean it is now public domain and you have no protection?
If anything bad comes from it, it only means that the banks employ weak security. That information by itself should mean nothing. Complain to the financial institutions, not the person who posts it. Make it the bank's problem and it will go away. Don't use their services until they make it secure without making it unduly inconvenient for the customer. The silly passwords and 20 minute waits for failed logins do nothing for security. Make financial security the institution's responsibility instead of suppressing the flow of information. And furthermore, you know what you can do with your copyrights. If you don't want people to use your photos keep them to yourself. If you don't want your information divulged, then don't reveal it to anybody.
What?