The Internet Archive Sued Over Stored Pages
Kailash Nadh writes "The Internet archive, which has been storing snapshots of millions of webpages since 1996 has been sued by the firm Harding Earley Follmer & Frailey, Philadelphia. The firm was defending Health Advocate, a company in suburban Philadelphia that helps patients resolve health care and insurance disputes, against a trademark action brought by a similarly named competitor. In preparing the case, representatives of Earley Follmer used the Wayback Machine to turn up old Web pages - some dating to 1999 - originally posted by the plaintiff, Healthcare Advocates of Philadelphia. Last week Healthcare Advocates sued both the Harding Earley firm and the Internet Archive, saying the access to its old Web pages, stored in the Internet Archive's database, was unauthorized and illegal." CT:update note that the submittor got it backwards: Healthcare Advocates is the sueing Wayback and Harding Earley Follmer & Frailey, not the other way around.
....why not just ask them to take them off?
"But on at least two dates in July 2003, the suit states, Web logs at Healthcare Advocates indicated that someone at Harding Earley, using the Wayback Machine, made hundreds of rapid-fire requests for the old versions of the Web site. In most cases, the robot.txt blocked the request. But in 92 instances, the suit states, it appears to have failed, allowing access to the archived pages." If would appear that they did kind of. That seems to be at the heart of the matter. The Internet Archive don't seem to be very surprised that it is happening. I don't think the company doing the suing have much of a case really, but IANAL.
They got caught with their pants down and now are suing because someone kept the evidence. Boy do I hope this lawsuit meets a swift and decisive end in favor of the Internet Archives.
To be candid, I'm surprised it took this long for someone to sue them.
Stella!
And they said zombies weren't real!
Well that asks the question about other things that people own. Can one get sued for taking a picture of someone else's property without their permission? The things people sue for these days just seems to become more and more crazy.
Quid Pro Quo, nothing more, nothing less.
you mean it's like being a library?
Again, not comparable (but this didn't stop you from getting modded up of course). The libraries had permission to buy the papers and allow access to them in the first place. Internet Archive had no such agreement with this company. IA took the absence of them saying no as an implicit agreement, which for pretty much anything else, isn't legal (it hasn't been tested yet with websites and caches). They in fact, did say no. But a bug caused this message not to be delivered/it was ignored some of the time.
I have to disagree with you slightly.
I think this is more like if I were to take an old book (or collection of old books) and store them together in a single publically accessible place (hmmm like a library).
Then those books sit there for 6 years, and someone (law firm) decides to (gasp) check out those books and use them as reference material in their suit against the people who originally published those books.
Could the original authors of these stored books then sue the library for providing those books to the public?? (lets assume the robot.txt issue has been resolved, and the library hasn't posted these books illegaly).
Don't anthropomorphize computers: they hate that.
It's going to get worse before it gets better. Our culture is being forced to confront issues of privacy and information ownership that have previously laid under the radar only because violating these issues was inconvenient or expensive.
But the internet is changing that, and now an errant picture or snippet of text can be reproduced and distributed widely for practically zero dollars.
I think eventually we'll settle on some kind of bubble of privacy concept, in which anything inside is legally protected, but anything you distribute outside that bubble is fair game for anyone, forever.
This is generally the case in the real world. If someone wears clothes, they effectively have created a privacy bubble, only allowing limited information about themselves to be distributed (via reflected light) to be seen by others. But what information they do allow to escape is fair game for distribution in photographs.
In a sci- fi series (Neverness et al), Zindell argues that in the future, even identity will be as carefully concealed in public as one's privates. As information technology saturates our culture, even revealing our identity in public is going to be increasingly dangerous.
Of course DRM advocates will try to attach little bubbles of limited privacy to specific bits of content released into the wild. Eventually, I hope, common sense will prevail and such ridiculous notions will be abandoned.
For the "It's too early to think crowd"...
How did Healthcare Advocates determin that Haridng Early was making hundreds of requests for files on the Wayback Machine? The logs would have been kept on the Wayback Machine's servers, not on anything Healthcare Advocates would have access to easily. Harding Earley would be accessing the files via the Wayback Machine's copies, not the copies that are kept on Healthcare Advocates website
I would argue it comes down to the media more than anything. See, they publish all these stories about stupid lawsuits, but make them sound like they actually stand a chance. Then they fail to mention that a few months later the lawsuit was tossed out or lost at summary judgment and fell apart.
So what they've done (quite successfully) is make everyone think that all people do in the US is sue each other to death, but fact is that most of these lawsuits like this you hear about never go anywhere, and just end up ruining the lawyers' reputations. Then the media stories turn the public against the legal system and lawyers in general.
When business is bad, you just pick on lawyers and things turn around it seems. It works for the media and politicians at least.
What?
When will I be sued for remembering old stories and telling it to others ?
http://en.wikipedia.org/wiki/I_have_a_dream
KFG
Problem is, The plaintif claim the thier robot.txt file instructs the webcrawler to dis-allow access to the older archives of the plaintifs web content. The fact that you can block access to your older PUBLISHED content is disturbing in itself because it is so useful in finding the truth. I should be illegal to tamper with the evidence. The Wayback machine should show everything at set the standard before it's to late. Perhaps it is time to "Rise Up".
The government which is strong enough to protect you from everything is strong enough to take everything from you.
Boffoonery - downloadable Comedy Benefit for Bletchley Park
Huh ??? Is this really what they are saying:
1) That the Wayback Machine came and archived their site sometime in 1999.
2) Since then they have added a robots.txt file
3) Because they now have a robots.txt file previously archived material should no longer be available
If so that's complete nonsense.
But on at least two dates in July 2003, the suit states, Web logs at Healthcare Advocates indicated that someone at Harding Earley, using the Wayback Machine, made hundreds of rapid-fire requests for the old versions of the Web site. In most cases, the robot.txt blocked the request. But in 92 instances, the suit states, it appears to have failed, allowing access to the archived pages.
In so doing, the suit claims, the law firm violated the Digital Millennium Copyright Act, which prohibits the circumventing of "technological measures" designed to protect copyrighted materials. The suit further contends that among other violations, the firm violated copyright by gathering, storing and transmitting the archived pages as part of the earlier trademark litigation.
Wow that is stretching things!! Ive never read the DMCA but to claim that a robots.txt file (which isnt a legally binding mechanism by any means) added to the site after the pages had been indexed had been ignored by the wayback machine was a circumvention of their copyright and a violation of that act...well Id fully expect any judge to have a good laugh at this.
HOWEVER given how poor the US legal system is I wouldnt be suprised to hear that robots.txt gains legal status as a binding document for crawlers!!
I can tell you exactly where the problem lies (and I know this because I have customers who behave this way):
When they write documents, they write them in HTML format. They send their email, they send itin HTML format. When I asked for them to prepare content for their website, they gave me a Microsoft Word document in HTML format, and said "You don't have to use the same fonts I used in this document, but please keep the layout the same on my website."
These users equate "a document" to "a website", and they think that once they stop using or sending that document out, that their "website" should be removed as well. They think websites are "sent" to people, not requested "by" people, and that when you close your browser, your "document" is gone.
That simply is not the case, and people need to be re-educated to understand these technologies and how they work. The Internet was MEANT to be self-healing, in case one node or another went down, information and information pathways would still be functioning.
I disagree. In North-Europe it's usual that even the illegally collected evidence counts. Abuse of power (police) is usually much more harsher a crime.
In Finland, there was one case where the police did an undercover operation to known drug seller. Too bad that at the time they didn't have rights to buy drugs undercover, resulting two officers charged and convicted of drug trading. Even more, the seller got an easier sentence because he was interpreted to be selling the drug out of request made by officers. Especially it was bad because it was planned. This in effect circumvents the "guilty man walking because of technicality" cases.
Besides, I see more problems with police violence in the US than I see problems with illegal evidence in the Europe. And no, I'm not trying to start a flame war.
?SYNTAX ERROR
Actuanlly no..
The courts have held that things not plainly visible ( plainly being not obvios to a human at a reasonale distance or public place ) are illegal to disiminate. Like when you turn on night vision during the day. It captures IR and translates it to B&W, the problem is that our body reflects more of it than our clothes do giving all clothes a semi-transparent look. The courts have held that even though they were recourding in public they violated the privacy of the people taped. This doesn;t mean that all IR captures in public are illegal, but when it's specifically used to reveal information about a person that is not plainly visible it might be a crime.
The courts have also held that augmention of senses cannot be used as an excuse to break the 4th ammendmant. Cops can only use items that are plainy visible to initate a search on a private residence. This president was set after they used heat signatures to get warrants for pot growers ( because of the grow lamps used ). Remeber that with technology today you can basicly see movement and hear speech through walls.
Here's the deal, and it's not very good. If the Wayback Machine doesn't have permission (implied or otherwise) to archive websites and serve copies of them, it's technically breaking copyright law except in a very small number of cases. I believe in some cases, if you fail to assert your copyright and (and yes, there's an "and" in there) you distribute your content to all-comers for free, it's considered public domain. Asserting your copyright is as simple a matter as putting a copyright notice on your content. I've heard of, albeit third hand, by word of mouth, and IANAL, cases in some juristictions where leaflets pushed in mailboxes without some form of copyright notice were considered public domain.
That's the best I can think of in terms of defenses for the IA. The IA doesn't honour expiry dates on webpages (if it did, it'd be useless.) It doesn't quote small portions in the context of a review.
So why hasn't it been more widely sued? Well, I think it's largely because (a) most people consider the Wayback Machine to be an invaluable public service, including most of the websites whose content they archive. and (b) because the Wayback Machine has an honourable record of removing content whose owners don't want displayed. And given (a) and (b), the costs of litigation, the fact that it doesn't appear (to me) to make any money from the operation (and so, as I understand it, is guilty of a civil offense only), people are reluctant to sue.
My personal opinion? The law needs to be changed to protect groups who do exactly this. This is one of many areas where copyright law needs to be diluted in order to remain credible. If people performing what is obviously a public service, who do make best-efforts to honour the wishes of those who do not consent to be a part of what they're doing, need to worry about the legality of doing so, the law is wrong and liable to fall into disrepute.
You are not alone. This is not normal. None of this is normal.
I used the internet archive to grab a manual (in pdf format) for a product for which the company had long retired. I'd say the service was well worth it at that moment (the ia).
In the UK, the copyright act was amended; photo copying ANY part of a copyrighted document is now considered illegal. The concept of 'fair use' is no longer applicable.
That said, certain professions (librarians etc) can register as an exception so that they can photocopy a percentage of a document legally (just like the old days...)
-Jar.
Together, We Can Make Slashdot Better. I Do NOT Mod ACs. - Check Me Out
Let's see the other side of the story:
Don't slashdot them too hard, and please remember to disable your cache when you browse their pages (your brain's cache too!)... ;)
Anything posted on the web should automatically be in the public domain. The physical act of viewing a web page requires me to download its contents to my computer. That means the website in question is volunteering content for me to download (or at least view). Maybe if I'm a content provider, I have a right to be angry if someone uses that content to impersonate me, or whatever. But otherwise I must understand that I've just put the content on my readers computer- I have no real control at that point over what the reader does with that content.
You know, it's funny. The web used to be mostly about free organizations offering up content. Then companies figured out that people like the web, and so they started jumping on. Unfortunately though, they don't seem to want to play the web as it was meant to be played. The web was not designed to support copywrite controls, and I can't understand why companies constantly expect that it does.
Seriously, we have this discussion every time Google or the Wayback Machine or whatever comes up. Putting material on the Internet does not give up your copyright on it, place it in the public domain, grant others the right to reproduce it any way they see fit, or otherwise work differently to copyright laws as they apply to all other media. There are necessarily certain implied rights, but arguing that actually ripping someone else's material and then making it publicly available after they've withdrawn it from their own site is a pretty big stretch to anyone without a vested interest.
Actually there is a simple principle here.
The supreme court has ruled that directories cannot be copyrighted if the information they contain is purely factual in FEIST v. RURAL TELEPHONE, 1991
An example is the telephone book, those are all facts and that was what the case was about.
The wayback machine could be called a directory of old web pages, cached as they existed at the time. Facts.
Thus protected from copyright claims.
Well, there's their defense. It would be kind of fun to argue!
In any case it looks like the wayback machine needs a couple hundred mirrors. Heh.
.
It seems rather more likely that the plaintifs fucked up their robots.txt file entries and that is why they were spidered.
At the risk of receiving yet another deposition I was part of the conversations that led to robots.txt. It was never intended to be an access control mechanism or an effective content control mechanism within the meaning of the DMCA. The objective was simply to allow sites with automatically generated content to tell the robots that parts of their site are not suitable for spidering.
So now it looks like we are going to have revisit the business model for the way back machine and work out how to float a littigation fund.
Actually one way that it could be done is to sign and timestamp material on receipt and offer the signatures as a premium service.
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
Yes, but the difference is that under copyright law, I can require that you not make copies of my site just so that you continue to come to my site to get updated versions of things.
I do this routinely with my technical papers exactly because I know they will be updated. They're usually available for people to view and read and link to, but I ask people not to make copies elsewhere and technically that "request" is enforceable under copyright law.
Now it's true that fair use allows some things to be copied for certain reasons. And, curiously, I think the need to copy for a lawsuit might stand up. But copying the entire of everything everywhere in anticipation of something being needed for a lawsuit sounds to me to be a questionable thing. To stretch the analogy one further: making a complete repository of photos of all store windows, almost as a workaround for the fact that those store windows were not directly accessible for use. That doesn't sound like fair use to me. One of the fair use criteria is about the totality of the work, and while it's undefined how that comes into play in each instance, it's clear that the amount of bulk matters here.
Personally, I was initially uncomfortable with the Internet Archive, and I continue to be of mixed feelings about it. I think it serves a huge historical interest. However, in the nearterm it has some ill effects that run counter to the copying/distribution/presentation laws of copyright and may need some correction.
I might think it reasonable if
The disturbing part is that legal term of copyright seems to continue to lengthen with time. I'm a big fan of copyright as a form of personal control for authors to get income from their works, but copyright must lapse after some part and it is already well exceeding what I think is reasonable in that regard, with the trend looking to extend indefinitely as rich copyright holders influence congress to extend every time, say, Mickey Mouse comes into jeopardy.
Kent M Pitman
Philosopher, Technologist, Writer
If obtaining evidence illegally is acceptable, what ensures it was properly secured and documented? Why not just forge it?
Part of legal evidence gathering is ensuring you have a trail to prove it is valid.
Secondly what rights are trampled in the cause of getting that evidence?
By making illegally obtained evidence invalid you encourage proper behaviour. If the legal evidence is used to convict a killer the polic will get off with a slap on the wrists despite any crimes they may have committed.
If illegally obtaining evidence could cause that same killer to get off, you can bet the police would be extremely careful to ensure that the evidence is collected properly.
Or just move the hosting to Sealand and ignore lawsuits. Although IANAL, I think this is the more reasonable course of action, since you have to be insane to deal with the insanity of today's copyright law.
Well, either that or try to get absorbed by the Library of Congress or something...
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
However, knowing the people in Congress, such a "Corporation Control Act" would not serve to control corporations. Instead it would give ultimate control of the country to corporations. It's all in how you read the title.
Actually, it doesn't matter what the original intent was, the end result would be as you describe. See regulatory capture.
How to solve most of our problems: 1.Lots of nuclear plants. 2.Cure aging.
In fact, limiting the rights of others to distribute your works in order to encourage you to make them available is exactly what copyright is for, and this sort of case is a textbook example of why the principle matters.
Really?
I agree that content creators should have some limited rights that allow them to profit from their content, as that encourages the creation of more public content.
But in this case, nobody it saying that they wanted to publish this content later for profit. The plaintiffs intentionally made something public to the entire world, and went to some trouble and expense to do so. Now they want to pretend it never happened because the facts have become inconvenient.
What the Internet Archive does may or may not be technically legal, but it's certainly in harmony with the spirit of copyright law. When one publishes books, one is obliged to give a copy to the Library of Congress so that it remains on the permanent record.
Personally, I think the Library of Congress should just fund the Internet Archive and bless the project with their special powers of copyright exemption. Failing that, Congress should make legal this sort of non-profit archiving of public material. Copyright is the right to reasonable profit from your creative efforts, not the right to manipulate the historical record.
Putting material on the Internet does not give up your copyright on it, place it in the public domain, grant others the right to reproduce it any way they see fit,
Putting material publicly visible on the Internet is a decision to distribute it, just like selling a book is a decision to distribute the book.
A difference is that when printing a book, a limited number of copies are made, whereas when you put something on your web site, you have produced a theoretically infinite number of copies.
Now what you have posted publicly is a matter of record and not only can be copied but ought to be, to ensure continued availability of the information.
The Internet Archive is an electronic equivalent to a library where old works are preserved, the difference is, of course, since the public Internet is inherently a medium where infinitely many copies are made (which differs greatly from the physical world), there is an appearance that the Internet Archive is totally redistributing a work... ...in fact, they are just lending it out, lending out one of their infinitely many copies of the material
Just like if you send someone an e-mail, you have given sent an unlimited number of copies of the message, because such copies will be made every time they launch their e-mail client and open the message... they can do this as often as they like, and every time they do this they have found a new copy of that which you sent them.
| That's akin to advocating free speech, even if it means shouting "Fire" in a crowded theatre
The correct quote is "The most stringent protection of free speech would not protect a man falsely shouting fire in a theater and causing a panic." and it has absolutely nothing to do with the issues at play in this case. The statement is a limitation where it endangers public health and or safety, and I dont think anyone would apply those standards to this case. Similarly, Defamation has no relation to this case, and Breach of Contract in the form of robots.txt is already a bone of contention in the case.
I'm not having this discussion about free speech. I'm having this discussion about the scope of current copyright arguments. I'm not arguing that this suit is trying to muzzle the IA in violation of the first ammendment. That would be a flawed argument anyway - the IA is an organization, not an individual, and thus not subject to constitutional protection in most cases (including this one). I'm arguing that the value proposition of copyright does not extend to preventing organizations like the IA from documenting the state and history of any publicly accessible website. It is a documentarian, and educational role, and as such, is every bit as appropriate as the library system.
The Internet Archive is, in essence, cataloging the evolution of one of the most significant cultural phenomenons of the last century. That IS of tremendous historical significance - most especially because 80% of the content that is there now will be lost forever within the next year if they DONT do it.
The internet is evolving at a breakneck pace. As new business models, new buisnesses, new display models, new UI elements and new languages crop up, older sites are redesigned, replaced or relegated to non-existence by darwinian market forces. What is there now bears little resemblance to what was there a year ago, and likely little resemblance to what will be there a year from now. This evolution is of tremendous social, scientific and cultural significance.
Do I believe that documenting this growth and evolution trumps the overextended copyright argument presented by the plaintiff in this case? Yes, I do. The plaintiff had no reasonable expectation of privacy where the public website was concerned. The information was placed in what amounts to a public space, with no access controls and no barriers to public navigation. To assert then, that the IA was acting inapproriately to document the state of the public portion of that site is patently absurd.
Now, to your other points...
We aren't discussing Joe User's website, we are discussing a website developed and deployed by a corporation at some expense with the express purpose of giving them a 'web presence'. Many jurisdictions will now permit the argument of publication where an agency has spent time and money to make information available to an audience. The expense and resources expended to make it available to the public presuppose an intent and desire that it be seen and consumed by the public. Further, the case in question references the DMCA, and specifically, a means to bypass security measures to prevent copying. That clause of the DMCA (and arguably the whole the whole law) applies _specifically_ to published media.
Regardless, flyers distributed in a parking lot or on a public bulletin board also carry all of the assumptions needed to carry the rest of my argument. The 'provider' of such information has no control over the future use or reference to those fliers in an educational, historical or editorial context.
Again, free speech is only a peripheral issue here. Fair use doctrine is as much, or more a limitation of copyright as it is a defense of the first ammendent. While first ammendment protections have been used to DEFEND fair use in the past, it does not necessarily follow that all questions relating to fair use are also first ammendment challenges. At question here is the IA's right to
I've been wondering if the issue isn't simpler than all of this legal wrangling? What I mean is that, whatever has happened all throughout history, we only have 1) evidence of things through artifacts, interpreted by those who find and study them, and 2) the written word, by those who research, then try to wrap up "facts" in a coherant package.
Technology such as the internet archive now exists to automatically, systematically, and rather thoroughly store very specific artifacts (old web pages). These artifacts happen to also be the written word. The complication is that much of that written word (that the legal system and corporations care about) is propaganda which, by its very nature, is not 100% true. What is true from a historical perspective, though, is that it existed as a part of the Internet/WWW which, in turn, is a huge part of our society and culture.
So do we view it in the context of an accurate historical representation of a body of knowledge that existed at a given snapshot of time, which is a decent encapsulated version of "truth," which is theoretically what a good justice system should be rooted in? Or, do we blatantly use outmoded, weasel-ish legal wranglings to suppress what is, indeed, truth that is relevant to deciding a given court case?
If we choose the latter, what does that say about the integrity of our justice system?