The Internet Archive Sued Over Stored Pages

← Back to Stories (view on slashdot.org)

The Internet Archive Sued Over Stored Pages

Posted by timothy on Wednesday July 13, 2005 @12:16AM from the philadelphia-lawyers dept.

Kailash Nadh writes "The Internet archive, which has been storing snapshots of millions of webpages since 1996 has been sued by the firm Harding Earley Follmer & Frailey, Philadelphia. The firm was defending Health Advocate, a company in suburban Philadelphia that helps patients resolve health care and insurance disputes, against a trademark action brought by a similarly named competitor. In preparing the case, representatives of Earley Follmer used the Wayback Machine to turn up old Web pages - some dating to 1999 - originally posted by the plaintiff, Healthcare Advocates of Philadelphia. Last week Healthcare Advocates sued both the Harding Earley firm and the Internet Archive, saying the access to its old Web pages, stored in the Internet Archive's database, was unauthorized and illegal." CT:update note that the submittor got it backwards: Healthcare Advocates is the sueing Wayback and Harding Earley Follmer & Frailey, not the other way around.

20 of 801 comments (clear)

obvious man question by 0110011001110101 · 2005-07-13 00:18 · Score: 5, Insightful

fsck me if i'm wrong, but wouldn't this be similar to suing someone for referencing an old book I wrote, just because I'd released a new one that didn't contain much of the old information?

--
Don't anthropomorphize computers: they hate that.
1. Re:obvious man question by Professor_UNIX · 2005-07-13 00:31 · Score: 5, Insightful
  
  No, this is like suing someone for distributing an old book you've written withour the person having your permission.
  Putting up an unprotected web site is akin to putting up a billboard. If I take a picture of the billboard and publish it in a textbook that kids read for the next 20 years, should I be expected to be sued by the billboard company? I'm really sick and tired of companies that have absolutely no clue how the Internet and the world wide web works putting up sites and then expecting you to never cache them anywhere. They have this old mentality that they control the flow of information and frankly, that's just not true anymore.
2. Re:obvious man question by Chuck+Chunder · 2005-07-13 01:13 · Score: 5, Interesting
  
  Putting up an unprotected web site is akin to putting up a billboard. If I take a picture of the billboard and publish it in a textbook that kids read for the next 20 years, should I be expected to be sued by the billboard company?
  Apparantly, yes.
  
  --
  Boffoonery - downloadable Comedy Benefit for Bletchley Park
3. Re:obvious man question by hacker · 2005-07-13 01:20 · Score: 5, Interesting
  
  I can tell you exactly where the problem lies (and I know this because I have customers who behave this way):
  
  When they write documents, they write them in HTML format. They send their email, they send itin HTML format. When I asked for them to prepare content for their website, they gave me a Microsoft Word document in HTML format, and said "You don't have to use the same fonts I used in this document, but please keep the layout the same on my website."
  
  These users equate "a document" to "a website", and they think that once they stop using or sending that document out, that their "website" should be removed as well. They think websites are "sent" to people, not requested "by" people, and that when you close your browser, your "document" is gone.
  
  That simply is not the case, and people need to be re-educated to understand these technologies and how they work. The Internet was MEANT to be self-healing, in case one node or another went down, information and information pathways would still be functioning.
4. Re:obvious man question by Zeinfeld · 2005-07-13 02:21 · Score: 5, Interesting
  
  To sum it up, the plaintifs are claiming that the Wayback Machine didn't obey the robots.txt at their site and are calling it breach of contract.
  It seems rather more likely that the plaintifs fucked up their robots.txt file entries and that is why they were spidered.
  
  At the risk of receiving yet another deposition I was part of the conversations that led to robots.txt. It was never intended to be an access control mechanism or an effective content control mechanism within the meaning of the DMCA. The objective was simply to allow sites with automatically generated content to tell the robots that parts of their site are not suitable for spidering.
  
  So now it looks like we are going to have revisit the business model for the way back machine and work out how to float a littigation fund.
  
  Actually one way that it could be done is to sign and timestamp material on receipt and offer the signatures as a premium service.
  
  --
  Looking for an Information Security student project suggestion?
  Try http://dotcrimeManifesto.com/
summary is incorrect by paulbd · 2005-07-13 00:22 · Score: 5, Informative

The archive is being sued by Health Advocates, not the legal firm that had defended Health Advocates. In fact, the legal firm is named in the suit as well.

And to clarify: its not a simple "you have our stuff stored on your systems" claim. Rather, Health Advocates is claiming that the archive failed to follow the instructions in robots.txt that were intended to prevent access to historical material.
1. Re:summary is incorrect by kevmo · 2005-07-13 01:06 · Score: 5, Informative
  
  HealthCARE Advocates is suing, not Health Advocates. There is a trademark case of Healthcare Advocates (plaintiff) suing Health Advocates (defendant). The legal firm defending Health Advocates digged up the old archive. HealthCare Advocates, the plaintiff, got desperate and is suing the legal firm and IA probably in order to try to exclude whatever evidence the defense legal firm dug up.
  
  I guess you were trying to be informative, but in this case it makes a big difference as to which company is doing the lawsuit. Its the plaintiff, not the defendant.
Information Extracted by inkdesign · 2005-07-13 00:23 · Score: 5, Informative

..on at least two dates in July 2003, the suit states, Web logs at Healthcare Advocates indicated that someone at Harding Earley, using the Wayback Machine, made hundreds of rapid-fire requests for the old versions of the Web site. In most cases, the robot.txt blocked the request. But in 92 instances, the suit states, it appears to have failed, allowing access to the archived pages.

For the "I don't wanna rtfa because its early" crowd.
the bottom line by countzer0interrupt · 2005-07-13 00:29 · Score: 5, Insightful

He said that the robots.txt file is part of an entirely voluntary system, and that no real contract exists between the nonprofit Internet Archive and any of the historical Web sites it preserves.
Exactly right. The plaintiff is an asshat. The bottom line for publishing anything to the Web is: if you don't want it copied across the world, saved on people's hard disks (either automatically in a browser cache, or deliberately by the user), and potentially redistributed (after your initial act of publishing) for the rest of time, don't publish it to the Web. I'm not advocating the breach of copyright here - sure, I want credit of paternity for anything I put on the Web, at the very least. Pragmatically, however, I know that the Web (and the Internet at large) is a much more fluid medium. Somebody may save my webpage, copy a quote from it, download an image and use it as their desktop wallpaper, simply because they can. I can't stop them, and I'll never have proof that they did it, so I couldn't sue them if I wanted to. Therefore, I should exercise some common sense, and remember that the Web is a public medium, and if my work is so precious then maybe I shouldn't put it up there. Some web site owners want to use the power of the web to reach huge numbers of people, but they don't want to pay the price of such a fast and powerful medium. Once your words are out there, you may never get them back.
Turn on the shredder! by hhghghghh · 2005-07-13 00:35 · Score: 5, Insightful

This is a case where a plaintiff of an action (that they probably lost) is sueing opposing council for using the internet archive looking for old documentation that is used as evidence against its claims. In effect, they're claiming that because they had a robots.txt any page that might have been on the internet archive was there illegaly, and shouldn't have been used as evidence.

In effect, they're saying "we were wrong, we tried to destroy the evidence of our wrongdoing, but because the shredder jammed and you found the evidence anyway, you're abusing our copyright".

The court hearing their argument should thoroughly smack them. Perhaps they should be brought to justice for trying to destroy evidence (or instructing a third party to do so), surely that's illegal in these post-Enron days.
If there is hope, it lies with the proles? by FooHentai · 2005-07-13 00:56 · Score: 5, Insightful

""Day by day and almost minute by minute the past was brought up to date. In this way every prediction made by the Party could be shown by documentary evidence to have been correct; nor was any item of news, or any expression of opinion, which conflicted with the needs of the moment, ever allowed to remain on record. All history was a palimpsest, scraped clean and reinscribed exactly as often as was necessary."
RTFA Addendum by poena.dare · 2005-07-13 01:04 · Score: 5, Funny

The suit contends, however, that representatives of Harding Earley should not have been able to view the old Healthcare Advocates Web pages - even though they now reside on the archive's servers - because the company, shortly after filing its suit against Health Advocate, had placed a text file on its own servers designed to tell the Wayback Machine to block public access to the historical versions of the site.

So the robots.txt was added YEARS AFTER the site had been archived. I don't think they correctly used the "no-archive-time-travel" directive.
Short translation of the article by mwvdlee · 2005-07-13 01:06 · Score: 5, Insightful

"We've lost our case based on evidence and will now be suing the organisation that provided the evidence for doing so".

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Re:Lookng forward by aussie_a · 2005-07-13 01:18 · Score: 5, Insightful

Having a public website is implicitly allowing anyone to read/view what you've made available.

But NOT to redistribute it.
Oops! by Marc2k · 2005-07-13 01:26 · Score: 5, Informative

Oh man, that sucks! I guess I better turn off all caching in my browser, lest I get sued for copyright infringement, because it's storing and rebroadcasting copyrighted materials that you may no longer want me to see at later date.

However, if you RTFA'd, you'd know that lots of IP law firms use the Wayback Machine on a daily basis, and in fact, the company suing the Internet Archive is not suing them for republishing copyrighted information. Rather, the case is that they recently placed a robots.txt file on their site that disallows viewing historical versions of the website, and the Archive is being sued because the Wayback Machine apparently ignored the robots.txt file (which, I might note is a voluntary standard, and by no means implies a contract between the two parties), which the plaintiff claims violates the DMCA. This has nothing to do with copyright violation.

It has everything to do with robots.txt. Read.

--
--- What
Analogies by MyLongNickName · 2005-07-13 01:27 · Score: 5, Insightful

I've read about 500 analogies on what electronic information "is like".

Every analogy is bad. We cannot equate electronic information with physical information of ages past. Every analogy just plain sucks.

The reason the information age has taken off is because of the ease of transmitting, storing and copying of electronic data. These methods weren't available fifty years ago, and weren't wide spread until about twenty years ago. Trying to stuff these concepts into one-hundred plus year old ways of thinking is just useless.

This does not mean we can't use older solutions to problems to guide us in the future. But, we need to stop shackling ourselves to old ways of thinking. The fundamental way we transmit thoughts and ideas have changed, our fundamental way of thinking about information needs to change as well.

Does this mean "all information is free"? No. But trying to treat electronic information like a book is useless. Web sites are put out to be publicly consumed. It is contradictory to say that someone cannot cache it for non-profit purposes. Trying to reuse the "creative" parts of the web site for commercial purposes should be prohibited.

Bottom line: Stop with the analogies. Start thinking fresh.

--
See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year
Re:We have this one every time... by Dr.+Evil · 2005-07-13 02:06 · Score: 5, Insightful

Oddly, the Internet Archive honours robots.txt, so if you don't want people to surf your archive, you can just post their robots.txt file and it will block everything, even into the past.

I would say that caching and archiving are so well understood to be part of the Internet that posting a web page and not expecting it to be archived or spidered is absurd. In other words, by posting their site to the web without a robots.txt, they knowingly published it in a medium which contains facilities for archiving and later redistribution.
Re:Who has the right right to store store windows? by Artfldgr · 2005-07-13 03:16 · Score: 5, Informative

There are several law firms in the NY city area that pay to have every sidewalk and store front and such filmed on video... they then send that video into the state.... now when a person trips on a bad sidewalk they can get the case to court! i know.. you say WTF.. but its pretty simple. say there is a big upheaval in the sidewalk.. you trip, and try to sue the city for not maintaining its property, etc... (i am making this simple, there are all kinds of better examples but this is simpler). the city though will tell you and so will the courts that the city is not responsible. why? because you cant prove negligence. negligence is willfull, and not knowing there is a crack is not negligence. and here is the rub. being told that you have a problem and then ignoring it till something happens IS negligence. so in the past the lawyer would have to sepeona the cities records to see if someone reported the issue, if so, then great for the client, if not, their plum out of luck. so when the legal firm sends in the tapes, they are reporting the state of every block in that area... the city not looking at the tape that would define all the bad areas is negligence since now they DO have a method of seeing the problems and are ignoring them.. and voila, you now win cases that you couldnt before... so given that there is precident on such (and that store windows, especially in manhatten, are copyrightable, given that they are artistic displays!) my friend say i should have been a lawyer. :)
Re:obvious man question (now, in a 2nd Ed.) by drakaan · 2005-07-13 04:12 · Score: 5, Insightful

&copy 2005, by Adrian Stovall
If that's true, we had all better be careful not to visit *too* many pages on a given website during a given day. Either that or make sure that our web browser is set to immediately flush all downloaded content once it has been rendered.

The argument being made is that copyright is being violated, but the way the archive works might well be considered fair use, since the *only* reason it exists is for archival purposes. If having a copy of website content is illegal, in and of itself, then everyone who uses a web browser (unless they're running knoppix or something that doesn't store anything to the HD) is just as guilty as the Internet Archive.

I hereby rescind your permission to copy any of my posts, which means that if you're reading this, you're in violation of copyright law.

Okay, I now release my copyrighted work officially into the public domain. You're safe now.

--
"Murphy was an optimist" - O'Toole's commentary on Murphy's Law
Re:We have this one every time... by 99BottlesOfBeerInMyF · 2005-07-13 04:13 · Score: 5, Insightful

Putting material on the Internet does not give up your copyright on it, place it in the public domain, grant others the right to reproduce it any way they see fit, or otherwise work differently to copyright laws as they apply to all other media. There are necessarily certain implied rights, but arguing that actually ripping someone else's material and then making it publicly available after they've withdrawn it from their own site is a pretty big stretch to anyone without a vested interest.

Actually, while they do not give up any copyright, there are a number of explicitly stated, legal uses of copyrighted materials and there is a great deal of public benefit to enumerating a few more of them. Can you honestly argue it is not in the public's best interest that a historical archive of the internet exists, for educational reasons if no other? This case should be a poster child for just such legislation. A company published something, lied about it, and are now suing the people who made a copy and proved their guilt. Are you saying it is in the best interests of society that copyrights be used as tool to promote lies and censorship?

Copyright is supposed to be about one thing and one thing only, promoting science and arts. That is the only constitutional provision for its existence. If someone is copying legally obtained works into an archive for educational, historical, or non-profit uses then they are almost invariably helping to promote science and arts, and anyone trying to stop them is up to no good.

As to the letter of the law (which is probably unconstitutional although it is impossible to prove that) you're right. The internet archive is screwed in the U.S. and many other countries. They tried to do what copyright law originally required of copyright holders and the library of congress. If a work is to copyrighted then ethically it needs to be available. That is the whole point of copyright. According to the letter of the law it is probably illegal for me to print out the receipt some e-businesses display when I buy something online. The law needs to be fixed.

In fact, limiting the rights of others to distribute your works in order to encourage you to make them available is exactly what copyright is for, and this sort of case is a textbook example of why the principle matters.

What? How does this limiting of the rights of others encourage them to distribute the material? They, like the majority of copyright holders these days, don't want the work to be available at all. It does not encourage them to publish it, it just gives them a way to prevent works from being distributed.

The archive is in trouble not because the violated the intention of copyright. They, in fact, are trying to uphold the very principals upon which it is founded. Unfortunately, the laws have been changed by the corrupt and greedy to create a situation where copyright does exactly the opposite of its original purpose. This is a perfect example of copyright laws that have been rewritten being used to hold back progress and remove works from public availability. It is unethical and sickening and your implication that a businesses financial considerations should trump both the rights of our descendants to have access to our works and that they trump the the ability to find and present the truth in the courts... well it makes me want to vomit. Go to hell.