Slashdot Mirror


The Internet Archive Sued Over Stored Pages

Kailash Nadh writes "The Internet archive, which has been storing snapshots of millions of webpages since 1996 has been sued by the firm Harding Earley Follmer & Frailey, Philadelphia. The firm was defending Health Advocate, a company in suburban Philadelphia that helps patients resolve health care and insurance disputes, against a trademark action brought by a similarly named competitor. In preparing the case, representatives of Earley Follmer used the Wayback Machine to turn up old Web pages - some dating to 1999 - originally posted by the plaintiff, Healthcare Advocates of Philadelphia. Last week Healthcare Advocates sued both the Harding Earley firm and the Internet Archive, saying the access to its old Web pages, stored in the Internet Archive's database, was unauthorized and illegal." CT:update note that the submittor got it backwards: Healthcare Advocates is the sueing Wayback and Harding Earley Follmer & Frailey, not the other way around.

36 of 801 comments (clear)

  1. summary is incorrect by paulbd · · Score: 5, Informative

    The archive is being sued by Health Advocates, not the legal firm that had defended Health Advocates. In fact, the legal firm is named in the suit as well.

    And to clarify: its not a simple "you have our stuff stored on your systems" claim. Rather, Health Advocates is claiming that the archive failed to follow the instructions in robots.txt that were intended to prevent access to historical material.

    1. Re:summary is incorrect by kevmo · · Score: 5, Informative

      HealthCARE Advocates is suing, not Health Advocates. There is a trademark case of Healthcare Advocates (plaintiff) suing Health Advocates (defendant). The legal firm defending Health Advocates digged up the old archive. HealthCare Advocates, the plaintiff, got desperate and is suing the legal firm and IA probably in order to try to exclude whatever evidence the defense legal firm dug up.

      I guess you were trying to be informative, but in this case it makes a big difference as to which company is doing the lawsuit. Its the plaintiff, not the defendant.

    2. Re:summary is incorrect by Prothonotar · · Score: 2, Informative

      To be even more nit-picking, it's Health Advocate (singular, not plural) and Healthcar Advocates (plural).

      --
      "Every man is a mob, a chain gang of idiots." - Jonathan Nolan, Memento Mori
  2. Information Extracted by inkdesign · · Score: 5, Informative

    ..on at least two dates in July 2003, the suit states, Web logs at Healthcare Advocates indicated that someone at Harding Earley, using the Wayback Machine, made hundreds of rapid-fire requests for the old versions of the Web site. In most cases, the robot.txt blocked the request. But in 92 instances, the suit states, it appears to have failed, allowing access to the archived pages.

    For the "I don't wanna rtfa because its early" crowd.

    1. Re:Information Extracted by Stalyn · · Score: 4, Informative

      you forgot,

      In so doing, the suit claims, the law firm violated the Digital Millennium Copyright Act, which prohibits the circumventing of "technological measures" designed to protect copyrighted materials. The suit further contends that among other violations, the firm violated copyright by gathering, storing and transmitting the archived pages as part of the earlier trademark litigation.

      and

      Even if they had, it is unclear that any laws would have been broken.

      "First of all, robots.txt is a voluntary mechanism," said Martijn Koster, a Dutch software engineer and the author of a comprehensive tutorial on the robots.txt convention (robotstxt.org). "It is designed to let Web site owners communicate their wishes to cooperating robots. Robots can ignore robots.txt."

      William F. Patry, an intellectual property lawyer with Thelen Reid & Priest in New York and a former Congressional copyright counsel, said that violations of the copyright act and other statutes would be extremely hard to prove in this case.

      --
      The best education consists in immunizing people against systematic attempts at education. - Paul Feyerabend
  3. Re:God by GigsVT · · Score: 2, Informative

    The writeup says the archive is being sued by Harding et al. Then later it says it's being sued by one of those Health companies.

    I didn't even pick up on the fact there were two similarly named health care companies!

    --
    I've had enough abrasive sigs. Kittens are cute and fuzzy.
  4. Re:Robots.txt? by Baddas · · Score: 3, Informative

    As it says in the article, the robots.txt is an entirely voluntary measure. The IA doesn't need to obey it, but they do, in order to be a courteous member of the internet.

  5. Re:Robots.txt? by Illserve · · Score: 2, Informative

    They don't have a case either way!

    Adherence to robots is voluntary, done in good faith by crawlers for the general well being of the web.

  6. Re:God by MrKahuna · · Score: 2, Informative
    Actualy, it DOES. The summary says "Internet archive, ... has been sued by the firm Harding Earley Follmer & Frailey, Philadelphia" which is false. The crazy thing is it's correct several sentences later where it says "Last week Healthcare Advocates sued both the Harding Earley firm and the Internet Archive".

    Why does Slashdot even bother with the summaries any more? They're outright wrong many times and just plain confusing and poorly written the rest. Either hire some better editors or just post the links to the original stories and be done with it. As it is, I'm about ready to delete my bookmark to this site and move on.

  7. Comment removed by account_deleted · · Score: 3, Informative

    Comment removed based on user account deletion

  8. Re:obvious man question by aussie_a · · Score: 2, Informative

    you mean it's like being a library?

    I was under the impression that libraries had permission to distribute the content that it does. In fact, in Canada, authors (Canadian ones at least) get given some money to cover their books that are in libraries. I'd say that pretty much means there's an agreement (and not an assumption of one because the author hasn't said no) between libraries and authors.

  9. The write up is indeed, bollocks! by @madeus · · Score: 4, Informative

    Sorry, the writeup is bollocks. It says:

    "The Internet archive, which has been storing snapshots of millions of webpages since 1996 has been sued by the firm Harding Earley Follmer & Frailey, Philadelphia."

    and also:

    "Last week Healthcare Advocates sued both the Harding Earley firm and the Internet Archive".

    So to believe the write up, they are being sued by BOTH parties.

    However, it says, in TFA:

    "... John Earley, a member of the firm being sued, said he was not surprised by the action, because Healthcare Advocates had tried to amend similar charges to its original suit against Health Advocate, but the judge denied the motion. Mr. Earley called the action baseless, adding: "It's a rather strange one, too, because Wayback is used every day in trademark law. It's a common tool."

    Christ knows where the idea they they are being sued by the firm Harding Earley Follmer & Frailey came from.

    Doesn't anyone else read the stories first? o_O

  10. Re:Robots.txt? by RealityMogul · · Score: 3, Informative

    Larger images aren't cached on the archive servers, so they'd go to the real server. Most likely the original images weren't there so they started getting a flood of 404s and started investigating the problem.

  11. Just to clear this up for everyone... by willisbueller · · Score: 1, Informative

    Healthcare Advocates is suing Health Advocates. When Health Advocates (Defense) and their lawyers (Defense) used the Wayback machine to try to prove the case frivolous, Healthcare Advocates (Plaintiff) tried to block their access to historical content (which does seem to make their case look dubious). However, the access was not successfully blocked, so the plaintiff is going after the Internet Archive and Health Advocate's(Defense) lawyers. Seems like more of a smoke wall than anything else.

  12. The obvious explanations are just too many to list by mrRay720 · · Score: 3, Informative

    The problem is that by allowing illegally obtained evidence, you are officially and legally endorsing criminal activity.

    Who wins in a criminal vs criminal case? Would police officers be forced to fall on their own swords - ie commit criminal acts to gain evidence - on order of those above them? It also gives the thumbs up signal to vigilante justice too.

    Oh, let's not also forget that it'd put the idea of court orders, seartch warrants, the right to be innocence until proven guilty (admittedly already fading), and a whole host of other rights in their graves.

    There's just so much wrong with the idea of allowing illegal evidence, I'm surprised when anyone asks why it's wrong.

  13. Re:If you put something on the web..... by cdrudge · · Score: 3, Informative

    By nature of copyrights, everything that you create is automatically copyrighted the instant that you create an origial work. This post has already been copyrighted by me.

    You don't HAVE to register the copyright with the Copyright Office in order to retain the copyright. Doing so though gains you added benifits in case there is a dispute. Published works ARE REQUIRED to be registered. However, just putting the website up does not count as publishing. In order to be considered published works, the work must be sold or otherwise have a transfer of ownership, or through rental, lease, or lending. They Copyright Office has even said "The reports also state that it is clear that any form of dissemination in which the material object does not change hands, for example, performances or displays on television, is not a publication no matter how many people are exposed to the work." (Source)

  14. Oops! by Marc2k · · Score: 5, Informative

    Oh man, that sucks! I guess I better turn off all caching in my browser, lest I get sued for copyright infringement, because it's storing and rebroadcasting copyrighted materials that you may no longer want me to see at later date.

    However, if you RTFA'd, you'd know that lots of IP law firms use the Wayback Machine on a daily basis, and in fact, the company suing the Internet Archive is not suing them for republishing copyrighted information. Rather, the case is that they recently placed a robots.txt file on their site that disallows viewing historical versions of the website, and the Archive is being sued because the Wayback Machine apparently ignored the robots.txt file (which, I might note is a voluntary standard, and by no means implies a contract between the two parties), which the plaintiff claims violates the DMCA. This has nothing to do with copyright violation.

    It has everything to do with robots.txt. Read.

    --
    --- What
    1. Re:Oops! by AnObfuscator · · Score: 4, Informative
      This has nothing to do with copyright violation.

      Ahem. Perhaps, if YOU had RTFA'd, you would have seen this little gem:

      From TFA:
      The lawsuit, filed in Federal District Court in Philadelphia, seeks unspecified damages for copyright infringement and violations of two federal laws: the Digital Millennium Copyright Act and the Computer Fraud and Abuse Act. (emphasis mine)

      I'd also like to point out that the Digital Millennium Copyright Act is about preventing copyright infringement.

      Read.

      Pot. Kettle. Black.

      --
      multifariam.net -- yet another nerd blog
  15. Re:Please RTFA by Anonymous Coward · · Score: 2, Informative

    And yet, that's how the Internet Archive tells people to remove previously archived material.

  16. We have this one every time... by Anonymous+Brave+Guy · · Score: 2, Informative
    I'm really sick and tired of companies that have absolutely no clue how the Internet and the world wide web works putting up sites and then expecting you to never cache them anywhere.

    <mini-rant> And I'm really sick and tired of people that have absolutely no regard for how the law works copying material off the Internet and then expecting never to get sued for it, claiming some legally naive and ethically dubious justification. </mini-rant>

    Seriously, we have this discussion every time Google or the Wayback Machine or whatever comes up. Putting material on the Internet does not give up your copyright on it, place it in the public domain, grant others the right to reproduce it any way they see fit, or otherwise work differently to copyright laws as they apply to all other media. There are necessarily certain implied rights, but arguing that actually ripping someone else's material and then making it publicly available after they've withdrawn it from their own site is a pretty big stretch to anyone without a vested interest.

    Before anyone shoots back the inevitable responses about information wanting to be free, not controlling the flow, yada yada, please stop and think for a minute. A lot of the useful content on the web is made available by volunteers or companies who don't expect to profit from it immediately, but whose future business may be damaged if the information is taken and republished by others. Many of these people will just stop putting information on the web at all (see Slashdot discussions passim) if you abuse the access, and that doesn't benefit anyone.

    In fact, limiting the rights of others to distribute your works in order to encourage you to make them available is exactly what copyright is for, and this sort of case is a textbook example of why the principle matters.

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    1. Re:We have this one every time... by CausticPuppy · · Score: 3, Informative
      Oddly, the Internet Archive honours robots.txt, so if you don't want people to surf your archive, you can just post their robots.txt file and it will block everything, even into the past.

      From TFA:

      Web logs at Healthcare Advocates indicated that someone at Harding Earley, using the Wayback Machine, made hundreds of rapid-fire requests for the old versions of the Web site. In most cases, the robot.txt blocked the request. But in 92 instances, the suit states, it appears to have failed, allowing access to the archived pages.


      So it appears that the basis of the lawsuit is that the robots.txt was NOT honored. The plaintiff claims that the robots.txt is a "contract" and that the wayback machine violated the contract by still allowing archived pages to be viewed in a limited number of attempts, for reasons unknown.

      However, the TFA also does mention that honoring the robots.txt is strictly voluntary and does not constitute a contract.
      --
      -CausticPuppy "Of all the people I know, you're certainly one of them." -Somebody I don't know
    2. Re:We have this one every time... by DerekLyons · · Score: 2, Informative
      That's a ridiculous distinction.
      No, it's an important distinction - and one that does not rely on calling a tail a leg. One item is a pointer towards content, the other is a copy of the content. These are two very different things at every level.
      Those pages were accessable on the internet when the archive crawler archived them.
      So? That doesn't destroy the rights of the owners of content over that content. The CNN coverage of the Discovery launch I am currently watching is publically available, but even if I were taping it, I don't have the right to then make copies available to third parties. This is basic copyright law, well supported by precedent.
      They existed at that time for anyone to view. You can't take it back.
      Thats an assumption (read 'wishful thinking'), not a fact.
  17. Re:Turn on the shredder! by iainl · · Score: 2, Informative

    It's not even that. The robots.txt wasn't in place until the previous court case started.

    What they're actually suing the Wayback Machine for is failing to see that there was now a robots.txt in place and so purge their entire archive history for the page.

    Tragically, search-engine advisory information files have yet to develop time travel. This is somehow Wayback's fault.

    --
    "I Know You Are But What Am I?"
  18. Re:The library clause by dtfinch · · Score: 2, Informative

    The Internet Archive has received two DMCA exemptions from the US Copyright Office, but only for archiving copy protected software. I don't think they needed one to archive the web.

  19. Did you even read the article you linked? by Safety+Cap · · Score: 3, Informative

    From the article:

    During the case it was discovered that McDonald's required franchises to serve coffee at 180-190 degrees Fahrenheit (82-88 degrees Celsius). At that temperature, the coffee would cause a third-degree burn in two to seven seconds.

    Testimony by witnesses for McDonald's revealed that:

    • consumers were not aware the coffee was so hot that there was a risk of serious burns
    • McDonald's did not warn customers of this risk
    • they could offer no explanation as to why there was no warning
    • McDonald's did not intend to reduce the heat of its coffee

    ~.

    Documents obtained from McDonald's also showed that from 1982 to 1992, more than 700 people were burned by McDonald's coffee with varying degrees of severity.

    [Emphasis mine]

    Frivolous Lawsuit? Hardly.
    Excellent Spin-doctoring on McDonald's Part? Absolutely.

    --
    Yeah, right.
  20. Re:obvious man question by robslimo · · Score: 4, Informative

    I should have pointed out that the aspect of robots.txt they're complaining about is an "extension" of sorts where archive.org will remove any archived copies of your site if it disallows the ia_archiver spider *and* they submit their site to be recrawled (guaranteeing that the spider will see the new directive).

  21. Usenet parallels by Anonymous+Brave+Guy · · Score: 2, Informative
    I wonder if there have been suits over the Google, formerly DejaNews, archive of Usenet.

    As far as I know, there hasn't been any definitive court case anywhere on the basic concept of Usenet archives. They at least have some kind of defence, because you inevitably expect Usenet messages to propagate beyond your control, and you expect that services providing Usenet feeds will charge money to their customers in exchange for relaying the information. IMHO, for Usenet archives it's more a question of whether permanent storage is a reasonable expectation for which permission is implied by posting, which can at least be argued reasonably either way (e.g., it usually isn't and it's common expectation that messages will disappear after a few days vs. the technical standards not saying anything about necessary expiration and considerations of increased cheap storage space at service providers compared to when Usenet was first running).

    I'm pretty sure that at least one business that reproduced Usenet via the web and added those annoying automated keyword-linked ads on top of someone's posts has been screwed for it in court, though; IIRC, they were found to be publishing a derivative work without permission. I've come across at least one techie forum that was abusing many posts I've made to a programming newsgroup this way, which I did find inappropriate (they are generating ad revenue purely from distorting words I wrote, even advertising compilers in a post whose whole point was that you shouldn't write code depending on a specific compiler!), so I don't have much sympathy. If anyone can remember the case that established this one, I'd appreciate a reference.

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
  22. Re:Who has the right right to store store windows? by Artfldgr · · Score: 5, Informative

    There are several law firms in the NY city area that pay to have every sidewalk and store front and such filmed on video... they then send that video into the state.... now when a person trips on a bad sidewalk they can get the case to court! i know.. you say WTF.. but its pretty simple. say there is a big upheaval in the sidewalk.. you trip, and try to sue the city for not maintaining its property, etc... (i am making this simple, there are all kinds of better examples but this is simpler). the city though will tell you and so will the courts that the city is not responsible. why? because you cant prove negligence. negligence is willfull, and not knowing there is a crack is not negligence. and here is the rub. being told that you have a problem and then ignoring it till something happens IS negligence. so in the past the lawyer would have to sepeona the cities records to see if someone reported the issue, if so, then great for the client, if not, their plum out of luck. so when the legal firm sends in the tapes, they are reporting the state of every block in that area... the city not looking at the tape that would define all the bad areas is negligence since now they DO have a method of seeing the problems and are ignoring them.. and voila, you now win cases that you couldnt before... so given that there is precident on such (and that store windows, especially in manhatten, are copyrightable, given that they are artistic displays!) my friend say i should have been a lawyer. :)

  23. Re:obvious man question by blamanj · · Score: 2, Informative

    Putting up an unprotected web site is akin to putting up a billboard. If I take a picture of the billboard and publish it in a textbook that kids read for the next 20 years, should I be expected to be sued by the billboard company?

    Sadly, the answer to this is probably yes. Two examples:
    1) Coke sues a photographer for including one of its billboards in a picture.
    2) The filmmakers of "Bewitched" were forced to edit the Transamerica pyramid out of their shots of the San Francisco skyline because the building is a registered trademark.

    Our IP laws seem destined to be controlled by corporate greed and congressional stupidity.

  24. Re:Who has the right right to store store windows? by Artfldgr · · Score: 2, Informative

    the tapes in order to be valid are made with street names recorded and such... so the report would have to include the location.. so your assumption that its not reasonable to search them all (outside of normal maintenance), is correct

  25. Re:The Archive faces a lot of potential problems.. by millennial · · Score: 2, Informative

    In the United States, putting any creative work into a fixed tangible form automatically confers the protection of copyright, effective from that moment. No notice or registration is required.

    That's simply not true. You have to be able to prove that you created it first, and if you want the right to be the sole receiver of royalties from your work, you have to register it with the copyright office. This isn't free, either.

    --
    I am scientifically inaccurate.
  26. This is a frivilous lawsuit by jafiwam · · Score: 2, Informative

    Archive.org has always had a good policy to removing data on request.

    They have an automatic version that allows use of robots.txt, when forbidden to crawl they go back and make the other, older versions unavailable as well. (It only works when the re-crawl happens, though I think you can initate it by going to the site.)

    Furthermore, additional requests can be made via email to remove content. The only "damage" here is that the wrong (in their opinion) law firm got ahold of the data before they could do that.

    The company suing, broke the law, got sued, got fucked, and now wants to sue to recover money due to them breaking the law and getting busted for it by going after archive.org that provided evidence in the original lawsuit. Sorry guys, you got fucked when you first stole trademarked stuff of someone elses web site. The rest of it is just sour apples. They should be charged with intimidating a witness and put in pound me in the ass federal prison for it. It's racketteering like that that gives lawyers such a bad name.

    Had they any brains, they would have employed a geek to go seek out these cached sources and remove them the first time around.

    AND the company suing the original offending company, should have used a simple entry in their HOSTS file to keep from accidentally causing requests to go to the original web server, that's simple data forensics.

    Let me tell you a story about my week in mid September, 2001. After wasting tons of time reading news I got a desparate call from a certain client (soon to be rather). Their web host was in the towers, and both server farms were demolished, along with all the backup tapes. Their site was gone. AND due to other complications they were losing customers left and right.

    I used Archive.org, Google cache and a few bits they had to reassemble the web site and get it back on line. In this case, un-pre-approved caching was critical in keeping this company from going out of business.

    There 1,000s of other systems that cache data and make it available later, Inktomi, Akami, corporate networks, those "high speed dial-up" things, my friggin open source firewall does that (Squid?). It's simply stupid to sue archive.org for that. Caching is part of the web, get fucking used to it.

    It's the webmasters damn job to know or learn all about this stuff (including caching). Slapping HTML up on some server is not the end of web managment. There's a whole lot more to it.

  27. Sue a witness? by Neurotoxic666 · · Score: 4, Informative

    Can you sue a witness because he remembered the facts against you during a trial the same way the Wayback Machine is being sued because it "remembers" old facts and saying and has been used in courts?....

    --
    You are more than the sum of what you consume. Desire is not an occupation.
  28. Re:MODS: Parent is wrong. by techno-vampire · · Score: 2, Informative
    But negligence does not require evidence of willful conduct. Negligence is merely a failure to act as a reasonably prudent person under the circumstances.

    That's true, now. There was a time that a specific act had to be shown, and the person specified. Over a hundred years ago, a man was injured when a loose barrel came flying out of a brewery and hit him. He sued for negligence, and won, even though nobody could be shown to have caused it. This was because the incident was so outrageous that there was no possible explanation without assuming negligence, and it established a new legal priciple: res ipse loquitur, the act speaks for itself.

    --
    Good, inexpensive web hosting
  29. RTFA--the Internet Archive is absolvable by Nukenin · · Score: 2, Informative

    From the FA (emphasis mine):

    But on at least two dates in July 2003, the suit states, Web logs at Healthcare Advocates indicated that someone at Harding Earley, using the Wayback Machine, made hundreds of rapid-fire requests for the old versions of the Web site. In most cases, the robot.txt blocked the request. But in 92 instances, the suit states, it appears to have failed, allowing access to the archived pages.

    I'm fairly certain that the Internet Archive has no control over access to Healthcare Advocates own webserver(s). I'm also fairly certain that the Internet Archive would not log access to archived web content back to the "Web logs at Healthcare Advocates". So someone at Healthcare Advocates or its legal firm is really, really grasping at nonexistent straws here, or just plain stupid/ignorant. Suing the Internet Archive because Healthcare Advocates' own webserver(s) served up outdated content that they themselves left accessible? (robots.txt is no substitute for simply removing the old files from the webserver(s)' document tree or otherwise restricting access at the server side.)

    Hopefully sanity prevails and this lawsuit is dropped. Either that or Healthcare Advocates and/or its legal representation is made a laughingstock in the courtroom.

  30. Re:The Archive faces a lot of potential problems.. by ubernostrum · · Score: 2, Informative

    That's simply not true.

    IANAL, but... actually, it is. Since 1978, copyright has been granted automatically on the creation of the work, with registration required only to exercise certain legal options such as recovering statutory damages. See Title 17 USC Chapter 3, Section 302 (a) for this, and Chapter 4, Sections 411 and 412 for the limitation on what you can do without registering. The exclusive rights granted by copyright (Chapter 1, Section 106) remain in effect regardless of registration.