The Internet Archive Sued Over Stored Pages
Kailash Nadh writes "The Internet archive, which has been storing snapshots of millions of webpages since 1996 has been sued by the firm Harding Earley Follmer & Frailey, Philadelphia. The firm was defending Health Advocate, a company in suburban Philadelphia that helps patients resolve health care and insurance disputes, against a trademark action brought by a similarly named competitor. In preparing the case, representatives of Earley Follmer used the Wayback Machine to turn up old Web pages - some dating to 1999 - originally posted by the plaintiff, Healthcare Advocates of Philadelphia. Last week Healthcare Advocates sued both the Harding Earley firm and the Internet Archive, saying the access to its old Web pages, stored in the Internet Archive's database, was unauthorized and illegal." CT:update note that the submittor got it backwards: Healthcare Advocates is the sueing Wayback and Harding Earley Follmer & Frailey, not the other way around.
fsck me if i'm wrong, but wouldn't this be similar to suing someone for referencing an old book I wrote, just because I'd released a new one that didn't contain much of the old information?
Don't anthropomorphize computers: they hate that.
Did they set up their robots.txt file properly? If not, they may not have a case.
The simple truth is that interstellar distances will not fit into the human imagination
- Douglas Adams
Would that make archived newspaper editorials, TV reports, etc. illegal as well? Google beware.
Looking forward to newspapers filing similar frivolous lawsuits against libraries for maintaining old copies of the papers in their collections; copies that newspaper company might be embarassed about now.
Don't blame Durga. I voted for Centauri.
Wouldnt this reference site be covered under some of the same protections as a library. It serves some of the very same purposes.
Hopefully this falls flat.
I wonder where the server are locations
Pablo
Lawsuits these days sound more like people whining like spoiled brats than someone really done an injustice.
They publish the thing, person X stores it, person Y uses stored info to prove they publish it. So what? If they'd written the thing in a newspaper they would sue someone for keeping the newspaper?
Huh
"Those who cast the votes decide nothing; those who count the votes decide everything." (attrib. Joseph Stalin)
Would a lawsuit be considered if instead of a cache of web pages, the other side had used old newspapers from the library or VHS recordings of an old television broadcast? Once they've put their web pages into the public, don't they lose control of who keeps a copy?
I'm not saying that legally they don't have a legitimate case, but is it really necessary to persue an organisation such as the Internet Archive over something so passive as this? In my opinion, hell no it isn't.
She's built like a steak house, but she handles like a bistro....
So, they want to make their webpage freely available to the entire world, but they don't want people to download the pages? Make up your fucking mind, if you're going to put something on the internet, people are going to download it.
There is a Wayback machine mirror in the Bibliotheca Alexandrina. It would be very difficult for them to find any legal basis in Egypt to get this one offline.
Assuming the judge has more than one brain cell then this case should take no more than 30 seconds and will be summarised in two sentences.
"You published information on a public medium. Case mismissed."
But then again this is America we're talking about.. home of the idiot lawsuit and lunatic judicial decisions so I don't hold out much hope for the triumph of reason...
Sky subscribers are morons. They pay to be advertised at !
Because that would be UnAmerican(tm)
Technoli
Possibly because they are making a copy of copyrighted material and distributing such a copy or making it available for download. Is this fundamentally different from for example recording music from radio or shows from TV and redistributing them?
A century from now all profit will be gathered from suing one another about IP & copyright rights :)
-- Sig down
A book is a physical object, you can reference a book as long as you do not republish it in its entirety. The internet isn't a physical object, it's a collection of bytes arranged in a specific manner. It's that collection that makes it simple to take someone elses work and republish it, almost effortlessly.
The law has the ugly job of sorting out what constitutes copyright infringement -- republishing a website, perhaps? With the internet, it has become infinitely easier to republish works in their entirety, and hence the lawsuit. If they are guilty of anything, it is not of just 'referencing' a work, it is of taking that work, and republishing it without the authorization of its author. (heh, gotta love the wordplay)
To bring all of this to a point, it's as if I took your old book, put it into a book that talks about old stuff, and recopied everything, verbatim.
War isn't about who's right. It's about who's left.
Seriously, if you don't want something to see something, THEN WHY DID YOU PUT IT ON THE INTERNET TO BEGIN WITH???
;)
but no worries, its all cool cause we just found an excuse to pull the lever on the american justice jackpot
This is a case where a plaintiff of an action (that they probably lost) is sueing opposing council for using the internet archive looking for old documentation that is used as evidence against its claims. In effect, they're claiming that because they had a robots.txt any page that might have been on the internet archive was there illegaly, and shouldn't have been used as evidence.
In effect, they're saying "we were wrong, we tried to destroy the evidence of our wrongdoing, but because the shredder jammed and you found the evidence anyway, you're abusing our copyright".
The court hearing their argument should thoroughly smack them. Perhaps they should be brought to justice for trying to destroy evidence (or instructing a third party to do so), surely that's illegal in these post-Enron days.
info@healthcareadvocates.com
Be gentle, they might be in the right after all.
It should be treated the same way trespassing for unfenced property is treated.
The case should be dismissed as it reproduces verbatim with attribution content that was published for public bot scraping.
Now what, will someone sue Yahoo ! or Google for caching pages or converting PDFs to HTML ? Or Coral Cache for unauthorized reproduction of websites ?.Quidquid latine dictum sit, altum videtur
It's a constitutional guarantee, at least in the US.
The thinking runs that any wrongs that are created by illegally obtained evidence is outweighed by the wrongs that would result from abuses that would ensue if illegally obtained evidence was allowed to be used.
... if they lose this fight.
For example, 2600 Magazine's old web site containing a copy of the DeCSS source code is stored in the Archive. Could the Archive be held in violation of the DMCA for mirroring someone else's old site?
I am scientifically inaccurate.
""Day by day and almost minute by minute the past was brought up to date. In this way every prediction made by the Party could be shown by documentary evidence to have been correct; nor was any item of news, or any expression of opinion, which conflicted with the needs of the moment, ever allowed to remain on record. All history was a palimpsest, scraped clean and reinscribed exactly as often as was necessary."
Let me understand this.
If Company X publishes a brochure (paper) that states Y and then later decides that they shouldn't have written that, and I keep a copy they can ask me to destroy the copy????
Not only that, they can't use it against me in a civil suit about trademarks??
It's true that they own the copyright, but the point is that they *freely* distributed it, and you have the right to keep a copy (per Bern 1976 copyright act).
You sir are a troll, have a nice day.
"We've lost our case based on evidence and will now be suing the organisation that provided the evidence for doing so".
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
First and foremost, the existance of a robots.txt does not constitute a contract between the client (a web surfer/browser agent) and the server (the site hosting the content proper). Repeat that over and over. There is nothing stating that the existance of robots.txt on your server must be requested by my crawler or spider.
Its preferred, but not required. Even so, I am free to ignore it if I want, and parse whatever links I see fit to grab. If you make the content public and I want to read that content, I'm going to get it, whether you have robots.txt in place or not.
Secondly, has anyone taken the time to validate the robots.txt file found on the site in question? Note too that they just changed robots.txt on July 8th of this year. Did the previous version validate? Are they trying to rewrite history again? What did the old version look like?
If there is even so much as one error, robots/crawlers are free to ignore/parse/merge/break it as they see fit. It happens all the time, and even when robots.txt is perfectly valid, many robots and crawlers ignore it anyway (msnbot and Yahoo's crawlers are two of the worst offenders here).
But back to the first point, robots.txt is a guideline, not a rule, not a contract, and certainly not something that can be enforced. Does lack of a robots.txt file constitute the legal right to publically redistribute the content? Or store it for later review and retrieval? How do you know any of your former employees from 1996 haven't stored your entire website on floppy, one page at a time? Did they adhere to robots.txt? Did ANYONE adhere to robots.txt in 1996? It seems that there was evaluation of the Robots Exclusion Standard in 1996, but was everyone using it? Not likely.
Microsoft Internet Explorer will certainly store the entire website for "reading offline" if you ask it to do so when bookmarking it. They don't parse robots.txt to exclude pages that shouldn't be stored locally.
Its too bad that people need to try to erase history to prevail in litigation. This isn't George Orwell's 1984... well, at least not yet anyway.
Not many people would publish something hugely embarassing and then draw attention to it by suing a popular project.
I've read about 500 analogies on what electronic information "is like".
Every analogy is bad. We cannot equate electronic information with physical information of ages past. Every analogy just plain sucks.
The reason the information age has taken off is because of the ease of transmitting, storing and copying of electronic data. These methods weren't available fifty years ago, and weren't wide spread until about twenty years ago. Trying to stuff these concepts into one-hundred plus year old ways of thinking is just useless.
This does not mean we can't use older solutions to problems to guide us in the future. But, we need to stop shackling ourselves to old ways of thinking. The fundamental way we transmit thoughts and ideas have changed, our fundamental way of thinking about information needs to change as well.
Does this mean "all information is free"? No. But trying to treat electronic information like a book is useless. Web sites are put out to be publicly consumed. It is contradictory to say that someone cannot cache it for non-profit purposes. Trying to reuse the "creative" parts of the web site for commercial purposes should be prohibited.
Bottom line: Stop with the analogies. Start thinking fresh.
See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year
And if you printed out the site, would they want to sue you for "reproducing" that site? Along the same lines, would someone want to sue you because you kept a book you bought 10 years ago and the author had written a new version? This all smacks of this "on demand" nonsense and self-destructing media and even shades of Orwell's 1984 where the Ministry of Truth modifies ancient history when it suits their purposes. This is all part of an attempt of corporations with the complicity of the legal establishment to place absolute control of all media in the hands of said corporations. Which all leads to the fact that it's time for the Congress to enact a Corporation Control Act that would finally put a leash on these rabid idiots.
"Is this Winkhorst a nova criminal?" "No just a technical sergeant wanted for interrogation."
Really? Gee, then maybe Google and Yahoo should stop crawling those sites because God forbid even the metatags could be copyrighted, let alone cached information used to generate search entries (and yeah, I regularly find pages in Google's cache that no longer exist). Maybe they should just remove all the offending entries and render the motherfuckers unfindable by anybody running a search.
It'll work wonders for business.
Oddly, the Internet Archive honours robots.txt, so if you don't want people to surf your archive, you can just post their robots.txt file and it will block everything, even into the past.
I would say that caching and archiving are so well understood to be part of the Internet that posting a web page and not expecting it to be archived or spidered is absurd. In other words, by posting their site to the web without a robots.txt, they knowingly published it in a medium which contains facilities for archiving and later redistribution.
Irritable, left-wing and possibly humorous bumper stickers and t-shirts
Legal decisions are often based on previous decisions, and at this point they are comparing "similar" (though not the same) situations in the physical world with those in the digital world.
In terms of public domain, visibility, and various other terms the analogies aren't bad. It's quite similar to patent cases... just because you do something on the internet doesn't make it unique (which makes for a lot of dumb patents), and really taking an electronic 'snapshot' of a publicly visible webpage shouldn't be any different than taking a picture+photocopy of a physical notice/bulletin/sign/etc in a publicly visible location.
If they'd published the same information on a sign in their front lawn... what's to seperate it from the e-Version other than the fact that one is paint and the other is bytes?
"unfortunately for your argument, the legal truth is that copyright protection is the DEFAULT, so robots.txt has it backwards. the fact of the matter is that to be (more) compatible with existing law, there should be an allowcache.txt, not a robots.txt."
I would agree, but I'm arguing that archiving and redistribution is part of the medium that the copyrighted work was published in. The webmaster of the site certainly knew that the site would be archived. They would also know that robots.txt is a voluntary process.
In other words... if they wanted to make sure it wouldn't be archived, they shouldn't have put it on the web.
The Wayback Machine on the other hand stores copies of pages, not copies of their adresses.
Not exactly the kind of people you want to be defending. The fact that copyright law can be used this way suggests it is broken. The fact that it was created before our modern information economy was formed also suggests it is broken and in need of revision.
Credibility is not something you can easily steal, and the people who "get there first" tend to get the lion's share of credibility, even when competing against companies much larger than themselves.So your cheap shot about how people on Slashdot have nothing to lose is just that, a cheap shot with no substance or truth. We're not saying that copyright laws need to be entirely abolished, but they do need to be updated to reflect our modern society and information infrastructure. The fact that we're compelled to lay unenforceable law after unenforceable law down on top of copyright law in a vain attempt to keep it afloat in its current state should be evidence enough that things need to change.
Excuse me? What? A public website is by its very nature meant to be redistributed. It is replicated on millions of machines per day for many different purposes. If you do not agree to at least some redistribution for your website, then take it offline because it doesn't work without redistribution.Sue the caching proxy servers! Sue people who use internet cache! Sue Proximitron users! Sue link-of-the-day sites because they helped people replicate the data. Wait, why not just sue everyone online, because they were party to the crime by using the same routers!
Nice, avoiding his point entirely. Part of copyright law is the intent with which you distribute it. This helps prevent entrapment scenarios. If you place a public site on the internet, your intent is to have it treated like a public site. This means it will be crawled by search engines, cached by proxies, linked to by interested users, downloaded for personal offline browsing, pre-cached by Earthlink and AOL services, and archived by the wayback machine.This is the cost of doing business on the internet. This is how it works, and how it's going to continue to work. If you are not willing to express this intent with your website, take it down now.
Bear in mind that the article is about a large insurance corporation trying to deny benefits to a group of people, using copyright law as a club to beat evidence into inadmissibility. We're not talking about chinese knockoffs ruining a poor independent artist. We're talking about Yet Another Way Out for corporate America's scumbags.If copyright law allows this, then we need to tear it down and fix it, because I'm not willing to pay such a stiff price for a basic kind of protection that I probably can't afford to fight in court anyways.
Slashdot. It's Not For Common Sense
I respectfully disagree.
The wrongs that would ensue if illegally obtained evidence is allowed to be used are the same as the wrongs that are created by illegally obtained evidence.
If by "the wrongs that are created by illegally obtained evidence" you mean the results of a crime (say, the wrong created by a handgun fired at the cashier of a liquor store during the course of an armed robbery, resulting in the cashier's death); then I would argue that the wrongs are not the same at all.
The idea of not allowing illegally obtained evidence to be used in a criminal trial is to protect 'The People' from abuses by 'The State'. Using evidence that has been illegally obtained may result in the conviction of a guilty person, except that doing so is (usually) found to be a violation of an U.S. citizen's right "...to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures".
I believe the theory is that if the police were regularly permitted to use illegally obtained evidence in a trial, it would moot the Fourth Amendment protections of the Constitution, resulting in something awfully similar to a police state. Think of this: if a police officer knows that any "evidince" they find during the course of an investigation will be permitted in a trial, then there is no check upon their power to search and seize (illegal searches are often the reason why evidence is suppressed). If an officer knows they don't need a warrant or probable cause to conduct a search, what's to stop them from randomly searching ANY person, place, or thing, at ANY time, for ANY reason (under the guise of 'conducting an investigation', of course)?
An innocent person may feel they have nothing to hide, but do you really want the police tossing your home at 3AM because you fit the description of somebody who committed a crime recently? What if that description was merely "black male"? What if you happen to be a black male, and the cops go digging through your home looking for a firearm and find your marijuana stash instead? Oops, now you're gonna go to jail for drug charges, and it doesn't even matter that you were nowhere near that liquor store when it was robbed. The sort of damage that such a system would do to our freedoms is far worse than the damage done to the prosecutor's case by suppressing evidence of a murder/armed robbery that had been illegally obtained.
Furthermore, what if the police don't find anything during the search of your house, but decide that they need a conviction, so they simply plant evidence instead? Things like warrants and chain of evidence are designed to prevent such abuses, but if police do not have to follow those procedures, you can kiss that notion goodbye.
If that's true, we had all better be careful not to visit *too* many pages on a given website during a given day. Either that or make sure that our web browser is set to immediately flush all downloaded content once it has been rendered.
The argument being made is that copyright is being violated, but the way the archive works might well be considered fair use, since the *only* reason it exists is for archival purposes. If having a copy of website content is illegal, in and of itself, then everyone who uses a web browser (unless they're running knoppix or something that doesn't store anything to the HD) is just as guilty as the Internet Archive.
I hereby rescind your permission to copy any of my posts, which means that if you're reading this, you're in violation of copyright law.
Okay, I now release my copyrighted work officially into the public domain. You're safe now.
"Murphy was an optimist" - O'Toole's commentary on Murphy's Law
Putting material on the Internet does not give up your copyright on it, place it in the public domain, grant others the right to reproduce it any way they see fit, or otherwise work differently to copyright laws as they apply to all other media. There are necessarily certain implied rights, but arguing that actually ripping someone else's material and then making it publicly available after they've withdrawn it from their own site is a pretty big stretch to anyone without a vested interest.
Actually, while they do not give up any copyright, there are a number of explicitly stated, legal uses of copyrighted materials and there is a great deal of public benefit to enumerating a few more of them. Can you honestly argue it is not in the public's best interest that a historical archive of the internet exists, for educational reasons if no other? This case should be a poster child for just such legislation. A company published something, lied about it, and are now suing the people who made a copy and proved their guilt. Are you saying it is in the best interests of society that copyrights be used as tool to promote lies and censorship?
Copyright is supposed to be about one thing and one thing only, promoting science and arts. That is the only constitutional provision for its existence. If someone is copying legally obtained works into an archive for educational, historical, or non-profit uses then they are almost invariably helping to promote science and arts, and anyone trying to stop them is up to no good.
As to the letter of the law (which is probably unconstitutional although it is impossible to prove that) you're right. The internet archive is screwed in the U.S. and many other countries. They tried to do what copyright law originally required of copyright holders and the library of congress. If a work is to copyrighted then ethically it needs to be available. That is the whole point of copyright. According to the letter of the law it is probably illegal for me to print out the receipt some e-businesses display when I buy something online. The law needs to be fixed.
In fact, limiting the rights of others to distribute your works in order to encourage you to make them available is exactly what copyright is for, and this sort of case is a textbook example of why the principle matters.
What? How does this limiting of the rights of others encourage them to distribute the material? They, like the majority of copyright holders these days, don't want the work to be available at all. It does not encourage them to publish it, it just gives them a way to prevent works from being distributed.
The archive is in trouble not because the violated the intention of copyright. They, in fact, are trying to uphold the very principals upon which it is founded. Unfortunately, the laws have been changed by the corrupt and greedy to create a situation where copyright does exactly the opposite of its original purpose. This is a perfect example of copyright laws that have been rewritten being used to hold back progress and remove works from public availability. It is unethical and sickening and your implication that a businesses financial considerations should trump both the rights of our descendants to have access to our works and that they trump the the ability to find and present the truth in the courts... well it makes me want to vomit. Go to hell.
Spider: Hi, I'm an Internet Spider, may I access this page?
RT: No, no, one thousand and twenty-four times NO! I will not give this page to a spider.
Spider: Okay. How about this other page?
If this were the case, then the only way of bypassing this mechnism -- and one that would violate the sprirt (IMHO), if not the letter (IANAL), of the DCMA would be for your Spider to not identify itself as a spider. Then it would be trying to trick an Access Control Mechnism.
BTW, it's my on opinion that once you publish a page on the Internet for public viewing, you cannot complain if they've Time-Shifted that viewing to a later point by recording -- ur, saving -- it on recordable media. Seems to me that the plantifs are totally wrong, got caught at it by their own web-postings, and are now trying to kill the messenger.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
There is an overwhelming interest indeed. For many reasons. Such as 90% (or close) of literature, music or movies published are complete, utter, useless, crap which should never seen the light of day and only did because some vast marketing organizations sought to sell donkey manure wrapped in shiney packaging to the sheep known as "consumers". Or that writing is supposed to be art, and not an industrial process, and as such it is supposed to be sponsored by wealthy patrons, voluntary donations and art foundations. If you are a "technical" writer, you are supposed to be doing it under auspicies of academia or technical organisations whose members are financing you. Copyright, Software Patents (and soon storyline patents) and similiar attempts at treating information as if it were capable of being "private property" are perversions of logic, law and artifacts of pure greed. Greed stronger then common sense, science and morality. An all-encompassing greed which threatens to strangle all progress and destroy the humanity itself.
Your reference to the "starving spouse" with fingers "worked to the bone" is a classic propagandist device designed to evoke sympathy for the "poor writer" who is toiling to manufacture yet another piece of mind-vomit in order to "score it big". Tough luck. If you are an artist and what you do is art, find a way to finance it. World could use far fewer inept amateur "artists" and more dedication and quality from those who remain. In case you did not notice, all the art before the age of copyright and even long after (as copyright did not apply to music and paintings for a long time) was produced this way. I'll take Shakespeare, Plato, Da Vinci and Mozart over the likes of Rolland and Britney any day. If you are in it for the money, screw you, find real work and stop lobbying for laws which attempt to rape us all for your benefit. I already consider anyone who thinks Intellectual Property laws are "beneficial" to be either confused beyond hope or a vicious enemy of humanity whose only agendas are his ego and wealth. In either case a mortal enemy of mine.
I know you're all going to find this shocking, but it looks like the
1. Healthcare Associates of Philly sues Health Advocate. The law firm representing the plaintiff is McCarter & English. The defendant's is Harding et al.
2. Healthcare Assoc. modifies robots.txt to tell IA not to allow access to older versions of their site. 3. Harding et al. manages to get the IA to give them a peek anyway*.
4. Healthcare Assoc. sees "rapid fire" requests from Harding in their logs, and a few times the IA slipped up and granted access anyway.
5. Healthcare Assoc. sues Harding et al. and the IA.
*My guess is that the IA checks the current robots.txt everytime an archived page is accessed. If the server doesn't respond quickly enough, the assume it's OK to give access to the archived files. Harding et al. might have realized this and requested the pages over and over in rapid succession to slow down Healthcare Assoc.'s servers enough to trick IA into thinking they're not responding. This is all just my speculation, so take it with a grain of salt.
(Sorry for the crappiness of the diagram. Apparently Slashdot is more concerned with preventing 13-year-olds from posting ASCII art versions of gotse man that will be modded down to -1 in 2 seconds than it is with allowing people to make diagrams to illustrate something. And why the eff doesn't the "ecode" tag work properly?)
You do realize that this is patently incorrect?
Pardon me if this sounds pedantic, but tort law is so misunderstood that I'd like the opportunity to correct this post.
In Common Law countries, all people have a duty to act as would a reasonably prudent person in the same or similar circumstances. A person is negligent if they breach that duty and cause injury to another.
In other words, the city is negligent if it fails to repair a sidewalk that a reasonably prudent person would have repaired. In situations where a party lacks notice of a defect, the same analysis applies: should a reasonably prudent person responsible for the maintenance of a sidewalk have been aware of the defect.
Here's an example: 10 minutes ago, NYC experienced a minor, highly localized earthquake that fractured the sidewalk outside my apartment. 7 minutes ago, I realized I was out of milk. 4 minutes ago, I stepped out of the building, tripped on the damaged sidewalk, and broke my leg in 12 places.
A reasonably prudent city probably wouldn't be able to repair the damage in the six minutes between earthquake and injury. Probably, the city government wouldn't even be aware of it by then. Thus, the city couldn't be negligent.
A more likely explanation is that the law firms are videotaping the sidewalks and sending them to the city government to put them on notice of serious flaws in the sidewalk. Then they can argue that the city was on notice of the defects and failed to act reasonably by not repairing the damage.
But negligence does not require evidence of willful conduct. Negligence is merely a failure to act as a reasonably prudent person under the circumstances.
--AC
Now that you've modded the parent down, you should mod me down too.
Those are just the first few examples that come to mind, but the significance is clear: just because some information was available somewhere at some time, that doesn't automatically means there's a benefit to society to preserving that information in an obvious place for all time.
The answer to problems with information like drafts and trade secrets being public knowledge after being published is simple, don't publish them. If you don't want people to read drafts of unfinished works, don't publish them online. You do realize copyright law, even today in theory, insures that all copyrighted works are to be preserved for the public and given over to the public for all time once it expires right? And how many better authors would we have today if we did have Shakespeare's drafts to look at to help understand his writing process?
I'm going to skip your constitutional arguments, because copyright is an international convention, and most of the world isn't subject to your constitution. Can we agree the more neutral definition that copyright exists to promote the creation and distribution of works for the benefit of society?
Most copyright law in the world is pretty similar to that in the U.S., but fine lets ignore the U.S. constitution. Lets talk about natural versus artificial rights. Freedom of speech is in my opinion a natural right. Copyright is, in my opinion an artificial right, granted as part of an agreement between authors and those who would benefit from said authorship. Authors are rewarded for giving works to the public with the rights to make money. What advantage does a copyrighted work that is not available to the public give to the people who are giving up their natural right to copy it freely?
Your position is illogical. We're talking about material that has already been made available. If it's a work of value, then probably it was removed because the copyright holder was going to distribute it via some other means, or was working on a newer, better version and didn't want the out-of-date material getting in the way. If it's not a work of value, then there is little public interest to be served in preserving it, particularly if doing so causes any harmful effects to the parties involved.
And here is where your argument falls apart completely. You're making a whole slew of assumptions here, most of which are not true. First you're putting responsibility for deciding what is and is not of value to the public into the ahnds of the copyright owner (note in most cases this is NOT the author anymore). Next you're assuming that not only will the copyright owners know what works are valuable to the public, but they will act in the best interests of the public rather than in their own best interests.
You do realize that the vast majority of copyrighted works including art, literature, film, and music are completely unavailable to the average person right? About .05% of all copyrighted books are still in print and maybe 3% are still available either new or used. The same holds true for music. This is mostly because so many works are copyrighted, but no one knows who holds that copyright, or because the large companies that own millions of copyrights don't want older works to compete with current offerings. Is it in the best interests of the public as a whole to have no access to the majority of our artistic, music, theatrical, and literary heritage? How many great works are in those collections, that will never be seen ever again because the last copy is lost and it was illegal for anyone to make more except some company who did not see the profit in it?
If you remove copyright...
I never said anything about removing copyright, only reforming it. For example it used to be that every copyrighted work in the U.S. had to have two good copies sent to the library of congress to be archived for reference and to preserve the work for future generations. Sound familiar? If that law was still in effect
Seeing as how every website is copied to your cache when you view it. Is the problem not that the website/page was copied but rather that it was available for viewing?
Does this mean that everything publicly viewable on the internet may be copied so long as it is not re-shown (or for lack of a better term, re-"published")?
*in this hypothetical the wayback machine does not exist.
For instance, would I have the same legal trouble as the archive if...
Entity A puts up a website on which a crime was committed. Namely, a copyrighted image was shown without the copyright holders (Entity B's) permission.
I had viewed the site and the cached copy is still on my harddrive. I have not re-"published" (re-shown) it on the internet (it is not publicly available). Entity A takes down the site. Later, entity b finds out about the copyright violation, but the site is no longer available in its former state.
Entity B finds out that I have a cached copy left from when I viewed the site. While in litigation entity B asks that my copy be subpoenaed as evidence.
What's wrong with that?
In this case, I am the archive. Why would I be sued? Entity A can sue me because I hold incriminating evidence against them? This whole thing is ridiculous...
Which is the problem:
1) the fact that website was "copied" in the first place or
2) the fact that the "copy" was available to the public?
If it is "2)" then what do we say about the Library of Congress?
Again...the whole thing is ludiculous.