Google's Cache Ruled Fair Use
jbarr writes "An EFF Article states that: 'A district court in Nevada has ruled that the Google Cache is a fair use ... the Google Cache feature does not violate copyright law.' Notable is the basis that 'The Google Cache qualifies for the DMCA's 512(b) caching 'safe harbor' for online service providers.'" From the article: "The district court found that Mr. Field 'attempted to manufacture a claim for copyright infringement against Google in hopes of making money from Google's standard [caching] practice.' Google responded that its Google Cache feature, which allows Google users to link to an archival copy of websites indexed by Google, does not violate copyright law."
So if someone created a search engine which automatically, randomly and non-volitionally searches and caches MP3 files from websites which do not have "no archive" metatag, it's not breaking the law?
When those searched websites disappeared, this search engine may still serve those cached MP3 files for archival purposes?
Uncensored Google results requested and delivered by email
Google wasn't really copying as much as they were archiving the past... Look at Archive.org's way back machine. Same principal.
Most browers have a built in cache. They don't violate copyright law
do they?
This is good news for the Wayback Machine at archive.org. I think the case against google images, and especially google video is a little stronger, however.
Religion for nerds. Stuff that really matters
This should apply to archive.org as well, right?
As far as I'm aware archive.org hadn't had a specific test case yet.
Google's cache is often times the only way to read an article posted on here. Also, its a good resource for people behind firewalls and can only pull up a cached version. It's a good win for Google.
http://religiousfreaks.com/Finally, a frivolous lawsuit that got its just desserts. We can only hope that will herald a new age, where the insanely stupid lawsuits are going to fianlly the death they so rightly deserve.
-
The judge then left the bench, walked over, and whacked the plaintiff and his council on the head with a salami.
This sig, aah-ah, is comin' like a ghost-sig...
avoid a lawsuit
He who knows best knows how little he knows. - Thomas Jefferson
Mr. Field will just have to find another way of making a fortune. "Jackpot Justice" didn't pay off this time.
try { do() || do_not(); } catch (JediException err) { yoda(err); }
So who has a link to the Google cache of the article?
That Mr. Field should start working for the RIAA.
Those of you who do the "yesbutNOCACHEtag" dance have got it backwards to: it's not the responsibility of the copyrightholder to sing to the tune of whatever the latest fad is. Rather, it's the other way around - google should convince people that it's in their interest to put a "CACHEME!" tag.
Anyone an see this was one of those idiotic lawsuits in a cheap attempts to make some cash.
Comment removed based on user account deletion
That overwhelmingly loud noise everyone just heard was the sound of every Slashdot reader gasping at the fact that the DMCA just got used for something positive.
I thought we sold "fair use" a while ago for three magic beans and some DVDs.
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff
wonder if the guy bothered with a robots.txt or used the meta NOARCHIVE - not that actually preventing that was his intent.
i don't mind the google cache at all, what drives me up a wall is what jeeves and other engines do with external pages by sticking them in a frame. so, if you put code in the page to force it out of frames, then engines like yahoo penalize (or drop from the index entirely) for messing with the user navigation.....
You misspelled "large trout."
Forget the fair use analysis, the most important thing here is the success of the "Implied License" claim. Basically, it goes like this: You operate a website. The web was created specifically with the idea that "robots" would crawl across it, and there is a standard well-known way to prevent them from crawling your site. Even more specifically, there's another standard well-known way to keep search engines from cacheing your content. Being on the web but not using these techniques means that you give search engines permission to cache your content.
It's sort of like what happens when you leave a potful of candy at your front-door on Oct. 31st. In theory, you could claim that all those kids who come to your door and help themselves are stealing. But, because everybody knows how Halloween works, you've implicitly given permission for them to do it.
In this opinion, the Fair Use analysis was basically just used as a stopgap of "what little infringement that's left after you account for the implied license is a fair use." If the website had included a robots.txt file, the fair use case would have been much harder to make.
The Implied License is a stake in the ground for "This is the Internet. The rules are different here." IMO, that's a good thing -- there are a bunch of things that just couldn't happen if you had to get explicit permission from every content owner.
The Google cache is absolutely ridiculous. As an individual who has had quite a bit of experience on both sides of the white hat / black hat search engine industry, the cache is NOT a webmaster's friend.
1. The cache removes content control away from the author. For example, a site like EzineArticles.com prevents scraping by using an IP blocking method based on the speed at which pages are spidered by that IP. It is absurdly easy to circumvent this by simply spidering the Google cache of that article instead of spidering the site. Google's IP blocking is far less restrictive, and combined with the powerful search tool, it allows for easy, anonymous contextual scraping of sites whose Terms of Service explicitly refuse it.
2. The cache extends access to removed content, often for months if not years at a time. Google rarely replaces 404 pages (perhaps it is because of their wish to have the largest number of indexed pages). I have clients who have nearly 48,000 non existent pages still cached in google that have not been present in over 14 months. Despite using 404s, 301s, etc. these pages have not yet been removed. Furthermore, Google's often mishandling of robots.txt, nocache, and nofollow leaves webmasters dependent upon search traffic hesitant to force removal of these pages using the supposedly standardized methods of removal.
3. The cache allows Google to serve site content anonymously. Don't want the owner of a site to know you are looking at their goods (think of companies grepping for competitor IPs), just watch the cache instead.
The list goes on and on. But I think the point is this...
Why should a web author have to be technologically savvy to keep his or her content from being reproduced by a multi-billion dollar US company? Content control used to be as simple as "you write it, its yours". It got a little more complicated with time to the point at which it might be useful to use, perhaps, a Terms of Service. Even a novice could write "No duplication allowed without expressed consent". Now, a web author must know how to manipulate HTML meta tags and/or a robots.txt file.
Fair use is for users, for people, not multi-billion dollar companies.
Fight Link Spam with LinkSleeve.org
"Finally, a frivolous lawsuit that got its just desserts. We can only hope that will herald a new age, where the insanely stupid lawsuits are going to fianlly the death they so rightly deserve."
They could have said he was right, but then not awarded any monetary damages. This sets a bad precedent. Copyright used to be automatic. Now, if I don't put the right tag in my html, I forfeit copyright to search engines.
Vote for Pedro
After reading the actual opinion that granted summary judgement, if this same logic is applied to the scanning and offering of search on "real world" materials, Google may be able to withstand lawsuits on the book scanning effort quite well. There are some differences that could create a different outcome, but this outcome was 100% favorable to Google and the idea of indexing and caching of materials to allow such search and reference was solidly defended by the judge.
Sig under construction since 1998.
Ok, no more excuses Slashdot... It's time to start caching pages and preventing the Slashdot effect.
What about web caches? They violate copyright law too! My firefox does it too. Should I use 0 Mbytes of disk space for browser caching?
By now someone must have created a search engine that only indexes sites whose robots.txt tells them not to index. I'm surprised I haven't heard of a particular one. Bet it would raise a few hackles though...
This is no different that being required to put up a no trespassing sign on your property to keep people off of it.
:)
Not everything is "secure by default"
Google gained precedent, but shouldn't have to foot the legal bill for fighting this. The system wrongfully (IMHO) allows these people(gold digger and his ambulance chaser) to put no skin in the game.
What a pity that Chinese citizens will never know if the archived material discusses Tibet or Tiananmen Square. But that's alright, greed trumps decency and ethics anyways. Oh, and I have every expectation that the Google apologists will mod me flamebait, so go on, I've got karma to burn and a deep abiding hatred of evil corporations that aid tyrants that are scared of simple words.
The world's burning. Moped Jesus spotted on I50. Details at 11.
So if I use Google's cache to locate torrents for say I don't know ... King Kong (2005) ... I'm all good?
---- "Logoff! That cookie shit makes me nervous!" - A. Soprano
Sorry, mumbles, but I don't buy it. My money's on the judge being right, and you being a loudmouth with too much time to post over and over, as you have in reply to everybody who argued with you.
And to try to actually contribute something to the debate: So you don't like the "fair use" part of the decision. Fair enough; though as I said, between you and the judge, my money's on the judge as the one who correctly groks what fair use is all about. But that aside, what about the other three points? Most interestingly, what about the "implied license" point?
And what about the judge's assessment that Field was "attempting to manufacture a suit against Google"? That is, this wasn't about actual injury to Field, it was about Field actively looking for a chance to sue someone? Does any of that matter to you?
You act as if copyright exists for the benefit of the copyright holders, and fair use is a bone thrown to the rest of the public by the copyright holders so long as it does not impose an undue burden on the copyright holders. This is backward.
In fact copyright exists for the benefit of the public itself, and fair use is a mechanism by which the public ensures that the bones they've thrown the copyright holders do not impose an undue burden on everyone else.
The police and government may have been acting as if they work for the copyright holders rather than the public lately, but "intellectually" that is not where the responsibilities are or are supposed to lie. If the public cannot make backup caches of publicly available material in case the copyright holder becomes unavailable later, the utility of that material is serverely diminished to everyone else in a way the copyright holder can not fairly claim they have earned. Google happens to run one such cache; that is a coincidence. It is not the only one.
All through your post you're using the language of capitalism, libertarianism, and property. This language means nothing when you're talking about a government-created market interferance like copyright. Copyright is not a form of property.
I'm not sure about that. Although the result of this case seems fair and clearly indicated on several counts, there's a lot that might not apply to archives more generally, so I'm not sure how much of a precedent has been set.
In particular, the case was brought by someone who practically admitted trying to set Google up: he knew about mechanisms like META tags and robots.txt, knew that Google was caching his site, made no attempt to stop them, and indeed actually set up robots.txt explicitly to allow bots to crawl his site. This supports Google's first two defences here, having an implied licence and estoppel.
The most interesting discussion, IMHO, is on the fair use defence. The court considered in a lot of detail whether the use made by Google qualifies as fair use. On the first criteria (how the material is being used), it was found significant that the material was being used for different purposes in the cache than on the original site: the latter was presumed artistic, while the former allowed access to the material when the original site was down, historical comparisons of the site content, highlighting of search terms that made a page relevant to the user's search, etc. Hence the court concludes as follows:
The court also noted that Google made no attempt to profit from the display of the material, did not attach advertisements, made clear that the copy could be out of date, and linked clearly to the original source. (I wonder whether that non-profit, no-ads observation will come back to kick Google later...)
The other fair use discussion is less interesting, although the fact that the plaintiff had made his works available for free and not made any other attempt to profit from them was important, because this meant the market value of the original hadn't been damaged. One interesting tidbit is that apparently the SCOTUS has ruled that the fourth fair use factor (any damage to the market/value of the original work) can't be used to argue that the copyright holder could have licensed an otherwise fair use (such as the caching here) and thus the use can't be fair.
Some of the DMCA defence stuff could have quite significant implications. In particular, the fact that Google caches material only for a fairly short time (14-20 days is mentioned) is relevant, since a prior ruling about Usenet servers could be used.
In summary, Google would basically have won out on four different defences here, even without the fact that the original use might not qualify as direct copyright infringement (since the plaintiff went after the downloading done automatically in response to users; he didn't go after GoogleBot's initial copying process that caches the site on Google's system). It doesn't seem at all clear that a lot of the arguments would apply to other caching services, though: amongst other things, Google's cache in this case is temporary; known to the plaintiff, who had not tried to stop it and actually encouraged it; not for direct profit nor carrying any advertising; and clearly not damaging the market value of the original works.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
http://www.snopes.com/language/notthink/deserts.ht m
:| *eyes watch*
Slashdot requires you to wait at least 15 seconds before blablabla
If the kind of damage you're talking about were actually happening here, I'd agree with you, but note that the judgement relies on (among other things) Google not displaying ads with the cached page or otherwise profiting from it, the originals not generating any income for the copyright holder, and the fact that the plaintiff was well aware of the conventions that could be used to prevent his site being copied and in fact used robots.txt to request quite the opposite.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
What did you expect from a guy who is also a lawyer? Shakespeare was right 500 years ago, and it hasn't changed yet.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
Your summary is rather misleading. The court also knew that the plaintiff was well aware of those preventative mechanisms and opted not to use them. In fact, he deliberately set up robots.txt so that his content would be considered. The same might not be true at all for Ma and Pa AOL's family homepage.
I'm not sure that's a good thing at all. Firstly, I don't think the Internet per se should receive any special treatment in law. The activities it facilitates -- such as making and distributing digital copies of works very quickly and cheaply -- may have dramatic implications, but then any applications of the law should be based on those activities, not on one particular medium.
Secondly, there are also a bunch of things that will go wrong if content providers have to play ball with anyone and everyone who wants to run any service that uses their site's content in any way based on this implicit permission. Unless there are a very small number of universal and legally enforceable mechanisms (which META or robots.txt could become, but aren't at present in any jursidiction AFAIK) the implied licence is a very dangerous precedent.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
So they are dependant on google giving them traffic for their livelyhood, and complain when they are not capable enough ( or are too cheap to hire someone who is capable enough ) to bother properly forming their content to the medium through which it is transmitted.
robots.txt may not be a law off the web, but neither is the specification for hypertext transfer.
One would think that a 'web' author should at least know how the 'web' works before authoring their precious little IP there.
They're there affecting their effect.
And so did everybody else's browser that ever visited that site. I'm sure he'll want to sue us all next.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
Mirrordot and company do a decent job, but too often they don't cache enough (like Pages 2-5 of a story), and having it official would be great for users and would-be slashdotting victims. ... though this does bring potential advertising revenue into perspective; good for OSTG, bad for article hosters.
Use my userscript to add story images to Slashdot. There's no going back.
You, with your liberal anything-goes views, contribute zero value to the society by using the web. The GPP, with his desire to prevent certain detrimental activities, is contributing content that is presumably of value to some people even allowing for his wishes to control certain behaviour. Which of you do you think the law should support here?
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Comment removed based on user account deletion
By putting something out there for all and sundry, you would appear to this non-lawyer, to be giving an implied license to read and use it.
Because you can, in fact, say that you don't want it cached, the fact that you are negligent in doing so is your own fault and not that of a third party. The burden on Google's shoulders is far greater than the one for responsible webmasters, therefore it is reasonable that the webmaster, the only one in a position to know whether or not they wish to give permission, should be given that burden. This goes double when one may request that Google remove it after the fact without need of a lawsuit.
Furthermore, it is a transformative use--it has the search terms highlighted, links to the original page, and tells you that it is not Google's. It even gives "convert to HTML" links as well as translations in some cases. In these cases *even if* the whole work is copied, it CAN still be fair use. And this is a good thing. Because otherwise I'd just tell you that you cannot cache this work right here--you know, the one you're reading right now. Oops, your browser already automatically did that for you? Too damn bad. As the copyright holder, I can license it any way I damn well please. Thankfully, the Court was more sensible than that in this case.
Moreover, it's a further financial drain on Google rather than a profit center. True, it's one of the great features that makes Google popular, but it's not something they make money on in and of itself--they don't put any ads on the pages they cache or anything like that.
Lastly, I am not a lawyer and this is not legal advice. Even so, Google is promoting the progress of arts & sciences here with this service. That is the ONLY legitimate reason for copyrights, and insofar as that aim is not being served copyright law should be curtailed. Money is one of the ways it is intended to promote them, but insofar as money gets in the way, too bad. No one has a right to money due to some artificial legal construct. And that includes me, even though I am a programmer, an author and have been a musician (albeit not a very good musician).
Google's search engine works exactly like any human browsing the web -- it scans webpages and follows links. In order for it to see MP3 files, those files have to be made available for public download already. If that is legal then caching those files also is, arguably. If the files were posted illegally, the fault is not Google's.
Of course Google Cache does not actually copy graphics, as far as I am aware, and it certainly wouldn't cache MP3 files. Archive.org is a more complete cacher.
~CGameProgrammer( );
To quote, it requires that the cache exist "for the purpose of making the material available to users... who... request access to the material [from the originating site]." We understand what this is referring to - an ISP or a web proxy intercepts a web request and serves a cached copy of a page rather than getting a fresh copy. However, Google offers up its cached pages to people as an alternative to requesting the material from the originating site.
The judge's claim is that some people use the Google cache after unsuccessfully trying to load the original site (e.g. b/c of the slashdot effect). However, the user does not need to request the original page to load the page from the Google cache.
It seems clear that the law allows caching that is transparent to the user, while Google has gone to great pains to make their caching as untransparent as possible. On the other hand, this seems to have strengthened their fair use and estoppel claims.
I was always under the assumption that there was a reason for this net thing being referred to as the "information highway"---never have I heard it hyped as the information tollroad.
Down With Slashdot BETA!!! I've been around the corner and seen the oliphant; you can only abuse me from your perspecti
A similar argument was thrown around a lot back when we had those stories about people being arrested for using unsecured WiFi networks without permission.
The argument works just as well there: You operate a wireless network. Wireless networks were created specifically with the idea that clients would try to connect to nearby networks (automatically, in some cases), and there are multiple standard well-known ways to prevent strangers from connecting to your network or accessing the internet through it. Running a wireless network without using these techniques means that you give strangers permission to connect to it.
It didn't seem to fly back then, though. Does this mean courts will now take the "implied license" argument more seriously when applied to other technologies besides HTTP?
Visual IRC: Fast. Powerful. Free.
This ruling comes as I have been asked to remove a web page from my personal web site. The page was a copy of an obituary that appeared in January of 2002. It was a touching tribute to Michell Robotham, who was killed on 9/11 in the attacks on the World Trade Center.
The newspaper deleted all the memorials, so had I chosen to just link to the newspaper web site, the link would have not worked. I have agreed to remove the copyrighted material, but let them know that I cannot control what Google has cached, and that they would have to contact Google to removed their cache of the page.
I wrote about my run-in with copyright laws on my blog.
The link will be removed tomorrow, but will live on in Google.
"Let us raise a standard to which the wise and honest can repair" - George Washington
..since it means that I can now create a search engine that caches and displays Google's results.
I wonder how long it would take for Google to cry foul for doing exactly what they do ?
The poor boy was threatened with lawsuits. I don't know whether he tried to fight them or not, but he did get shafted. If memory serves correct, he settled and the RIAA or whoever squeezed him for as much as they could (which wasn't very much). Hopefully this case with Google sets a precedent that may be used in other cases.
I don't care *what* someone's ToS says. If you need a ToS, then you can't allow anonymous access.
But you can, and you are wrong.
A web site is a work that can be protected by copyright. As such, a person holding the copyright in a website can set terms and conditions as to who, if anyone, may distribute and reproduce works embodied in the site, orthogonally to how he or she makes the site available from a purely technical standpoint.
This is well established in law. In fact there are cases which are exactly on point, such as Intellectual Reserve v. Utah Lighthouse Ministries. That case dealt with someone who had, effectively, cached a copy of someone else's website, and was found to be infringing for doing so.
More than that, that case established (although it is not the only one) that people who merely view a web site infringe the copyright of the author, absent some permission to do so. (The reason is based on reasoning in MAI Systems Corp v. Peak Computer, which sets out that storing a work in RAM is making a copy under the Copyright Act definition).
Now, the idea that implicit licenses exist is also established in law. However, the scope of any "implicit license" is going to be quite limited, and it is not clear that a permanent and unauthorized -- and "opt out" --- cache is at all justifiable.
The instant case was decided as it was because the lawyer in question basically orchestrated the circumstances to bilk Google for some money. The court found this to be an abuse and sided with Google. If the lawyer's case wasn't such an obvious scam, I doubt it would have went the way it did. On appeal, I have no doubt parts of this ruling would be overturned.
'nuff said