On the Google Book Scanning Project and the Library We Will Never See (theatlantic.com)
For a decade, Google's enormous project to create a massive digital library of books was embroiled in litigation with a group of writers who say it was costing them a lot of money in lost revenue. Even as Google notched a victory when a federal appeals court ruled that the company's project was fair use, the company quietly shut down the project. From an article published in April this year: Despite eventually winning Authors Guild v. Google, and having the courts declare that displaying snippets of copyrighted books was fair use, the company all but shut down its scanning operation. It was strange to me, the idea that somewhere at Google there is a database containing 25-million books and nobody is allowed to read them. It's like that scene at the end of the first Indiana Jones movie where they put the Ark of the Covenant back on a shelf somewhere, lost in the chaos of a vast warehouse. It's there. The books are there. People have been trying to build a library like this for ages -- to do so, they've said, would be to erect one of the great humanitarian artifacts of all time -- and here we've done the work to make it real and we were about to give it to the world and now, instead, it's 50 or 60 petabytes on disk, and the only people who can see it are half a dozen engineers on the project who happen to have access because they're the ones responsible for locking it up. But Google seems to be thinking ways to make use of it, it appears. Last month, it added a new feature to its search function that instantly connects you with eBook data from libraries near you. From a report: Now, every time you search for a book through Google, information about your local library rental options will be easily available. Yeah, that's right. Your local library not only still exists, but it has eBooks, which are things you can totally borrow (for free) online! Before, this perk was hidden somewhere deep within your local library's website -- assuming it had one -- but now these free literary wonders are all yours for the taking.
Well, actually, isn't the problem that they want to sell it / use it for commercial purposes? If Google simply wanted to put this on the web for absolutely free, with no links to anything else, couldn't they?
I thought it's only when you're trying to sell something that these issues arise.
An endeavor of this magnitude must have some type of value. You don't want to just give this away to the world. Corporations are not idealistic or altruistic. Once someone figures out how to extract some of the value from this collection, it'll be back.
I saw this go by back in April and was made sad by it. Now I am being made sad by it again. I wonder how hard it would be to crowdsource the same work. Like, just have everybody who thinks this is a tragedy do 10 books, and see how many that adds up to. The Google OCR API is available for use, and I think they may even have open sourced it so you don't have to run it in the cloud.
They have a great corpus to train their AI with now. Maybe the best in the world.
They were able to scan the books and data mine all the text. Why would they want someone else to be able to do the same?
I'm sure others will note... Google almost certainly just wanted the data. Why would they need/want anything else out of the arrangement?
There is no XUL, only WebExtensions...
This and many other wrongs have happened because publishers, the RIAA, the MPAA, and especially Disney have been able to bribe lawmakers and buy extremely insanely long extensions of copyright. Works that should have long ago been in the public domain are being kept under copyright to the great detriment of our society. These same entities listed above are also doing everything that they can to eliminate Fair Use, and Right of First Sale. All in the name of price gouging and insane levels of uncontrolled corporate greed! Copyright (and patents) need to be limited to 5-7 years with no extensions at all for any reasons. Works then need to go permanently into the public domain, never to be put back under copyright under any circumstances whatsoever!
The purpose of Copyright has been totally distorted from its original purpose, which was to give the creators of the copyrighted work a limited time to profit from that work. Now Copyright has been extended to such an insane extent that it doesn't expire until the creator, their children, and even their grandchildren (in many cases) have passed on! All so the entities listed above can profit more and longer. And if these entities had their way, Copyright would be forever, never expiring at all, and there would be no Fair Use or Right of First Sale!
I think what happened is they got 1 terabyte in and realized that the data started to repeat over and over...and over.
Lodragan Draoidh
The more you explain it, the more I don't understand it. - Mark Twain
Hey Google, use some of that vast money stockpile to undo the damage that companies have been doing to Copyright laws. Get some reductions in copyright duration to something more reasonable (15 years!) and then you'll be able to release the vast majority of your scanned books.
----------------------------------- My Other Sig Is Hilarious -----------------------------------
So is it possible Google is shooting to secure a place in history?
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
I wonder how hard it would be to crowdsource the same work.
Project Gutenberg has been at it since the 70's. But they currently only have 54.000 books, not a whole lot compared to Google's 25 million books.
Down with the creators seeking to control their creations! How dare they?..
In Soviet Washington the swamp drains you.
Getting to see the books is not what Google Books is for. It was never what Google Books was for. You've bought into the fallacy promoted by the Authors Guild, who came in after the fact and tried to wangle their lawsuit against Google Books into an orphaned-works library without actually having any authority to do so. Google shrugged and went along with it, because why not, but it was never what they had intended.
From the very beginning, Google Books (nee Google Print) was intended to populate a search database so people could search within paper books as easily as they could search within the web. If the book was still in copyright, then finding that book to read was the searcher's problem. (Interlibrary loan works a treat.) Google was very straightforward about that in early blog posts and publicity about the project. Don't blame them for falling short of the Authors Guild's goals. Those goals were never theirs to begin with. See the link in the first paragraph for more information.
Editor Emeritus and Senior Writer, TeleRead.org
Erm, it's a LOT of effort to scan a book on a regular scanner. 99% of people have flatbed scanners, and if you are the 1% who have self feeding scanners you would have to separate all the pages first (destroying the book in the process). That being said people are doing it, there is a place on IRC (internet relay chat) where you can pretty much find any work of fiction produced (google it, I would rather not have the details indexed by google and associated with me). What I have trouble getting my grubby paws on are non fiction books. Still haven't found a central place for those, end up having to fire up a VM and dig through 20 million dodgy websites before I can find what I am looking for. Oh yeah, be warned - a LOT of the books have OCR errors, some have been proofread and corrected, but not a lot. Some have loaded the text into word or some other spell checker and clicked "Autofix spelling and grammer", and we all know how well that works.
There are three kinds of falsehood: the first is a 'fib,' the second is a downright lie, and the third is statistics.
Gutenberg is curating and only scanning things that are out of copyright. Very useful work, but not the same thing. I'm talking about having a database and essentially gamifying the process, with the goal of seeing how many titles we can get, rather than the goal of getting the stuff people think of to add.
What is stopping Google from operating as a library? For each city have a pool of ebooks that users can borrow for a week. They could have books that you can borrow for 1 min for search purposes. It should be cheaper that publicly funded libraries.
Google Books helped me find books from 1838 that mentioned ancestors of mine by name and what they were doing. This is priceless to me.
The problem is that they want to *give* it to the world, instead of paying writers for their work. The US court has agreed for some weird reason, but foreign courts have not, and rightly so. Writers want to get paid for their work, just like you! They just happen to get paid in royalties, not hourly wages. Google wanted to be the only one to profit (from ads I might add).
So yes: the library can be available to all, but once Google is willing to pay the writers.
Amusing quote, and what's even more ironic, in the context of this discussion, is that you didn't bother to credit the author:
J. K. Rowling, Harry Potter and the Deathly Hallows (Chapter 25).
So, your worldview is apparently that not only should authors not be paid, they shouldn't even be credited.
http://www.geoffreylandis.com
In a world where every book, every music recording, every movie, tv show, all media is readily available for free somewhere on the internet, there's not enough hours in the day to read/listen to/watch all of it. And that's just what's in English. It's a noble cause but in the final analysis over 90% of it is not worth the time or effort. The totality of human knowledge is a real mess.
Throughout much of history artists and musicians got full pay up-front for their work instead of this BS about getting paid only after some middle man feels they've siphoned away enough of the work's value.
If you need an entertaining lesson on the history of this, at least go watch "Amadeus" and learn a bit about how money-grubbing Mozart was.
I actually read that as "dead authors don't need to get paid, copyright shouldn't outlive the author". I suppose I could stretch it to imply that copyright should be more limited than that, as well; say, the 14 years it was originally. And remember, when copyright was 14 years, printing and distribution were much slower than what we're capable of today. A book that would have taken a year to go to press and be shipped across the globe can now arrive on everyone's shelf tomorrow; if anything, that should further shorten copyright terms.
APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
For thousands of years, authors, artists and musicians didn't expect to get paid for their work, and they did it anyway.
And for thousands of years peasants starved to death in years when the local harvest was poor, and died of disease when a plague passed through. And, more to the point, had their stuff taken away by anybody who passed by who was equipped with swords, spears, arrows, and armor.
Your point is that ancient societies were somehow better than ours? That societies for thousands of years condoned slavery, so we should, too?
http://www.geoffreylandis.com
is that many of the books have unknown copyright status. Meaning no one agrees on who holds the copyrights. Google had an idea to start a blind trust. So that whenever a court makes a final ruling that person, corp, entity, family, or random winning kangaroo would be able to then collect any money made off of these Out of print books. So let that sink in. These books are never being reprinted. Not because they are dangerous, not because people are argueing over who owns the rights. But because everyone else in this scheme is worried that google will have a monopoly on OoP books. So then their legal copyright conundrums might wind up in a pocket book they don't control.
TL/DR;
Fark these arseholes who scrape the corpses of dead writers. I for one vote that Google finishes their project then makes it available to the world at large. A few scholars will rejoice, most folks won't care & some arsewipes will file legal motions that in the end pay some lawyers to do nothing.
Still TL/DR; I am not a big fan of Google, but screw the butt munches that have gagged this project. & screw Google for stopping the project for these butt munches. ////Spleen Vented!
As an an author, Shirley you can do better than introducing a straw man into the argument? The poster did not make comment about living authors, so it ain't reasonable to criticise him for your unsupported inference.
Muslims preserved the knowledge of Greek and Roman cultures while Christians were busy burning it. In fact by the time of the Muslims conquering Egypt the Christians had held sway for centuries in Egypt and the library of Alexandria was long burnt.
**Life is too short to be serious**
Meanwhile, archive.org is scanning a thousand new books every day and nobody's writing news stories about it...
I repeat something I said back then: In my opinion, Google would be providing a much greater service to mankind if they scanned the enormous amount of books that are now completely public domain, and not just books that were published for the general market within the last 200 years. There must be hundreds of thousands, maybe even a few million, books, scrolls and tablets sitting tucked away in private libraries, monasteries, temples. the Vatican archives, museums, the British Admiralty archives and so on. As an example, I suggested sending one or two technicians to some remote monastery with a solar powered, multi-spectral scanner (multi-spectral in hopes of finding previously unidentified palimpsests) and paying the resident monks some small fee per page that they scan in. (having the monks do the scan would ensure that the effective content owners get final say in what gets brought into the public eye).
From there, Google could put the raw visible spectrum images out there for free access, and charge fees for additional spectra, OCR processed and searchable text and auto-translated data. Done right, even the field technicians could be essentially free for Google, since there are numerous graduate students and researchers who would love to get their hands on this stuff.
I need a wheelchair van for my son. Help me get the word out. https://www.gofundme.com/wheelchair-van-for-jj
Completely irrelevant - copyright law doesn't care if the book is out of print or not.
Another irrelevancy because it sidesteps the half of the books that are still in copyright - and which Google planned to distribute anyway.
And again you leave out the relevant point... Normally, it's the responsibility of the person wishing to reprint material to seek permission to do so. Google wished to turn this idea on it's head, to be free to distribute the material and only on the hook to pay for it when the owners of the material found out that it was being distributed.
Not to mention, they Author's Guild didn't have standing to make a deal with Google in the first place.
No, it was win-loss-loss. Google won the right to turn the law on it's head and profit thereby. The public lost because the deal practically ensured Google a monopoly on the material. (The agreement only covered Google, everyone else would still be bound by the law.) The authors lost because now the onus was on them to seek recompense from a third party (the Author's Guild) rather than the infringing party (Google).
Most people don't know that there are a LOT of dark archives out there. They're used to back up journals and rare books to ensure that they should something happen (publishers go out of business, fires, etc.)
I saw a talk once about a dark archive for music research. (I think it was at Research Data Access and Preservation, but could've been ASIS&T). They allowed people to submit jobs to run against it, but it was important that the results couldn't be used to recreate the music (possibly in conjunction with other results), as that could violate copyright.
It would be nice if Google would do something similar. It could be used to find when words and phrases were first used (although maybe not in context, but could give a reference), etc
Build it, and they will come^Hplain.
The second part of this post states:
"But Google seems to be thinking ways to make use of IT"
"Last month, it added a new feature to its search function"
How do these statements relate to the library of books that we cannot see, that is the subject of the first part ?
If you don't understand how these are related, then you don't understand what Google Books is about. Google Books was never intended to give you a digital copy of the book. Google Books was designed to index paper books and returned those along with the search results. Google Books wasn't designed to be a digital library but rather to allow you to search the paper books at your local library as easily as the web. This new feature is exactly what Google Book's original purpose was. It's like the digital index that sometimes comes in the front cover of a paper reference book. It's designed to allow you to easily find something in the paper book.
Google isn't innovating here. Overdrive has been around for quite a while and provides a very nice search interface showing which ebooks are available at your selected libraries. Also considerable integration with local libraries appears to be happening.
You are proposing that copyright caries a compulsory right to grant licenses in perpetuity.
Yes, it actually does. That's the whole point of copyright; that, in exchange for a time-limited protected monopoly on the authors' work, the work is granted in its entirety to the public once the copyright period expires.
Why would anyone engrave "Elbereth"?
How did authors make money before copyright? I mean, written works predate copyright, so someone must have paid for them, right? The original 14 years was a gift to authors, as it allowed them to earn a bit more than the initial writing would afford them, while balancing against the greater good of an enriched society via the public domain.
If an author hasn't made anything in the handful of years before their work goes out of print (and that's a smaller handful if it's not selling), they're not going to make anything on that work before they die and their family isn't going to make anything on it in the 70 years that follow. Because it's out of print. Because it wasn't selling.
If you haven't made a profit in 14 years, you're not going to. If you haven't made something else profitable in 14 years, I should say you've not contributed enough to society to deserve to continue profiting.
Copyright is what gets me paid, by the way. If it took me 14 years to profit off of my work, I'd fucking starve.
APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
Take two sheets of glass, tape them into a V shape with cardboard to hold them up, place the book open on the V, take a picture from below with your cell phone camera. Repeat for each pair of pages.
Imaginary Property is theft. Culture belongs to the People - it is not the personal property of degenerate capitalists.
It must be very obvious to everyone now, that ownership of ideas makes the whole world needlessly stupider, and should be ended now.
Until these badlaws are removed we must honor those heros like Alexandra Elbakyan who are expropriating scientific knowledge from the rich horders and and freeing it for the enlightenment of the whole people.
Here's my proposal how to fix the major flaws of copyright while ensuring that authors get paid:
Replace copyright with payright. Here's what it means:
The author gets a right to a clearly defined slice of revenue (e.g. 20% by default) from every commercial use of their work. If you register your work in a central registry, you get to set the percentage yourself and commercial users will have contact you. If you don't register it, statutory default applies and commercial users will just need to hold your slice of revenue in escrow until you contact them.
You as an author don't get to pick and choose who may or may not use your work. Anyone can use it as long as you get your share of revenue and they take care not to damage your reputation. You may not play favorites by charging some users less than others, either. Non-commercial use would be completely free in every sense of the word.
Here's my proposal how to fix the major flaws of copyright while ensuring that authors get paid: Replace copyright with payright. Here's what it means: The author gets a right to a clearly defined slice of revenue (e.g. 20% by default) from every commercial use of their work. If you register your work in a central registry, you get to set the percentage yourself and commercial users will have contact you. If you don't register it, statutory default applies and commercial users will just need to hold your slice of revenue in escrow until you contact them.
So, you're saying that I can put up a site that makes the work of all the bestselling authors in America available for free, and the bestselling authors will get nothing. Because in your view they don't own their work, and aren't allowed to decide what their work is worth, or even if it is worth anything at all.
Why do you think this is good?
http://www.geoffreylandis.com
Requiem for the American Dream
Proof by first derivative. Works every time. No dilemma ever.
Er, um, hold the presses.
No Google books: authors control most revenue, no soup for Google.
Partial Google books: one author rats out the other (economically) by signing up. Author who signs up wins, author who holds out loses. Plenty of canned alphabet soup for Google.
Full Google books: Google creams almost the whole of the economic surplus due to better consumption matching, authors left in roughly the same place (though a smaller piece of the whole pie). Cream of truffle soup for Google.
Society usually ends up deciding these matter in the large by a process of fait accompli.
Sun on Privacy: 'Get Over It' — January 1999
It's so routine that McNealy completely forgot himself in his rush to get their before the fait accompli paint was dry.
The judge decides that the authors have already lost the power game, crosses that cell off the game theory matrix (out of superficial prudence), and then—Lo and Behold—corporate America wins again.
We shoot ourselves in the foot by claiming victory for network effects that aren't network effects.
This is a power heuristic, make no mistake about it. With enough power, no network required (though of course, actually having a network does tend to boost power, as well).
I've bought books because of Google Books service that let me look inside a book and see that it's going to be useful for me. Shutting down GB means closing this channel for you as an author. A stupid move, I would say.
I agree. But it should be your choice to decide what and how much of your work to give away for free, not somebody else's.
Your work, your decision.
http://www.geoffreylandis.com
One thing authors often do is sell some sort of first publication rights to get some cash sooner rather than to wait for the royalties. One thing magazines etc. do is to buy first publication rights so they can ensure that they can publish before anyone else, rather than having the December issue come out with a featured story that the competitor had in their November issue.
So, this would reduce the desire to publish an author's works and reduce the amount of stuff published, while adding overhead to the whole system. Doesn't look like a win to me.
Also, it completely sidesteps the copyright issue. How do we make sure the 20% is paid? How do we deal with illicit free copies potentially hurting the commercial market?
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
First publication rights would still exist. The rules I've described above would apply only to works that have already been published with author's consent. A short period of commercial exclusivity (only a few months) would also be acceptable, but only if you register your work.
There will be no such thing as "illicit free copies". Non-commercial use will be completely legal. Period.
The main problem of current copyright system for potential commercial users is that it's ridiculously difficult and expensive to actually get the copyright holder to take your money. Eliminate this hurdle and commercial pirates will be drowned out by legitimate services.
And we're back to the question of how we assure that authors have a good chance to get paid. With free eBooks readily and legally available, who's going to buy a copy when they can wait a few months and get a free one? Either we have a reasonably long period of exclusivity, or we need to find another way to pay authors.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
And we're back to the question of how we assure that authors have a good chance to get paid. With free eBooks readily and legally available, who's going to buy a copy when they can wait a few months and get a free one? Either we have a reasonably long period of exclusivity, or we need to find another way to pay authors.
I'm all for experimenting with other ways of making money. Most of them are currently blocked by copyright bureaucracy. Why should the law prefer selling ebooks as if they were physical goods over other business models?
Selling PDFs is not that much more expensive than selling dead tree editions. We've had people connected with publishers post on Slashdot before. The bulk of the cost of a dead tree with ink on it is amortizing the same expenses ebooks would have to amortize. The additional cost of a hardcover book is mostly a premium to read the book before the paperback comes out. The demand for books is fairly inelastic. I'll buy lots of them, in some form or another, but not that many more than I can read. Other people will buy no books, no matter how inexpensive. The number of books sold doesn't depend that heavily on price.
Selling ebooks for $3 and still providing editing and the other services that turn the manuscript into a polished, readable, book isn't going to make it. The cheap ebooks are either from the public domain (already edited books with no need to pay the author) or ones without significant editing.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
I'm in favor of experimenting until we find a better way. I'm not in favor of cutting off the current way we pay authors without finding and implementing something better first. Not just books, but all copyrighted materials are sold as if they were physical.
Currently, the system allows the following:
The downside is that it restricts how many people can enjoy a book and requires enforcement. This seems to me to be reasonable for the benefits gained, and I'd like any new scheme to be an overall improvement.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
Take another look at the bigger picture of the current copyright system:
The current copyright system is an evolutionary dead end. There is no way to improve it that would fix the above problems. The only way forward is to redesign the whole system from scratch.
Publishers are not necessary in the copyright system. You can always self-publish. Back when books were all made of dead trees, there were "vanity publishers". Currently, it's really easy to self-publish through Amazon and perhaps Barnes & Noble. The advantage, to an author, of using a regular publisher is that the publisher will provide services like editing and good formatting and proofreading (not necessarily doing it well), publishers have publicity channels ready to go, and publishers will often absorb some of the risk.
Moreover, copyright doesn't forbid competition. If you and I both write books that have similar premises and plots, we can both publish. It does forbid copying, so I couldn't take your book, change a few things, and publish without your permission. I don't see that as a problem. If you do, could you explain further?
DMCA takedown notices don't quite work because there's no penalty for filing zillions of ones that don't apply. (There's a possible penalty if you claim copyright over something you don't have the copyright to, but not for saying that my symphony infringes on your /. post.) There's also the issue that people have come to think they have rights over sites that display their work (and perhaps monetize it) for free.
Copyright law can be changed so it can't be used to restrict ownership of anything else. Such claims violate the old principle of copyright law that it can't restrict you from doing something, as long as it's your own words or whatever, and the courts have not been entirely friendly to these claims.
So, if the DMCA were amended to have penalties for filing frivolous takedown notices, and if there were restrictions on copyright law explicitly saying it can't be used to restrict ownership, that would satisfy all of your complaints.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
Pardon me, I meant that PDFs aren't that much less expensive than dead tree books.
As I said, people who know the figures for costs of books have posted them on Slashdot. The economics weren't all that accurate, but publishers need to make a lot of money on individual book sales to cover fixed costs unless the book is a best-seller or something.
There are people whose business it is to know the demand curve for books, and it turns out to be pretty inelastic. If a book from a real publisher is priced pretty much as similar books are priced now,, halving the price is not going to double the sales. I haven't seen anything on the demand curve for cheaply written self-published books, but that has little to do with published books.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes