On the Google Book Scanning Project and the Library We Will Never See (theatlantic.com)
For a decade, Google's enormous project to create a massive digital library of books was embroiled in litigation with a group of writers who say it was costing them a lot of money in lost revenue. Even as Google notched a victory when a federal appeals court ruled that the company's project was fair use, the company quietly shut down the project. From an article published in April this year: Despite eventually winning Authors Guild v. Google, and having the courts declare that displaying snippets of copyrighted books was fair use, the company all but shut down its scanning operation. It was strange to me, the idea that somewhere at Google there is a database containing 25-million books and nobody is allowed to read them. It's like that scene at the end of the first Indiana Jones movie where they put the Ark of the Covenant back on a shelf somewhere, lost in the chaos of a vast warehouse. It's there. The books are there. People have been trying to build a library like this for ages -- to do so, they've said, would be to erect one of the great humanitarian artifacts of all time -- and here we've done the work to make it real and we were about to give it to the world and now, instead, it's 50 or 60 petabytes on disk, and the only people who can see it are half a dozen engineers on the project who happen to have access because they're the ones responsible for locking it up. But Google seems to be thinking ways to make use of it, it appears. Last month, it added a new feature to its search function that instantly connects you with eBook data from libraries near you. From a report: Now, every time you search for a book through Google, information about your local library rental options will be easily available. Yeah, that's right. Your local library not only still exists, but it has eBooks, which are things you can totally borrow (for free) online! Before, this perk was hidden somewhere deep within your local library's website -- assuming it had one -- but now these free literary wonders are all yours for the taking.
Well, actually, isn't the problem that they want to sell it / use it for commercial purposes? If Google simply wanted to put this on the web for absolutely free, with no links to anything else, couldn't they?
I thought it's only when you're trying to sell something that these issues arise.
An endeavor of this magnitude must have some type of value. You don't want to just give this away to the world. Corporations are not idealistic or altruistic. Once someone figures out how to extract some of the value from this collection, it'll be back.
It's such a wealth of knowledge that's so valuable to them I doubt they'll stop. It can be used for training their AI by having their AI consume vast quantities of knowledge and literature.
I saw this go by back in April and was made sad by it. Now I am being made sad by it again. I wonder how hard it would be to crowdsource the same work. Like, just have everybody who thinks this is a tragedy do 10 books, and see how many that adds up to. The Google OCR API is available for use, and I think they may even have open sourced it so you don't have to run it in the cloud.
They have a great corpus to train their AI with now. Maybe the best in the world.
They were able to scan the books and data mine all the text. Why would they want someone else to be able to do the same?
I'm sure others will note... Google almost certainly just wanted the data. Why would they need/want anything else out of the arrangement?
There is no XUL, only WebExtensions...
This and many other wrongs have happened because publishers, the RIAA, the MPAA, and especially Disney have been able to bribe lawmakers and buy extremely insanely long extensions of copyright. Works that should have long ago been in the public domain are being kept under copyright to the great detriment of our society. These same entities listed above are also doing everything that they can to eliminate Fair Use, and Right of First Sale. All in the name of price gouging and insane levels of uncontrolled corporate greed! Copyright (and patents) need to be limited to 5-7 years with no extensions at all for any reasons. Works then need to go permanently into the public domain, never to be put back under copyright under any circumstances whatsoever!
The purpose of Copyright has been totally distorted from its original purpose, which was to give the creators of the copyrighted work a limited time to profit from that work. Now Copyright has been extended to such an insane extent that it doesn't expire until the creator, their children, and even their grandchildren (in many cases) have passed on! All so the entities listed above can profit more and longer. And if these entities had their way, Copyright would be forever, never expiring at all, and there would be no Fair Use or Right of First Sale!
I think what happened is they got 1 terabyte in and realized that the data started to repeat over and over...and over.
Lodragan Draoidh
The more you explain it, the more I don't understand it. - Mark Twain
Hey Google, use some of that vast money stockpile to undo the damage that companies have been doing to Copyright laws. Get some reductions in copyright duration to something more reasonable (15 years!) and then you'll be able to release the vast majority of your scanned books.
----------------------------------- My Other Sig Is Hilarious -----------------------------------
So is it possible Google is shooting to secure a place in history?
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
I wonder how hard it would be to crowdsource the same work.
Project Gutenberg has been at it since the 70's. But they currently only have 54.000 books, not a whole lot compared to Google's 25 million books.
libgen is still alive!
The Net interprets censorship as damage and routes around it -- John Gilmore
Down with the creators seeking to control their creations! How dare they?..
In Soviet Washington the swamp drains you.
Gutenberg, and the problem is time and hardware. Really, it's not as easy as sticking a DVD into a machine and making millions of copies. That's why piracy doesn't add anything to the collective consciousness. Someone else had to do all the hard work first.
Getting to see the books is not what Google Books is for. It was never what Google Books was for. You've bought into the fallacy promoted by the Authors Guild, who came in after the fact and tried to wangle their lawsuit against Google Books into an orphaned-works library without actually having any authority to do so. Google shrugged and went along with it, because why not, but it was never what they had intended.
From the very beginning, Google Books (nee Google Print) was intended to populate a search database so people could search within paper books as easily as they could search within the web. If the book was still in copyright, then finding that book to read was the searcher's problem. (Interlibrary loan works a treat.) Google was very straightforward about that in early blog posts and publicity about the project. Don't blame them for falling short of the Authors Guild's goals. Those goals were never theirs to begin with. See the link in the first paragraph for more information.
Editor Emeritus and Senior Writer, TeleRead.org
The second part of this post states:
"But Google seems to be thinking ways to make use of IT"
"Last month, it added a new feature to its search function"
How do these statements relate to the library of books that we cannot see, that is the subject of the first part ?
Erm, it's a LOT of effort to scan a book on a regular scanner. 99% of people have flatbed scanners, and if you are the 1% who have self feeding scanners you would have to separate all the pages first (destroying the book in the process). That being said people are doing it, there is a place on IRC (internet relay chat) where you can pretty much find any work of fiction produced (google it, I would rather not have the details indexed by google and associated with me). What I have trouble getting my grubby paws on are non fiction books. Still haven't found a central place for those, end up having to fire up a VM and dig through 20 million dodgy websites before I can find what I am looking for. Oh yeah, be warned - a LOT of the books have OCR errors, some have been proofread and corrected, but not a lot. Some have loaded the text into word or some other spell checker and clicked "Autofix spelling and grammer", and we all know how well that works.
There are three kinds of falsehood: the first is a 'fib,' the second is a downright lie, and the third is statistics.
Instead of having to go through Wikipedia, we would be able to access the reliable sources ourselves, now we have to deal with what ever article the reverting admins want us to have.
What is stopping Google from operating as a library? For each city have a pool of ebooks that users can borrow for a week. They could have books that you can borrow for 1 min for search purposes. It should be cheaper that publicly funded libraries.
Google Books helped me find books from 1838 that mentioned ancestors of mine by name and what they were doing. This is priceless to me.
The problem is that they want to *give* it to the world, instead of paying writers for their work. The US court has agreed for some weird reason, but foreign courts have not, and rightly so. Writers want to get paid for their work, just like you! They just happen to get paid in royalties, not hourly wages. Google wanted to be the only one to profit (from ads I might add).
So yes: the library can be available to all, but once Google is willing to pay the writers.
> What I have trouble getting my grubby paws on are non fiction books
Try your local public library. Mine has tons of non-fiction through Overdrive ebooks and audiobooks. They expire after 3 weeks thanks to DRM, but there are ways to make it an "indefinite loan" if you get my drift.
I also joined the "Friends" program at a nearby public university library. $80 a year and all the ebooks I can read - catch is, I have to connect to the campus network for access.
Copyright length is the main issue, not a differing business model. There's a lot of content out there that the author's are dead and income are the least of their worries.
So, Mr. Anonymous Coward, what you're basically saying is that since dead authors don't need to be paid, you think it's ok if living ones don't get paid either.
Yeah, great.
http://www.geoffreylandis.com
Amusing quote, and what's even more ironic, in the context of this discussion, is that you didn't bother to credit the author:
J. K. Rowling, Harry Potter and the Deathly Hallows (Chapter 25).
So, your worldview is apparently that not only should authors not be paid, they shouldn't even be credited.
http://www.geoffreylandis.com
You are proposing that copyright caries a compulsory right to grant licenses in perpetuity. While an author should not be able to obtain a copyright on a work which has never been published, it would be exceedingly unfair to insist they cannot use scarcity to influence the value of the licenses they do grant.
You have also carefully and disingenuously conflated Google's inability to force licensing of copyrighted works with Google's choice not to do something with those works which are genuinely in the public domain. It was Google who decided that if they could not get the licensing terms they wanted for copyrighted books then they weren't going to do anything with the PD stuff either.
In a world where every book, every music recording, every movie, tv show, all media is readily available for free somewhere on the internet, there's not enough hours in the day to read/listen to/watch all of it. And that's just what's in English. It's a noble cause but in the final analysis over 90% of it is not worth the time or effort. The totality of human knowledge is a real mess.
Throughout much of history artists and musicians got full pay up-front for their work instead of this BS about getting paid only after some middle man feels they've siphoned away enough of the work's value.
If you need an entertaining lesson on the history of this, at least go watch "Amadeus" and learn a bit about how money-grubbing Mozart was.
For thousands of years, authors, artists and musicians didn't expect to get paid for their work, and they did it anyway.
And for thousands of years peasants starved to death in years when the local harvest was poor, and died of disease when a plague passed through. And, more to the point, had their stuff taken away by anybody who passed by who was equipped with swords, spears, arrows, and armor.
Your point is that ancient societies were somehow better than ours? That societies for thousands of years condoned slavery, so we should, too?
http://www.geoffreylandis.com
is that many of the books have unknown copyright status. Meaning no one agrees on who holds the copyrights. Google had an idea to start a blind trust. So that whenever a court makes a final ruling that person, corp, entity, family, or random winning kangaroo would be able to then collect any money made off of these Out of print books. So let that sink in. These books are never being reprinted. Not because they are dangerous, not because people are argueing over who owns the rights. But because everyone else in this scheme is worried that google will have a monopoly on OoP books. So then their legal copyright conundrums might wind up in a pocket book they don't control.
TL/DR;
Fark these arseholes who scrape the corpses of dead writers. I for one vote that Google finishes their project then makes it available to the world at large. A few scholars will rejoice, most folks won't care & some arsewipes will file legal motions that in the end pay some lawyers to do nothing.
Still TL/DR; I am not a big fan of Google, but screw the butt munches that have gagged this project. & screw Google for stopping the project for these butt munches. ////Spleen Vented!
As an an author, Shirley you can do better than introducing a straw man into the argument? The poster did not make comment about living authors, so it ain't reasonable to criticise him for your unsupported inference.
Muslims preserved the knowledge of Greek and Roman cultures while Christians were busy burning it. In fact by the time of the Muslims conquering Egypt the Christians had held sway for centuries in Egypt and the library of Alexandria was long burnt.
**Life is too short to be serious**
I've been borrowing ebooks from libraries for years using overdrive
https://www.ebay.com/sch/i.htm...
Meanwhile, archive.org is scanning a thousand new books every day and nobody's writing news stories about it...
Oh so google wants to steal everyone's data again and make money for themselves on it again? Poor Google.
I repeat something I said back then: In my opinion, Google would be providing a much greater service to mankind if they scanned the enormous amount of books that are now completely public domain, and not just books that were published for the general market within the last 200 years. There must be hundreds of thousands, maybe even a few million, books, scrolls and tablets sitting tucked away in private libraries, monasteries, temples. the Vatican archives, museums, the British Admiralty archives and so on. As an example, I suggested sending one or two technicians to some remote monastery with a solar powered, multi-spectral scanner (multi-spectral in hopes of finding previously unidentified palimpsests) and paying the resident monks some small fee per page that they scan in. (having the monks do the scan would ensure that the effective content owners get final say in what gets brought into the public eye).
From there, Google could put the raw visible spectrum images out there for free access, and charge fees for additional spectra, OCR processed and searchable text and auto-translated data. Done right, even the field technicians could be essentially free for Google, since there are numerous graduate students and researchers who would love to get their hands on this stuff.
I need a wheelchair van for my son. Help me get the word out. https://www.gofundme.com/wheelchair-van-for-jj
Completely irrelevant - copyright law doesn't care if the book is out of print or not.
Another irrelevancy because it sidesteps the half of the books that are still in copyright - and which Google planned to distribute anyway.
And again you leave out the relevant point... Normally, it's the responsibility of the person wishing to reprint material to seek permission to do so. Google wished to turn this idea on it's head, to be free to distribute the material and only on the hook to pay for it when the owners of the material found out that it was being distributed.
Not to mention, they Author's Guild didn't have standing to make a deal with Google in the first place.
No, it was win-loss-loss. Google won the right to turn the law on it's head and profit thereby. The public lost because the deal practically ensured Google a monopoly on the material. (The agreement only covered Google, everyone else would still be bound by the law.) The authors lost because now the onus was on them to seek recompense from a third party (the Author's Guild) rather than the infringing party (Google).
if you do not permit the work to have any value, people will stop doing it.
Millions of authors write tens of millions of books every year without any expectation of compensation and often give their works away. Some of them even go on to become bestsellers while being given away that are turned into blockbuster movies starring Matt Damon.
Work might, just maybe, have value outside of making a (brett) buck. I suspect most great artworks were not done for the love of money.
Most people don't know that there are a LOT of dark archives out there. They're used to back up journals and rare books to ensure that they should something happen (publishers go out of business, fires, etc.)
I saw a talk once about a dark archive for music research. (I think it was at Research Data Access and Preservation, but could've been ASIS&T). They allowed people to submit jobs to run against it, but it was important that the results couldn't be used to recreate the music (possibly in conjunction with other results), as that could violate copyright.
It would be nice if Google would do something similar. It could be used to find when words and phrases were first used (although maybe not in context, but could give a reference), etc
Build it, and they will come^Hplain.
The books are out there, legal or not. It's time for someone to show the way and innovate where big corporations and their publishing cabals fail to, as Napster once did and Sci-Hub continues to.
Google isn't innovating here. Overdrive has been around for quite a while and provides a very nice search interface showing which ebooks are available at your selected libraries. Also considerable integration with local libraries appears to be happening.
There's plenty of projects that try to bring the goal of libraries through the digital divide and get the information exactly where it wants to go, rights respected through the institutions or rights be damned by those readers who truly care about the information and the art enough to examine it critically. I've read it already, have my copy. If authors are more concerned with their bottom line, it isn't a project readers should support financially. As with anything else, fuck any publishers/journal distributors who use "their" exclusive rights to the information over the authors wishes to drown out what they have to say against the will of authors. If the company google invested so much time and energy in digitizing, but only so long as they could play publisher in this regard, fuck them too. Its piracy when you gotta unload the booty. Any employees who find value in their work/the content have been doing their jobs and giving this stuff away to who needs it in the mean time, I'm sure.
Take two sheets of glass, tape them into a V shape with cardboard to hold them up, place the book open on the V, take a picture from below with your cell phone camera. Repeat for each pair of pages.
OTOH a dataset with the 'right' 54k books could well have vastly more usefulness than the dataset with the 'wrong' 25M books. Using numbers for the comparison of importance is disingenuous here.
fair use is important. Copyright can and is and has often been abused. A world with the strongest (*cough* Ayn Rand style *cough*) copyright protections and no fair-use balance would be a worse world. In fact, the existing fair-use-heavily-persecuted scenario we have is far from optimal.
My $0.02- kids these days deserve to be able to watch the original star wars trilogy, perhaps at ntsc resolution, as well as every episode of star trek, without being subjected to weaponized psychology in the form of targeted advertisements. This kind of cultural availability does not seem like something fair to see kids deprived of because their parents, due to misfortune, lazyness, or misplaced priorities, cannot afford to give them paid access to.
The world would be a better place if teens today had unfettered access to a reasonable library of digitized culture of recent generations. If instead, they get access to a subset that is architected with the sole purpose of getting them to make economic choices they would otherwise not make (advertisements), or access to a subset that involves decreasing their personal cybersecurity (piratebay is the media equivalent to unregulated back-room abortions).
And in no way does the current typical library dvd collection fall under the classification of 'reasonable library of digitized culture of recent generations'. The supply/demand inventory, let alone the scratched dvd factor, are pure bullshit in a world where such limitations are not technical, but social constructs.
$0.02 , not exactly what's going to happen of course
If you really have the time to type it up.
Imaginary Property is theft. Culture belongs to the People - it is not the personal property of degenerate capitalists.
It must be very obvious to everyone now, that ownership of ideas makes the whole world needlessly stupider, and should be ended now.
Until these badlaws are removed we must honor those heros like Alexandra Elbakyan who are expropriating scientific knowledge from the rich horders and and freeing it for the enlightenment of the whole people.
The scanning project was never for humans. It was to feed their AI. All this publicity was a great way for Google to be doing something altruistic.
Here's my proposal how to fix the major flaws of copyright while ensuring that authors get paid: Replace copyright with payright. Here's what it means: The author gets a right to a clearly defined slice of revenue (e.g. 20% by default) from every commercial use of their work. If you register your work in a central registry, you get to set the percentage yourself and commercial users will have contact you. If you don't register it, statutory default applies and commercial users will just need to hold your slice of revenue in escrow until you contact them.
So, you're saying that I can put up a site that makes the work of all the bestselling authors in America available for free, and the bestselling authors will get nothing. Because in your view they don't own their work, and aren't allowed to decide what their work is worth, or even if it is worth anything at all.
Why do you think this is good?
http://www.geoffreylandis.com
Internet "entrepreneur" shocked that people believe their work has value, and shouldn't be stolen:
https://www.diyphotography.net/internet-entrepreneur-shocked-copyright-owner-sued-stealing-work/
Proof by first derivative. Works every time. No dilemma ever.
Er, um, hold the presses.
No Google books: authors control most revenue, no soup for Google.
Partial Google books: one author rats out the other (economically) by signing up. Author who signs up wins, author who holds out loses. Plenty of canned alphabet soup for Google.
Full Google books: Google creams almost the whole of the economic surplus due to better consumption matching, authors left in roughly the same place (though a smaller piece of the whole pie). Cream of truffle soup for Google.
Society usually ends up deciding these matter in the large by a process of fait accompli.
Sun on Privacy: 'Get Over It' — January 1999
It's so routine that McNealy completely forgot himself in his rush to get their before the fait accompli paint was dry.
The judge decides that the authors have already lost the power game, crosses that cell off the game theory matrix (out of superficial prudence), and then—Lo and Behold—corporate America wins again.
We shoot ourselves in the foot by claiming victory for network effects that aren't network effects.
This is a power heuristic, make no mistake about it. With enough power, no network required (though of course, actually having a network does tend to boost power, as well).
I've bought books because of Google Books service that let me look inside a book and see that it's going to be useful for me. Shutting down GB means closing this channel for you as an author. A stupid move, I would say.
I agree. But it should be your choice to decide what and how much of your work to give away for free, not somebody else's.
Your work, your decision.
http://www.geoffreylandis.com