How Google Book Search Got Lost (backchannel.com)
Google Books was the company's first moonshot. But 15 years later, the project is stuck in low-Earth orbit, argues an article on Backchannel. From the article: When Google Books started almost 15 years ago, it also seemed impossibly ambitious: An upstart tech company that had just tamed and organized the vast informational jungle of the web would now extend the reach of its search box into the offline world. By scanning millions of printed books from the libraries with which it partnered, it would import the entire body of pre-internet writing into its database. [...] Two things happened to Google Books on the way from moonshot vision to mundane reality. Soon after launch, it quickly fell from the idealistic ether into a legal bog, as authors fought Google's right to index copyrighted works and publishers maneuvered to protect their industry from being Napsterized. A decade-long legal battle followed -- one that finally ended last year, when the US Supreme Court turned down an appeal by the Authors Guild and definitively lifted the legal cloud that had so long hovered over Google's book-related ambitions. But in that time, another change had come over Google Books, one that's not all that unusual for institutions and people who get caught up in decade-long legal battles: It lost its drive and ambition. Google stopped updating Books blog in 2012, and folded it into the main Google Search blog. The author reports that Google still has people working on Book Search, and they are adding new books, but the pace is rather slower.
Are they a shareholder-answerable business?
Does it make them money?
No? What did you expect?
This isn't surprising. It never took off like some other things, it therefore turns into an expense with little return (Do they charge a percentage of book sales found through their searches? Can they enforce that and stop you just taking the ISBN and buying from Amazon once you've found it?), so it will die when people lose personal interest in it.
The only things I can see staying any significant length of time are Google search and Google Apps. Everything else is just a boredom / filler project that can disappear like so many others, Google or not.
When I worked at the Google IT help desk in 2008, the building next door had all the book scanners. It was supposedly a miserable place to work at, low pay for flipping book pages, a relentless daily quota and a high turnover rate. Makes help desk support look like paradise.
"I haven't failed. I've just found 10,000 ways that won't work." - Thomas Edison, on the electric light bulb.
2) Automate page flipping for books that couldn't be spine-cut or sheet fed.
My understanding of the early book scanners was a chair that the operator sat back in to look at the overhead monitor. One button took a picture of the page, the other button flipped the page. If the book went out of alignment, the operator had to readjust it. The technology may have changed since then, as the human component was a big problem for the program back then.
http://hackaday.com/2012/11/16/google-books-team-open-sources-their-book-scanner/
I have a friend who is weird even by my social groups standards.
One of his 'interests' is preserving old DEC documentation. They just use a binding guillotine and a high speed sheet feeder scanner. Along with countless tricks to restore tape for one last read pass etc.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
90% of books that need scanning should be cut up.
Just not the books that Google borrowed from a library. Librarians are the people who can tell the difference, but I'm sure Google could come up with something to do 99% of the sorting (mostly, already scanned...)
What they really need are portable scanning solutions. LIbrarians are just the kind of people that would love to help, so long as their books don't go too far out of their control. Even absent that, most libraries produce a steady stream of 'discards' that should be checked against the 'books database' first.
Anybody should be able to take a picture of a title page and have Google tell them if they want the book for scanning. 'Book people' would do it.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
Google had to to it in the least damaging way possible. It was a necessary condition if they wanted libraries to cooperate.
Non-library books were processed destructively, by cutting off the spine.
Project Gutenberg specializes in notable books that are more than three generations old.
No. Gutenberg makes text versions of fairly common books. You might think that they're uncommon, but as an academic who specializes in European books of the 15th-17th centuries, I can tell you that Google has found things that are absolutely miraculous. I've seen Google scans of books that exist in only four copies in libraries across Europe. I've seen whole sub-genres of literature that were thought lost suddenly appear on the internet. If you work in early modern literature, especially older forms of German and French or newer forms of Latin, Google Books and its associated HathiTrust project are a revolution, and the Gutenberg Project isn't even a blip on the radar.
Project Gutenberg scans books which are out of copyright, and only famous ones.
Google Books scans contemporary works. That in itself made it worth doing. Basically if the Library of Congress burned down, there would be millions if not billions of contemporary books and magazines which existed only on the authors' computers, and in printed form on collectors shelves. There would be no central database of these works, much less a searchable one. Regardless of what you think of Google Books or how boring it is to work there (I'm having similar boredom problems scanning dozens of my family's photo albums), it's a project well worth doing.
Google Books always seemed like a great idea, but the idea of the search giant owning all of the data always made me incredibly uncomfortable. This data should be in the public domain. Authors should feel *privileged* to submit their works for inclusion in the database, not fighting it. It seems that, at least recently, Google Books has served primarily as a means to drive book *sales*. That's not an admirable goal. It's time for Google Books to be converted to a community-driven effort, like Wikipedia. Release all the data under a Free database license that ensures the data can not be used commercially and allow the community to help with the effort. This would be an incredible achievement for humanity in general. Oh well, one can dream...