Google To Resume Scanning Books
SenseOfHumor writes "The Wall Street Journal is reporting that Google will resume scanning copyrighted books from Stanford and Univ of Michigan libraries. Let the battle resume!" From the article: "It isn't known just what percentage of library holdings fall into the category of being in copyright but out of print. About 18% of the books held by the libraries working with Google were printed prior to 1923 and are therefore in the public domain, according to an analysis by the Online Computer Library Center, a Dublin, Ohio, nonprofit library cooperative. An unknown percentage of the rest still are protected by copyright, depending on whether it was renewed. Google's resumption of its scanning of copyrighted works comes amid heated debate in the library community over participation in the program."
"Scanning" of old books is typically done with a camera photographing a book lying in a cradle (to not split the binding). One image is taken of each page or every two pages (the latter is faster, but has focus problems).
Once photographed, OCR software grinds away. There are errors. Some projects proof-read the errors (this is very expensive), but with Google's volume they cannot. Even when not proof-read, however, the OCR'ed text has high value in search engines.
For examples of the resulting product, see U of Michigan's Making of America or the Library of Congress American Memory.
New, in-print books can be scanned destructively. That is, saw off the binding and feed into a sheet feed scanner. This works with publishers who have extra copies they can expend.