Google To Resume Scanning Books

← Back to Stories (view on slashdot.org)

Google To Resume Scanning Books

Posted by Zonk on Tuesday November 1, 2005 @08:09AM from the love-of-the-digital-printed-word dept.

SenseOfHumor writes "The Wall Street Journal is reporting that Google will resume scanning copyrighted books from Stanford and Univ of Michigan libraries. Let the battle resume!" From the article: "It isn't known just what percentage of library holdings fall into the category of being in copyright but out of print. About 18% of the books held by the libraries working with Google were printed prior to 1923 and are therefore in the public domain, according to an analysis by the Online Computer Library Center, a Dublin, Ohio, nonprofit library cooperative. An unknown percentage of the rest still are protected by copyright, depending on whether it was renewed. Google's resumption of its scanning of copyrighted works comes amid heated debate in the library community over participation in the program."

4 of 257 comments (clear)

Min score:

Reason:

Sort:

Re:How exactly are they doing this? by GoodOmens · 2005-11-01 08:16 · Score: 3, Informative

Here is one solution:
http://www.rod-neep.co.uk/books/production/scan/ scanning.htm
If you notice that it requires someone to turn the pages. While tedious it would protect some of the much older books google will be scanning. If there is a automated soltion I do not know ....
Index! by Karma_fucker_sucker · 2005-11-01 08:19 · Score: 3, Informative

There are actually books that do not have an index. And boy is it a pain in the ass! I can understand why. From what I've heard from authors, indexing a book is the most boring and tedious thing to do.

--
Evil people don't think they're evil. - George Lucas, Making of Ep III
Re:Google[black]mail by 99BottlesOfBeerInMyF · 2005-11-01 08:51 · Score: 3, Informative

hough I personally believe what Google are doing is not ethically/morally wrong, they are most probably 'breaking' our unjust (injust?) copyright laws.

My research into the subject suggests the opposite. Although the laws are somewhat vague, Google appears to meet all four criteria for fair use and every single district has filed supporting briefs supporting a case with significant precedent, except the district in which the case against google has been filed. I suspect this is because the lawyers involved know they will be unlikely to prevail in the end, but are hoping to win the initial case and force the issue to the supreme court, possibly with an injunction in place. This is because they hope to delay and possible temporarily stop Google's actions while they try to get laws pushed through the courts to make what Google is doing illegal.

The only reason they are 'getting away' with it is because they are the most powerful domain on the net. No-one dares mess with Google.

I think you are overstating Google's influence by a lot. First, the people suing Google don't care if they are findable by Google as they are not a consumer facing body. Second, they are a bunch of middle men, what do they care about publicity? Will you stop buying books from those publishers and hurt the authors (who mostly support Google's actions)? No I think you have this backwards. Google is legally going to prevail, and these publishers are just delaying while trying to pass some laws to avoid the future possibility of being cut out of the deal. They fear for their position as middle men and are fighting hard to stop anything that might be progress.
"Scanning" is done with a camera and cradle by dananderson · 2005-11-01 09:26 · Score: 4, Informative

Scanning of old books isn't done with a consumer-grade (or even business grade) flat-bed scanner. That's too expensive and too damaging to old books.
"Scanning" of old books is typically done with a camera photographing a book lying in a cradle (to not split the binding). One image is taken of each page or every two pages (the latter is faster, but has focus problems).
Once photographed, OCR software grinds away. There are errors. Some projects proof-read the errors (this is very expensive), but with Google's volume they cannot. Even when not proof-read, however, the OCR'ed text has high value in search engines.
For examples of the resulting product, see U of Michigan's Making of America or the Library of Congress American Memory.
New, in-print books can be scanned destructively. That is, saw off the binding and feed into a sheet feed scanner. This works with publishers who have extra copies they can expend.