Slashdot Mirror


Google To Resume Scanning Books

SenseOfHumor writes "The Wall Street Journal is reporting that Google will resume scanning copyrighted books from Stanford and Univ of Michigan libraries. Let the battle resume!" From the article: "It isn't known just what percentage of library holdings fall into the category of being in copyright but out of print. About 18% of the books held by the libraries working with Google were printed prior to 1923 and are therefore in the public domain, according to an analysis by the Online Computer Library Center, a Dublin, Ohio, nonprofit library cooperative. An unknown percentage of the rest still are protected by copyright, depending on whether it was renewed. Google's resumption of its scanning of copyrighted works comes amid heated debate in the library community over participation in the program."

17 of 257 comments (clear)

  1. Re:How exactly are they doing this? by GoodOmens · · Score: 3, Informative
    Here is one solution:

    http://www.rod-neep.co.uk/books/production/scan/ scanning.htm

    If you notice that it requires someone to turn the pages. While tedious it would protect some of the much older books google will be scanning. If there is a automated soltion I do not know ....

  2. Index! by Karma_fucker_sucker · · Score: 3, Informative

    There are actually books that do not have an index. And boy is it a pain in the ass! I can understand why. From what I've heard from authors, indexing a book is the most boring and tedious thing to do.

    --
    Evil people don't think they're evil. - George Lucas, Making of Ep III
  3. Re:Out of print - fair game by dascandy · · Score: 2, Informative

    Nope, but I DO have a full-copy of a book that is out of print for 10 years, only 25 years old in IT. It's one of the few books in IT where age doesn't matter, but which doesn't sell well but is still very useful. So, you can only buy second-hand books at a greatly increased rate (original book between 30 and 50 dollars, second hand cheapest is 150 dollars).

    F*** that, I'll copy.

  4. Re:How exactly are they doing this? by GoodOmens · · Score: 2, Informative
    I'm sorry here is a better link to the acctual manufacture of the scanner.

    http://www.imageware.de/

  5. Re:How exactly are they doing this? by Anonymous Coward · · Score: 1, Informative

    No. They do it manually. Automated scanning was found to take more time than manual (weird but true). Besides, there are many old books that need to be handled with care. Hence.

  6. Re:Good. by Kelson · · Score: 2, Informative

    The big advantage of paper is that all you need to view it are your eyes. (Or your fingers, if it's written in Braille.)

    On the other hand, anyone who has visited the US National Archives can see the efforts needed to preserve the original US Constitution and Declaration of Independence. We have zillions of copies, but the originals are kept in sealed vaults with limited lighting. Exposure to air damages the paper. Exposure to light fades the ink. They've struck a careful balance to make sure that people can still see the physical artefact without destroying it.

    Both paper and digital need to be copied to new physical media from time to time. It's a lot easier to make that digital-to-digital copy!

  7. Re:Google[black]mail by 99BottlesOfBeerInMyF · · Score: 3, Informative

    hough I personally believe what Google are doing is not ethically/morally wrong, they are most probably 'breaking' our unjust (injust?) copyright laws.

    My research into the subject suggests the opposite. Although the laws are somewhat vague, Google appears to meet all four criteria for fair use and every single district has filed supporting briefs supporting a case with significant precedent, except the district in which the case against google has been filed. I suspect this is because the lawyers involved know they will be unlikely to prevail in the end, but are hoping to win the initial case and force the issue to the supreme court, possibly with an injunction in place. This is because they hope to delay and possible temporarily stop Google's actions while they try to get laws pushed through the courts to make what Google is doing illegal.

    The only reason they are 'getting away' with it is because they are the most powerful domain on the net. No-one dares mess with Google.

    I think you are overstating Google's influence by a lot. First, the people suing Google don't care if they are findable by Google as they are not a consumer facing body. Second, they are a bunch of middle men, what do they care about publicity? Will you stop buying books from those publishers and hurt the authors (who mostly support Google's actions)? No I think you have this backwards. Google is legally going to prevail, and these publishers are just delaying while trying to pass some laws to avoid the future possibility of being cut out of the deal. They fear for their position as middle men and are fighting hard to stop anything that might be progress.

  8. Re:How exactly are they doing this? by Anonymous Coward · · Score: 1, Informative

    Kirtas Technologies http://kirtas-tech.com/ makes a bound book scanner that turns the pages, at a rate of 1200 pages per hour, and claims to be gentler than a human. I've seen it in action -- it's a pretty neat mechanism.

  9. Re:Out of print - fair game by neoform · · Score: 2, Informative

    Ever heard of the "Disney Vault" as they call it?

    They remove their movies from the market for 10 years in order to create artificial demand.

    --
    MABASPLOOM!
  10. Re:Good. by oliverthered · · Score: 2, Informative

    You know nothing about paper do you?

    Most modern books will be printed on acidic wood pulp paper (as apposed to acid free hemp, cloth (cotton based paper) or more expensive acid free wood pulp). Over a period of time the acid will erode the paper until between 20-60 years later all your left with is crumbs.

    Modern print is crippled by paper tech that means you'll never have a copy survive until it's out of copyright, just like modern digital is crippled by DRM and can never be released into the public domain by someone who owns a copy.

    --
    thank God the internet isn't a human right.
  11. Re:How exactly are they doing this? by Anonymous Coward · · Score: 1, Informative

    With newer books they cut the spine and feed the entire thing into a scanner. With older books they have human drones turning pages. A foot pedal triggers the scan. Multiple scans are made from each book, and software mixes-and-matches the best scans from each one. Every now and then they audit the output of OCR manually to compare.

    Their software also poses yes/no questions to humans: Is this a thumb blocking some text? Is this crooked? Is this a 'D' or a 'B'? Apparently it generates thousands of these questions per hour or some such silly number.

    Also, as far as I know they never stopped scanning books in the first place. That was some kind of PR smokescreen.

  12. "Scanning" is done with a camera and cradle by dananderson · · Score: 4, Informative
    Scanning of old books isn't done with a consumer-grade (or even business grade) flat-bed scanner. That's too expensive and too damaging to old books.

    "Scanning" of old books is typically done with a camera photographing a book lying in a cradle (to not split the binding). One image is taken of each page or every two pages (the latter is faster, but has focus problems).

    Once photographed, OCR software grinds away. There are errors. Some projects proof-read the errors (this is very expensive), but with Google's volume they cannot. Even when not proof-read, however, the OCR'ed text has high value in search engines.

    For examples of the resulting product, see U of Michigan's Making of America or the Library of Congress American Memory.

    New, in-print books can be scanned destructively. That is, saw off the binding and feed into a sheet feed scanner. This works with publishers who have extra copies they can expend.

  13. Corrected, live link by dananderson · · Score: 2, Informative
  14. Re:Out of print - fair game by Simonetta · · Score: 2, Informative

    I support this 100%. By bribing politicians to extend the copyright period, the global media corporations have stolen the public domain. For that reason, they have forfeited any claim to any intellectual property that they believe that 'own'.

          Suppose that you have bought a car and are making regular monthly payments on it. After five years you have one more payment to make and the ownership of the car is completely yours. The finance company bribes a politician, who puts a rider into a huge bill that can't be prevented from passing into law, and suddenly all the monthly payment schedules for cars are doubled. You won't own the car for another five years and you must continue to make monthly payments on it in order to prevent it from being repossessed. It's great for them, bad for you.

        The copyright laws work the same way. For a set period, everyone must pay to consume the copyrighted material. After that period is over, the consumption of the material is free. By extending the period that the public must pay in order to consume this material, the public has had their free access stolen from them.
    This theft vastly exceeds any financial loss from people ignoring copyrights. And the copyright period has been extended every time that it has been set to expire. Which is a consistent pattern of criminal behavior. The global media companies, under the RICO act of the USA, are criminal organizations and have no legal right to demand people be placed in prison or be fined for accessing what they claim is copyrighted material.

        The USA courts think nothing about stealing a person's house or car if they are found with a few pennies worth of cannabis on their person. This is 'asset forfeiture' and is a sweet little legalized theft business for the police and their associates.

        When we make copies of materials that a supposedly 'owned' by the global media corporations, we are making a citizen's 'asset forfeiture' of the property acquired by these criminal organizations through illegal means, through the theft of the public domain.

        There is no such thing as 'theft' of intellectual property. There is only the use of violence to either remove or protect the world's culture from the use by the people of the world. Everything else is just a legal smokescreen.

      Nobody has any intellectual property rights to anything anymore in the information age. If you understand this basic concept, you will be able to make intelligent and balanced decisions concerning your own use of cultural resources and 'intellectual property'.

  15. Re:How exactly are they doing this? by Fordiman · · Score: 2, Informative

    Check this out.
    http://kirtas-tech.com/

    --
    110100 1101000 1101000 1100110 0 1101111 1101000 1100011 1
  16. Re:How exactly are they doing this? by Anonymous Coward · · Score: 1, Informative

    Many times these books are sent to India where they are scanned
    and then the electronic versions and the physical versions
    are shipped back. It costs just $4 to scan a book in India.
    see http://in.rediff.com/money/2004/aug/16spec1.htm

  17. Re:People always forget by codyk · · Score: 2, Informative

    At the time, the common usage of those words referred to something else - "useful arts" was essentially what we would consider science and engineering today, while "science" essentially meant human knowledge in any field, including music or visual art.