Slashdot Mirror


Human-Powered Internet Archive Book Project

Carl Bialik from the WSJ writes "A group led by the Internet Archive is planning a massive, ambitious effort to scan millions of old books and make them available for Web searching early next year. Behind that effort are about a dozen scanners, employees making about $10 an hour to manually scan volumes -- some more than a century old -- one page at a time, on special contraptions. The Wall Street Journal Online visits a University of Toronto library to watch one of the scanners in action: 25-year-old Liz Ridolfo."

4 of 113 comments (clear)

  1. Re:Diffrent? by way2trivial · · Score: 3, Insightful

    Stories over 75 years old don't have the same copyright protections..

    anyone can do 'a christmas carol' because it's copyright has expired..

    using however, someones PRECISE arangement of the text is not permissible however- that has it's own copyright...
    so if I buy a current day copy from amazon, I cant scan it in... but if I buy a copy that's last edition/print was more than 75 years ago, it is fair game.

    --
    every day http://en.wikipedia.org/wiki/Special:Random
  2. Good Bad Ugly by mpapet · · Score: 4, Insightful

    The good:
    Old books prior to copyright laws are being scanned.

    The bad:
    Pay is roughly $10/hr. Now, I happen to be concerned that someone being paid so little should be handling rare books. Not to mention the college graduate getting paid so little.

    The ugly:
    The digital camera contraption costs $30,000!! There's a few scanner manufacturers left in the world and none of them have exploited this niche. Shame on them.

    --
    http://www.maxineudall.com/2010/02/should-economists-be-sued-for-malpractice.html
  3. Re:It's lighter! by Hosiah · · Score: 3, Insightful

    Ahem: years ago, I made up the "moving time" rule that books *must* be packed in the smallest available boxes. Anything of dimensions around 2x1x1 feet. After straining on the book boxes previously, it occurred to me that it's human nature to (a) pack books first, reasoning that you're not going to be doing much reading in the next couple days anyway... and (b) upon first beginning to pack, grab the biggest box to start with.

  4. Re:Why not join the Gutenberg Project by flimnap · · Score: 3, Insightful
    So, as the summary states:
    make them available for Web searching
    does not mean that there will be a complete text index available (that is full text search,) but instead you can only search for specific works?

    That probably means that the search index will be uncorrected OCR, which leads to some inaccurate searches. The problem with using raw OCR is scannos, words that may be recognised as a different word that "looks" the same, for example modem and modern, or an i might be recognised as a slash.

    I do that every once in a while on their German counterpart: GaGa

    Your time might be better spent at the real Distributed Proofreaders, or DP-Europe, since Projekt Gutenberg-DE is not an offical branch of PG, and actually copyrights its output (unlike the real PG).