Human-Powered Internet Archive Book Project
Carl Bialik from the WSJ writes "A group led by the Internet Archive is planning a massive, ambitious effort to scan millions of old books and make them available for Web searching early next year. Behind that effort are about a dozen scanners, employees making about $10 an hour to manually scan volumes -- some more than a century old -- one page at a time, on special contraptions. The Wall Street Journal Online visits a University of Toronto library to watch one of the scanners in action: 25-year-old Liz Ridolfo."
Last time I moved, It took many VERY HEAVY boxes to Move all my books. Maybe I'll scan them all..
:(
All though anything useful has to be illegal...
0xB315AA8D852DCD3F3DCA578FD2E0BF88
Project Gutenberg frequently makes use of the page scans for source material. What PG does is to run the images through OCR, proofread and post-process it. It's more useful than a stack of page images, but considerably more work.
If you look at the current books on Distributed Proofreaders, you'll see that some of them credit the Million Books Project for the page scans.
Laws do not persuade just because they threaten. --Seneca
The good:
Old books prior to copyright laws are being scanned.
The bad:
Pay is roughly $10/hr. Now, I happen to be concerned that someone being paid so little should be handling rare books. Not to mention the college graduate getting paid so little.
The ugly:
The digital camera contraption costs $30,000!! There's a few scanner manufacturers left in the world and none of them have exploited this niche. Shame on them.
http://www.maxineudall.com/2010/02/should-economists-be-sued-for-malpractice.html
From the Wikipedia article on the Open Content Alliance:
The Open Content Alliance is a consortium of non-profit and for-profit groups which is dedicated to building a free archive of digital text and multimedia. It was conceived in 2005 by Yahoo and the Internet Archive. It was conceived in response to Google Print's closed nature, and aims to keep public domain works in the public domain on-line. These results will then be used in the search results of participating search engines. You can see a sample of the open content at openlibrary.org
A large difference between the OCA's approach and that of Google Print is that the OCA intends to ask a copyright holder before digitising a work that is still under copyright, while Google Print will digitise any book unless explicitly told not to do so by November 1, 2005.
So, Google Print will almost certainly be better when searching for copyrighted material. For public domain works, we'll have to wait and see.
IMHO, it seems like a little cooperation here would make a lot of sense for both parties - they could save money trading digital copies 1-for-1 while remaining in (healthy) competition.
The scans won't be added to Project Gutenberg, but it's very likely that the scans will be used by Project Gutenberg's Distributed Proofreading project, which I'm involved in. We're already 'harvesting' images from quite a few sites, as well as all the images our volunteers scan. Now that there are several large and relatively well funded scanning operations getting off the ground, I imagine that over time an ever increasing proportion of the works that go through DP will be based from harvested images.
I maintain several lists that show the DP harvesting status of several image collections, including The Internet Archive's Canadian Libraries collection, Google Print, and Early Canadiana Online. As you can see, we will not be running short of material to work on for a very long time, even without any of these recently announced initiatives. That said, it's always great to see more material be made freely available, rather than locked up behind expensive subscription services like Jstor and EEBO.
-- Help Digitise the Public Domain at DP.
> bullshit
I too want to be modded Insightful!
being smart is exausting