Slashdot Mirror


Yahoo Competes with Google in Book Scanning

UltimaGuy writes "A consortium backed by Yahoo has launched an ambitious effort to digitize classic books and technical papers and make them freely available on the Web. The company is partnering with the newly formed Open Content Alliance, which aims to offer PDF documents of books to the public at no charge. Consumers will be able to search the contents of the Open Content Alliance's database and download the entire content of any work, such as a scanned copy of a book."

5 of 193 comments (clear)

  1. What do these guys know... by dada21 · · Score: 5, Interesting

    ...that we don't?

    It seems to me that they're throwing money at an unnecessary application. Does Yahoo know something that we don't? I'd venture that they're starting with PD books to shake the bugs out of their platform so the app works well in round 2.

    Round 2 (current commercial books) won't occur without a massive copyright law change or support of the Author's Guild.

    Hmm.

  2. Whew! by op12 · · Score: 4, Interesting

    I almost panicked after seeing we had gone so long without a Google-related article.

    The opt-in rather than opt-out strategy is really what Google probably should have done, but it'll be interesting to see who comes out as a winner, Yahoo or Google, in all of this.

  3. Re:Project Gutenberg by harmonica · · Score: 4, Interesting

    More books are a good thing. Having a scanned PDF version includes graphics as well, which are missing from Gutenberg ebooks. So I see this as a very positive development.

  4. University of Calif: Yahoo OK, Guttenburg banned by dananderson · · Score: 5, Interesting
    I find it funny (in an ironic way only) that the University of California is allowing its public domain books to be scanned by Yahoo. At the same time, UC libraries prohibit scanning for Project Gutenberg or other true "open" content projects unless they receive $$$$ in royalities.

    I hate to see a University pander to commercial interests, while at the same time, welcome commercial interests such as Yahoo. Money talks, and I'm sure UC is being paid a lot, but libraries are supposed to be public resources too, not exclusive profit-centers :-(.

  5. Bookripper on its way? by serutan · · Score: 4, Interesting

    Google maintains its scanning represents "fair use" allowed under the law because it only allows Web surfers to view excerpts from copyrighted books.


    Soon after Google Mail was introduced, somebody created a SourceForge project that lets you use Google Mail as a database. How long until somebody releases a "Bookripper" app that assembles a whole book from search extracts? As I understand it Google displays two pages at a time (or wait, that's Amazon, but I bet they're similar). All you would need to know is a quote from a book's first page as a seed, and you should be able to grab the whole book by doing a series of searches using text from the second page returned by each search. The trick would be to knit the pieces together and eliminate the overlapping text. Seems almost trivial. Another possibility would be to search for random words and look for overlaps between the results, assembling them like a linear jigsaw puzzle until there are no gaps.