Yahoo Competes with Google in Book Scanning

← Back to Stories (view on slashdot.org)

Yahoo Competes with Google in Book Scanning

Posted by ScuttleMonkey on Monday October 3, 2005 @09:04AM from the my-literary-collection-is-bigger-than-yours dept.

UltimaGuy writes "A consortium backed by Yahoo has launched an ambitious effort to digitize classic books and technical papers and make them freely available on the Web. The company is partnering with the newly formed Open Content Alliance, which aims to offer PDF documents of books to the public at no charge. Consumers will be able to search the contents of the Open Content Alliance's database and download the entire content of any work, such as a scanned copy of a book."

11 of 193 comments (clear)

Min score:

Reason:

Sort:

no mention of project gutenberg by justforaday · 2005-10-03 09:08 · Score: 3, Insightful

I find it interesting that in all the articles I've looked at today about this that only one has mentioned Project Gutenberg. Naturally, I can't recall which source it was...

--
I'll turn into a supernova and burn up everything. Well I'll turn into a black little hole and you'll turn into string.
The difference between Google and Yahoo's effort by doctor_no · 2005-10-03 09:17 · Score: 4, Insightful

Seems like the crucial difference between Google's efforts and the OCA(Open Content Alliance) is that Google has a "opt-out" policy for copyrighted material, while OCA specifically requires the copyright holder to contact them and essentially allow them to use the material.

The OCA likely won't be sued by the Writer's Guild like Google, however, for searching material Google will likely be better being that Google's search will likely include a massive plethora of copyrighted material, legal or not. Also, it seems that Google themselves will be allowed to use all the material from the OCA into their project as well.
Companies should Get Original by TarrySingh · 2005-10-03 09:18 · Score: 2, Insightful

Why can't companies come up with some cooler ideas? Why ape each other? First Google and hten Yahoo, Sure MS will also want to play.

--
Scott McNealy to Michael: "Suck my Sun!" Michael Dell to Scott : "Lick my Dell!"
Re:RIAA Problems Solved by Anonymous Coward · 2005-10-03 09:30 · Score: 1, Insightful

As much as I hate RIAA methods as anyone else I still have to disagree with that it has to be free for all. The artists somehow has to be paid too. Just as you have to be paid for the work you do. This is why iTunes is such a hit. They found a level where it creates an income without it is ripping us all off (At least not as much as RIAA wants).
Re:Annoying by ScentCone · 2005-10-03 09:44 · Score: 2, Insightful

I am getting tired of the big internet companies straight up copying each other.

Should we turn to you to tell us which provider of each major online activity is the one we should all use? Even if the differences are incremental and subtle, I'm glad when I get to choose between Yahoo's and Google's take on a particular app/service. I'm also glad that Audi and Toyota and GM and Honda all have different ideas on cars... even though someone else built one once already. Come on - not every service offered is going to be wholly unique, and shouldn't be. It's competition - for eyeballs, brand loyalty, etc. Same reason there are a zillion Linux distros, even though may overlap. Everyone's got their own idea of what would make it just a little bit better.

--
Don't disappoint your bird dog. Go to the range.
PDF?! yuck by BillHop · 2005-10-03 09:48 · Score: 2, Insightful

Does anyone else find there is no way to read a PDF with the scroll buttons (mouse wheel, etc.) without the viewer constantly breaking your flow by jumping to the next page?

This goes along with the concept that for an electronic format, I do NOT need a sentence (or even worse, hyphenated word) broken up by two inches of top and bottom margin filled with page numbers, miscellaneous watermarks, repetitive titles, etc.

PS. This being flamebait does not make it false.
"Do no Evil" done right by Chunni+Babu · 2005-10-03 10:03 · Score: 5, Insightful
Now this is a right step towards making book contents searcheable online. I will hate to see one company like Google copying and caching all books in its massive cluster of servers. I know that Google kool-aid that "we are about general good" is running deeply in the veins of slashdot types.

Since when was scanning books from libraries and making them available to public for a profit was considered "fair use"? This kind of stuff is done by pirates. Go to the major cities in China and India and you will see piles of copied book in the streets all sold for 1/10th the original price without giving anything back to the authors. The pirates can say that they are doing a favor to the authors by driving them out of obscurity.

The message the alliance is sending out to the authors is
- we are not for profit
- we will scan your book only if you want us to do so
- your book will be indexed based on your approval and copyright agreement with you and the publishers
Compare this to what Google is telling the authors
- we will scan your book, fill a form and tell us if you don't want us to do so
- we will take sale comissions from amazon, buy.com, bn.com, etc. without sharing anything with you
- if we show ads, we will share the profits with you
- we will show excerpts of your book, so if a researcher is researching on a topic he can find what you have written about a topic without ever having to buy your book, too bad, heh heh, write a fiction book dude
- we will cache your book in our servers and only we will reserve the right to profit from your scanned book
So much for do no evil. Kudos to yahoo for bringing the open content alliance, gutenberg, and other similar projects to limelight - these are some really nice collections that were hidden by the noise created by 'google print'.
1. Re:"Do no Evil" done right by nursegirl · 2005-10-03 10:40 · Score: 2, Insightful
  
  Compare this to what Google is telling the authors
  * we will show excerpts of your book, so if a researcher is researching on a topic he can find what you have written about a topic without ever having to buy your book, too bad, heh heh, write a fiction book dude
  
  Except that Google only shows 2-3 sentences of books that are under copyright. I've never found a researcher that can write on a topic by only reading 2 sentences. It's only posters on /. that can claim expertise on a topic without actually learning anything about it.
2. Re:"Do no Evil" done right by Anonymous Coward · 2005-10-03 11:42 · Score: 3, Insightful
  
  Since when was scanning books from libraries and making them available to public for a profit was considered "fair use"?
  
  How disingenuous. Google Print shows only a snippet of the text and tells you how to buy the book if it seems like what you need. Not pages, not paragaphs - a couple of sentences. In fact, Google Print instantly returns pretty much what you'd get if you hired a researcher to go find X number of books with such and such text and the researcher prepared a paper with a short quote from each. Such a paper would be unquestionably fair use and could be published anywhere. Google Print merely automates that process and makes it instant. I have no special fetish for Google; anybody who builds a system like this is doing us all a favor: it's a 21st century version of a card catalog, and a huge win for readers and authors. It's only being fought because, in our sue-happy culture, fair use rights have been eroded so much and copyright protections have been expanded so far that people seem to believe that even the most trivial use of their work - in a futuristic card catalog, for example - should bring a pay day. It's another case of cutting off your nose to spite your face.
  
  we will scan your book, fill a form and tell us if you don't want us to do so
  
  Which is, of course, exactly the model that Google and every single other search engine on the web has used since day one: Yahoo, AltaVista, everybody. It's the only sane way to make the web indexable. If it's not copyright violation on the web, then it's not copyright violation in print. Bringing that sort of searchable index to the history of printed material is a huge win for everybody, including authors. If courts eventually rule that this is copyright violation, then let's all say goodbye to the usefulness of Google and every single other decent search engine in the history of the web. Which would be a damn shame, but not surprising considering how twisted and lopsided against the public the bargain of copyright has become.
This is huge. IA beat Google and Yahoo to this... by Anonymous Coward · 2005-10-03 10:04 · Score: 4, Insightful

I've read through the first few posts, and people really don't have a clue about what this is all about. "Open Content Alliance"... It means what it says. Open f'ing content. Let there be content available to the masses... Is it more important that I can get a snippet from some copyrighted text, or that millions of children can read Alice in Wonderland with all it's wonderful illustrations.

This is beyond PDF or anything like that. Some people want PDF, so Adobe will make them. Some people want decent OCR versions, perhaps to go into Distrubuted Proof readers or into someone's text-only PDA. It's ALL possible. This is NOT an exclusive club, it's an INCLUSIVE community that is dedicated to Open f'ing Content.

Why don't you people get it. By allowing people to have full texts of some of humanities greatest works we are doing more than a few snippets of the latest Ken Follet novel... a lot more.

It's bigger than Yahoo or Google. Yahoo is NOT an also-ran.... The Internet Archive has been scanning books and hosting Milloins Books project texts as well as Project Gutenberg texts for a long time... long before Yahoo or even Google were in the picture. Ignorant comments made here suggest somehow Yahoo is following.

I say Yahoo is leading by embracing a project that by definition is bigger than themselves. Good for them.
Re:its to see... by twiddlingbits · 2005-10-03 10:05 · Score: 3, Insightful

PDFs of "public domain" or donated works will always be available. Amazon has gotten enough sh*t about the excerpts that they publish to entice the reader to buy the book. Google "e-book" and you'll see Yahoo! is nowhere near the only source. There is even an open-source e-book idea at Open eBook - http://www.openebook.org/ -- Information on the publication specification for electronic books that will allow compatibility between different e-book devices.

I just wonder how Yahoo! will make $$$ of this very small market of public domain works, or if they DO get repro rights to other books what the price model is to download them, or will you just see advertisements in your e-books? The authors are not going to give up their $$$ nor is Yahoo so somebody is going to have to pay for this content.