Yahoo Competes with Google in Book Scanning

← Back to Stories (view on slashdot.org)

Yahoo Competes with Google in Book Scanning

Posted by ScuttleMonkey on Monday October 3, 2005 @09:04AM from the my-literary-collection-is-bigger-than-yours dept.

UltimaGuy writes "A consortium backed by Yahoo has launched an ambitious effort to digitize classic books and technical papers and make them freely available on the Web. The company is partnering with the newly formed Open Content Alliance, which aims to offer PDF documents of books to the public at no charge. Consumers will be able to search the contents of the Open Content Alliance's database and download the entire content of any work, such as a scanned copy of a book."

18 of 193 comments (clear)

Min score:

Reason:

Sort:

What a concept. by Anonymous Coward · 2005-10-03 09:09 · Score: 5, Informative

I liked the idea the first time I heard it - back when it was called Project Gutenburg. :P
Project Gutenberg by timeToy · 2005-10-03 09:11 · Score: 5, Informative

16k ebooks to choose from today, more to come, no Google, no Yahoo.
http://www.gutenberg.org/
1. Re:Project Gutenberg by timeToy · 2005-10-03 09:49 · Score: 4, Informative
  
  It depends, some book do carry graphics, for instance the Slashdot friendly "Amusements in Mathematics" by Henry Ernest Dudeney, 1917
  http://www.gutenberg.org/etext/16713 the Html zipped version do carry all the original drawings.
But will they digitize PD works from after 1922? by Anonymous Coward · 2005-10-03 09:12 · Score: 5, Informative

In the US, books published after 1922 can still be public domain if the author was American, it was originally published in the US, and the copyright was not extended at the end of the original copyright period. Google Library does not seem to be making an exception for this, will OCA? Project Gutenberg does.
Not really an up-stage by ChocoBean · 2005-10-03 09:14 · Score: 4, Informative

Actually this won't "Upstage" google in any way.

FTA:
all the content will be made available so it can be indexed by all the other major search engines, including Google's

Yahoo is just going to scan, scan and scan. We all already prefer google's indexing and searching and cleaner interfaces, so the only thing Yahoo! will accomplish by this is help google print along, sheilding all (other) copyright law suits. Once the stuff is online, we all know that Google-bots will be all over it "like a fly on a pile of very seductive manure (Zapp)"

Excellent.

I just hope publishers realise that in this case neither google or yahoo is trying to be their best friend.
NOT competing by daniil · 2005-10-03 09:23 · Score: 4, Informative

There's a slight difference between an 'Internet-based library' and 'searching inside books'.

--
Man is a slave because freedom is difficult, whereas slavery is easy.
Re:Why PDF? by david+duncan+scott · 2005-10-03 09:25 · Score: 4, Informative

10 years down the road when everything is in PDF format, whose to stop them from charging us to view material in their format?

The fact that it's an open, documented format?
Adobe has made their money the old-fashioned way, by making tools that work well, rather than by locking people into a format. GhostScript, among others, will read those PDF's with or without Adobe.

--
This next song is very sad. Please clap along. -- Robin Zander
Apples and Oranges! This is not Google Print! by merreborn · 2005-10-03 09:26 · Score: 4, Informative

Google Print's goal is to allow people to search book content, WITHOUT giving them the content of the book.

For example, searching "Zoroastrianism" would return a list of book titles on the subject, and links to purchase the books in question. You CANNOT download the content of the book!

The OCA (The group Yahoo just joined) is an opt-in, full content hosting project.

Searching "Zoroastrianism" would return a (much smaller) list of books, with the *full* content of the book available for download with the explicit consent of the publisher/author!
Re:PDF?! yuck by Fiver- · 2005-10-03 10:11 · Score: 4, Informative

"Does anyone else find there is no way to read a PDF with the scroll buttons..."

No. I just set it to Continuous. See those four icons in the lower right corner? (assuming you've got a recent version) Play with those. You want the second button from the left

"This goes along with the concept that for an electronic format, I do NOT need a sentence (or even worse, hyphenated word) broken up by two inches of top and bottom margin filled with page numbers, miscellaneous watermarks, repetitive titles, etc."

Well, the whole purpose of PDF is to "preserve the look and integrity of your original documents ... regardless of the application and platform used to create it." Blame the creators of that particular pdf file if you don't like the headers, footers and margin size. When I make pdf books to read on the train...I just finished Dream Quest of Unknown Kadath by Lovecraft...I open the original ascii text file in Word, make the top & bottom margins tiny, change the font to something tolerable and export it.
Re:But will they digitize PD works from after 1922 by thisissilly · 2005-10-03 10:40 · Score: 2, Informative

In the US, that is only true of works published after 1978.
When U.S. works pass into the Public Domain is a good summary of the U.S. issues.
Me, I just want 14+14 back.
Re:"Do no Evil" done right by Jeff+DeMaagd · 2005-10-03 10:44 · Score: 2, Informative

[i]Since when was scanning books from libraries and making them available to public for a profit was considered "fair use"?[/i]

It's not. You are mischaracterizing Google's system. The problem with your claim is that Google's system doesn't make the book available to users to download, it is only a search method that points to the relevant books and provides short excerpts like their search engine does. Google won't provide the book or even whole page without the copyright owner's permission. My impression is that Google was just trying to make an improved card catalog.

[i]we will take sale comissions from amazon, buy.com, bn.com, etc. without sharing anything with you[/i]

The sale of the book meant that the author got their share of the money.

[i]we will show excerpts of your book, so if a researcher is researching on a topic he can find what you have written about a topic without ever having to buy your book, too bad, heh heh, write a fiction book dude[/i]

The researcher could just go to the local research library, no books purchased. Another problem is that the research would be horribly flawed given that the given descriptions are so short and the allowed excerpts only cover certain pixed pages.
Re:Bookripper on its way? by gasaraki · 2005-10-03 11:39 · Score: 2, Informative

It's already been done. The guy was sent a 'please stop doing this' letter by Google if I recall, which I think he went along with. No formal suit or anything, but they didn't like it. I'll be damned if I can remember the link, I think there was a K5 story or two on it though.
Re:Annoying by Moofie · 2005-10-03 11:39 · Score: 3, Informative

"very few new features come out"

Have you seen Google Earth?

How about the disaster wiki that went together in about 20 minutes, where people were posting status reports of New Orleans properties?

I think you're damning with faint praise. Google, at least, consistently builds superb offerings, and the price is right. Not quite sure what you're grousing about...

--
Why yes, I AM a rocket scientist!
Re:"Do no Evil" done right by _Sprocket_ · 2005-10-03 12:26 · Score: 2, Informative
Since when was scanning books from libraries and making them available to public for a profit was considered "fair use"?

Since when is Google doing this? As others have pointed out, Google provides a portion of the work to give the search context - 3 pages. In another post, you claim that 3 pages is enough information to invalidate the sale of a book. If this is the case, I would have to seriously question the value of your work. Either that - or take a serious look at public libraries, private loaning, Amazon.com, book stores, and other avenues of viewing those precious 3 pages that apparently cost you sales.

It might be worth noting that no case of "fair use" is clear. Court cases often contradict each other, so there are no clear precidents to follow. However, among common factors potentially in Google's favor is that they:
1. Provide additional insight in to the work(s)
2. Provide a service to the public, in many cases providing facts and information
3. Provide a limited subset of the work
4. Are not making offensive use of the work
What may not factor in Google's favor include:
1. Limited modification of the origional work
2. Potential damage to the market for the work - providing that someone such as yourself can prove that 3 pages is damaging.
3. Google's behavior may be interpreted as hostile and offend the Court
Having said that - I'm not a lawyer. But then, even experts are occasionally shocked at the outcomes of these cases.

It might be worth noting that fair use does not require notification or permission of the copyright holder. Nor does it require that the one invoking fair use not make a profit.
we will take sale comissions from amazon, buy.com, bn.com, etc. without sharing anything with you

When do authors currently get a cut of sale comissions?
we will show excerpts of your book, so if a researcher is researching on a topic he can find what you have written about a topic without ever having to buy your book, too bad, heh heh, write a fiction book dude

Again - this might stand up in court. Possibly. But note that most examples of this having weight tend to involve images and songs - not books. It may be difficult to prove 3 pages as damaging for a work as large as a book - especially if the damaging material is a fact.
Kudos to yahoo for bringing the open content alliance, gutenberg, and other similar projects to limelight - these are some really nice collections that were hidden by the noise created by 'google print'.

Kudos to Yahoo for coming up with something different to do. But I missed it where the OCA or Yahoo even makes mention of Project Gutenberg. Furthermore, I find it a hard stretch to claim that the "noise created by 'google print'" did anything more to obscure Project Gutenberg than Yahoo's project.
Re:no mention of project gutenberg by Anonymous Coward · 2005-10-03 14:21 · Score: 1, Informative

Umm, isn't this a feature. formats come and go, plain ascii seems to have a habit of hanging around (well, for the last >50 years, anyhow). If you want something fancier/easier to read, then conversion really isn't that hard (I've pdf-ed several books for printing in the last few years - using latex - it really is trivial, and the output is excellent quality, binding+printing costs about $2 per book). ymmv
Re:Bookripper on its way? by Dan+East · 2005-10-03 14:29 · Score: 2, Informative

According to Google, there are specific portions of each book that it will never show, making it impossible to harvest an entire book.

I'm already logged in. Why are you telling me the page is unavailable?

As part of our efforts to protect a book's copyright, a set of pages in every in-copyright book will be unavailable to all users.

http://print.google.com/googleprint/help.html#page limit

Dan East

--
Better known as 318230.
Re:Project Gutenberg (Michael Hart essay) by gbnewby · 2005-10-03 17:01 · Score: 2, Informative

Here's something Michael Hart wrote about this today. He's
the founder of Project Gutenberg, and inventor of eBooks.
-- Greg

Yet another consortium of multi-billion dollar institutions
has thrown its hat into the eBook/eLibrary ring today, just
9 months before the 35th Anniversary of Project Gutenberg's
placement on the Internet of the first eLibrary element, on
July 4th, 1971.

Last December 14th Google used a multi-million dollar blitz
of television, radio and print media to announce the Google
Print revolution: "Today is the day the world changes," but
so far it has been difficult to get even a handful of books
from their project, some 10 months later.

I am wondering of the news media will give the same kind of
coverage to a second such announcement, which will also put
up an alliance of an Internet search engine giant with some
multi-billion dollar libraries. I will be watching all the
news programs tonight in eager anticipation, as I was doing
last December, but I fear that "once burned/twice cautious"
might take some of the wind out of their sails/sales.

However, this effort has one huge advantage: "The Internet
Archive," run by my friend Brewster Kahle. Brewster is one
person who has a proven ability to put an enormous resource
on the Internet for the whole wide world to use.

This different is such that I am willing to bet that Yahoo!
gets off to a better start in the next 10 months than did a
rather completely false start by Google.

Of course, the real test will be to see how long it takes a
project such as this to reach a million eBooks, since there
are already well over 100,000 eBooks already available free
for the taking on various Internet sites, perhaps 50,000 of
them from the various Project Gutenberg sites.

Here's a hope that a few years from now anyone can have the
advantage of a million book home library, and in even a few
years more to ten million books sitting on one inch of your
own bookshelf next to your computer.

Michael S. Hart
Founder
Project Gutenberg
Erosion of Public Domain--not just Disney and RIAA by dananderson · 2005-10-03 19:06 · Score: 2, Informative

The physical owner of a PD book (library) can prohibit scanning or even viewing. For modern books, it's not a problem--just go to another library. For some books it is a problem. Few copies exist, and they are scattered around the world.
The library can require a legal agreement to view or scan the book, and that is where a lawsuit can occur. Of course, the legal agreement doesn't apply to 3rd parties that haven't signed. It's another example of the erosion of the public domain--it's not just Disney and the music industry that's doing it folks--it's the University of California and other libraries.