Slashdot Mirror


Yahoo Competes with Google in Book Scanning

UltimaGuy writes "A consortium backed by Yahoo has launched an ambitious effort to digitize classic books and technical papers and make them freely available on the Web. The company is partnering with the newly formed Open Content Alliance, which aims to offer PDF documents of books to the public at no charge. Consumers will be able to search the contents of the Open Content Alliance's database and download the entire content of any work, such as a scanned copy of a book."

193 comments

  1. RIAA Problems Solved by GreggyBUIUC · · Score: 5, Funny

    Someone start up a "Open Content Alliance" for music... then we can digitize and share it all we want.

    1. Re:RIAA Problems Solved by Anonymous Coward · · Score: 1, Insightful

      As much as I hate RIAA methods as anyone else I still have to disagree with that it has to be free for all. The artists somehow has to be paid too. Just as you have to be paid for the work you do. This is why iTunes is such a hit. They found a level where it creates an income without it is ripping us all off (At least not as much as RIAA wants).

    2. Re:RIAA Problems Solved by Freexe · · Score: 0

      Woooosh!!

      I think you missed the joke.

      If Google and Yahoo are providing books for free, where do the authors get there income from

      --
      "In a time of universal deceit - telling the truth is a revolutionary act." - George Orwell
    3. Re:RIAA Problems Solved by Anonymous Coward · · Score: 0

      I don't think that groups that sell out thousands of $40-500 seats and new albums/T-shirts/merchandise selling at the concert are going to have any problems staying millionaires if they aren't selling any music outside of the concert.

    4. Re:RIAA Problems Solved by rf0 · · Score: 1

      Just do a load of covers which sound very very like them and you can legally play them

      Rus

    5. Re:RIAA Problems Solved by no+reason+to+be+here · · Score: 1

      Ummm...not without ASCAP being paid off first. You think the RIAA is bad, just wait 'til you run into the thuggish tactics of ASCAP.

    6. Re:RIAA Problems Solved by carlos_benj · · Score: 1

      Google's not providing books for free. They're making them searchable. It would be more work to compile a book from Google searches than it would be worth (if it's even possible).

      The Yahoo thing though sounds different (haven't read the article itself yet).....

      --

      --

      As a matter of fact, I am a lawyer. But I play an actor on TV.

    7. Re:RIAA Problems Solved by Anonymous Coward · · Score: 0

      Where do they get their income when someone goes the library?

    8. Re:RIAA Problems Solved by sik0fewl · · Score: 1

      In PDF format, no less!

      --
      I remember when legal used to mean lawful, now it means some kind of loophole. - Leo Kessler
    9. Re:RIAA Problems Solved by blibbler · · Score: 1

      Isn't that what the original MP3.com was? Independent artists essentially letting everyone download their music for free.

    10. Re:RIAA Problems Solved by Achromatic1978 · · Score: 1
      Gah.

      From the fact that libraries pay a significantly higher amount of money per copy than you or I do at retail, and that this price factors in expected borrowings. They also then pay an ongoing royalty based on actual borrowings, for precisely this reason.

    11. Re:RIAA Problems Solved by Anonymous Coward · · Score: 0

      "I don't think that groups that sell out thousands of $40-500 seats and new albums/T-shirts/merchandise selling at the concert are going to have any problems staying millionaires if they aren't selling any music outside of the concert."

      Then you don't have any idea how the music business "works".

    12. Re:RIAA Problems Solved by marco13185 · · Score: 1

      Actually, yahoo cannot, by law, provide a complete downloads of a book unless:

      A)The copyright has expired
      B)The author has given explicit written permission

      The downloadable books will probably all have expired copy rights, as I don't see many authors giving away their books for free. You won't find a complete download of all your favorite tech books, sorry to burst you bubble.

    13. Re:RIAA Problems Solved by mdecarle · · Score: 1

      I worked at a college library. We bought all our books at a local (big) bookstore, and magazine subscriptions at the same rate you get them. (Although, we did specialize in a field that has more expensive books and magazines)

  2. Will Yahoo scan it like they have yahoo.com? by Anonymous Coward · · Score: 5, Funny

    I can't wait to read the whole book on one page.

    1. Re:Will Yahoo scan it like they have yahoo.com? by Anonymous Coward · · Score: 0

      That made me 'lol'. Nice one :)

    2. Re:Will Yahoo scan it like they have yahoo.com? by m50d · · Score: 1

      You jest, but I'd find that much better than pdfs. There's a perfectly good format for reading things on a screen how you want to, it's called html. I want to have longer lines on my huge monitor, be able to apply my own stylesheets to the document, etc. PDF is for printing.

      --
      I am trolling
  3. Who cares! Yahoo is a dying engine by Anonymous Coward · · Score: 0, Interesting

    Yahoo services are often slow, riddled with annoying ads and cluttered.

  4. Google/Yahoo by Anonymous Coward · · Score: 0

    Oh yay. Competetion. This might stop Google from becoming a monopoly, and make people less concerned about that.

    Oh damnit. The human-confirm thing was fucked, so I missed first post >_

    1. Re:Google/Yahoo by crashelite · · Score: 1, Interesting

      google isnt evil so they cant become a monopoly like M$...

      --
      (yes i know i suck at spelling fell free to correct my grammar and/or spellin i dont care, im still not going to change
  5. no mention of project gutenberg by justforaday · · Score: 3, Insightful

    I find it interesting that in all the articles I've looked at today about this that only one has mentioned Project Gutenberg. Naturally, I can't recall which source it was...

    --
    I'll turn into a supernova and burn up everything. Well I'll turn into a black little hole and you'll turn into string.
    1. Re:no mention of project gutenberg by skelly33 · · Score: 1

      My only gripe with Gutenberg is the lack of foresight to store the texts in a translatable format such as XML - if users want a plain-text version, a quick XSL transform is all it would take to deliver it. PDF? No problem. Perhaps one of these successors is willing to put a bit more effort into it - quality, not quantity, right?

    2. Re:no mention of project gutenberg by Anonymous Coward · · Score: 1, Informative

      Umm, isn't this a feature. formats come and go, plain ascii seems to have a habit of hanging around (well, for the last >50 years, anyhow). If you want something fancier/easier to read, then conversion really isn't that hard (I've pdf-ed several books for printing in the last few years - using latex - it really is trivial, and the output is excellent quality, binding+printing costs about $2 per book). ymmv

  6. Dupe by JordanL · · Score: 0, Troll

    The editors should talk to each other more. I mean, I don't mind seeing two different takes on the same story, but I'd be pissed if I had bought the rights to see a story early.... only to find out it was a dupe.

    1. Re:Dupe by Nuttles1 · · Score: 0

      "but I'd be pissed if I had bought the rights to see a story early.... only to find out it was a dupe."

      Oh man, /. is worthy for a some kind of subscription fee regardless if you get to see a story early regardless if it is a dupe. Besides, you make it sound like one has to mortgage his/her house to get a subscription. IT IS 5 DOLLARS FOR 1000 PAGES. Morning coffee for half the people in the office is more than that!

    2. Re:Dupe by JordanL · · Score: 1

      Wow... I never checked out the pricing... I retract that. Quite the bargain actually.

    3. Re:Dupe by Nuttles1 · · Score: 4, Funny

      You must not be a true /.er because you know that if you were you would read up on every bit of documentation about anything that we do....Like how we alway RTFA...errr....wait, scratch that

    4. Re:Dupe by JordanL · · Score: 1

      lol...

      If I hadn't posted in this article I'd mod you up on that post...

    5. Re:Dupe by Nuttles1 · · Score: 1

      You're an oldie, a SIX DIGIT!, don't you have that perk?

    6. Re:Dupe by JordanL · · Score: 1

      I got five points ATM, but unfortunately I responded to this article. :( So i can't mod.

  7. What a concept. by Anonymous Coward · · Score: 5, Informative

    I liked the idea the first time I heard it - back when it was called Project Gutenburg. :P

    1. Re:What a concept. by ugmoe · · Score: 0, Offtopic

      Anyone know where I can get dna-paterniti-testing done?

  8. I'd never heard of such a thing by Anonymous Coward · · Score: 0

    Getting an author's permission before scanning and indexing their copyrighted works? What a novel concept.

  9. What do these guys know... by dada21 · · Score: 5, Interesting

    ...that we don't?

    It seems to me that they're throwing money at an unnecessary application. Does Yahoo know something that we don't? I'd venture that they're starting with PD books to shake the bugs out of their platform so the app works well in round 2.

    Round 2 (current commercial books) won't occur without a massive copyright law change or support of the Author's Guild.

    Hmm.

    1. Re:What do these guys know... by Brigadier · · Score: 1



      well they know it's all about content. Being advertisment driven sites they have to offer content and experiences that will attract people to there portal. ie search engine, e-mail, clubs, blogs etc.

    2. Re:What do these guys know... by dada21 · · Score: 1

      Yet ancient content isn't a driving element for even tiny groups, is it?

    3. Re:What do these guys know... by adtifyj · · Score: 1

      The website with the most hits, wins the fight.

  10. its to see... by CDPatten · · Score: 0, Troll

    that Yahoo! picked up the pieces and will succeeded where google failed miserably.

    1. Re:its to see... by twiddlingbits · · Score: 3, Insightful

      PDFs of "public domain" or donated works will always be available. Amazon has gotten enough sh*t about the excerpts that they publish to entice the reader to buy the book. Google "e-book" and you'll see Yahoo! is nowhere near the only source. There is even an open-source e-book idea at Open eBook - http://www.openebook.org/ -- Information on the publication specification for electronic books that will allow compatibility between different e-book devices.

      I just wonder how Yahoo! will make $$$ of this very small market of public domain works, or if they DO get repro rights to other books what the price model is to download them, or will you just see advertisements in your e-books? The authors are not going to give up their $$$ nor is Yahoo so somebody is going to have to pay for this content.

  11. Project Gutenberg by timeToy · · Score: 5, Informative

    16k ebooks to choose from today, more to come, no Google, no Yahoo.
    http://www.gutenberg.org/

    1. Re:Project Gutenberg by harmonica · · Score: 4, Interesting

      More books are a good thing. Having a scanned PDF version includes graphics as well, which are missing from Gutenberg ebooks. So I see this as a very positive development.

    2. Re:Project Gutenberg by timeToy · · Score: 4, Informative

      It depends, some book do carry graphics, for instance the Slashdot friendly "Amusements in Mathematics" by Henry Ernest Dudeney, 1917
      http://www.gutenberg.org/etext/16713 the Html zipped version do carry all the original drawings.

    3. Re:Project Gutenberg by Reality+Master+101 · · Score: 1, Troll
      No images, graphics, no typography, no typesetting...

      Project Gutenberg is great and all, but there's something to be said for some effort made at presentation. Sometimes italics are a good thing.

      --
      Sometimes it's best to just let stupid people be stupid.
    4. Re:Project Gutenberg by Infinityis · · Score: 2, Funny

      Well this is a problem waiting to get solved. Why don't they incorporate image-to-ASCIIart software so we can get high-quality images from these books?

    5. Re:Project Gutenberg by Anonymous Coward · · Score: 0
      No images, graphics, no typography, no typesetting...

      Looked at PG lately? Half or more of the books being added include HTML versions these days. Examples: An 1881 issue of Scientific American, An issue of Punch from 1892, a book about hand shadows, Beatrix Potter, and even some occasional PDF books to preserve the original's layout, and more....

    6. Re:Project Gutenberg by Anonymous Coward · · Score: 0
      Having a scanned PDF version includes graphics as well, which are missing from Gutenberg ebooks.

      Yes, without pictures, My Pet Goat is completely unreadable.

    7. Re:Project Gutenberg by shellbeach · · Score: 3, Interesting

      Project Gutenberg is great and all, but there's something to be said for some effort made at presentation. Sometimes italics are a good thing.

      It's not a great solution, but emphasis _is_ preserved in the etexts, just like that. Or occasionally like THIS ... Pity there's no consistency, but for most texts it works well enough.

      Also, the fact that they are plain text, with no markup, formatting, binary code, whatever in them means that they'll always be accessible to anyone, regardless of software or platform. And that's a good thing, too!

    8. Re:Project Gutenberg by Anonymous Coward · · Score: 0

      http://www.jus.uio.no/sisu/
      allows for fairly easy conversion from guttenberg text to a number of other formats
      (manual tweaking/markup required though)
      you find a few output samples here
      http://www.jus.uio.no/sisu/SiSU/2
      http://www.jus.uio.no/sisu/SiSU/2#books
      including the markup used, (a few markup samples also here)
      http://www.jus.uio.no/sisu/SiSU/2#markup

      the markup for War and Peace:
      http://www.jus.uio.no/sisu/sample/syntax/war_and_p eace.leo_tolstoy.s2.html
      output
      http://www.jus.uio.no/sisu/SiSU/2#wap
      http://www.jus.uio.no/sisu/war_and_peace.leo_tolst oy/

      more complicated sample documents are also provided,
      (LaTeX generated and used for pdf conversion)
      primarily Linux / Unix software though.

    9. Re:Project Gutenberg by m50d · · Score: 1
      Also, the fact that they are plain text, with no markup, formatting, binary code, whatever in them means that they'll always be accessible to anyone, regardless of software or platform. And that's a good thing, too!

      HTML would accomplish the same thing. It's a public standard, implementable by anyone on any platform, and convertable to plain text by a simple regex substitution. You're no more likely to find someone who can't read an html file than someone who can't read an ascii text file.

      --
      I am trolling
    10. Re:Project Gutenberg by shellbeach · · Score: 1

      HTML would accomplish the same thing. It's a public standard, implementable by anyone on any platform, and convertable to plain text by a simple regex substitution. You're no more likely to find someone who can't read an html file than someone who can't read an ascii text file.

      I agree, personally. However, you could also argue that _this_ sort of emphasis is convertible to html with a simple regex substitution - my point was simply that the texts haven't lost any information. Ultimately, it doesn't really matter _that_ much, does it?? Ascii is the lowest common denominator - it's not a bad place to start.

    11. Re:Project Gutenberg by sootman · · Score: 1

      They're a great group, but they've using some *really* shitty compression algos. :-)

      Format - Encoding - Compression - Size
      HTML - iso-8859-1 - none - - - 1.27 MB
      HTML - iso-8859-1 - zip - - - 5.95 MB

      --
      Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
    12. Re:Project Gutenberg by m50d · · Score: 1
      I agree, personally. However, you could also argue that _this_ sort of emphasis is convertible to html with a simple regex substitution - my point was simply that the texts haven't lost any information.

      But when it's not done consistently then it can't be converted back and has lost information. Does _this_ mean underlined or bold?

      Ascii is the lowest common denominator - it's not a bad place to start.

      It's a better place to finish. Converting down from a richer format to ascii is very easy, converting up not so much.

      --
      I am trolling
    13. Re:Project Gutenberg by bbc · · Score: 1

      " They're a great group, but they've using some *really* shitty compression algos. :-)

      Format - Encoding - - Compression - Size
      HTML - - iso-8859-1 - none - - - -- 1.27 MB
      HTML - - iso-8859-1 - zip - - - - - 5.95 MB
      "

      The uncompressed number refers to just the HTML file. The compressed number refers to the HTML file and the files of all the embedded images.

  12. Why PDF? by matr0x_x · · Score: 0, Interesting

    An OS solution would be better would it not? 10 years down the road when everything is in PDF format, whose to stop them from charging us to view material in their format?

    --
    LINUX ONLINE POKER: Linux Poker
    1. Re:Why PDF? by david+duncan+scott · · Score: 4, Informative
      10 years down the road when everything is in PDF format, whose to stop them from charging us to view material in their format?

      The fact that it's an open, documented format?

      Adobe has made their money the old-fashioned way, by making tools that work well, rather than by locking people into a format. GhostScript, among others, will read those PDF's with or without Adobe.

      --

      This next song is very sad. Please clap along. -- Robin Zander

    2. Re:Why PDF? by amliebsch · · Score: 1

      Parent's point is still valid. PDF-related technology is patented, and the free licenses they currently grant are not to my knowledge perpetual. Therefore, theoretically, the license could be revoked, and while Ghostscript would still technically be able to read (old) PDF files, it could not do so legally. There are lots of open, documented formats that are still pay-to-play.

      --
      If you don't know where you are going, you will wind up somewhere else.
    3. Re:Why PDF? by TTK+Ciar · · Score: 1

      You're right, sorta. The djvu format is better than PDF for scanned books in most respects. Looks better, compresses better (and compresses by default), decompresses + renders faster while using less memory, more easily transformed to/from other formats due to availability of high-quality open source and free tools, etc. The Internet Archive's books collection has several books archived in djvu format.

      The downside is that most users do not have a djvu reader installed on their computers, and even though it's trivial to download and install djview for free, most people will not bother. The Internet Archive more or less solves this problem with a java applet which turns users' web browsers into djvu readers. This should work for other content providers as well, except nobody knows about it, so everyone stops at "oh no, nobody has a viewer installed". The end.

      On a slightly different note, though, PDF isn't that bad. It's an open format, and even though most people seem to think Acrobat is the only viewer, there are others like xpdf, which is faster, more stable, and easier to use than Acrobat (though not as fully-featured).

      -- TTK

    4. Re:Why PDF? by david+duncan+scott · · Score: 1
      Not perpetual, no, but then, neither are patents.
      Part 2: Patent Clarification Notice: Reading and Writing PDF Files
      (exact quote of [3])

                  Adobe has a number of patents covering technology that is disclosed
                  in the Portable Document Format (PDF) Specification, version 1.3
                  and later, as documented in PDF Reference and associated Technical
                  Notes (the "Specification"). Adobe desires to promote the use of PDF
                  for information interchange among diverse products and
                  applications.

                  Accordingly, the following patents are licensed on a royalty-free,
                  non-exclusive basis for the term of each patent and for the sole
                  purpose of developing software that produces, consumes, and
                  interprets PDF files that are compliant with the Specification:

                  U.S. Patent Numbers:

                      * 5,634,064
                      * 5,737,599
                      * 5,781,785
                      * 5,819,301
                      * 6,028,583
                      * 6,289,364
                      * 6,421,460

                  In addition, the following patent is licensed on a royalty-free,
                  non-exclusive basis for its term and for the sole purpose of
                  developing software that produces PDF files that are compliant with
                  the Specification (specifically excluding, however, software that
                  consumes and/or interprets PDF files):

                  U.S. Patent Numbers:

                  * 5,860,074


      Life of the patent seems long enough for me.

      I note also that the GNU folks felt comfortable enough about this to work on GhostScript. Would they have done so if they felt the terms encumbered? I mean, I'm not a lawyer, but some of the FSF people are, and they seem pretty careful to me.

      --

      This next song is very sad. Please clap along. -- Robin Zander

  13. Whew! by op12 · · Score: 4, Interesting

    I almost panicked after seeing we had gone so long without a Google-related article.

    The opt-in rather than opt-out strategy is really what Google probably should have done, but it'll be interesting to see who comes out as a winner, Yahoo or Google, in all of this.

    1. Re:Whew! by ChocoBean · · Score: 1
      yeah i commented about that and somebody had a knee-jerk response and called that trolling.

      putting stuff online is good, making it troublesome for others is not

      that being said, we just had one that's only vaguely related to google earlier today though: http://it.slashdot.org/article.pl?sid=05/10/03/105 7258&tid=185&tid=217&tid=218

  14. But will they digitize PD works from after 1922? by Anonymous Coward · · Score: 5, Informative

    In the US, books published after 1922 can still be public domain if the author was American, it was originally published in the US, and the copyright was not extended at the end of the original copyright period. Google Library does not seem to be making an exception for this, will OCA? Project Gutenberg does.

  15. Not really an up-stage by ChocoBean · · Score: 4, Informative

    Actually this won't "Upstage" google in any way.

    FTA:
    all the content will be made available so it can be indexed by all the other major search engines, including Google's

    Yahoo is just going to scan, scan and scan. We all already prefer google's indexing and searching and cleaner interfaces, so the only thing Yahoo! will accomplish by this is help google print along, sheilding all (other) copyright law suits. Once the stuff is online, we all know that Google-bots will be all over it "like a fly on a pile of very seductive manure (Zapp)"

    Excellent.

    I just hope publishers realise that in this case neither google or yahoo is trying to be their best friend.

    1. Re:Not really an up-stage by Anonymous Coward · · Score: 0

      I don't prefer Google's indexing. So please talk about yourself only. I use a9.com these days which seem to give "better" results in the first 2-3 pages.

    2. Re:Not really an up-stage by Scowler · · Score: 1
      If you use Google search to get to Yahoo content, who do you think is getting the bulk of the ad dollars? Hint: Yahoo is fine with this arrangement.

      And replace "We all" with "Most of us". I for one am roughly equally happy with the search results from both engines.

    3. Re:Not really an up-stage by op12 · · Score: 1

      If you use Google search to get to Yahoo content, who do you think is getting the bulk of the ad dollars? Hint: Yahoo is fine with this arrangement.

      Not necessarily...you are going to see a Google ad related to your search before you see a Yahoo one related to your search. If you didn't care about ads the first time (at Google), why would you when Yahoo hits you with them again? I think that probably Google benefits from someone finding something through them, and Yahoo's benefit is much reduced.

    4. Re:Not really an up-stage by Anonymous Coward · · Score: 0

      don't prefer Google's indexing. So please talk about yourself only.

      And by this comment I infer that the day has arrived when google is so cool that it is no longer cool to think that google is cool.

      (I feel a great disturbance in the force)

    5. Re:Not really an up-stage by Woy · · Score: 1
      I just hope publishers realise that in this case neither google or yahoo is trying to be their best friend.

      And here's to hoping they don't.

      --
      "If God created us in his own image we have more than reciprocated." - Voltaire
    6. Re:Not really an up-stage by krunk4ever · · Score: 1

      I'm guessing the parent was trying to be funny since a9 uses google results, just with a more presonalized interface. i've been using a9 because amazon gives me the pi/2% off all amazon products.

    7. Re:Not really an up-stage by krunk4ever · · Score: 1

      I don't really think what Google and Yahoo are doing is exactly the same. Yahoo seems to be only digitizing specific books and text (probably the ones that Open Content Alliance has licenses to). In fact, it clearly says so in the article:

      Internet powerhouse Yahoo Inc. is setting out to build a vast online library of copyrighted books that pleases publishers -- something that rival Google Inc. hasn't been able to achieve.

      The Open Content Alliance, a project that Yahoo is backing with several other partners, plans to provide digital versions of books, academic papers, video and audio. Much of the material will consist of copyrighted material voluntarily submitted by publishers and authors, said David Mandelbrot, Yahoo's vice president of search content.


      So this isn't really shielding Google from anything and Google Print's project is trying to index ALL books and will only remove you from the list if you specifically request so.

    8. Re:Not really an up-stage by Scowler · · Score: 1
      Think about the timeline. You'll spend less than 5 minutes in Google's site. You'll spend the remaining time (hours? this is heavy reading we're talking about) looking at Yahoo content.

      Of course Yahoo would prefer that you went to them in the first place, but Yahoo is still getting the lion's share of the loot in this scenario.

    9. Re:Not really an up-stage by Anonymous Coward · · Score: 0

      Thanks for reminding me. I hadn't used my A9 search in a while. Hate to lose that pi/2% discount!

      Oh yeah, also time to check my gold box. I love the new gold box. It no longer feels like they are trying to get me to buy shit that is in no way related to things I would actually buy. Half of my offers actually match my purchase history. The other half... camping gear.

    10. Re:Not really an up-stage by ediron2 · · Score: 1

      And I just hope that writers realize that the publishers might realize that in this case neither google nor yahoo is trying to be their (the publishers') best friend.

  16. What about China? by DAldredge · · Score: 3, Interesting

    Will Yahoo provide sorted or unsorted lists of books that China's Internet uses view to the thugs that run China?

    1. Re:What about China? by Anonymous Coward · · Score: 0

      Yes, the same ones available to the thugs of America will be available to the thugs of China.

    2. Re:What about China? by m50d · · Score: 1

      It's google that was doing that the last I saw. But slashdotters are strangely quieter about that.

      --
      I am trolling
    3. Re:What about China? by DAldredge · · Score: 1

      Yahoo Helped China Imprison Journalist Says Reporters Without Borders

      According to a report released by Reports Without Borders (Reporters sans frontieres) Yahoo Holdings (Hong Kong, China) helped China convict a journalist by providing the Chinese government with information to identify Shi Tao. Shi Tao was convicted to 10 years for leaking government information to the external media. Tao sent foreign websites copies of a message Chinese authorities sent to his newspaper which warned of the "dangers of social destabilisation and risks resulting from the return of certain dissidents on the 15th anniversary of the Tiananmen Square massacre."

      http://news.google.com/news?hl=en&ned=us&q=yahoo+c hina&btnG=Search+News

  17. *sigh* oh look, it's the wheel again by Anonymous Coward · · Score: 0

    Project Gutenberg could sure use the help that Yahoo and Google are throwing a these projects.

    Why do these companies, especially Google, have to go this route? Why couldn't they just help a pre-established project?

  18. The difference between Google and Yahoo's effort by doctor_no · · Score: 4, Insightful

    Seems like the crucial difference between Google's efforts and the OCA(Open Content Alliance) is that Google has a "opt-out" policy for copyrighted material, while OCA specifically requires the copyright holder to contact them and essentially allow them to use the material.

    The OCA likely won't be sued by the Writer's Guild like Google, however, for searching material Google will likely be better being that Google's search will likely include a massive plethora of copyrighted material, legal or not. Also, it seems that Google themselves will be allowed to use all the material from the OCA into their project as well.

  19. Companies should Get Original by TarrySingh · · Score: 2, Insightful

    Why can't companies come up with some cooler ideas? Why ape each other? First Google and hten Yahoo, Sure MS will also want to play.

    --
    Scott McNealy to Michael: "Suck my Sun!" Michael Dell to Scott : "Lick my Dell!"
    1. Re:Companies should Get Original by Anonymous Coward · · Score: 0

      Then come up with a better idea, Einstein.

    2. Re:Companies should Get Original by TTK+Ciar · · Score: 1

      You mean "First the Internet Archive, then Google, then Yahoo". The Million Books Project predates Google's bookscanning efforts by a few years.

      -- TTK

    3. Re:Companies should Get Original by some+guy+I+know · · Score: 1
      Then come up with a better idea, Einstein.
      Wait! I know!
      Let's do the exact same thing, but not use computers!
      Hmmm...
      Yes, do the exact same thing that others have done for years, but not using computers.
      What an incredibly original idea!
      Please excuse me while I go file a patent.
      --
      Those who sacrifice security to condemn liberty deserve to repeat history or something. - Benjamin Santayana
  20. NOT competing by daniil · · Score: 4, Informative

    There's a slight difference between an 'Internet-based library' and 'searching inside books'.

    --
    Man is a slave because freedom is difficult, whereas slavery is easy.
  21. Apples and Oranges! This is not Google Print! by merreborn · · Score: 4, Informative

    Google Print's goal is to allow people to search book content, WITHOUT giving them the content of the book.

    For example, searching "Zoroastrianism" would return a list of book titles on the subject, and links to purchase the books in question. You CANNOT download the content of the book!

    The OCA (The group Yahoo just joined) is an opt-in, full content hosting project.

    Searching "Zoroastrianism" would return a (much smaller) list of books, with the *full* content of the book available for download with the explicit consent of the publisher/author!

  22. Sad thing about Yahoo though by totallygeek · · Score: 2, Interesting

    You will be reading the content to Moby Dick on Yahoo and in the top right it will say, "content provided by Google."

    1. Re:Sad thing about Yahoo though by WindBourne · · Score: 1

      As opposed to "The Minnow" , content provided by Microsoft?

      --
      I prefer the "u" in honour as it seems to be missing these days.
    2. Re:Sad thing about Yahoo though by Anonymous Coward · · Score: 0

      White Whale
      Great deals on White Whale
      Shop on eBay and Save!
      www.eBay.com

  23. Annoying by rm999 · · Score: 1

    I am getting tired of the big internet companies straight up copying each other. Yes, it means that products slowly get improved over time (eg. yahoo mail -> gmail -> yahoo mail) but it also means that the companies aren't innovating enough. Yahoo is spending time and money on providing a product that is already offered. We would probably be better off if they spent the effort on providing a unique service - like scanned magazines or something.

    1. Re:Annoying by ScentCone · · Score: 2, Insightful

      I am getting tired of the big internet companies straight up copying each other.

      Should we turn to you to tell us which provider of each major online activity is the one we should all use? Even if the differences are incremental and subtle, I'm glad when I get to choose between Yahoo's and Google's take on a particular app/service. I'm also glad that Audi and Toyota and GM and Honda all have different ideas on cars... even though someone else built one once already. Come on - not every service offered is going to be wholly unique, and shouldn't be. It's competition - for eyeballs, brand loyalty, etc. Same reason there are a zillion Linux distros, even though may overlap. Everyone's got their own idea of what would make it just a little bit better.

      --
      Don't disappoint your bird dog. Go to the range.
    2. Re:Annoying by rm999 · · Score: 1

      I never said I hate the lack of choice. In fact I like it (duh). I just said it annoys me that there isn't more large-scale innovation - very few new features come out. Two large, multi-billion dollar companies should be able to do a little more.

      As an example of my point, two image search engines require double the effort of one, but only provide incremental benefit to the user. Instead of copying altavista's image search (which I still think is better), google could have implemented something entirely new. This is just an example, and I understand the image search probably didn't take much effort, but that misses my point. My point is that the large search engines haven't innovated much because all they care about is offering the same service as everyone else, just improved a little.

    3. Re:Annoying by Moofie · · Score: 3, Informative

      "very few new features come out"

      Have you seen Google Earth?

      How about the disaster wiki that went together in about 20 minutes, where people were posting status reports of New Orleans properties?

      I think you're damning with faint praise. Google, at least, consistently builds superb offerings, and the price is right. Not quite sure what you're grousing about...

      --
      Why yes, I AM a rocket scientist!
  24. Finnegan's Wake, now available online! by megify · · Score: 1

    I think this is only good for short documents....
    I think if I read Finnegan's Wake or Hawaii on-screen, my eyes would bleed and tear themselves out of my skull. (not to mention downloading PDFs for days.) In that case, I'd much rather just go buy a paperback for $3. Then I don't have to read on-screen, the pages are conveniently sized and bound, and I can take my book to places I wouldn't bring a laptop. Like a bubble bath, bed, or my commute to work every day.

    1. Re:Finnegan's Wake, now available online! by Voltaire759 · · Score: 1
      "think if I read Finnegan's Wake or Hawaii on-screen, my eyes would bleed and tear themselves out of my skull."

      It's actually not that bad with a laptop. You can change the angle, distance and brightness, so it's tolerable for quite some time.

      --
      Écrasez l'infâme
    2. Re:Finnegan's Wake, now available online! by megify · · Score: 1

      maybe i'm just biased because the last thing i read entirely on-screen were the functional specs and documentation for new chip engineering software, in their entirety.

      or maybe i've just ruined my eyes with bi-weekly, 48-hour bouts of being awake in front of my computer. my eyes don't put up with a lot of that anymore.

  25. Yikes, How long ... by pin_gween · · Score: 1

    will it take to download that PDF of War and Peace?

    --
    Ignorance is not a crime; neither should it be a way of life

    Congress control $ = inmates run the asylum
    1. Re:Yikes, How long ... by Kevinv · · Score: 1

      couple of minutes (on dsl) from project gutenberg. it's text instead of pdf though.

      http://www.gutenberg.org/etext/2600

    2. Re:Yikes, How long ... by Anonymous Coward · · Score: 0

      War and Peace is online as a pdf (and in other formats)
      (and has been for quite a while), rather nicely formatted too.
      Check your own download times, no doubt they will vary :-p ;-) :-)

      http://www.jus.uio.no/sisu/war_and_peace.leo_tolst oy/portrait
      http://www.jus.uio.no/sisu/war_and_peace.leo_tolst oy/landscape

      http://www.jus.uio.no/sisu/SiSU/2#wap
      http://www.jus.uio.no/sisu/SiSU/2#books

      http://www.jus.uio.no/sisu/

  26. University of Calif: Yahoo OK, Guttenburg banned by dananderson · · Score: 5, Interesting
    I find it funny (in an ironic way only) that the University of California is allowing its public domain books to be scanned by Yahoo. At the same time, UC libraries prohibit scanning for Project Gutenberg or other true "open" content projects unless they receive $$$$ in royalities.

    I hate to see a University pander to commercial interests, while at the same time, welcome commercial interests such as Yahoo. Money talks, and I'm sure UC is being paid a lot, but libraries are supposed to be public resources too, not exclusive profit-centers :-(.

  27. Reading Between the Lines by 99BottlesOfBeerInMyF · · Score: 1, Redundant

    Reading between the lines for this proposal we seem to have another print.google.com, except it will not index a huge number of works whose copyright holders do not "opt in" to the program. The advantage to this is that it may make some copyright holders feel better about the whole thing and, hopefully submit entire works to be viewed by the public. It is also possible that Yahoo is worried about the legal issues and want to wait and see how google weathers any legal challenges.

    From a purely technical perspective, this system seems inferior in most ways. It only displays full text and does not give copyright authors the ability to show only an excerpt, or a set number of pages. Although, providing them as PDFs is nice. I wish Google would add that feature for works that are shown in their entirety. In general though, if I'm looking for particular data I don't see why I'd use yahoo which will have a much smaller index of work.

  28. PDF?! yuck by BillHop · · Score: 2, Insightful

    Does anyone else find there is no way to read a PDF with the scroll buttons (mouse wheel, etc.) without the viewer constantly breaking your flow by jumping to the next page?

    This goes along with the concept that for an electronic format, I do NOT need a sentence (or even worse, hyphenated word) broken up by two inches of top and bottom margin filled with page numbers, miscellaneous watermarks, repetitive titles, etc.

    PS. This being flamebait does not make it false.

  29. I prefer Gutenberg's formats by Anonymous Coward · · Score: 0

    ASCII text and HTML can be edited by the user to correct typographical errors, add notes, insert pictures (in HTML versions), and so on. You can't do that with PDF. So I prefer Gutenberg's more versatile formats. Also, plain text and HTML are readable with a much wider variety of software, which may become important in the future when PDF ceases to be so popular.

  30. Bookripper on its way? by serutan · · Score: 4, Interesting

    Google maintains its scanning represents "fair use" allowed under the law because it only allows Web surfers to view excerpts from copyrighted books.


    Soon after Google Mail was introduced, somebody created a SourceForge project that lets you use Google Mail as a database. How long until somebody releases a "Bookripper" app that assembles a whole book from search extracts? As I understand it Google displays two pages at a time (or wait, that's Amazon, but I bet they're similar). All you would need to know is a quote from a book's first page as a seed, and you should be able to grab the whole book by doing a series of searches using text from the second page returned by each search. The trick would be to knit the pieces together and eliminate the overlapping text. Seems almost trivial. Another possibility would be to search for random words and look for overlaps between the results, assembling them like a linear jigsaw puzzle until there are no gaps.

    1. Re:Bookripper on its way? by Chunni+Babu · · Score: 1

      Consider the case of ppl researching on a particular topic. They just look for a keyword and then read the 3 relevant pages that Google shows. They get the material and there is no need to buy the book.

    2. Re:Bookripper on its way? by gasaraki · · Score: 2, Informative

      It's already been done. The guy was sent a 'please stop doing this' letter by Google if I recall, which I think he went along with. No formal suit or anything, but they didn't like it. I'll be damned if I can remember the link, I think there was a K5 story or two on it though.

    3. Re:Bookripper on its way? by Anonymous Coward · · Score: 0

      "Does Google keep track of the pages I'm viewing?

      In order to help us protect copyrights, Google Print users can view only a limited portion of the books we present. Enforcing these limits requires us to keep track of page views by our users."

      So, no, it's not possible :(

    4. Re:Bookripper on its way? by momerath2003 · · Score: 1

      Gmail limits the total portion a user sees of a book to 20% of it (it ties records of the book viewing to your google ID). No matter how many searches you do, you can't extract more than a fifth of the book.

      --
      I had but a simple dream, to destroy all humans.
    5. Re:Bookripper on its way? by Dan+East · · Score: 2, Informative

      According to Google, there are specific portions of each book that it will never show, making it impossible to harvest an entire book.

      I'm already logged in. Why are you telling me the page is unavailable?

      As part of our efforts to protect a book's copyright, a set of pages in every in-copyright book will be unavailable to all users.


      http://print.google.com/googleprint/help.html#page limit

      Dan East

      --
      Better known as 318230.
    6. Re:Bookripper on its way? by herrison · · Score: 1

      Amazon's easier than this: you can search and find for the page number.

      --
      You know what I miss? Leeches.
  31. If Google rose to the competition by expro · · Score: 1

    If there was ever anything we need competition in, it is search engines. Whether project Gutenberg needed any competition is another question.

    I don't see a lot of similarity between this project and the one Google is doing. Open versus proprietary. Free (free as in speech) information versus non-free information.

    In the case of other search engines Google has put out of business (Altavista, although the web site still exists, no longer exists as the more-advanced search engine it was using the facilities of others), the competition did not make them improve at all, beyond their insight to make searching a popularity contest instead of an accurate search.

  32. "Do no Evil" done right by Chunni+Babu · · Score: 5, Insightful

    Now this is a right step towards making book contents searcheable online. I will hate to see one company like Google copying and caching all books in its massive cluster of servers. I know that Google kool-aid that "we are about general good" is running deeply in the veins of slashdot types.

    Since when was scanning books from libraries and making them available to public for a profit was considered "fair use"? This kind of stuff is done by pirates. Go to the major cities in China and India and you will see piles of copied book in the streets all sold for 1/10th the original price without giving anything back to the authors. The pirates can say that they are doing a favor to the authors by driving them out of obscurity.

    The message the alliance is sending out to the authors is

    • we are not for profit
    • we will scan your book only if you want us to do so
    • your book will be indexed based on your approval and copyright agreement with you and the publishers
    Compare this to what Google is telling the authors
    • we will scan your book, fill a form and tell us if you don't want us to do so
    • we will take sale comissions from amazon, buy.com, bn.com, etc. without sharing anything with you
    • if we show ads, we will share the profits with you
    • we will show excerpts of your book, so if a researcher is researching on a topic he can find what you have written about a topic without ever having to buy your book, too bad, heh heh, write a fiction book dude
    • we will cache your book in our servers and only we will reserve the right to profit from your scanned book
    So much for do no evil. Kudos to yahoo for bringing the open content alliance, gutenberg, and other similar projects to limelight - these are some really nice collections that were hidden by the noise created by 'google print'.
    1. Re:"Do no Evil" done right by nursegirl · · Score: 2, Insightful

      Compare this to what Google is telling the authors
      * we will show excerpts of your book, so if a researcher is researching on a topic he can find what you have written about a topic without ever having to buy your book, too bad, heh heh, write a fiction book dude

      Except that Google only shows 2-3 sentences of books that are under copyright. I've never found a researcher that can write on a topic by only reading 2 sentences. It's only posters on /. that can claim expertise on a topic without actually learning anything about it.

    2. Re:"Do no Evil" done right by Jeff+DeMaagd · · Score: 2, Informative

      [i]Since when was scanning books from libraries and making them available to public for a profit was considered "fair use"?[/i]

      It's not. You are mischaracterizing Google's system. The problem with your claim is that Google's system doesn't make the book available to users to download, it is only a search method that points to the relevant books and provides short excerpts like their search engine does. Google won't provide the book or even whole page without the copyright owner's permission. My impression is that Google was just trying to make an improved card catalog.

      [i]we will take sale comissions from amazon, buy.com, bn.com, etc. without sharing anything with you[/i]

      The sale of the book meant that the author got their share of the money.

      [i]we will show excerpts of your book, so if a researcher is researching on a topic he can find what you have written about a topic without ever having to buy your book, too bad, heh heh, write a fiction book dude[/i]

      The researcher could just go to the local research library, no books purchased. Another problem is that the research would be horribly flawed given that the given descriptions are so short and the allowed excerpts only cover certain pixed pages.

    3. Re:"Do no Evil" done right by ValuJet · · Score: 1
      we will scan your book, fill a form and tell us if you don't want us to do so
      If you want to protect your work, this isn't that difficult, and something your publisher would probably do for you if you didn't want to do it yourself.

      we will take sale comissions from amazon, buy.com, bn.com, etc. without sharing anything with you
      Um... its not like they're not getting the money they normally would've received from the sale. In fact their sales probably increase do to more people finding out about their book.

      if we show ads, we will share the profits with you
      ya, i'm sure they're really moaning about extra money.

      we will show excerpts of your book, so if a researcher is researching on a topic he can find what you have written about a topic without ever having to buy your book, too bad, heh heh, write a fiction book dude
      If you are hoping to make lots of sales of your book based on a couple of paragraphs within your book, then you have a lot more to worry about than google.

      we will cache your book in our servers and only we will reserve the right to profit from your scanned book
      Dude, you just said they'll share profits with the author from ads...

    4. Re:"Do no Evil" done right by Chunni+Babu · · Score: 1

      1. The only thing Google is trying is to make money out of other people's work. 2. The sale of a book brings author money. The click on a link without sale only brings Google money. 3. Local library may not have the book. Local library is funded by people paying taxes out of their hard earned money. Local library buys the book. 4. 2-3 pages are sometimes enough to get an idea. A researcher looks at an index of a book and then reads the pages based on keyword. Google provides this service to the researcher.

    5. Re:"Do no Evil" done right by Chunni+Babu · · Score: 1
      If you want to protect your work, this isn't that difficult, and something your publisher would probably do for you if you didn't want to do it yourself.

      May I ask why? A author spends years writing a book and why can't Google take his written permission before caching my book in its servers?

      Um... its not like they're not getting the money they normally would've received from the sale. In fact their sales probably increase do to more people finding out about their book.

      problem is google will make money from the click only that does not even generate a sale. Not to mention the fact that Google will track people's book search and clicks to send them targetted ads of crappy companies.

      ya, i'm sure they're really moaning about extra money.

      They will moan and graon about because I am sure the pay will be as pathetic as adsense or even worse

      If you are hoping to make lots of sales of your book based on a couple of paragraphs within your book, then you have a lot more to worry about than google.

      Dude.. have you ever done research for writing?

      Dude, you just said they'll share profits with the author from ads...

      Heh Heh..Yeah google will make writer millionares

    6. Re:"Do no Evil" done right by Moofie · · Score: 1

      Way to not let facts get in the way of your opinion.

      1) Making money is not inherently evil. Note that Google's scheme will also make money for authors. Google's scheme takes nothing from authors at all.

      2) The click on a link also only brings 2-3 sentences (not pages, Sparky...) of text.

      3) The virtue of libraries is not that they pay for books, it is that they make as much information as possible available to as many people as possible.

      4) See 2.

      When the copyright holders start remembering that the purpose of copyright is to promote the progress of science and the useful arts, not to build walled gardens, their objections might be a little more persuasive. Until then...

      --
      Why yes, I AM a rocket scientist!
    7. Re:"Do no Evil" done right by Moofie · · Score: 1

      1) Because lots of authors are dead, and dead people are hard to talk to.

      2) " problem is google will make money from the click only that does not even generate a sale."

      So? That click, and that sale, wouldn't exist anyway. The authors are exchanging almost nothing (IE their tacit permission) for a chance at making more sales than they otherwise would.

      3) "They will moan and graon about because I am sure the pay will be as pathetic as adsense or even worse"

      More than zero is more than zero.

      4) "Heh Heh..Yeah google will make writer millionares"

      If it breaks the bestseller monopoly the publishers currently have, it'll make more writer millionaires than currently exist.

      --
      Why yes, I AM a rocket scientist!
    8. Re:"Do no Evil" done right by jp10558 · · Score: 1

      So if Google bought one copy of the book (or heck, 100 so they were ever only showing one copy to one person at a time) it would be ok?

      It'd basically be a faster interlibrary loan system.

      --
      Opera, Proxomitron-Grypen,GPG 0x0A1C6EE3
    9. Re:"Do no Evil" done right by Anonymous Coward · · Score: 3, Insightful

      Since when was scanning books from libraries and making them available to public for a profit was considered "fair use"?

      How disingenuous. Google Print shows only a snippet of the text and tells you how to buy the book if it seems like what you need. Not pages, not paragaphs - a couple of sentences. In fact, Google Print instantly returns pretty much what you'd get if you hired a researcher to go find X number of books with such and such text and the researcher prepared a paper with a short quote from each. Such a paper would be unquestionably fair use and could be published anywhere. Google Print merely automates that process and makes it instant. I have no special fetish for Google; anybody who builds a system like this is doing us all a favor: it's a 21st century version of a card catalog, and a huge win for readers and authors. It's only being fought because, in our sue-happy culture, fair use rights have been eroded so much and copyright protections have been expanded so far that people seem to believe that even the most trivial use of their work - in a futuristic card catalog, for example - should bring a pay day. It's another case of cutting off your nose to spite your face.

      we will scan your book, fill a form and tell us if you don't want us to do so

      Which is, of course, exactly the model that Google and every single other search engine on the web has used since day one: Yahoo, AltaVista, everybody. It's the only sane way to make the web indexable. If it's not copyright violation on the web, then it's not copyright violation in print. Bringing that sort of searchable index to the history of printed material is a huge win for everybody, including authors. If courts eventually rule that this is copyright violation, then let's all say goodbye to the usefulness of Google and every single other decent search engine in the history of the web. Which would be a damn shame, but not surprising considering how twisted and lopsided against the public the bargain of copyright has become.

    10. Re:"Do no Evil" done right by _Sprocket_ · · Score: 2, Informative
      Since when was scanning books from libraries and making them available to public for a profit was considered "fair use"?

      Since when is Google doing this? As others have pointed out, Google provides a portion of the work to give the search context - 3 pages. In another post, you claim that 3 pages is enough information to invalidate the sale of a book. If this is the case, I would have to seriously question the value of your work. Either that - or take a serious look at public libraries, private loaning, Amazon.com, book stores, and other avenues of viewing those precious 3 pages that apparently cost you sales.

      It might be worth noting that no case of "fair use" is clear. Court cases often contradict each other, so there are no clear precidents to follow. However, among common factors potentially in Google's favor is that they:
      1. Provide additional insight in to the work(s)
      2. Provide a service to the public, in many cases providing facts and information
      3. Provide a limited subset of the work
      4. Are not making offensive use of the work


      What may not factor in Google's favor include:
      1. Limited modification of the origional work
      2. Potential damage to the market for the work - providing that someone such as yourself can prove that 3 pages is damaging.
      3. Google's behavior may be interpreted as hostile and offend the Court


      Having said that - I'm not a lawyer. But then, even experts are occasionally shocked at the outcomes of these cases.

      It might be worth noting that fair use does not require notification or permission of the copyright holder. Nor does it require that the one invoking fair use not make a profit.
      we will take sale comissions from amazon, buy.com, bn.com, etc. without sharing anything with you

      When do authors currently get a cut of sale comissions?
      we will show excerpts of your book, so if a researcher is researching on a topic he can find what you have written about a topic without ever having to buy your book, too bad, heh heh, write a fiction book dude

      Again - this might stand up in court. Possibly. But note that most examples of this having weight tend to involve images and songs - not books. It may be difficult to prove 3 pages as damaging for a work as large as a book - especially if the damaging material is a fact.
      Kudos to yahoo for bringing the open content alliance, gutenberg, and other similar projects to limelight - these are some really nice collections that were hidden by the noise created by 'google print'.

      Kudos to Yahoo for coming up with something different to do. But I missed it where the OCA or Yahoo even makes mention of Project Gutenberg. Furthermore, I find it a hard stretch to claim that the "noise created by 'google print'" did anything more to obscure Project Gutenberg than Yahoo's project.
    11. Re:"Do no Evil" done right by Jeff+DeMaagd · · Score: 1

      1. The only thing Google is trying is to make money out of other people's work.

      So do book stores.

      2. The sale of a book brings author money. The click on a link without sale only brings Google money.

      But what you first complained was:
      we will take sale comissions from amazon, buy.com, bn.com, etc. without sharing anything with you

      Sales commissions is different from link clicks.

      4. 2-3 pages are sometimes enough to get an idea. A researcher looks at an index of a book and then reads the pages based on keyword. Google provides this service to the researcher.

      You can't read the pages based on the keyword without permission of the copyright holder. You only get like about a sentence's worth.

    12. Re:"Do no Evil" done right by Achromatic1978 · · Score: 1
      Google won't provide the book or even whole page without the copyright owner's permission.

      So what? Google shouldn't HAVE the book without the copyright owner's permission. That's copyright law. Acting all graciously and say "Well, unless you say so, we won't spew it forth unto the net, completely unprotected" is a disingenuous 'concession', at best.

    13. Re:"Do no Evil" done right by Achromatic1978 · · Score: 1
      1) Because lots of authors are dead, and dead people are hard to talk to.

      Most authors have estates, to which royalties are paid.

    14. Re:"Do no Evil" done right by serutan · · Score: 1

      Go to the major cities in China and India and you will see piles of copied book in the streets all sold for 1/10th the original price without giving anything back to the authors. The pirates can say that they are doing a favor to the authors by driving them out of obscurity.

      Interesting. Except for the cut-rate pricing, this is how the recording industry has been operating for a century.

  33. This is huge. IA beat Google and Yahoo to this... by Anonymous Coward · · Score: 4, Insightful


    I've read through the first few posts, and people really don't have a clue about what this is all about. "Open Content Alliance"... It means what it says. Open f'ing content. Let there be content available to the masses... Is it more important that I can get a snippet from some copyrighted text, or that millions of children can read Alice in Wonderland with all it's wonderful illustrations.

    This is beyond PDF or anything like that. Some people want PDF, so Adobe will make them. Some people want decent OCR versions, perhaps to go into Distrubuted Proof readers or into someone's text-only PDA. It's ALL possible. This is NOT an exclusive club, it's an INCLUSIVE community that is dedicated to Open f'ing Content.

    Why don't you people get it. By allowing people to have full texts of some of humanities greatest works we are doing more than a few snippets of the latest Ken Follet novel... a lot more.

    It's bigger than Yahoo or Google. Yahoo is NOT an also-ran.... The Internet Archive has been scanning books and hosting Milloins Books project texts as well as Project Gutenberg texts for a long time... long before Yahoo or even Google were in the picture. Ignorant comments made here suggest somehow Yahoo is following.

    I say Yahoo is leading by embracing a project that by definition is bigger than themselves. Good for them.

  34. Competition is good by obli · · Score: 1

    With all this competition it won't be long until they start indexing comic books, then I'll finally be able to find that awesome Donald Duck comic I read back in 1996 and couldn't ever find again.

    1. Re:Competition is good by Anonymous Coward · · Score: 0

      obli,

      please try to recall the story, artist and inker.

      i am sure i can help you out. i really like carl bark's
      ability to tell a story.

      imagine i use the reprints for wrapping paper ;-)

      so in summary, i have read a lot of duck stories.
      which one are you referring to?

  35. Re:But will they digitize PD works from after 1922 by Shamashmuddamiq · · Score: 1

    I don't understand this. My favorite book was published in 1956, and the author died just 7 years later. He had no offspring and he outlived his wife. Now would someone please explain to me why someone was allowed to extend the copyright and why the work isn't yet in the public domain?

    --
    ...just my 2 gil.
  36. Yahoo seaches for Creative Commons by mrklin · · Score: 1

    Yahoo! Advanced Search at http://search.yahoo.com/web/advanced?ei=UTF-8 allows one to search for Creative Commons licensed content.

  37. Re:PDF?! yuck by Fiver- · · Score: 4, Informative

    "Does anyone else find there is no way to read a PDF with the scroll buttons..."

    No. I just set it to Continuous. See those four icons in the lower right corner? (assuming you've got a recent version) Play with those. You want the second button from the left

    "This goes along with the concept that for an electronic format, I do NOT need a sentence (or even worse, hyphenated word) broken up by two inches of top and bottom margin filled with page numbers, miscellaneous watermarks, repetitive titles, etc."

    Well, the whole purpose of PDF is to "preserve the look and integrity of your original documents ... regardless of the application and platform used to create it." Blame the creators of that particular pdf file if you don't like the headers, footers and margin size. When I make pdf books to read on the train...I just finished Dream Quest of Unknown Kadath by Lovecraft...I open the original ascii text file in Word, make the top & bottom margins tiny, change the font to something tolerable and export it.

  38. Different than Gutenburg by phantomfive · · Score: 1

    That's what I thought too when I read the summary. The difference is that they also have an "opt in" program, wherein any publisher can have their works indexed upon request, without being redistributed in full.

    Yahoo is worried that they are falling behind google, and now they are starting their own book scanning project to compete. They have to find some way to market as being better than google, so they are making it a "publisher friendly" project. It's great marketing in that it removes the focus from the fact that they are a year behind, and emphasizes that google is the evil one (although allying yourself with publishers is only slightly better on the evil-meter than allying yourself with the RIAA).

    Personally, I think this is great news. The more digitized media we have the better. Now we will have (more or less) the same content from two different sources. Redundancy is a good thing. Hopefully Microsoft will get in on the effort too (Woah! That is a trip! If you've never said something good about Microsoft, you should try it sometime just to see what it feels like!).

    --
    Qxe4
    1. Re:Different than Gutenburg by _Sprocket_ · · Score: 1
      The difference is that they also have an "opt in" program, wherein any publisher can have their works indexed upon request, without being redistributed in full.

      Back in the early 90's, that was called the World Wide Web (and search engines). Which puts Yahoo... well... where they began.
  39. duplicate of what? by Anonymous Coward · · Score: 0

    duplicate of what? I can't find any other article on /. about this, and the press release was embargoed until this morning.

  40. Re:But will they digitize PD works from after 1922 by blibbler · · Score: 1

    I am not specifically familiar with US copyright, but copyright in most jurisdictions extends for 50 or 70 years after the death of the author. In your example, the copyright naturally should extend to 2013, or 2033.
    There are some exceptions to this. Perhaps most well known is Peter Pan which the UK has granted a perpetual copytright in favour of the Great Ormond Street Hospital.

  41. Re:Apples and Oranges! This is not Google Print! by DJCF · · Score: 1
    Ahh, this is where it gets confusing. Don't worry, alot of +5 insightful comments on Google in the past few months have made this mistkae.

    What will library books in Google look like?
    If you are in the United States and you search for Books and Culture by Hamilton Wright Mabie, for instance, you'll be able to page through as much of it as you like, because its 1896 copyright means it's now in the public domain in the United States. These public domain books look very similar to publisher-submitted books except you will be able to click through all the pages of the book.
    (source).

    Google Print does what you say, but it also indexes public domain books and offers these to a user for free.

  42. Speculations... by Armadni+General · · Score: 0

    I predict that the winner of this epic showdown will not be determined by the number of books, or the number of pages...but instead, by who gets sued the most!

  43. that one thing by Anonymous Coward · · Score: 0

    If I could ask one gift for my new school year... it would be that, books are too expensive (my parents don't pay for my extracuriccular activities). It is that one thing that keeps me from knowing more.

    I worship Yahoo & Google for their efforts on this place.

    --may the source be with you--

    1. Re:that one thing by ValuJet · · Score: 1

      repeat after me...

      Library
      Library
      Library

  44. Opendocument? by Spy+der+Mann · · Score: 0, Troll

    Will they come in opendocument format? Or proprietary PDF?

    Just wondering.

  45. Re:But will they digitize PD works from after 1922 by thisissilly · · Score: 2, Informative
    In the US, that is only true of works published after 1978.

    When U.S. works pass into the Public Domain is a good summary of the U.S. issues.

    Me, I just want 14+14 back.

  46. New and Radical by Corydon76 · · Score: 3, Funny

    Hey, wow, that is completely original. Nobody else could have possibly thought of this idea before.

  47. PDF Isn't Proprietary by everphilski · · Score: 1

    duuuuuhhhhh....

    -everphilski-

    1. Re:PDF Isn't Proprietary by amliebsch · · Score: 2
      --
      If you don't know where you are going, you will wind up somewhere else.
    2. Re:PDF Isn't Proprietary by DECS · · Score: 1

      Is Linux proprietary? Somebody owns the name.

      Is BSD proprietary? UC offers it under license, and grants a similar royalty free license as Adobe does for PDF.

      Anyone can create an implementation of the PDF standard without paying Adobe anything. Apple used it for their imaging model; every app in Mac OS X can generate PDFs. Apple don't pay Adobe anything.

      Insisting that PDF is proprietary, simply because Adobe invented it, makes "proprietary" a worthless word.

      SGI invented OpenGL, it is proprietary?

  48. enough with books already by Anonymous Coward · · Score: 0

    I don't have the attention span to read anything more than a newspaper article. When is someone going to work on getting every newspaper ever printed scanned and available to search for free online. I know ProQuest is working on it, but I doubt that they will make it available, even in libraries. It's easy enough to search the New York Times archive from any library, but say I want to search the Indianapolis Star archives. I can't even do it in the library. I still have to use interlibrary-loan and have them send the microfilm from a library in Indiana. I mean, seriously, MICROFILM?!?!? What are we living in the dark ages!!

  49. best format? by j1m+5n0w · · Score: 2, Interesting

    Actually, I prefer plain txt to pdf if I'm reading from a computer (assuming the book is not illustrated), since I have more control over fonts and colors (and I have read quite a few gutenberg books that way). However, I think the best native format (despite its general user-unfriendliness) would be latex, from which txt, pdf, and html could be generated. On the other hand, I suppose it's much easier to generate txt or pdf from scanned pages than latex.

  50. A DRM-free e-Ink e-book reader on the horizon by Catbeller · · Score: 1

    Noticed on boingboing.net that a Chinese company is marketing a DRM-free version of an ebook reader using an eInk screen.

    Although I don't think it's on sale, it is the Holy EBook Reader Grail we've been seeking for ten years.

    If we're gonna download ebooks, we should have a reader to read them with, no?

  51. Wow, 16k by Anonymous Coward · · Score: 0

    16,000 books is small compared to what Amazon started with, 120,000[1]. It's tiny compared to what they have now, which is probably over a million[2] -- and that's just what's live on their website. Who knows how many they have scanned and ready to go? Contrast this to Google Print, who eschew Amazon's "scan what we have" philosophy and instead go for "scan whatever we can lay our hands on." You'd only guess that they have about 200,000 from a simple search[3], but since their program is fairly young it's reasonable to assume that this represents a fraction of the material they haven't yet made live. Plus it was just a search for the number 1, so presumably there are several books that don't contain that particular numeral.

    My point, for those of you who were asleep during the first paragraph, is that Project Gutenberg will never come anywhere near to the scope of Google Print, A9, or ultimately Yahoo!.

    [1] http://www.amazon.com/exec/obidos/tg/feature/-/507 108/102-4368024-5588945
    [2] http://a9.com/4 -- make it show books, not the web
    [3] http://print.google.com/print?q=1

  52. But how? by Anonymous Coward · · Score: 0

    "The artists somehow has to be paid too."

    Elvis agrees! So does John Lennon. And Buddy Holly.

    These artists won't produce any more songs unless they get paid. Oh. Wait.

  53. Re:University of Calif: Yahoo OK, Guttenburg banne by jp10558 · · Score: 1

    Yeah, but can't anyone just take the online library text and put it in Gutenberg? I mean, it's public domain content, no one can sue for anything there.

    --
    Opera, Proxomitron-Grypen,GPG 0x0A1C6EE3
  54. Dumb question... by tkrotchko · · Score: 1

    How can you prohibit scanning of PD books?

    --
    You were mistaken. Which is odd, since memory shouldn't be a problem for you
  55. Re:University of Calif: Yahoo OK, Guttenburg banne by esme · · Score: 2, Interesting
    At the same time, UC libraries prohibit scanning for Project Gutenberg or other true "open" content projects unless they receive $$$$ in royalities.

    do you have a source for this? do you mean that a UC library tried to stop someone from checking out books and scanning them? or do you mean that they didn't allow the gutenberg folks to setup a scanning shop inside a library? there's a huge difference between those two.

    i work at a UC library, and i've certainly never heard of any policies about project gutenberg. i'm not sure what kind of arrangements yahoo made, where the scanning is going to happen, etc. but i would imagine that yahoo agreed to (at least) cover the expense and hassle of any library facilities they're going to be using. project gutenberg might not have that kind of funding.

    this is all assuming that this was involving public domain books, where the only leverage that UC libraries would have would be their facilities and lending policies. if you're talking about stuff that UC owns the copyright to, then that would be another kettle of fish. it would not surprise me to learn that a campus counsel or some such wouldn't let a library give away rights to content that UC held the rights to (like a library's special collections holdings).

    -esme

  56. Yahoo does not compete by baomike · · Score: 1

    THey can't even keep there message boards and finance services running. I doubt that they will be any more succeful at this.
    Google is safe.
    I give 'em three years and they're toast.

    1. Re:Yahoo does not compete by Anonymous Coward · · Score: 0

      And if that doesn't happen you will roll naked in SanF streets next to googleplex. Deal ! Thanks I will get the lawyer and get it formal.

    2. Re:Yahoo does not compete by baomike · · Score: 1

      Ok , 4 years. As for Sfran , I was there in the 60's and that was quite enough. Was fun tho.

  57. Bad Redundancy by Kattana · · Score: 1

    What a waste of time, now we have 3 projects doing the same thing, can they not think of something origional to do? Now effectivly we have 1 foss collection of books, and 2 redundant proprietary collections because they could not work together, oh no then we might have all this scanning done 3 times faster, someone needs to scan them a book to teach them cooperation.
    No amount of computer resources can make up for human inefficiency.

  58. That's a lot of scanning by icepick72 · · Score: 1

    they must be using rooms full of trained squirrels.

  59. Ever hear of a printer? by B4RSK · · Score: 1

    There is this great new invention called a printer! You can use it to turn on-screen documents into printed hard copy!

    But wait, there's more! There are even ones that will print on both sides of the paper, and will automatically print two pages onto one side! So you can get 4 pages onto one A4 sheet, thus having text about the same size as a paperback! Put a couple of binding clips on one side and you have an instant book.

    More seriously though... Besides the fact that it is both cheaper and nearly instant, you can easily replace the book if it is lost or damaged, and you can just give it away to someone who expresses interest and later print yourself another copy. Plus you still have the electronic version on your computer, so if you want to search for a certain passage you can do this instantly without having to flip pages.

    --
    Some people are like slinkies--basically useless but they bring a smile to your face when pushed down the stairs.
  60. Instead of scanning, why not useful tools? by Leobinus · · Score: 1

    Instead of just providing the texts, why doesn't Yahoo or Google go in-depth with a collection of texts, and provide insights on them? Open Source Shakespeare is an example of what I mean. There's an automatically-generated concordance, you can look at all of a character's lines at once, and see statistics about the plays, etc. Those are actual research tools. Being able to search a text is useful, but you could do that in 1975. I would be more impressed if these big search companies figured out how to do something more useful.

  61. File format issues by harmonica · · Score: 1

    Also, the fact that they are plain text, with no markup, formatting, binary code, whatever in them means that they'll always be accessible to anyone, regardless of software or platform. And that's a good thing, too!

    I know about the problems that old file formats can cause. However, I doubt that formats like PDF or JPG will ever get "lost". There's just too much information stored in them, and various free libraries available with source code which read and write them.

    And if I'm wrong I won't live to see it. :)

    1. Re:File format issues by shellbeach · · Score: 1

      I know about the problems that old file formats can cause. However, I doubt that formats like PDF or JPG will ever get "lost".

      My point was that since the emphasis is included in the file, you could always convert it to a nicely formatted PDF if you wanted to. In fact, I used to do almost exactly that a while back - I wrote some perl script to convert etexts to RTF and peanut markup language, and it worked pretty nicely. Keeping things at the lowest common denominator level isn't always a bad thing...

      Personally, I wish PG would use basic html markup for italics and bold formatting, rather than plain text. But I'd rather cope with a lack of formatting than having the texts only in binary format.

  62. i have heard of these "printer" inventions, yes. by megify · · Score: 1

    that's why I said "the pages are conveniently sized and bound."

    What I thought was self-evident was that I meant: "and I don't have to print, fold, and somehow bind them together on my own, because it's time-consuming and 1/3 an ink cartridge and a ream of paper is more expensive than three dollars."

    You can also just give books to people....sometimes they even like them better than hand-made pre-owned sheaves of inkjet prints.

    BUT I do think CtrlF is a way better way to find your favorite passage - especially in a long document.

  63. Re:i have heard of these "printer" inventions, yes by B4RSK · · Score: 2, Interesting

    I do see your points as well, and definitely there will be demand for commercially produced books for some time to come.

    However, what I described does not require any folding and binding takes all of about 10 seconds. I've done this more than a few times and it does work out well.

    I have a Brother laser printer that cost about US$300. I bought this printer for other reasons, but it is a great book printer too. (Has a duplexer, supports both PCL6 and PS3, built-in standard 10/100 LAN port. Basically it will work on any OS that supports PS or PCL6.)

    Anyway, it prints duplexed pages at about 16ppm and the toner is cheap. The Windows driver also lets me easily (one click) print two pages onto one side of a sheet. The result of all this is that I can print a 300 page book perfectly in under 5 minutes using only 75 sheets of A4 paper. I then apply two of those triangular binding clips (the ones with the fold in handles), and it's done!

    Total cost of around US$1 including the clips, and total time of about 5 minutes. It's not as pretty as a bound paperback but I'm willing to trade that off for the instant availability and the ability to reprint again any time if needed.

    (The fact that I live in Japan definitely plays some role in my choice. English books here are very expensive and only available from major downtown bookstores -- and even then selection is pretty limited. Ordering from Amazon Japan (or US/UK) is possible, but the shipping increases the prices and takes time. A $1 five minute book is a dream!)

    --
    Some people are like slinkies--basically useless but they bring a smile to your face when pushed down the stairs.
  64. get your points - but by megify · · Score: 1

    for starters - I get your points. Secondly, please don't condescend. I know a lot about printers, book design, paper sizes and weights, and printing time/issues with downloads/OS compatibility/ripping a large doc through a small printer/ETC. However, it is still my preference to books.

    I live in the U.S, and for what it's worth, it's still WAY cheaper to buy a $3 book. Better still to go to the library and get a pile of books for FREE.

    Right now I'm reading "Lonesome Dove." 950 Pages. even at 2 pages per side, that's 240 duplexed pages.

    so: downloading a 950-page book, sending a 950-page document through a printer, and clipping 1/2 a ream of paper together would be at your rate at least 20 minutes. and have you tried to carry a binder-clipped sheaf of A4 paper around? it's awkward. granted, you could grab 20 pages at a time, but I'd rather just have the book and be able to go back and look things up or re-read them.

    so for my 30 minutes of time (minimum), lack of printer, not wanting to carry around a half ream of paper, availability of free or cheap paperbacks - I'll stick to commercially printed books. plus, that way if the author is still living, that means they are more likely to get a cut. If I spend $5 on a book, and the author gets $1 or 50, I feel it's better than just spending $5 on printing supplies and binder clips.

  65. Re:Project Gutenberg (Michael Hart essay) by gbnewby · · Score: 2, Informative

    Here's something Michael Hart wrote about this today. He's
    the founder of Project Gutenberg, and inventor of eBooks.
        -- Greg

    Yet another consortium of multi-billion dollar institutions
    has thrown its hat into the eBook/eLibrary ring today, just
    9 months before the 35th Anniversary of Project Gutenberg's
    placement on the Internet of the first eLibrary element, on
    July 4th, 1971.

    Last December 14th Google used a multi-million dollar blitz
    of television, radio and print media to announce the Google
    Print revolution: "Today is the day the world changes," but
    so far it has been difficult to get even a handful of books
    from their project, some 10 months later.

    I am wondering of the news media will give the same kind of
    coverage to a second such announcement, which will also put
    up an alliance of an Internet search engine giant with some
    multi-billion dollar libraries. I will be watching all the
    news programs tonight in eager anticipation, as I was doing
    last December, but I fear that "once burned/twice cautious"
    might take some of the wind out of their sails/sales.

    However, this effort has one huge advantage: "The Internet
    Archive," run by my friend Brewster Kahle. Brewster is one
    person who has a proven ability to put an enormous resource
    on the Internet for the whole wide world to use.

    This different is such that I am willing to bet that Yahoo!
    gets off to a better start in the next 10 months than did a
    rather completely false start by Google.

    Of course, the real test will be to see how long it takes a
    project such as this to reach a million eBooks, since there
    are already well over 100,000 eBooks already available free
    for the taking on various Internet sites, perhaps 50,000 of
    them from the various Project Gutenberg sites.

    Here's a hope that a few years from now anyone can have the
    advantage of a million book home library, and in even a few
    years more to ten million books sitting on one inch of your
    own bookshelf next to your computer.

    Michael S. Hart
    Founder
    Project Gutenberg

  66. More expensive books? by Grendel+Drago · · Score: 2, Interesting

    Huh? Where are you from? I worked at a research library at a large state university, and I have no idea what you're talking about. True, libraries pay extortionate rates for journal subscriptions, but when they purchase monographs, they frequently get them off the used book market, just like you or I would. It costs them extra to get it bound in a durable fashion, and to enter it into their Byzantine catalog system, but I've never, ever heard of libraries having to pay extra for books simply because they were libraries.

    Also, ongoing royalties? What country does that happen in? I've never heard of such a thing.

    --
    Laws do not persuade just because they threaten. --Seneca
  67. Oh, it's not quite the same as PG. by Grendel+Drago · · Score: 1

    Project Gutenberg doesn't just scan books. Actually, they take a lot of their scans from outside sources, like the Internet Archive's Million Book Project. The work that PG does is largely in proofreading and essentially re-typesetting the book. The output of Yahoo!'s work here will be scads of page images, maybe with dicey OCR. The output of PG's work are plaintext (and sometimes HTML) ebooks.

    As a fan of Project Gutenberg, I look forward to more page images being made available, since it means more high-quality eBook scans to choose from when picking a project.

    I wonder if they'll be hosting anything in Canada, where the copyrights are only Life+50, instead of Life+70 (Europe) or worse (USA). Last I knew, someone was trying to start up PG Canada...

    --
    Laws do not persuade just because they threaten. --Seneca
    1. Re:Oh, it's not quite the same as PG. by Anonymous Coward · · Score: 0

      I wonder if they'll be hosting anything in Canada, where the copyrights are only Life+50, instead of Life+70 (Europe) or worse (USA). Last I knew, someone was trying to start up PG Canada...

      Project Gutenberg of Australia posts some books that may violate U.S. copyrights.

  68. Right you are! See TEI. by Grendel+Drago · · Score: 2, Interesting

    Indeed. It's bothered me for some time now that it takes a good deal of doing to make a nice LaTeX edition of the book, so that it's nontrivial to go from the eBook to a really high-quality printed page.

    Luckily, someone's decided to do something about it. See PGTEI, a very verbose and flexible method for marking up literary works. The full TEI spec is gargantuan, so PGTEI is actually a dialect of a subset called TEI Lite. It's an XML markup scheme which has output filters (it uses XSLT, it seems) for plain vanilla TXT (for longetivity, and on general principle), HTML and PDF. (Probably some others as well.)

    You can try it out yourself. Grab some examples, and run them through the online tools.

    Post-processors are very set in their ways, but as I've recently joined their ranks, I hope to use PGTEI for my first post-production job. It certainly seems more elegant than generating and tweaking multiple formats by hand.

    --
    Laws do not persuade just because they threaten. --Seneca
  69. Gutenberg is more than book-scanning. by Grendel+Drago · · Score: 1

    Project Gutenberg does a lot more than scan books. (Actually, they frequently don't actually scan the books themselves; projects like the Million Book Project do that.) The value that PG provides is in the proofreading and formatting of their eBooks. That said, any massive scanning project which provides page images for PG to pick up is quite a good thing.

    --
    Laws do not persuade just because they threaten. --Seneca
  70. Different scope. by Grendel+Drago · · Score: 1

    Project Gutenberg does proofreading and postproduction, which requires a lot more human eyeballs than scanning a lot of pages. While this archive may be tremendously useful to PG by providing raw material, it's not a duplication of effort.

    --
    Laws do not persuade just because they threaten. --Seneca
  71. Condescending? by B4RSK · · Score: 1

    My first reply poked some fun, but I don't think I have been condescending.

    I'm not sure where you are getting the $3 books from unless they are used, "stripped", review copies, or unauthorized print runs. In any of those cases the author is not getting a cut. "Lonesome Dove" is $7.99 on Amazon + shipping.

    I support authors as well -- I certainly buy (more than) my share of books. I print some too though, mostly because of the need to get the information immediately. If the book is particularly good and I feel I will need it again I will usually pass the printed copy along to someone else and buy a copy for myself. I rarely read fiction these days -- non-fiction is far more important to consume during my limited available time.

    To each their own. As I said before, my geographic location plays a part in my choices too. There are many excellent points about life in Japan (especially if you are not a teacher here), but the easily availability of non-fiction English books is not one of them.

    --
    Some people are like slinkies--basically useless but they bring a smile to your face when pushed down the stairs.
  72. You're in luck! by Grendel+Drago · · Score: 1

    Assuming the work was written only by American citizens:

    Actually, if no one renewed the copyright (renewal became automatic for works published in 1964 or later), it may be public domain. Read the new and improved Rule 6 HOWTO that the fine folks at Project Gutenberg have put together. You can put together a reasonable case that copyright was not renewed, and heck, maybe you could get PG to pick up the book.

    Or you could move to Canada and wait until January 1, 2013, when the author's work will enter the public domain there. (Life+50, and all.)

    Out of curiosity, what was the book and the author?

    --
    Laws do not persuade just because they threaten. --Seneca
  73. Awesome, indeed! by Grendel+Drago · · Score: 2, Interesting

    I remember seeing some of Dudeney's puzzles referred to before, but I couldn't remember where. Then the book popped up on my RSS feed (it was released within the last month, I think), and indeed, it was full of fun math puzzles. Man, that was nice.

    But they don't just have HTML; see various examples of files released with filetype "TEI", including PDF (through LaTeX), TXT (in a variety of encodings, i.e. Latin-1, US-ASCII and UTF-8) and HTML.

    --
    Laws do not persuade just because they threaten. --Seneca
  74. More like... by Grendel+Drago · · Score: 1

    It's more like the Million Books project. Project Gutenberg does a lot more than just scan the books; they proofread and post-produce them.

    --
    Laws do not persuade just because they threaten. --Seneca
  75. Re:The difference between Google and Yahoo's effor by Anonymous Coward · · Score: 0

    When Google starts getting real legal problems, they will probably change that. Google may think they can do everything they want because of their name, but they are going to be in for a surprise once they get popular enough and the publishers starts legal actions. Google will have no choice but to remove copyrighted material to avoid this. They're not stupid. The result will be a stripped down service.

  76. Physical owner of PD book controls its use by dananderson · · Score: 1

    This is possible if the book is rare and the owner has physical custody. For libraries, this is usually through a controlled-access "special collection" area. They can and do prohibit scanning or transcribing of books, even if PD. They can require signing a legal agreement (license) with any terms they like, such as requiring royalities or restricting further distribution.

  77. Erosion of Public Domain--not just Disney and RIAA by dananderson · · Score: 2, Informative
    The physical owner of a PD book (library) can prohibit scanning or even viewing. For modern books, it's not a problem--just go to another library. For some books it is a problem. Few copies exist, and they are scattered around the world.

    The library can require a legal agreement to view or scan the book, and that is where a lawsuit can occur. Of course, the legal agreement doesn't apply to 3rd parties that haven't signed. It's another example of the erosion of the public domain--it's not just Disney and the music industry that's doing it folks--it's the University of California and other libraries.

  78. Good by Anonymous Coward · · Score: 0

    Google is leading, Yahoo is following, they will never be number one. :)
    But it is good that Yahoo do this, because Google was criticised for doing it, but now Yahoo does it too.

    Now it is easier to get hands on information, and there is no wear and tear on digial formats like in books, so data can be preserved longer (hopefully).

    Wikibooks also has free content. http://wikibooks.org/
    "Think free. Learn free."

    Also http://wikisource.org/

    Hope this really sparks Free/Open Content!

  79. University of California locks away public domain by dananderson · · Score: 2, Interesting
    The source is my personal experience with the UCSD, UCI, and UCLA libraries. I assume the other UCs have the same or similar policy against digitizing books. Gutenburg is not a corporation, it's private individuals (volunteers). It's usually one guy (or gal) with a scanner, OCR software, and a little bit of time to proofread.

    would not surprise me to learn that a campus counsel or some such wouldn't let a library give away rights to content that UC held the rights to (like a library's special collections holdings)

    So in other words, the public domain is locked away. The PD consists of OLD books, which are largely in special collections.

    Here's some policies I digged up. It's worse than the policy though. They say write a letter explaining your needs and they ignore you.

  80. Google as the only publisher by take5 · · Score: 1

    It is rather obvious that Google is not trying to rip off authors.
    Rather, it is trying to send the publishers to the trash bin of
    history. Two years from now, if I have a book to publish, I can make a deal
    with Google to put it online and they will do on demand printing and shipping for me.
    Then I sit back and cash my royalties check every 6 months.
    Out of print books will generate revenue this way also. As it
    turns out, 99.8% of titles ever printed are currently out of print and
    unprofitable for another print run. The Authors Guild Backinprint.com
    has only 8,000 authors and is profitable, imagine the money to be made with
    millions of authors.

  81. Userfriendly sums it up pretty well.. by Cabby · · Score: 1
  82. Re:University of California locks away public doma by esme · · Score: 1

    the UCSD policy you cite says:

    Permission to quote is normally freely given, as is the permission to reproduce text or images for such noncommercial use as illustrating a thesis or a dissertation. The Mandeville Special Collections Library assesses a fee for the publication of reproductions for commercial purposes.

    which sounds to me like a non-commercial project like gutenberg would probably not have to pay the access fees. the other UCSD policy mostly talks about limiting duplication because it stresses the rare and original materials that the special collection is setup to preserve.

    it doesn't surprise me that gutenberg didn't fall into their ordinary categories (commercial or academic). so any request to copy an entire volume would probably result in an interal discussion about how the special collections materials are preserved, accessed, possibilities for digitization, etc. in particular, librarians are very wary of digitizing things piecemeal without sustainable plans for organizing and maintaining them afterwards. (because it usually leads to doing the same work over again later).

    if you gave me a list of materials you wanted to scan, i could talk to some people about what plans are being developed to digitize things. especially if the materials were things that only UCSD had, we could probably give priority to digitizing the materials where people had expressed actual interest in the content.

    -esme

  83. eh, or i was reading into your comments too much. by megify · · Score: 1

    I thought you were... but to make this entirely argumentative is more than i care to do.

    cover price on lonesome dove is 5.95 - there's a bookstore around here that sells new books at 20% off cover, and used books for 1/2 cover price. 950 awesome pages = $3.

  84. Re:Erosion of Public Domain--not just Disney and R by jp10558 · · Score: 1

    Sorry, I wasn't clear. Say they let Yahoo scan the books,because Yahoo decides to pay for the OCA. Can't anyone just copy or OCR the Yahoo PDF or whatever to gutenberg as the text is public domain?

    --
    Opera, Proxomitron-Grypen,GPG 0x0A1C6EE3
  85. Re:University of Calif: Yahoo OK, Guttenburg banne by sootman · · Score: 1

    I find it funny (in an ironic way only) that the University of California is allowing its public domain books to be scanned by Yahoo. At the same time, UC libraries prohibit scanning for Project Gutenberg or other true "open" content projects unless they receive $$$$ in royalities... libraries are supposed to be public resources...

    University library != public library.

    --
    Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
  86. Re:University of Calif: Yahoo OK, Guttenburg banne by cant_get_a_good_nick · · Score: 1

    University library != public library
    Library on Public State University that receives state and federal money to be a public institution == ?

  87. Re:University of Calif: Yahoo OK, Guttenburg banne by sootman · · Score: 1

    Being funded by tax dollars does not make an institution 100% open. Try bringing some friends to your local IRS office for a picnic in the lobby. Or maybe your local police station--hell, it says "to protect AND SERVE" right on the cars, right? Walk in there and as one of them to bring you a drink. Or hell, see how far you can get at a nearby military base. They've got lots of cool stuff there, and you're paying for it, right?

    Uni libraries are usually restricted to those who pay $$$ to attend said uni. Some will let anyone in, but most won't let anyone but students and faculty check out materials. Those materials need to be there for the students. It wouldn't do to let someone come along and check out 10,000 volumes. And even if it were, who's gonna pay the attendant to check out all those books? And keep track of which are where? And get them back after a certain time? The world doesn't run on nothing. There are administrative costs and lots of other things involved. Just because you can check out a book for free doesn't mean you can check out a thousand at once. Things like that don't scale. Walk into a McDonalds, buy an ice tea, and put some sugar in it. Works, right? Now ask the lady at the counter for all the sugar in the place. Different result, right?

    --
    Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
  88. Oh, that's not all they violate. by Grendel+Drago · · Score: 1

    Thanks to a strong possibility that Australia will adopt Life+50, PG Australia's works may have to be moved to New Zealand or Canada. Hence PG Canada. (Though the site design is currently horrible. What's wrong with adapting gutenberg.org or pge.rastko.net? gutenberg.org.au made this same mistake and I'll never understand it.)

    Of course, you know that whoever inherited A.A. Milne's estate will be fighting tooth and nail to get Canada to up their terms prior to January 1, 2007, when Winnie the Pooh will enter the public domain. (Same for C.S. Lewis and January 1, 2013, and J.R.R. Tolkien and January 1, 2023.)

    --
    Laws do not persuade just because they threaten. --Seneca
  89. Clarifying. by Grendel+Drago · · Score: 1

    Err, to clarify a bit, TEI is a source format which can automatically generate PDF, TXT (in various encodings), HTML and so forth. If a text has been released as TEI, then it will almost certainly be available in all of those formats.

    --
    Laws do not persuade just because they threaten. --Seneca
  90. Re:Project Gutenberg (Michael Hart essay) by bbc · · Score: 1

    1. s/the different is/the difference is/

    2. Disclaimer: a G. Newby works for the Project Gutenberg Literary Archive Foundation.

    3. It is not too hard to predict this consortium will fare better, as one of its members, the Internet Archive, has been collecting scans of books in its Million Books Project and Canadian Libraries archive for months, and is thus able to make a running start.

    (Disclaimer, I am a PG volunteer.)

  91. Re:But will they digitize PD works from after 1922 by bbc · · Score: 1

    "My favorite book was published in 1956, and the author died just 7 years later. He had no offspring and he outlived his wife. Now would someone please explain to me why someone was allowed to extend the copyright and why the work isn't yet in the public domain?"

    Without knowing which book you are talking about, it is difficult to give an answer.

    Generally though, when somebody dies, there is an heir. That heir would then hold any copyrights the deceased may have owned.

    If you are talking about Heaven and Hell by Aldous Huxley or either of The Last Battle and Till We Have Faces by C.S. Lewis, a complicating matter may be that these authors were not American. Unless they published these books also in the US, a different set of rules apply that basically come down to Life+75.

    I doubt that the estates of Huxley and Lewis would not have renewed these works, although of course you never know. I seem to have read that Project Gutenberg is producing one of Marion Zimmer Bradley's works that wasn't registered or renewed. Science Fiction may yield good finds anyway, because SF stories were often published in magazine, and the author may have transferred rights, or may have forgotten to register a copyright.

  92. Re:But will they digitize PD works from after 1922 by Shamashmuddamiq · · Score: 1

    Wow! Yes, I was talking about "Till We Have Faces" by Lewis. I checked the book and the copyright has been renewed. So that means the copyright expires in 2038 (well, for now...) That's a long time.

    --
    ...just my 2 gil.