Slashdot Mirror


Yahoo Competes with Google in Book Scanning

UltimaGuy writes "A consortium backed by Yahoo has launched an ambitious effort to digitize classic books and technical papers and make them freely available on the Web. The company is partnering with the newly formed Open Content Alliance, which aims to offer PDF documents of books to the public at no charge. Consumers will be able to search the contents of the Open Content Alliance's database and download the entire content of any work, such as a scanned copy of a book."

48 of 193 comments (clear)

  1. RIAA Problems Solved by GreggyBUIUC · · Score: 5, Funny

    Someone start up a "Open Content Alliance" for music... then we can digitize and share it all we want.

  2. Will Yahoo scan it like they have yahoo.com? by Anonymous Coward · · Score: 5, Funny

    I can't wait to read the whole book on one page.

  3. no mention of project gutenberg by justforaday · · Score: 3, Insightful

    I find it interesting that in all the articles I've looked at today about this that only one has mentioned Project Gutenberg. Naturally, I can't recall which source it was...

    --
    I'll turn into a supernova and burn up everything. Well I'll turn into a black little hole and you'll turn into string.
  4. What a concept. by Anonymous Coward · · Score: 5, Informative

    I liked the idea the first time I heard it - back when it was called Project Gutenburg. :P

  5. What do these guys know... by dada21 · · Score: 5, Interesting

    ...that we don't?

    It seems to me that they're throwing money at an unnecessary application. Does Yahoo know something that we don't? I'd venture that they're starting with PD books to shake the bugs out of their platform so the app works well in round 2.

    Round 2 (current commercial books) won't occur without a massive copyright law change or support of the Author's Guild.

    Hmm.

  6. Project Gutenberg by timeToy · · Score: 5, Informative

    16k ebooks to choose from today, more to come, no Google, no Yahoo.
    http://www.gutenberg.org/

    1. Re:Project Gutenberg by harmonica · · Score: 4, Interesting

      More books are a good thing. Having a scanned PDF version includes graphics as well, which are missing from Gutenberg ebooks. So I see this as a very positive development.

    2. Re:Project Gutenberg by timeToy · · Score: 4, Informative

      It depends, some book do carry graphics, for instance the Slashdot friendly "Amusements in Mathematics" by Henry Ernest Dudeney, 1917
      http://www.gutenberg.org/etext/16713 the Html zipped version do carry all the original drawings.

    3. Re:Project Gutenberg by Infinityis · · Score: 2, Funny

      Well this is a problem waiting to get solved. Why don't they incorporate image-to-ASCIIart software so we can get high-quality images from these books?

    4. Re:Project Gutenberg by shellbeach · · Score: 3, Interesting

      Project Gutenberg is great and all, but there's something to be said for some effort made at presentation. Sometimes italics are a good thing.

      It's not a great solution, but emphasis _is_ preserved in the etexts, just like that. Or occasionally like THIS ... Pity there's no consistency, but for most texts it works well enough.

      Also, the fact that they are plain text, with no markup, formatting, binary code, whatever in them means that they'll always be accessible to anyone, regardless of software or platform. And that's a good thing, too!

  7. Whew! by op12 · · Score: 4, Interesting

    I almost panicked after seeing we had gone so long without a Google-related article.

    The opt-in rather than opt-out strategy is really what Google probably should have done, but it'll be interesting to see who comes out as a winner, Yahoo or Google, in all of this.

  8. But will they digitize PD works from after 1922? by Anonymous Coward · · Score: 5, Informative

    In the US, books published after 1922 can still be public domain if the author was American, it was originally published in the US, and the copyright was not extended at the end of the original copyright period. Google Library does not seem to be making an exception for this, will OCA? Project Gutenberg does.

  9. Not really an up-stage by ChocoBean · · Score: 4, Informative

    Actually this won't "Upstage" google in any way.

    FTA:
    all the content will be made available so it can be indexed by all the other major search engines, including Google's

    Yahoo is just going to scan, scan and scan. We all already prefer google's indexing and searching and cleaner interfaces, so the only thing Yahoo! will accomplish by this is help google print along, sheilding all (other) copyright law suits. Once the stuff is online, we all know that Google-bots will be all over it "like a fly on a pile of very seductive manure (Zapp)"

    Excellent.

    I just hope publishers realise that in this case neither google or yahoo is trying to be their best friend.

  10. What about China? by DAldredge · · Score: 3, Interesting

    Will Yahoo provide sorted or unsorted lists of books that China's Internet uses view to the thugs that run China?

  11. The difference between Google and Yahoo's effort by doctor_no · · Score: 4, Insightful

    Seems like the crucial difference between Google's efforts and the OCA(Open Content Alliance) is that Google has a "opt-out" policy for copyrighted material, while OCA specifically requires the copyright holder to contact them and essentially allow them to use the material.

    The OCA likely won't be sued by the Writer's Guild like Google, however, for searching material Google will likely be better being that Google's search will likely include a massive plethora of copyrighted material, legal or not. Also, it seems that Google themselves will be allowed to use all the material from the OCA into their project as well.

  12. Companies should Get Original by TarrySingh · · Score: 2, Insightful

    Why can't companies come up with some cooler ideas? Why ape each other? First Google and hten Yahoo, Sure MS will also want to play.

    --
    Scott McNealy to Michael: "Suck my Sun!" Michael Dell to Scott : "Lick my Dell!"
  13. NOT competing by daniil · · Score: 4, Informative

    There's a slight difference between an 'Internet-based library' and 'searching inside books'.

    --
    Man is a slave because freedom is difficult, whereas slavery is easy.
  14. Re:Why PDF? by david+duncan+scott · · Score: 4, Informative
    10 years down the road when everything is in PDF format, whose to stop them from charging us to view material in their format?

    The fact that it's an open, documented format?

    Adobe has made their money the old-fashioned way, by making tools that work well, rather than by locking people into a format. GhostScript, among others, will read those PDF's with or without Adobe.

    --

    This next song is very sad. Please clap along. -- Robin Zander

  15. Apples and Oranges! This is not Google Print! by merreborn · · Score: 4, Informative

    Google Print's goal is to allow people to search book content, WITHOUT giving them the content of the book.

    For example, searching "Zoroastrianism" would return a list of book titles on the subject, and links to purchase the books in question. You CANNOT download the content of the book!

    The OCA (The group Yahoo just joined) is an opt-in, full content hosting project.

    Searching "Zoroastrianism" would return a (much smaller) list of books, with the *full* content of the book available for download with the explicit consent of the publisher/author!

  16. Sad thing about Yahoo though by totallygeek · · Score: 2, Interesting

    You will be reading the content to Moby Dick on Yahoo and in the top right it will say, "content provided by Google."

  17. University of Calif: Yahoo OK, Guttenburg banned by dananderson · · Score: 5, Interesting
    I find it funny (in an ironic way only) that the University of California is allowing its public domain books to be scanned by Yahoo. At the same time, UC libraries prohibit scanning for Project Gutenberg or other true "open" content projects unless they receive $$$$ in royalities.

    I hate to see a University pander to commercial interests, while at the same time, welcome commercial interests such as Yahoo. Money talks, and I'm sure UC is being paid a lot, but libraries are supposed to be public resources too, not exclusive profit-centers :-(.

  18. Re:Annoying by ScentCone · · Score: 2, Insightful

    I am getting tired of the big internet companies straight up copying each other.

    Should we turn to you to tell us which provider of each major online activity is the one we should all use? Even if the differences are incremental and subtle, I'm glad when I get to choose between Yahoo's and Google's take on a particular app/service. I'm also glad that Audi and Toyota and GM and Honda all have different ideas on cars... even though someone else built one once already. Come on - not every service offered is going to be wholly unique, and shouldn't be. It's competition - for eyeballs, brand loyalty, etc. Same reason there are a zillion Linux distros, even though may overlap. Everyone's got their own idea of what would make it just a little bit better.

    --
    Don't disappoint your bird dog. Go to the range.
  19. Re:Dupe by Nuttles1 · · Score: 4, Funny

    You must not be a true /.er because you know that if you were you would read up on every bit of documentation about anything that we do....Like how we alway RTFA...errr....wait, scratch that

  20. PDF?! yuck by BillHop · · Score: 2, Insightful

    Does anyone else find there is no way to read a PDF with the scroll buttons (mouse wheel, etc.) without the viewer constantly breaking your flow by jumping to the next page?

    This goes along with the concept that for an electronic format, I do NOT need a sentence (or even worse, hyphenated word) broken up by two inches of top and bottom margin filled with page numbers, miscellaneous watermarks, repetitive titles, etc.

    PS. This being flamebait does not make it false.

  21. Bookripper on its way? by serutan · · Score: 4, Interesting

    Google maintains its scanning represents "fair use" allowed under the law because it only allows Web surfers to view excerpts from copyrighted books.


    Soon after Google Mail was introduced, somebody created a SourceForge project that lets you use Google Mail as a database. How long until somebody releases a "Bookripper" app that assembles a whole book from search extracts? As I understand it Google displays two pages at a time (or wait, that's Amazon, but I bet they're similar). All you would need to know is a quote from a book's first page as a seed, and you should be able to grab the whole book by doing a series of searches using text from the second page returned by each search. The trick would be to knit the pieces together and eliminate the overlapping text. Seems almost trivial. Another possibility would be to search for random words and look for overlaps between the results, assembling them like a linear jigsaw puzzle until there are no gaps.

    1. Re:Bookripper on its way? by gasaraki · · Score: 2, Informative

      It's already been done. The guy was sent a 'please stop doing this' letter by Google if I recall, which I think he went along with. No formal suit or anything, but they didn't like it. I'll be damned if I can remember the link, I think there was a K5 story or two on it though.

    2. Re:Bookripper on its way? by Dan+East · · Score: 2, Informative

      According to Google, there are specific portions of each book that it will never show, making it impossible to harvest an entire book.

      I'm already logged in. Why are you telling me the page is unavailable?

      As part of our efforts to protect a book's copyright, a set of pages in every in-copyright book will be unavailable to all users.


      http://print.google.com/googleprint/help.html#page limit

      Dan East

      --
      Better known as 318230.
  22. "Do no Evil" done right by Chunni+Babu · · Score: 5, Insightful

    Now this is a right step towards making book contents searcheable online. I will hate to see one company like Google copying and caching all books in its massive cluster of servers. I know that Google kool-aid that "we are about general good" is running deeply in the veins of slashdot types.

    Since when was scanning books from libraries and making them available to public for a profit was considered "fair use"? This kind of stuff is done by pirates. Go to the major cities in China and India and you will see piles of copied book in the streets all sold for 1/10th the original price without giving anything back to the authors. The pirates can say that they are doing a favor to the authors by driving them out of obscurity.

    The message the alliance is sending out to the authors is

    • we are not for profit
    • we will scan your book only if you want us to do so
    • your book will be indexed based on your approval and copyright agreement with you and the publishers
    Compare this to what Google is telling the authors
    • we will scan your book, fill a form and tell us if you don't want us to do so
    • we will take sale comissions from amazon, buy.com, bn.com, etc. without sharing anything with you
    • if we show ads, we will share the profits with you
    • we will show excerpts of your book, so if a researcher is researching on a topic he can find what you have written about a topic without ever having to buy your book, too bad, heh heh, write a fiction book dude
    • we will cache your book in our servers and only we will reserve the right to profit from your scanned book
    So much for do no evil. Kudos to yahoo for bringing the open content alliance, gutenberg, and other similar projects to limelight - these are some really nice collections that were hidden by the noise created by 'google print'.
    1. Re:"Do no Evil" done right by nursegirl · · Score: 2, Insightful

      Compare this to what Google is telling the authors
      * we will show excerpts of your book, so if a researcher is researching on a topic he can find what you have written about a topic without ever having to buy your book, too bad, heh heh, write a fiction book dude

      Except that Google only shows 2-3 sentences of books that are under copyright. I've never found a researcher that can write on a topic by only reading 2 sentences. It's only posters on /. that can claim expertise on a topic without actually learning anything about it.

    2. Re:"Do no Evil" done right by Jeff+DeMaagd · · Score: 2, Informative

      [i]Since when was scanning books from libraries and making them available to public for a profit was considered "fair use"?[/i]

      It's not. You are mischaracterizing Google's system. The problem with your claim is that Google's system doesn't make the book available to users to download, it is only a search method that points to the relevant books and provides short excerpts like their search engine does. Google won't provide the book or even whole page without the copyright owner's permission. My impression is that Google was just trying to make an improved card catalog.

      [i]we will take sale comissions from amazon, buy.com, bn.com, etc. without sharing anything with you[/i]

      The sale of the book meant that the author got their share of the money.

      [i]we will show excerpts of your book, so if a researcher is researching on a topic he can find what you have written about a topic without ever having to buy your book, too bad, heh heh, write a fiction book dude[/i]

      The researcher could just go to the local research library, no books purchased. Another problem is that the research would be horribly flawed given that the given descriptions are so short and the allowed excerpts only cover certain pixed pages.

    3. Re:"Do no Evil" done right by Anonymous Coward · · Score: 3, Insightful

      Since when was scanning books from libraries and making them available to public for a profit was considered "fair use"?

      How disingenuous. Google Print shows only a snippet of the text and tells you how to buy the book if it seems like what you need. Not pages, not paragaphs - a couple of sentences. In fact, Google Print instantly returns pretty much what you'd get if you hired a researcher to go find X number of books with such and such text and the researcher prepared a paper with a short quote from each. Such a paper would be unquestionably fair use and could be published anywhere. Google Print merely automates that process and makes it instant. I have no special fetish for Google; anybody who builds a system like this is doing us all a favor: it's a 21st century version of a card catalog, and a huge win for readers and authors. It's only being fought because, in our sue-happy culture, fair use rights have been eroded so much and copyright protections have been expanded so far that people seem to believe that even the most trivial use of their work - in a futuristic card catalog, for example - should bring a pay day. It's another case of cutting off your nose to spite your face.

      we will scan your book, fill a form and tell us if you don't want us to do so

      Which is, of course, exactly the model that Google and every single other search engine on the web has used since day one: Yahoo, AltaVista, everybody. It's the only sane way to make the web indexable. If it's not copyright violation on the web, then it's not copyright violation in print. Bringing that sort of searchable index to the history of printed material is a huge win for everybody, including authors. If courts eventually rule that this is copyright violation, then let's all say goodbye to the usefulness of Google and every single other decent search engine in the history of the web. Which would be a damn shame, but not surprising considering how twisted and lopsided against the public the bargain of copyright has become.

    4. Re:"Do no Evil" done right by _Sprocket_ · · Score: 2, Informative
      Since when was scanning books from libraries and making them available to public for a profit was considered "fair use"?

      Since when is Google doing this? As others have pointed out, Google provides a portion of the work to give the search context - 3 pages. In another post, you claim that 3 pages is enough information to invalidate the sale of a book. If this is the case, I would have to seriously question the value of your work. Either that - or take a serious look at public libraries, private loaning, Amazon.com, book stores, and other avenues of viewing those precious 3 pages that apparently cost you sales.

      It might be worth noting that no case of "fair use" is clear. Court cases often contradict each other, so there are no clear precidents to follow. However, among common factors potentially in Google's favor is that they:
      1. Provide additional insight in to the work(s)
      2. Provide a service to the public, in many cases providing facts and information
      3. Provide a limited subset of the work
      4. Are not making offensive use of the work


      What may not factor in Google's favor include:
      1. Limited modification of the origional work
      2. Potential damage to the market for the work - providing that someone such as yourself can prove that 3 pages is damaging.
      3. Google's behavior may be interpreted as hostile and offend the Court


      Having said that - I'm not a lawyer. But then, even experts are occasionally shocked at the outcomes of these cases.

      It might be worth noting that fair use does not require notification or permission of the copyright holder. Nor does it require that the one invoking fair use not make a profit.
      we will take sale comissions from amazon, buy.com, bn.com, etc. without sharing anything with you

      When do authors currently get a cut of sale comissions?
      we will show excerpts of your book, so if a researcher is researching on a topic he can find what you have written about a topic without ever having to buy your book, too bad, heh heh, write a fiction book dude

      Again - this might stand up in court. Possibly. But note that most examples of this having weight tend to involve images and songs - not books. It may be difficult to prove 3 pages as damaging for a work as large as a book - especially if the damaging material is a fact.
      Kudos to yahoo for bringing the open content alliance, gutenberg, and other similar projects to limelight - these are some really nice collections that were hidden by the noise created by 'google print'.

      Kudos to Yahoo for coming up with something different to do. But I missed it where the OCA or Yahoo even makes mention of Project Gutenberg. Furthermore, I find it a hard stretch to claim that the "noise created by 'google print'" did anything more to obscure Project Gutenberg than Yahoo's project.
  23. This is huge. IA beat Google and Yahoo to this... by Anonymous Coward · · Score: 4, Insightful


    I've read through the first few posts, and people really don't have a clue about what this is all about. "Open Content Alliance"... It means what it says. Open f'ing content. Let there be content available to the masses... Is it more important that I can get a snippet from some copyrighted text, or that millions of children can read Alice in Wonderland with all it's wonderful illustrations.

    This is beyond PDF or anything like that. Some people want PDF, so Adobe will make them. Some people want decent OCR versions, perhaps to go into Distrubuted Proof readers or into someone's text-only PDA. It's ALL possible. This is NOT an exclusive club, it's an INCLUSIVE community that is dedicated to Open f'ing Content.

    Why don't you people get it. By allowing people to have full texts of some of humanities greatest works we are doing more than a few snippets of the latest Ken Follet novel... a lot more.

    It's bigger than Yahoo or Google. Yahoo is NOT an also-ran.... The Internet Archive has been scanning books and hosting Milloins Books project texts as well as Project Gutenberg texts for a long time... long before Yahoo or even Google were in the picture. Ignorant comments made here suggest somehow Yahoo is following.

    I say Yahoo is leading by embracing a project that by definition is bigger than themselves. Good for them.

  24. Re:its to see... by twiddlingbits · · Score: 3, Insightful

    PDFs of "public domain" or donated works will always be available. Amazon has gotten enough sh*t about the excerpts that they publish to entice the reader to buy the book. Google "e-book" and you'll see Yahoo! is nowhere near the only source. There is even an open-source e-book idea at Open eBook - http://www.openebook.org/ -- Information on the publication specification for electronic books that will allow compatibility between different e-book devices.

    I just wonder how Yahoo! will make $$$ of this very small market of public domain works, or if they DO get repro rights to other books what the price model is to download them, or will you just see advertisements in your e-books? The authors are not going to give up their $$$ nor is Yahoo so somebody is going to have to pay for this content.

  25. Re:PDF?! yuck by Fiver- · · Score: 4, Informative

    "Does anyone else find there is no way to read a PDF with the scroll buttons..."

    No. I just set it to Continuous. See those four icons in the lower right corner? (assuming you've got a recent version) Play with those. You want the second button from the left

    "This goes along with the concept that for an electronic format, I do NOT need a sentence (or even worse, hyphenated word) broken up by two inches of top and bottom margin filled with page numbers, miscellaneous watermarks, repetitive titles, etc."

    Well, the whole purpose of PDF is to "preserve the look and integrity of your original documents ... regardless of the application and platform used to create it." Blame the creators of that particular pdf file if you don't like the headers, footers and margin size. When I make pdf books to read on the train...I just finished Dream Quest of Unknown Kadath by Lovecraft...I open the original ascii text file in Word, make the top & bottom margins tiny, change the font to something tolerable and export it.

  26. Re:But will they digitize PD works from after 1922 by thisissilly · · Score: 2, Informative
    In the US, that is only true of works published after 1978.

    When U.S. works pass into the Public Domain is a good summary of the U.S. issues.

    Me, I just want 14+14 back.

  27. New and Radical by Corydon76 · · Score: 3, Funny

    Hey, wow, that is completely original. Nobody else could have possibly thought of this idea before.

  28. best format? by j1m+5n0w · · Score: 2, Interesting

    Actually, I prefer plain txt to pdf if I'm reading from a computer (assuming the book is not illustrated), since I have more control over fonts and colors (and I have read quite a few gutenberg books that way). However, I think the best native format (despite its general user-unfriendliness) would be latex, from which txt, pdf, and html could be generated. On the other hand, I suppose it's much easier to generate txt or pdf from scanned pages than latex.

  29. Re:Annoying by Moofie · · Score: 3, Informative

    "very few new features come out"

    Have you seen Google Earth?

    How about the disaster wiki that went together in about 20 minutes, where people were posting status reports of New Orleans properties?

    I think you're damning with faint praise. Google, at least, consistently builds superb offerings, and the price is right. Not quite sure what you're grousing about...

    --
    Why yes, I AM a rocket scientist!
  30. Re:PDF Isn't Proprietary by amliebsch · · Score: 2
    --
    If you don't know where you are going, you will wind up somewhere else.
  31. Re:University of Calif: Yahoo OK, Guttenburg banne by esme · · Score: 2, Interesting
    At the same time, UC libraries prohibit scanning for Project Gutenberg or other true "open" content projects unless they receive $$$$ in royalities.

    do you have a source for this? do you mean that a UC library tried to stop someone from checking out books and scanning them? or do you mean that they didn't allow the gutenberg folks to setup a scanning shop inside a library? there's a huge difference between those two.

    i work at a UC library, and i've certainly never heard of any policies about project gutenberg. i'm not sure what kind of arrangements yahoo made, where the scanning is going to happen, etc. but i would imagine that yahoo agreed to (at least) cover the expense and hassle of any library facilities they're going to be using. project gutenberg might not have that kind of funding.

    this is all assuming that this was involving public domain books, where the only leverage that UC libraries would have would be their facilities and lending policies. if you're talking about stuff that UC owns the copyright to, then that would be another kettle of fish. it would not surprise me to learn that a campus counsel or some such wouldn't let a library give away rights to content that UC held the rights to (like a library's special collections holdings).

    -esme

  32. Re:i have heard of these "printer" inventions, yes by B4RSK · · Score: 2, Interesting

    I do see your points as well, and definitely there will be demand for commercially produced books for some time to come.

    However, what I described does not require any folding and binding takes all of about 10 seconds. I've done this more than a few times and it does work out well.

    I have a Brother laser printer that cost about US$300. I bought this printer for other reasons, but it is a great book printer too. (Has a duplexer, supports both PCL6 and PS3, built-in standard 10/100 LAN port. Basically it will work on any OS that supports PS or PCL6.)

    Anyway, it prints duplexed pages at about 16ppm and the toner is cheap. The Windows driver also lets me easily (one click) print two pages onto one side of a sheet. The result of all this is that I can print a 300 page book perfectly in under 5 minutes using only 75 sheets of A4 paper. I then apply two of those triangular binding clips (the ones with the fold in handles), and it's done!

    Total cost of around US$1 including the clips, and total time of about 5 minutes. It's not as pretty as a bound paperback but I'm willing to trade that off for the instant availability and the ability to reprint again any time if needed.

    (The fact that I live in Japan definitely plays some role in my choice. English books here are very expensive and only available from major downtown bookstores -- and even then selection is pretty limited. Ordering from Amazon Japan (or US/UK) is possible, but the shipping increases the prices and takes time. A $1 five minute book is a dream!)

    --
    Some people are like slinkies--basically useless but they bring a smile to your face when pushed down the stairs.
  33. Re:Project Gutenberg (Michael Hart essay) by gbnewby · · Score: 2, Informative

    Here's something Michael Hart wrote about this today. He's
    the founder of Project Gutenberg, and inventor of eBooks.
        -- Greg

    Yet another consortium of multi-billion dollar institutions
    has thrown its hat into the eBook/eLibrary ring today, just
    9 months before the 35th Anniversary of Project Gutenberg's
    placement on the Internet of the first eLibrary element, on
    July 4th, 1971.

    Last December 14th Google used a multi-million dollar blitz
    of television, radio and print media to announce the Google
    Print revolution: "Today is the day the world changes," but
    so far it has been difficult to get even a handful of books
    from their project, some 10 months later.

    I am wondering of the news media will give the same kind of
    coverage to a second such announcement, which will also put
    up an alliance of an Internet search engine giant with some
    multi-billion dollar libraries. I will be watching all the
    news programs tonight in eager anticipation, as I was doing
    last December, but I fear that "once burned/twice cautious"
    might take some of the wind out of their sails/sales.

    However, this effort has one huge advantage: "The Internet
    Archive," run by my friend Brewster Kahle. Brewster is one
    person who has a proven ability to put an enormous resource
    on the Internet for the whole wide world to use.

    This different is such that I am willing to bet that Yahoo!
    gets off to a better start in the next 10 months than did a
    rather completely false start by Google.

    Of course, the real test will be to see how long it takes a
    project such as this to reach a million eBooks, since there
    are already well over 100,000 eBooks already available free
    for the taking on various Internet sites, perhaps 50,000 of
    them from the various Project Gutenberg sites.

    Here's a hope that a few years from now anyone can have the
    advantage of a million book home library, and in even a few
    years more to ten million books sitting on one inch of your
    own bookshelf next to your computer.

    Michael S. Hart
    Founder
    Project Gutenberg

  34. More expensive books? by Grendel+Drago · · Score: 2, Interesting

    Huh? Where are you from? I worked at a research library at a large state university, and I have no idea what you're talking about. True, libraries pay extortionate rates for journal subscriptions, but when they purchase monographs, they frequently get them off the used book market, just like you or I would. It costs them extra to get it bound in a durable fashion, and to enter it into their Byzantine catalog system, but I've never, ever heard of libraries having to pay extra for books simply because they were libraries.

    Also, ongoing royalties? What country does that happen in? I've never heard of such a thing.

    --
    Laws do not persuade just because they threaten. --Seneca
  35. Right you are! See TEI. by Grendel+Drago · · Score: 2, Interesting

    Indeed. It's bothered me for some time now that it takes a good deal of doing to make a nice LaTeX edition of the book, so that it's nontrivial to go from the eBook to a really high-quality printed page.

    Luckily, someone's decided to do something about it. See PGTEI, a very verbose and flexible method for marking up literary works. The full TEI spec is gargantuan, so PGTEI is actually a dialect of a subset called TEI Lite. It's an XML markup scheme which has output filters (it uses XSLT, it seems) for plain vanilla TXT (for longetivity, and on general principle), HTML and PDF. (Probably some others as well.)

    You can try it out yourself. Grab some examples, and run them through the online tools.

    Post-processors are very set in their ways, but as I've recently joined their ranks, I hope to use PGTEI for my first post-production job. It certainly seems more elegant than generating and tweaking multiple formats by hand.

    --
    Laws do not persuade just because they threaten. --Seneca
  36. Awesome, indeed! by Grendel+Drago · · Score: 2, Interesting

    I remember seeing some of Dudeney's puzzles referred to before, but I couldn't remember where. Then the book popped up on my RSS feed (it was released within the last month, I think), and indeed, it was full of fun math puzzles. Man, that was nice.

    But they don't just have HTML; see various examples of files released with filetype "TEI", including PDF (through LaTeX), TXT (in a variety of encodings, i.e. Latin-1, US-ASCII and UTF-8) and HTML.

    --
    Laws do not persuade just because they threaten. --Seneca
  37. Erosion of Public Domain--not just Disney and RIAA by dananderson · · Score: 2, Informative
    The physical owner of a PD book (library) can prohibit scanning or even viewing. For modern books, it's not a problem--just go to another library. For some books it is a problem. Few copies exist, and they are scattered around the world.

    The library can require a legal agreement to view or scan the book, and that is where a lawsuit can occur. Of course, the legal agreement doesn't apply to 3rd parties that haven't signed. It's another example of the erosion of the public domain--it's not just Disney and the music industry that's doing it folks--it's the University of California and other libraries.

  38. University of California locks away public domain by dananderson · · Score: 2, Interesting
    The source is my personal experience with the UCSD, UCI, and UCLA libraries. I assume the other UCs have the same or similar policy against digitizing books. Gutenburg is not a corporation, it's private individuals (volunteers). It's usually one guy (or gal) with a scanner, OCR software, and a little bit of time to proofread.

    would not surprise me to learn that a campus counsel or some such wouldn't let a library give away rights to content that UC held the rights to (like a library's special collections holdings)

    So in other words, the public domain is locked away. The PD consists of OLD books, which are largely in special collections.

    Here's some policies I digged up. It's worse than the policy though. They say write a letter explaining your needs and they ignore you.