Slashdot Mirror


Carnegie Mellon's Digital Library Exceeds 1.5 Million Books

cashman73 writes "Most Slashdot readers are probably familiar with Google's book scanning project, a collaboration with several major universities to digitize works of literature, art, and science. But Google may have been beat to the punch this time -- about a decade ago, Carnegie Mellon University embarked on a project to scan books into digital format, to be made available online. Today, according to new reports, they now have a collection of 1.5 million books, the equivalent of a typical university library, available online."

21 of 119 comments (clear)

  1. Link here by autophile · · Score: 5, Informative

    http://tera-3.ul.cs.cmu.edu/

    --
    Towards the Singularity.
    1. Re:Link here by Rebelgecko · · Score: 4, Informative
      If you're looking for the Mac or Linux versions of the plugin, try rereading the part of the page that says

      To see the book pages of ULIB, please dowload free TIFF plugin or DjVu plugin
      Then try following the link to the DjVu plugin and downloading the Windows, Mac or Unix one, depending on your what you need. They're available here.
      --
      CATS/Diebold '08- All your vote are belong to us!
    2. Re:Link here by himanshuarora · · Score: 2, Informative

      Another link here

      http://dli.iiit.ac.in/

      --
      Spam: Any activity on internet to gain popularity without paying to advertising companies like Google.
  2. Nice to have alternatives by MrAndrews · · Score: 5, Informative

    This site (which is found at ulib.org BTW) seems to have a pretty good collection of obvious titles to choose from, though having to download a custom plug-in to read anything is a bit annoying (and apparently temporary). I played around for a while, seeing what I could dig up, and didn't see any obvious gaps (though I purposely avoided anything modern).

    As an author, I was always a bit worried having Google as the sole gatekeeper for this kind of service... not that I necessarily distrust Google's intentions, but if they changed their worldview one day, it'd be a pity to have so much work invested in only one place, and have to re-build it all somewhere else. It's nice that there are proper choices, and not all from a commercial stance either.

    I don't know how smooth the integration process is (I submitted one of my books, but it appears it's a very un-automated system involving email etc, so it will probably take a while to see results). But still, I'm glad they're giving authors a way to help grow the library. Here's hoping it becomes even better than its promise!

    1. Re:Nice to have alternatives by Kadin2048 · · Score: 3, Informative

      Also, there are more than twice as many books in Chinese as in English ... I guess I should brush up on my Mandarin, if this is where the world's headed.

      --
      "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
  3. Re:Yay! by Joe+Tie. · · Score: 4, Insightful

    Traditional libraries are long dead in a pretty significant percentage of the US. I live in a fairly large city, and it's pretty much useless for anything but the level of book one would expect high school students to need. No real database access, no journals, very little in the way of primary sources for anything. It's all novels, magazines, newspapers, "subject X for dummies", and out of date encyclopedias. The wireless access there has been useful at times, but that's about it. You don't get a good library without a public willing to put in the requisite money, and fewer and fewer people are.

    --
    Everything will be taken away from you.
  4. search engine by HemmingSay · · Score: 5, Funny

    i really like the idea of online libraries, but i had to laugh when i got the following result for the first book that came to mind: "Please provide a valid query (Word greater than length 3)" the book was "the old man and the sea".

  5. Re:Digitize our history with slave labor? by truesaer · · Score: 3, Insightful
    In the FA it stated that most of the digitization was done in India and China. Low wage poverty-level workers, how dandy. Am I the only one who found it odd/sad that "we" digitized our knowledge with uneducated, underpaid slave labor? Maybe they were allowed to read some books and get educated? Nah.


    In case you haven't noticed, the economies of India and China are booming...in large part because of the offshoring/outsourcing from more developed countries. The wages and employment opportunities only get better in India and China due to projects like this.

  6. Re:Yay2! by jmorris42 · · Score: 2

    > Those books are still copyrighted, the publisher won't sell you a copy, yet they
    > want to deny everyone access to it.

    They have to follow the law so I forgive them on books under copyright. But they don't appear to even want to make it easy to access complete copies of books that are out of copyright. You can write them and ask for a full copy of a book. Bah. And no easy way to mirror the site (even just the out of copyright material) either.

    Our library already hosts a Project Guttenberg mirror. Doing some back of the envelope math says we would need to bulk up the RAID somewhat more to even take the public domain english content from this project since it is all TIFFs but it would be something we would consider if it were easy (rsync) and the content were in a form that would actually be, ya know, USEFUL!

    --
    Democrat delenda est
  7. They use a Captcha to validate the scanned words by chipasd · · Score: 3, Informative

    For those that missed the articles about C.M.'s associated project for validating all those scanned words on all those scanned pages: http://recaptcha.net/

    reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA.

  8. need a visualisation by ross.w · · Score: 3, Funny

    So how many Libraries of Cogress is that?

    --
    If my call is important, why am I talking to a recording?
  9. Re:There isnt a great collection there really by theMerovingian · · Score: 3, Insightful


    Copyright law in the US started out pretty reasonable - 20 years from the date of registration. Walt Disney spent alot of money and lobbied the government for another 20 year period. Before this could expire, they lobbied to have copyright terms extended to the life of the author plus 20 years. As a result of the Sonny Bonno act, it was expanded to the life of the author plus 75 years. (NOTE: this is a very brief approximation of US copyright law history - it was actually somewhat more complex than this and with several more twists and turns). See here for a detailed explanation.

    The functional result of this lobbying is that no US copyrighted work created since 1923 has lapsed into the public domain (unless the owner screwed up by not renewing the copyright at the appropriate juncture).

    --
    "If you think you have things under control, you're not going fast enough." --Mario Andretti
  10. Guess they couldn't afford proof readers. by liftphreaker · · Score: 5, Informative

    I picked a book at random, Dickens' tale of 2 cities. Here's the first few lines:

    "TIT was the best of tunes, it was the worst of times,..."

    "li was tie winter of despair, we had everything before us,..."

    I guess they just OCR'd books en-masse without proof reading. Oh well, think of it as an exercise for your brain.

  11. Lirbraries Are Not Dying by EEPROMS · · Score: 2, Interesting

    The definition of a Library is just changing. When you look at a small Internet cafe what you are really seeing is the modern version of a Library that also caters for those who wish for some refreshments. If the old Dickensian hard copy libraries want to survive they will have to become more communal and socially active. Yes, that means having network access and a place for young people to talk. While you have them captive you can promote books with posters on the walls and seminars and social events. Its time Libraries stopped hiding behind dusty books and started becoming a public social space were people can exchange ideas, you know what Libraries were originally way back in the ancient Egyptian days of the great library of Alexandria.

  12. Re:Any reckless venture capitalists in here tonigh by SpaceWanderer · · Score: 2, Informative

    Already been done. Check this site: http://www.teach12.com/store/courses.asp?t=&sl=&s=905&sbj=Literature%20and%20English%20Language&fMode=s I've listened to some of their recordings and they were pretty good.

  13. Re:Well... by agrippa_cash · · Score: 4, Insightful

    You are mistaken, and for this you should be glad. It often takes several years for masterpieces to be recognized as such, so it shouldn't surprise you that nothing you like has been acclaimed. I'm not a high culture joe myself, so please don't be offended, but today's high culture may be incomprehensible to you because you aren't sophisticated enough to appreciate it. If you grow up watching Fantasia, it is easier to enjoy Stravinski. As for originality, the tale is in the telling. People of years past lived and died much as we do, a bit more fresh air and hard work maybe but basically the same. Basically. They were us first, what are you going to do? Culturally we are far, far ahead of the 1907 crowd. Your image of 1899 is almost certainly based on the western upper class (listening to Wagner) rather than the teeming western poor (listening to minstrel shows) or the uncountable colonized listening to whips, maxim guns, pickaxes and sermons.

  14. Re:Digitize our history with slave labor? by Bwana+Geek · · Score: 2, Informative

    Sure, most of the digitization was done in China... but the vast majority of the books on the site are Chinese, too. Of the 1.5 million books in the collection, almost 1 million of them are Chinese. English accounts for most of the rest at 362508 books.

  15. Heck, I think I might have that many... by aminorex · · Score: 2, Informative

    1.5 million books? Ok, maybe my tastes are a bit more focussed on mathematics, physics, programming, economics, and linguistics than would be the CMU library, but I just burned 3 DVDs worth of math books alone, 12GB of PDF, at roughly 8MB/title, for 1500 titles. And that was just one week's worth of crap filtering for one man. Methinks CMU isn't really trying.

    --
    -I like my women like I like my tea: green-
  16. Re:Well... by Transtrek · · Score: 2, Insightful

    Also worth asking, are you willing to learn 2000+ year old greek to read Euclid or for Euler learn Latin (the language in of scholarship in his time)? One reason that we have and use more modern math textbooks is changes in language and notation over time. Also it is often the case that the original proof is far from the best that has been found since there is now more structure developed in later works that allows either condensing or a novel approach. If you limit yourself to pre-1900 works, you throw out the vast majority of Graph Theory losing all contributions by Erdos, Kuratowski, Tutte, Ramsey, etc. Sorry, there are areas of math that need at least up to the 1950's to get major theorems.

  17. Re:Any reckless venture capitalists in here tonigh by phantomfive · · Score: 4, Insightful

    As an average, educated male, I hate being in a discussion with someone who name-drops a book I never heard of before, as a proof that my point is invalid because I am not well read enough. It's the ultimate bitch-slap of the intellectual boxing If something like that stops you, then you totally need to work on your technique. What you have there is a clear and you fell for it even though it was ONLY IMPLIED.

    If someone comes up and says, "oh, this book clearly proves my point" then you can easily come back with, "Interesting. What does it say?" And you're off again, arguing the truth against real facts. Don't let them escape by saying, "oh, it's complicated." Respond, "it's ok, I have time. Please explain."

    The point is, make your goal to find out the truth, and you will always win. Don't defend ideas anymore once you know them to be false. Switch over as soon as you know you are wrong, and you will always be right. Not to mention switching drives your opponent batty.
    --
    Qxe4
  18. Re:Yay! by dlevitan · · Score: 3, Insightful

    Traditional libraries are long dead in a pretty significant percentage of the US. I live in a fairly large city, and it's pretty much useless for anything but the level of book one would expect high school students to need. No real database access, no journals, very little in the way of primary sources for anything. It's all novels, magazines, newspapers, "subject X for dummies", and out of date encyclopedias. The wireless access there has been useful at times, but that's about it. You don't get a good library without a public willing to put in the requisite money, and fewer and fewer people are. How many people actually want journals and technical books? You're talking about a very small portion of the population. The goal of a library is to cater to what people want - and that's mostly basic books about how to do basic things, popular fiction/nonfiction, magazines, newspapers, and basic encyclopedias. There are only two types of people who want access to journals and the like: scientists at companies and universities (who already have it as provided by their employer/school) and the few people who aren't employed in a field they want to learn about. Its not worth thousands of dollars/year/journal for a library to subscribe to even one journal when 2 people will ever read it.

    If you really want access, then you have to pay up and/or take the extra time to find somewhere you can get them for free.

    First, in my field (astrophysics) most articles are now e-printed or at least opened up after a few years. ApJ (Astrophysical Journal) has unrestricted access to all articles older than 3 years and all articles older than 1996 are available at a free NASA/Harvard site (ADS). So basically, unless you want the absolute latest articles (which for most things you don't need) you can get them for free (and even then usually through arxiv). And if you need the latest article then, as you said, pay the fee and buy it.

    Second, if you need some kind of technical book, talk to the librarians. Most of them will try to help and you can usually get it for free (or a small fee) through an inter-library loan. It might take a few weeks, but you can definitely do it without even leaving the library.

    Third, take a look at the universities near you. Most allow open access to the stacks and computers. You can spend a whole day reading a book or using the university computers to access journals without paying anything. Some even allow borrowing privileges for free or for a fee. Take a look at Columbia in New York City or UCLA.

    So yes, public libraries don't have journals. They're far from dead though, because they don't serve that need. If you really want those sort of things, then you need to go out there and get access yourself.