Carnegie Mellon's Digital Library Exceeds 1.5 Million Books

← Back to Stories (view on slashdot.org)

Carnegie Mellon's Digital Library Exceeds 1.5 Million Books

Posted by Zonk on Thursday November 29, 2007 @01:30PM from the might-just-be-enough-to-read dept.

cashman73 writes "Most Slashdot readers are probably familiar with Google's book scanning project, a collaboration with several major universities to digitize works of literature, art, and science. But Google may have been beat to the punch this time -- about a decade ago, Carnegie Mellon University embarked on a project to scan books into digital format, to be made available online. Today, according to new reports, they now have a collection of 1.5 million books, the equivalent of a typical university library, available online."

10 of 119 comments (clear)

Min score:

Reason:

Sort:

Link here by autophile · 2007-11-29 13:36 · Score: 5, Informative

http://tera-3.ul.cs.cmu.edu/

--
Towards the Singularity.
1. Re:Link here by Rebelgecko · 2007-11-29 16:09 · Score: 4, Informative
  
  If you're looking for the Mac or Linux versions of the plugin, try rereading the part of the page that says
  To see the book pages of ULIB, please dowload free TIFF plugin or DjVu plugin
  Then try following the link to the DjVu plugin and downloading the Windows, Mac or Unix one, depending on your what you need. They're available here.
  
  --
  CATS/Diebold '08- All your vote are belong to us!
2. Re:Link here by himanshuarora · 2007-11-29 16:45 · Score: 2, Informative
  
  Another link here
  
  http://dli.iiit.ac.in/
  
  --
  Spam: Any activity on internet to gain popularity without paying to advertising companies like Google.
Nice to have alternatives by MrAndrews · 2007-11-29 13:37 · Score: 5, Informative

This site (which is found at ulib.org BTW) seems to have a pretty good collection of obvious titles to choose from, though having to download a custom plug-in to read anything is a bit annoying (and apparently temporary). I played around for a while, seeing what I could dig up, and didn't see any obvious gaps (though I purposely avoided anything modern).

As an author, I was always a bit worried having Google as the sole gatekeeper for this kind of service... not that I necessarily distrust Google's intentions, but if they changed their worldview one day, it'd be a pity to have so much work invested in only one place, and have to re-build it all somewhere else. It's nice that there are proper choices, and not all from a commercial stance either.

I don't know how smooth the integration process is (I submitted one of my books, but it appears it's a very un-automated system involving email etc, so it will probably take a while to see results). But still, I'm glad they're giving authors a way to help grow the library. Here's hoping it becomes even better than its promise!

--
The world's only surviving livewriter.
1. Re:Nice to have alternatives by Kadin2048 · 2007-11-29 18:13 · Score: 3, Informative
  
  Also, there are more than twice as many books in Chinese as in English ... I guess I should brush up on my Mandarin, if this is where the world's headed.
  
  --
  "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
They use a Captcha to validate the scanned words by chipasd · 2007-11-29 14:17 · Score: 3, Informative

For those that missed the articles about C.M.'s associated project for validating all those scanned words on all those scanned pages: http://recaptcha.net/

reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA.
Guess they couldn't afford proof readers. by liftphreaker · 2007-11-29 14:51 · Score: 5, Informative

I picked a book at random, Dickens' tale of 2 cities. Here's the first few lines:

"TIT was the best of tunes, it was the worst of times,..."

"li was tie winter of despair, we had everything before us,..."

I guess they just OCR'd books en-masse without proof reading. Oh well, think of it as an exercise for your brain.
Re:Any reckless venture capitalists in here tonigh by SpaceWanderer · 2007-11-29 15:26 · Score: 2, Informative

Already been done. Check this site: http://www.teach12.com/store/courses.asp?t=&sl=&s=905&sbj=Literature%20and%20English%20Language&fMode=s I've listened to some of their recordings and they were pretty good.
Re:Digitize our history with slave labor? by Bwana+Geek · 2007-11-29 15:48 · Score: 2, Informative

Sure, most of the digitization was done in China... but the vast majority of the books on the site are Chinese, too. Of the 1.5 million books in the collection, almost 1 million of them are Chinese. English accounts for most of the rest at 362508 books.
Heck, I think I might have that many... by aminorex · 2007-11-29 17:34 · Score: 2, Informative

1.5 million books? Ok, maybe my tastes are a bit more focussed on mathematics, physics, programming, economics, and linguistics than would be the CMU library, but I just burned 3 DVDs worth of math books alone, 12GB of PDF, at roughly 8MB/title, for 1500 titles. And that was just one week's worth of crap filtering for one man. Methinks CMU isn't really trying.

--
-I like my women like I like my tea: green-