Carnegie Mellon's Digital Library Exceeds 1.5 Million Books
cashman73 writes "Most Slashdot readers are probably familiar with Google's book scanning project, a collaboration with several major universities to digitize works of literature, art, and science. But Google may have been beat to the punch this time -- about a decade ago, Carnegie Mellon University embarked on a project to scan books into digital format, to be made available online. Today, according to new reports, they now have a collection of 1.5 million books, the equivalent of a typical university library, available online."
Congratulations on killing traditional libraries, Carnegie Mellon University!
You just got troll'd!
http://tera-3.ul.cs.cmu.edu/
Towards the Singularity.
This site (which is found at ulib.org BTW) seems to have a pretty good collection of obvious titles to choose from, though having to download a custom plug-in to read anything is a bit annoying (and apparently temporary). I played around for a while, seeing what I could dig up, and didn't see any obvious gaps (though I purposely avoided anything modern).
As an author, I was always a bit worried having Google as the sole gatekeeper for this kind of service... not that I necessarily distrust Google's intentions, but if they changed their worldview one day, it'd be a pity to have so much work invested in only one place, and have to re-build it all somewhere else. It's nice that there are proper choices, and not all from a commercial stance either.
I don't know how smooth the integration process is (I submitted one of my books, but it appears it's a very un-automated system involving email etc, so it will probably take a while to see results). But still, I'm glad they're giving authors a way to help grow the library. Here's hoping it becomes even better than its promise!
The world's only surviving livewriter.
We can access them online. Don't need no stinkin' library ticket :-)
If I had an Ass, I'd call it Fanny Bottom, then I could slap my Ass; Fanny Bottom, on the Arse.
Because apparently the Slashdot editors can't be bothered...
http://www.ulib.org/
In the FA it stated that most of the digitization was done in India and China. Low wage poverty-level workers, how dandy. Am I the only one who found it odd/sad that "we" digitized our knowledge with uneducated, underpaid slave labor? Maybe they were allowed to read some books and get educated? Nah.
i really like the idea of online libraries, but i had to laugh when i got the following result for the first book that came to mind: "Please provide a valid query (Word greater than length 3)" the book was "the old man and the sea".
I suggested to my company that we use Carnegie Mellon's reCAPTCHA program to solve two problems 1) Improve our CAPTCHA implementation 2) Help Carnegie Mellon with their online publishing initiative. To my pleasant surprise I recently found the company decided to go ahead with reCAPTCHA. Sweet! If you are not familiar then check it out and do some good for everyone! http://recaptcha.net/
For those that missed the articles about C.M.'s associated project for validating all those scanned words on all those scanned pages: http://recaptcha.net/
reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA.
on book from '20s
wow. Universal access pffft
I don't want to be a party-pooper, but I wasn't that impressed with the collection. The latest chemistry books and engineering books were from 1920. A LOT has happened in chemistry and engineering since then. Are they starting with older books and slowly moving to newer ones? The Chinese collection is impressive, but hard to read (unless you know Chinese). Also the plugin only works on windows-based computers. That is sad for me.
I don't think anyone should be impressed by the size of a library. It should only depend on the quality of content.
I'll bet 99% of the content is drivel. Perhaps more.
So how many Libraries of Cogress is that?
If my call is important, why am I talking to a recording?
I picked a book at random, Dickens' tale of 2 cities. Here's the first few lines:
"TIT was the best of tunes, it was the worst of times,..."
"li was tie winter of despair, we had everything before us,..."
I guess they just OCR'd books en-masse without proof reading. Oh well, think of it as an exercise for your brain.
Am I really the only one so far to complain that this is requires a windows-only plugin? I expected better from you, slashdot!
The definition of a Library is just changing. When you look at a small Internet cafe what you are really seeing is the modern version of a Library that also caters for those who wish for some refreshments. If the old Dickensian hard copy libraries want to survive they will have to become more communal and socially active. Yes, that means having network access and a place for young people to talk. While you have them captive you can promote books with posters on the walls and seminars and social events. Its time Libraries stopped hiding behind dusty books and started becoming a public social space were people can exchange ideas, you know what Libraries were originally way back in the ancient Egyptian days of the great library of Alexandria.
Church.
I have an idea. Hear me out...
:-)
Sure, Google is currently losing the scanning wars, but they'll catch up. Someone else may join the race, and eventually there will be a single collection containing 1 BILLION books. Sure, I like to read. I also suspect other people like to read too, but who has the fucking time to read 1 BILLION books? As an average, educated male, I hate being in a discussion with someone who name-drops a book I never heard of before, as a proof that my point is invalid because I am not well read enough. It's the ultimate bitch-slap of the intellectual boxing. Wouldn't it be great to excuse yourself to the bathroom, where you would use your smart phone/mp3 player to download such book in a sound format, and within five minutes be able to get the gist of the text, and then come back to the round two of the discussion where you can even drop quotes from the previously name-dropped text?
No, no, no, I'm not talking about classic 'books on tape' as read by William Shatner or Olsen Twins. I'm talking about a sound file ripping off the Googlebooks (or whomever) summaries as read by cattle auctioneers!
Noone can prove that you used other company's scanned work to create these book-casts. You will not be copyright infringing as long as the original scanned source isn't copyright infringing. We are all tight on time. Have you had the time to read entire Dostoyevski? Imagine each one of these files being no longer than 5 minutes... Now Imagine reading the entire Hemingway in the time it takes to cut your front lawn... Can you? I can! Imagine!
I'm not joking here. I definitively see this race going that way eventually. Why not be the first one there? We just need a dozen cattle-fair hosts, gallons and gallons of Mountain Dew and we are set. As an ultimate FU to Google, we can even have the whole operation partially funded by ad-sense.
So, kind millionaire, won't you lend me a blank cheque or two?
I lived in a a small town and I agree that the small local library was a better version of the public school library, but ...
Now that I live in Portland, Oregon, this library kicks ass. 1,000s of DVDs. 1,000s of CDs. Why download songs when you can painlessly check them out from the library? I have a massive collection now.
Same with DVDs. It wasn't until my job changed and I stopped going downtown was Netflix viable.
Plus they have a lovely web interface. I just search the library from my house. Put the items on hold, and a day or two later (depending on demand) the item shows up at my local library.
Even the suburb libraries (i.e. those in a different system) are pretty good.
90% of books available today are not worth the paper they are printed on; most of the rest contain nothing original. It seems like almost all that is worth reading or studying was written before 1900. What masterpieces the 21st century has offered so far? The New Kind of Science? Is that all we, the intelligent species, are capable of now?
... You get the point.
It seems that, culturally, we are way behind compared to what we were a hundred years ago. Want to learn geometry? Read Euclid. He wrote his books thousands of years ago. Calculus? Euler is your best teacher, and has been so since 1700s. Fiction? Music? Architecture?
A lot of these books would languish in obscurity, only to be touched by very few people. Now the information is available via search, which means even more useful information can be had and these lost "works" might finally serve the purpose they were meant to serve...to educate the masses.
Printed books have their place, and does the digital library. The quality of our information is based on easily it can be accessed. A report written based on 3 sources the old way, might benefit from having 100 sources that were quickly and efficiently found digitally.
Poor countries will benefit from this digitization the most. A country's government could might not be able to afford to build a library and buy 1.5 million books, but now they don't have to.
Bearded Dragon
I've seen this story on slashdot before.
If you are a student and don't have a lot of money to buy your books (mostly in third countries ). You can find all your college textbooks in there. I think is a way better library than google and others, obviously because the copyright material. But anyways in that countries you would have photocopied the book. Or maybe because when you want to legally buy a book you find out that it cant be shipped to your country.
Already been done. Check this site: http://www.teach12.com/store/courses.asp?t=&sl=&s=905&sbj=Literature%20and%20English%20Language&fMode=s I've listened to some of their recordings and they were pretty good.
Have you considered visting Senator Larry Craig?
What was once true, is no longer so
No viewers for Linux / Firefox and the website feedback gives
/cgi-bin/udlcgi/ULIBCopyrightreport2.cgi was not found on this server.
Not Found
The requested URL
Apache/2.0.55 (Ubuntu) mod_perl/2.0.2 Perl/v5.8.7 Server at tera-3.ul.cs.cmu.edu Port 80
"90% of books available today are not worth the paper they are printed on; most of the rest contain nothing original."
That Shakespeare guy was always ripping off Plutarch and Plautus, you know, all the truly great writers.
Seriously, though, considering we're barely eight years into the 21st century, why don't you give it a little time? And give the 20th century some credit, too-- we had a nice range of original writers, from Damon Runyon to Flannery O'Connor to Wallace Stevens to Tom Stoppard. And for that matter, what makes you think that 90% of books written before 1900 were at all worth reading? Go find some Bulwer-Lytton or some Paracelsus and try to get through that tripe. Most books are bad, but you ain't got access to most books.
1.5 million books? Ok, maybe my tastes are a bit more focussed on mathematics, physics, programming, economics, and linguistics than would be the CMU library, but I just burned 3 DVDs worth of math books alone, 12GB of PDF, at roughly 8MB/title, for 1500 titles. And that was just one week's worth of crap filtering for one man. Methinks CMU isn't really trying.
-I like my women like I like my tea: green-
I find digitizing books is a cool thing, because
1. Less time to browse a book - you don't need to flip pages, just SEARCH.
2. Accessbility - In my university, library was not that big. So we had to share 2 copies among 400+ students. It was a nightmare, have to be in a que to get the book. Que usually last till the end of semester.
3. Unlimited reading time - it takes few weeks or months to extract all the details you want from a big-thick-book. But borrowing a book from library means, got to return in 2 weeks. Ahhh...
4. Reasonable in price point of view - In university, most books only need for a single semester, maximum 2. And most courses only concentrate few chapters in a book. So it is sometimes not worth to buy the big-book for a high price (well... general assumption, students are broke all the time), which you gonna use hardly after the course is done.
5. Ease in referencing - Sometimes.. when I write papers, we write things heard/read somewhere. It is much easy to find a reference by search in a digital library than browsing through in a library (most time, you don't find all the books in the shelf). Plus.. most interestingly, you have access to many titles.. which will definitley bring you a good reference at the end of the day.
how many library of congress is that?
Last time I counted, I had 800,000 e-books on disk. For a large institution, I'd expect better. Their collection probably isn't mostly sci-fi and D&D manuals though :/
If someone comes up and says, "oh, this book clearly proves my point" then you can easily come back with, "Interesting. What does it say?" And you're off again, arguing the truth against real facts. Don't let them escape by saying, "oh, it's complicated." Respond, "it's ok, I have time. Please explain."
The point is, make your goal to find out the truth, and you will always win. Don't defend ideas anymore once you know them to be false. Switch over as soon as you know you are wrong, and you will always be right. Not to mention switching drives your opponent batty.
Qxe4
It packs black and white images like crazy, though a Firefox plugin would be nice, this really is one of the best online book viewer I've seen technology wise. It's fast and pretty easy to interface with scripts, and all the images seems to be cropped.
Try the men's bathroom, Minneapolis-St. Paul International Airport. I hear that's a great place for some hot right-foot-tapping action
...that's nearly as big as my...er...friend's...MP3 collection.
AT&ROFLMAO
http://tera-3.ul.cs.cmu.edu/ULIBHelp.htm#faqBkMark
That just wasn't a good experience. I found the one book I looked for (Pilgrim's Progress) but I found the User Experience next to bad. They need to kick that up a couple of notches before I would use this over Google's books.
....Have more books available than Google, but at least Google lets you download the books for offline viewing. This library currently does not, which means for certain purposes (EG taking a library with you to an area that doesn't have internet access) this library at CMU is useless.
Mr. Burns: 'Lets see. It was the best of times, it was the "blurst" of times! You stupid monkey!'
Maybe CMU just needs to hire smarter monkeys...
I went to the site, searched for OpenGL, and got 10 books - all in Chinese.
You can review books from CMU's University Project at this Internet Archive page.
I'm assuming that they have only scanned the texts as images and have not done any OCR. (whoopeedoo...) OK, then if they are image files, then tell me again why I need a plugin?
The site is useless to me. Let me know when they can provide data in a standard format.
.
.
.
(IP address changed for this post to defeat slashdot's insanely long post flood interval.)
My last post was this one, in a completely different discussion:
http://it.slashdot.org/comments.pl?sid=374783&cid=21534247
It would be nice if they could fix that, or find a better way of insulating mod_perl from high performance demands, or at least give the real reason; that lame and obviously totally contrived "fair chance" excuse grows pretty damn tired.
I can't use the proprietary DjVu image viewer plug-in with my Mac/Firefox combo and I don't understand the use of TIFF images served one (very slow) page at a time requiring both Flash & Quicktime in my browser. Other book sources offer text and PDF formats. It is possible that there is some intent to restrict users by these unconventional formats and awkward serving procedures.
Indeed, the FAQ says that if you want to download an entire book for offline reading, you must send an email request. Your use of the book is restricted to non-commercial. So despite the fact that most of these books are around 100 years old, they seem to be claiming some rights by making copies of them.
When I zip over to Gutenberg.org; in seconds I find a list of interesting books and when I pick the venerable Kama Sutra, the entire text is on my screen in a flash. Unfortunately I acted too fast and discovered that the text I got was in French, but with such an efficient system I can correct my mistake quickly.
The CMU system would take forever to get such results and it can only be due to a deliberate obfuscation. How many of us want to read ancient science books anyway?
...omphaloskepsis often...