Carnegie Mellon's Digital Library Exceeds 1.5 Million Books
cashman73 writes "Most Slashdot readers are probably familiar with Google's book scanning project, a collaboration with several major universities to digitize works of literature, art, and science. But Google may have been beat to the punch this time -- about a decade ago, Carnegie Mellon University embarked on a project to scan books into digital format, to be made available online. Today, according to new reports, they now have a collection of 1.5 million books, the equivalent of a typical university library, available online."
http://tera-3.ul.cs.cmu.edu/
Towards the Singularity.
This site (which is found at ulib.org BTW) seems to have a pretty good collection of obvious titles to choose from, though having to download a custom plug-in to read anything is a bit annoying (and apparently temporary). I played around for a while, seeing what I could dig up, and didn't see any obvious gaps (though I purposely avoided anything modern).
As an author, I was always a bit worried having Google as the sole gatekeeper for this kind of service... not that I necessarily distrust Google's intentions, but if they changed their worldview one day, it'd be a pity to have so much work invested in only one place, and have to re-build it all somewhere else. It's nice that there are proper choices, and not all from a commercial stance either.
I don't know how smooth the integration process is (I submitted one of my books, but it appears it's a very un-automated system involving email etc, so it will probably take a while to see results). But still, I'm glad they're giving authors a way to help grow the library. Here's hoping it becomes even better than its promise!
The world's only surviving livewriter.
We can access them online. Don't need no stinkin' library ticket :-)
If I had an Ass, I'd call it Fanny Bottom, then I could slap my Ass; Fanny Bottom, on the Arse.
My local traditional library was doing just fine, last week, too. :(
Bastards!
Rotten, filthy, scum-sucking bastards!
Information is not free, and it wants to be largely neglected by a significant poriton of [american] society!
Heil Physical Media!
Because apparently the Slashdot editors can't be bothered...
http://www.ulib.org/
Traditional libraries are long dead in a pretty significant percentage of the US. I live in a fairly large city, and it's pretty much useless for anything but the level of book one would expect high school students to need. No real database access, no journals, very little in the way of primary sources for anything. It's all novels, magazines, newspapers, "subject X for dummies", and out of date encyclopedias. The wireless access there has been useful at times, but that's about it. You don't get a good library without a public willing to put in the requisite money, and fewer and fewer people are.
Everything will be taken away from you.
i really like the idea of online libraries, but i had to laugh when i got the following result for the first book that came to mind: "Please provide a valid query (Word greater than length 3)" the book was "the old man and the sea".
In case you haven't noticed, the economies of India and China are booming...in large part because of the offshoring/outsourcing from more developed countries. The wages and employment opportunities only get better in India and China due to projects like this.
I suggested to my company that we use Carnegie Mellon's reCAPTCHA program to solve two problems 1) Improve our CAPTCHA implementation 2) Help Carnegie Mellon with their online publishing initiative. To my pleasant surprise I recently found the company decided to go ahead with reCAPTCHA. Sweet! If you are not familiar then check it out and do some good for everyone! http://recaptcha.net/
For those that missed the articles about C.M.'s associated project for validating all those scanned words on all those scanned pages: http://recaptcha.net/
reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA.
on book from '20s
wow. Universal access pffft
I don't want to be a party-pooper, but I wasn't that impressed with the collection. The latest chemistry books and engineering books were from 1920. A LOT has happened in chemistry and engineering since then. Are they starting with older books and slowly moving to newer ones? The Chinese collection is impressive, but hard to read (unless you know Chinese). Also the plugin only works on windows-based computers. That is sad for me.
So how many Libraries of Cogress is that?
If my call is important, why am I talking to a recording?
I picked a book at random, Dickens' tale of 2 cities. Here's the first few lines:
"TIT was the best of tunes, it was the worst of times,..."
"li was tie winter of despair, we had everything before us,..."
I guess they just OCR'd books en-masse without proof reading. Oh well, think of it as an exercise for your brain.
The definition of a Library is just changing. When you look at a small Internet cafe what you are really seeing is the modern version of a Library that also caters for those who wish for some refreshments. If the old Dickensian hard copy libraries want to survive they will have to become more communal and socially active. Yes, that means having network access and a place for young people to talk. While you have them captive you can promote books with posters on the walls and seminars and social events. Its time Libraries stopped hiding behind dusty books and started becoming a public social space were people can exchange ideas, you know what Libraries were originally way back in the ancient Egyptian days of the great library of Alexandria.
Church.
I have an idea. Hear me out...
:-)
Sure, Google is currently losing the scanning wars, but they'll catch up. Someone else may join the race, and eventually there will be a single collection containing 1 BILLION books. Sure, I like to read. I also suspect other people like to read too, but who has the fucking time to read 1 BILLION books? As an average, educated male, I hate being in a discussion with someone who name-drops a book I never heard of before, as a proof that my point is invalid because I am not well read enough. It's the ultimate bitch-slap of the intellectual boxing. Wouldn't it be great to excuse yourself to the bathroom, where you would use your smart phone/mp3 player to download such book in a sound format, and within five minutes be able to get the gist of the text, and then come back to the round two of the discussion where you can even drop quotes from the previously name-dropped text?
No, no, no, I'm not talking about classic 'books on tape' as read by William Shatner or Olsen Twins. I'm talking about a sound file ripping off the Googlebooks (or whomever) summaries as read by cattle auctioneers!
Noone can prove that you used other company's scanned work to create these book-casts. You will not be copyright infringing as long as the original scanned source isn't copyright infringing. We are all tight on time. Have you had the time to read entire Dostoyevski? Imagine each one of these files being no longer than 5 minutes... Now Imagine reading the entire Hemingway in the time it takes to cut your front lawn... Can you? I can! Imagine!
I'm not joking here. I definitively see this race going that way eventually. Why not be the first one there? We just need a dozen cattle-fair hosts, gallons and gallons of Mountain Dew and we are set. As an ultimate FU to Google, we can even have the whole operation partially funded by ad-sense.
So, kind millionaire, won't you lend me a blank cheque or two?
90% of books available today are not worth the paper they are printed on; most of the rest contain nothing original. It seems like almost all that is worth reading or studying was written before 1900. What masterpieces the 21st century has offered so far? The New Kind of Science? Is that all we, the intelligent species, are capable of now?
... You get the point.
It seems that, culturally, we are way behind compared to what we were a hundred years ago. Want to learn geometry? Read Euclid. He wrote his books thousands of years ago. Calculus? Euler is your best teacher, and has been so since 1700s. Fiction? Music? Architecture?
A lot of these books would languish in obscurity, only to be touched by very few people. Now the information is available via search, which means even more useful information can be had and these lost "works" might finally serve the purpose they were meant to serve...to educate the masses.
Printed books have their place, and does the digital library. The quality of our information is based on easily it can be accessed. A report written based on 3 sources the old way, might benefit from having 100 sources that were quickly and efficiently found digitally.
Poor countries will benefit from this digitization the most. A country's government could might not be able to afford to build a library and buy 1.5 million books, but now they don't have to.
Bearded Dragon
If you are a student and don't have a lot of money to buy your books (mostly in third countries ). You can find all your college textbooks in there. I think is a way better library than google and others, obviously because the copyright material. But anyways in that countries you would have photocopied the book. Or maybe because when you want to legally buy a book you find out that it cant be shipped to your country.
Already been done. Check this site: http://www.teach12.com/store/courses.asp?t=&sl=&s=905&sbj=Literature%20and%20English%20Language&fMode=s I've listened to some of their recordings and they were pretty good.
Sure, most of the digitization was done in China... but the vast majority of the books on the site are Chinese, too. Of the 1.5 million books in the collection, almost 1 million of them are Chinese. English accounts for most of the rest at 362508 books.
Have you considered visting Senator Larry Craig?
What was once true, is no longer so
I don't think anyone should be impressed by the size of a library. It should only depend on the quality of content.
I'll reserve judgement on quality until I can read 970,000+ chinese books
Your'e all thinking it, I just said it for you
No viewers for Linux / Firefox and the website feedback gives
/cgi-bin/udlcgi/ULIBCopyrightreport2.cgi was not found on this server.
Not Found
The requested URL
Apache/2.0.55 (Ubuntu) mod_perl/2.0.2 Perl/v5.8.7 Server at tera-3.ul.cs.cmu.edu Port 80
1.5 million books? Ok, maybe my tastes are a bit more focussed on mathematics, physics, programming, economics, and linguistics than would be the CMU library, but I just burned 3 DVDs worth of math books alone, 12GB of PDF, at roughly 8MB/title, for 1500 titles. And that was just one week's worth of crap filtering for one man. Methinks CMU isn't really trying.
-I like my women like I like my tea: green-
how many library of congress is that?
Last time I counted, I had 800,000 e-books on disk. For a large institution, I'd expect better. Their collection probably isn't mostly sci-fi and D&D manuals though :/
If someone comes up and says, "oh, this book clearly proves my point" then you can easily come back with, "Interesting. What does it say?" And you're off again, arguing the truth against real facts. Don't let them escape by saying, "oh, it's complicated." Respond, "it's ok, I have time. Please explain."
The point is, make your goal to find out the truth, and you will always win. Don't defend ideas anymore once you know them to be false. Switch over as soon as you know you are wrong, and you will always be right. Not to mention switching drives your opponent batty.
Qxe4
It packs black and white images like crazy, though a Firefox plugin would be nice, this really is one of the best online book viewer I've seen technology wise. It's fast and pretty easy to interface with scripts, and all the images seems to be cropped.
If you really want access, then you have to pay up and/or take the extra time to find somewhere you can get them for free.
First, in my field (astrophysics) most articles are now e-printed or at least opened up after a few years. ApJ (Astrophysical Journal) has unrestricted access to all articles older than 3 years and all articles older than 1996 are available at a free NASA/Harvard site (ADS). So basically, unless you want the absolute latest articles (which for most things you don't need) you can get them for free (and even then usually through arxiv). And if you need the latest article then, as you said, pay the fee and buy it.
Second, if you need some kind of technical book, talk to the librarians. Most of them will try to help and you can usually get it for free (or a small fee) through an inter-library loan. It might take a few weeks, but you can definitely do it without even leaving the library.
Third, take a look at the universities near you. Most allow open access to the stacks and computers. You can spend a whole day reading a book or using the university computers to access journals without paying anything. Some even allow borrowing privileges for free or for a fee. Take a look at Columbia in New York City or UCLA.
So yes, public libraries don't have journals. They're far from dead though, because they don't serve that need. If you really want those sort of things, then you need to go out there and get access yourself.
Hey! CMU did not kill it.
I continue to own large number of books as print copies (Churchill's 6 vol second world war, William Shirer's Rise and Fall of 3rd reich, Clausewitz On War, Arthashastra, etc ).
I do own many books on mobipocket copies, but nothing beats a paper.
Lets see how Kindle catches up.
"Doing what i can, with what i have." ~ Burt Gummer
...that's nearly as big as my...er...friend's...MP3 collection.
AT&ROFLMAO
http://tera-3.ul.cs.cmu.edu/ULIBHelp.htm#faqBkMark
That just wasn't a good experience. I found the one book I looked for (Pilgrim's Progress) but I found the User Experience next to bad. They need to kick that up a couple of notches before I would use this over Google's books.
Mr. Burns: 'Lets see. It was the best of times, it was the "blurst" of times! You stupid monkey!'
Maybe CMU just needs to hire smarter monkeys...
You can review books from CMU's University Project at this Internet Archive page.
I happen to love libraries... I think many of them are outdated, but ... hey... it's a public space to read, browse the internet, listen to lectures, etc.
Recently our campus library allowed people to bring food and talk on 3 of the 4 levels. It's great... I don't think I could have learned Calculus anywhere else. There simply are too many distractions at the local coffee shop.
In small towns, public libraries are a resource for local history. In medium towns and up, financial databases like Value Line may be available. In all cases, as you said, libraries are a source of novels, magazines, and newspapers for free, which can be a significant benefit for those without a fair amount of available cash. Internet access is in place at most libraries, and it's popular enough that use has to be rationed.
Contribute to civilization: ari.aynrand.org/donate
I can't use the proprietary DjVu image viewer plug-in with my Mac/Firefox combo and I don't understand the use of TIFF images served one (very slow) page at a time requiring both Flash & Quicktime in my browser. Other book sources offer text and PDF formats. It is possible that there is some intent to restrict users by these unconventional formats and awkward serving procedures.
Indeed, the FAQ says that if you want to download an entire book for offline reading, you must send an email request. Your use of the book is restricted to non-commercial. So despite the fact that most of these books are around 100 years old, they seem to be claiming some rights by making copies of them.
When I zip over to Gutenberg.org; in seconds I find a list of interesting books and when I pick the venerable Kama Sutra, the entire text is on my screen in a flash. Unfortunately I acted too fast and discovered that the text I got was in French, but with such an efficient system I can correct my mistake quickly.
The CMU system would take forever to get such results and it can only be due to a deliberate obfuscation. How many of us want to read ancient science books anyway?
...omphaloskepsis often...