Carnegie Mellon's Digital Library Exceeds 1.5 Million Books
cashman73 writes "Most Slashdot readers are probably familiar with Google's book scanning project, a collaboration with several major universities to digitize works of literature, art, and science. But Google may have been beat to the punch this time -- about a decade ago, Carnegie Mellon University embarked on a project to scan books into digital format, to be made available online. Today, according to new reports, they now have a collection of 1.5 million books, the equivalent of a typical university library, available online."
http://tera-3.ul.cs.cmu.edu/
Towards the Singularity.
This site (which is found at ulib.org BTW) seems to have a pretty good collection of obvious titles to choose from, though having to download a custom plug-in to read anything is a bit annoying (and apparently temporary). I played around for a while, seeing what I could dig up, and didn't see any obvious gaps (though I purposely avoided anything modern).
As an author, I was always a bit worried having Google as the sole gatekeeper for this kind of service... not that I necessarily distrust Google's intentions, but if they changed their worldview one day, it'd be a pity to have so much work invested in only one place, and have to re-build it all somewhere else. It's nice that there are proper choices, and not all from a commercial stance either.
I don't know how smooth the integration process is (I submitted one of my books, but it appears it's a very un-automated system involving email etc, so it will probably take a while to see results). But still, I'm glad they're giving authors a way to help grow the library. Here's hoping it becomes even better than its promise!
The world's only surviving livewriter.
Traditional libraries are long dead in a pretty significant percentage of the US. I live in a fairly large city, and it's pretty much useless for anything but the level of book one would expect high school students to need. No real database access, no journals, very little in the way of primary sources for anything. It's all novels, magazines, newspapers, "subject X for dummies", and out of date encyclopedias. The wireless access there has been useful at times, but that's about it. You don't get a good library without a public willing to put in the requisite money, and fewer and fewer people are.
Everything will be taken away from you.
i really like the idea of online libraries, but i had to laugh when i got the following result for the first book that came to mind: "Please provide a valid query (Word greater than length 3)" the book was "the old man and the sea".
In case you haven't noticed, the economies of India and China are booming...in large part because of the offshoring/outsourcing from more developed countries. The wages and employment opportunities only get better in India and China due to projects like this.
> Those books are still copyrighted, the publisher won't sell you a copy, yet they
> want to deny everyone access to it.
They have to follow the law so I forgive them on books under copyright. But they don't appear to even want to make it easy to access complete copies of books that are out of copyright. You can write them and ask for a full copy of a book. Bah. And no easy way to mirror the site (even just the out of copyright material) either.
Our library already hosts a Project Guttenberg mirror. Doing some back of the envelope math says we would need to bulk up the RAID somewhat more to even take the public domain english content from this project since it is all TIFFs but it would be something we would consider if it were easy (rsync) and the content were in a form that would actually be, ya know, USEFUL!
Democrat delenda est
For those that missed the articles about C.M.'s associated project for validating all those scanned words on all those scanned pages: http://recaptcha.net/
reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA.
So how many Libraries of Cogress is that?
If my call is important, why am I talking to a recording?
Copyright law in the US started out pretty reasonable - 20 years from the date of registration. Walt Disney spent alot of money and lobbied the government for another 20 year period. Before this could expire, they lobbied to have copyright terms extended to the life of the author plus 20 years. As a result of the Sonny Bonno act, it was expanded to the life of the author plus 75 years. (NOTE: this is a very brief approximation of US copyright law history - it was actually somewhat more complex than this and with several more twists and turns). See here for a detailed explanation.
The functional result of this lobbying is that no US copyrighted work created since 1923 has lapsed into the public domain (unless the owner screwed up by not renewing the copyright at the appropriate juncture).
"If you think you have things under control, you're not going fast enough." --Mario Andretti
I picked a book at random, Dickens' tale of 2 cities. Here's the first few lines:
"TIT was the best of tunes, it was the worst of times,..."
"li was tie winter of despair, we had everything before us,..."
I guess they just OCR'd books en-masse without proof reading. Oh well, think of it as an exercise for your brain.
The definition of a Library is just changing. When you look at a small Internet cafe what you are really seeing is the modern version of a Library that also caters for those who wish for some refreshments. If the old Dickensian hard copy libraries want to survive they will have to become more communal and socially active. Yes, that means having network access and a place for young people to talk. While you have them captive you can promote books with posters on the walls and seminars and social events. Its time Libraries stopped hiding behind dusty books and started becoming a public social space were people can exchange ideas, you know what Libraries were originally way back in the ancient Egyptian days of the great library of Alexandria.
Already been done. Check this site: http://www.teach12.com/store/courses.asp?t=&sl=&s=905&sbj=Literature%20and%20English%20Language&fMode=s I've listened to some of their recordings and they were pretty good.
You are mistaken, and for this you should be glad. It often takes several years for masterpieces to be recognized as such, so it shouldn't surprise you that nothing you like has been acclaimed. I'm not a high culture joe myself, so please don't be offended, but today's high culture may be incomprehensible to you because you aren't sophisticated enough to appreciate it. If you grow up watching Fantasia, it is easier to enjoy Stravinski. As for originality, the tale is in the telling. People of years past lived and died much as we do, a bit more fresh air and hard work maybe but basically the same. Basically. They were us first, what are you going to do? Culturally we are far, far ahead of the 1907 crowd. Your image of 1899 is almost certainly based on the western upper class (listening to Wagner) rather than the teeming western poor (listening to minstrel shows) or the uncountable colonized listening to whips, maxim guns, pickaxes and sermons.
Sure, most of the digitization was done in China... but the vast majority of the books on the site are Chinese, too. Of the 1.5 million books in the collection, almost 1 million of them are Chinese. English accounts for most of the rest at 362508 books.
1.5 million books? Ok, maybe my tastes are a bit more focussed on mathematics, physics, programming, economics, and linguistics than would be the CMU library, but I just burned 3 DVDs worth of math books alone, 12GB of PDF, at roughly 8MB/title, for 1500 titles. And that was just one week's worth of crap filtering for one man. Methinks CMU isn't really trying.
-I like my women like I like my tea: green-
Also worth asking, are you willing to learn 2000+ year old greek to read Euclid or for Euler learn Latin (the language in of scholarship in his time)? One reason that we have and use more modern math textbooks is changes in language and notation over time. Also it is often the case that the original proof is far from the best that has been found since there is now more structure developed in later works that allows either condensing or a novel approach. If you limit yourself to pre-1900 works, you throw out the vast majority of Graph Theory losing all contributions by Erdos, Kuratowski, Tutte, Ramsey, etc. Sorry, there are areas of math that need at least up to the 1950's to get major theorems.
If someone comes up and says, "oh, this book clearly proves my point" then you can easily come back with, "Interesting. What does it say?" And you're off again, arguing the truth against real facts. Don't let them escape by saying, "oh, it's complicated." Respond, "it's ok, I have time. Please explain."
The point is, make your goal to find out the truth, and you will always win. Don't defend ideas anymore once you know them to be false. Switch over as soon as you know you are wrong, and you will always be right. Not to mention switching drives your opponent batty.
Qxe4
If you really want access, then you have to pay up and/or take the extra time to find somewhere you can get them for free.
First, in my field (astrophysics) most articles are now e-printed or at least opened up after a few years. ApJ (Astrophysical Journal) has unrestricted access to all articles older than 3 years and all articles older than 1996 are available at a free NASA/Harvard site (ADS). So basically, unless you want the absolute latest articles (which for most things you don't need) you can get them for free (and even then usually through arxiv). And if you need the latest article then, as you said, pay the fee and buy it.
Second, if you need some kind of technical book, talk to the librarians. Most of them will try to help and you can usually get it for free (or a small fee) through an inter-library loan. It might take a few weeks, but you can definitely do it without even leaving the library.
Third, take a look at the universities near you. Most allow open access to the stacks and computers. You can spend a whole day reading a book or using the university computers to access journals without paying anything. Some even allow borrowing privileges for free or for a fee. Take a look at Columbia in New York City or UCLA.
So yes, public libraries don't have journals. They're far from dead though, because they don't serve that need. If you really want those sort of things, then you need to go out there and get access yourself.