Internet Archive Challenges Google
richards1052 writes "The Internet Archive, whose main claim to fame is the Wayback Machine, designed to archive the internet's web history, has created a new project: the Open Content Alliance. It's purpose is to open the nation's library collections to universal web search.
A number of major library systems, including the Boston Public Library and Smithsonian, have refused to sign up with competing ventures by Microsoft and Google because they do not provide for universal access to digitized books. These commercial ventures prohibit books being accessed by competing search engines.
So far, 80 libraries and research institutions have signed on with Open Content Alliance. They must pay for the scanning of their books while Google and Microsoft offset that cost for their participating institutions."
I believe I've commented on something like this before. Might be a good idea to archive the books lest somewhere in the future we re-live something like the Spanish Inquisition where important literature was lost. Its also making this society a bunch of couch potatoes. What ever happened to walking into a quiet library, the smell of stale books, looking around at people. Its slowly being replaced by reading books online and hitting ctrl-w to close annoying popups while you read. Currently I have about 30+ Cisco (CCIE/NP/IP/etc) books and each come with their PDF's. At first I thought, neat I can read them on my laptop... Nowadays I find its easy to just open the book, nothing like butchering my books up with highlighters... This world is coming to one where companies will be fighting to keep us locked in our houses. Call me a troll, just speculation
Infiltrated dot Net
... but on a much larger scale?
Have EVDO, will travel.
The Libraries Shun Deals to Place Books on Web story in The New York Times covers the subject fairly well.
Recognising the restrictions of the current iterations currently available and working to provide a better resource that most or all libraries will support. The free exchange of ideas (not entertainment for those of you who download your entire music libraries from Kazaa) will promote progress across the board.
Someone save me from this sanity.
is there any estimate on how long it will take all these projects combined to scan the entire existing catalog of books, accounting for expansion and development of better technologies to do the scanning, etc?
stuff |
I buy a lot of books. I've got probably 10,000 or so. I wish I could search through them. Some for reference, sometimes because I read something that sounds familiar that I want to find where I first read it. I'd also like to read them on my PC sometimes, or even on my phone like when I'm waiting for a while somewhere. And I'd like to copy/paste short passages from them into messages I send on the Internet.
If this project is really "open", can I have my own libarary scanned? How much does it cost? I own the rights to copy my own books for my own personal use. Does something make these other "official" libraries eligible to use their full rights to their content in a way that I cannot?
--
make install -not war
How many of these libraries think of Open Source and software platform choice? How many of them make sure their web sites are platform agnostic, equally accessible from all browsers? These people are willing to stand up and are willing to pay more to preserve their liberty. Hats off to them. But does this stand also extends to not having their documents locked down in a proprietary format encumbered with licenses and restrictions? I would very much like such ideas, being independent of vendors, would extend to Corporate America too.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
It has got many more documents. And often /all/ versions ever of the document. And it doesn't just store HTML but much much more.
Internet archive is an excellent service. Best of luck!
...it boils down if I can use my torrent client to download the stuff (=good) or not (=irrelevant to my life).
In case you missed this discussion back on October 2, Carnegie Mellon has a service which helps to better digitize these books. It's called Recaptcha, and it uses otherwise wasted human cycles to convert text that was hard for computers to OCR.
There's a story about this in The New York Times this morning (free reg required). It begins:
The opposition between the Open Content Alliance and Google may not be as much as it seems at first glance. From the NYT article:
It looks like Google will digitize the collection for free in exchange for exclusive rights to offering searches of the digital data, but the libraries don't give up rights to have someone else digitize the stuff again and do with it as they see fit. So they can go with Google for now if they want and the O.C.A. later as they have the resources. This seems pretty reasonable to me. I don't know what the deal Microsoft is offering looks like, but I wouldn't be surprised if it's much more restrictive.
"You call it a new way of thinking; I call it regression to ignorance!" -- Operation Ivy
don't call him shirley.
Nobody expects the Spanish Inquisition.....
(Can't believe I'm the first one to respond with that. Of course by now I'm probably not. )
Nobody expects the Spanish Inquisition!
(I couldn't bear to leave you hanging.)
Ben Hocking
Need a professional organizer?
All is fine just as long they don't resort to shredding the books ;)
Hi. My name is *________*. This goes to all Anonymous Cowards who Troll the earth. I am sure me and the rest of us here at Slashdot would greatly appreciate your opinions if you could refrain from blatantly using perverted childish sarcasm and completely racist corollaries. Unfortunately, it seems you have no other means of communication and diction, hereby eliminating you from ever possibly meeting somebody of the opposite sex, with communication being the foundation of a strong relationship and all, and therefore by posting your highly inspirational post have thus erased your future offspring from the gene pool be preemptive default. Continue your moronic postings to further validate my case.
'We are trying to prove ourselves wrong as quickly as possible, because only in that way can we find progress.' RPF
Comment removed based on user account deletion
Which books are digitized anyway? With copyright being as ridiculous as it is (what is it, 50-100 years after the death of the author?), are we likely to see anything modern in such a collection? I would hope that libraries would have some sort of exemption from this, except that in this case it sounds like the data might be used for commercial searches. I also wonder if these will be regular PDFs or if there will be some sort of DRM on them. Can anyone more knowledgable weigh in on this?
It is a miracle that curiosity survives formal education. - Einstein
Most modern books are created in electronic form to begin with and are printed with high speed offset printers from files. Only older books have to be scanned.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
I hope libraries in other countries will be included as well. Please start with Sweden. :-)
Swedish plasma phys. PhD student; MSc EE; knows maths, programming, electronics; finance interest; seeks opportunities
I agree that a nice, hard bound book is, at the moment, more pleasant to read. However, technologies such as e-Ink and others that allow you to read something digitally without the eye-strain of using a back lit monitor are catching on. I think a few factors make digital copies more advantageous - cost of duplication, storage, protection from damage, searchability.
Storage: I just moved, and I moved three bookcases full of books. That sucked. If those were all digital, I'd have hauled my computer from A to B and brought all of my books with me. In addition, I moved to a smaller house. Trying to find a place for my three bookcases of books has been impossible.
Cost of duplication: With digital copies, books can be distributed without the overhead costs of printing and shipping.
Protection from damage: Many of the books housed in libraries, particularly places like the Smithsonian, are no longer in print. If it's destroyed, regardless of whether it's an accident or a malicious act, it's gone. The library may be able to get another copy from a benevolent individual or the last copy may have just been destroyed. With a digital copy, you can make back-ups of your back-ups... safeguarding the content of that book.
Searchability: This is my favorite... Who hasn't spent 30 minutes skimming a book trying to find THAT ONE PAGE!? It drives me nuts. Searching would make books sooo much more convenient.
You are using English. Please learn the difference between loose and lose; they're, there, and their; your and you're.
I always have loved the archives... Add to this the The National Digital Newspaper Program (NDNP)
http://www.neh.gov/projects/ndnp.html
and Project Gutenburg and librivox, and you have one heck of a great amount of info online freely available to everyone!
The wayback machine...I don't think I've ever used a website that was so lagged yet still technically functioned. Sort of.
Comment removed based on user account deletion
The Internet Archive is collecting public domain scanned book whereas google and m$ are collecting both PD and copywritten books. the idea is once you find a copywritten book under the search, links appear where you can purchase a hardcopy of the book. you get to view "selected pages" from these books, however PD books can be downloaded etc, with the watermark of the scanning parent at the foot. the watermark prevents opposing scanners from adding them to their collection since it would essentially be advertising the other service. when you go to 'book search' only the books scanned by their own perspective service will appear. the internet archive is simply not watermarking their collection, but as far as i know they still dont include PD google scans in their searches.
Cause googles totally sucks with it's incompetent design.
If Google really cared they would fix Android Chrome to reflow text, instead of discriminating
Yeah, conspiracy theories are usually quite easy to posit. That doesn't mean they have a bit of merit. Get over yourself—you're the majority, and you're not being persecuted in this country. (Yes, there are Christians being persecuted in countries where they're not the majority, and it is genuinely a travesty. Don't you dare try to use their suffering to perpetuate your persecution complex in this country.) That future you posit is actually less likely than Bush masterminding 9/11 (which he didn't).
Ben Hocking
Need a professional organizer?
Ben Hocking
Need a professional organizer?
This project predates google's scanning project by several years. Brewster tried to get google involved, but as usual they decided to go alone. While the OCA was announced in 2005, it was an offshoot of the Internet Archive/CMU/Raj Reddy's Million Book project which was started in 2001 with books being scanned in India.
Pay close attention to that last line. I replayed the game not long ago and was shocked how many quotes from it could be taken in context and applied to today. Of course, generic enough quotes can apply to may eras but I think my point is still valid.
We Must Dissent
Dream as if you'll live forever.
Live as if you'll die tomorrow.
~Anonymous~
First of all, are you referring to that liberal democratic state that elected Schwarzenegger to be governor? The same state that gave us Ronald Reagan? (Just checking.) That said, gun control is admittedly an issue of the left, so your primary point is valid. Still, the left is anti-pornography?!? More so than the right?!? (This was the claim that surprised me the most.)
Ben Hocking
Need a professional organizer?
"Cause googles totally sucks with it's incompetent design."
Indeed - they don't care about readability for anyone but their little office. Try disabeling fontsize in msie6 and you have a stamp sized window to read the messages in.
It used to be good before Google ruined it.
If Google really cared they would fix Android Chrome to reflow text, instead of discriminating
I'm not sure exactly which special case you're referring to, but I agree whole-heartedly that Democratic politicians are typically more politician than Democrat. The same holds for Republican politicians as well, of course. Well, obviously they're more politician than Democrat. I mean they're also more politician than Republican. Now stop being so silly. ;)
Ben Hocking
Need a professional organizer?
Ben Hocking
Need a professional organizer?