Proposal: Put Library of Congress' Contents Online
Mark_Uplanguage writes "The idea to scan in all materials available at the U.S. Library of Congress was presented at the Web 2.0 conference this week (as just one of many ideas presented). The proposed cost of $260 million would create a huge benefit to society (well, at least to those who can read English)."
Pardon me for sounding like an eegnoramoose, but isn't at least some of the material in the Library of Congress copyrighted material? Putting it all online would let people get copies of it for *gasp* FREE.
Can't have that, now can we?
This would violate the publishers' god-given right to milk their "creations" until the heat-death of the Universe.
How data much storage would this require? Could someone give it to me in laymen's terms?
Since Congress and the President can so easily pull out a hundred billion dollars to bomb the hell out of another country, I see no reason we can't come up with a whimpy $260 million for something as worthwhile as this.
I want a new quote. One that won't spill. One that don't cost too much. Or come in a pill.
The government has proposed recently. I would also suggest that they put in place requirements that all future material that is to be copyrighted present appropriate copies in machine readable form so this will be cheaper in the future.
well, at least to those who can read English
Correct me if I'm wrong, but doesn't the LOC contain all materials registered with the US copyright office? In which case it would have any foreign materials registered for copyright protection.
Javascript + Nintendo DSi = DSiCade
It would probably pay for itself too since FBI agents would no longer have to travel to libraries to secretly gather records of who borrowed what. They can just use Carnivore to do it instead.
Finally, Slashdot can establish that for official purposes:
1 Library of Congress = $260M
And the 2004 US Federal budget can be spec'd at 0.000243754522 LoC:s (Libraries of Congress per second).
--
make install -not war
At long last, we shall finally know just how much one unit of Libraries of Congress is. This could quite possibly have profound effects on how we understand the universe. For example, for many years we have known that the universe is approximately 42 Libraries of Congress. Now we can fully understand its meaning.
Putting the LoC on-line is only the first step. How long before those Internet book printing stations that can create an entire book for you from an electronic image in a deciminute for $1 tap into this? I'd have to think that this would be good for everyone except B&N who are busy reprinting old classics under their own label right now.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
you've perused the Libray of Congress, but have you perused the Library of Congress Online
In a traditional library it's not really easy to...
...all within 30 minutes.
1. walk in and pick up a book
2. strike the author's name from it and replace it with your own
3. replace the copyright notice with your own
4. Make one thousand perfect copies
5. Offer it for sale, start taking orders, and PROFIT!
I could easily do that on the internet.
Right now, Internet2 can download the entire Library of Congress in about 20 seconds.
I'm not aware of any PIAA for publishers, but somebody is going to have a problem with this. And by the time this actually happens, I bet there will be an Internet4 that can do it all in 20ms.
Punctanym: alternate spelling of words using punctuation or numerals in place of some or all of its letters; see 'leet'
The article claims that the LOC stored as image data would take up 1 TB.
That's wildly underestimated IMO. The LOC has 26 million books. If we conservatively assume that they each have at least 100 pages, that is 2.6 billion images. That equals 0.03 kb per image. That's some REAL good compression for an image as large as a full page of text.
Oh yeah, put 'em all online. I have a hard enough time already in libraries and book stores! If I could read any book I wanted to (even if they're only the ones already out of copyright) online, I'd probably not leave my computer until I passed out!!
A 0-rated post noted that this type of free access is a big deal to people who make an honest living publishing their creations.
This invokes a big, important question. The rise and flourish of the information age has and will continue to provide unbelievable freedom of access to unbelievable amounts of information. Where and how do we draw the line between the freedom of the consumers and the rights of the creators?
I'm a software developer who loves movies: I'm a creator and a consumer, so I see both sides of this coin. And I think there needs to be a compromise between consumers and creators.
Consumers need to realize that at a certain point, amassing more music, or more books, or more movies, or more whatever, becomes a luxury, not a right. So if the price of music prevents you from having a 10,000 song collection, I'm sorry but, "so sad too bad." That's how it's always been for just about every other purchaseable product. Sometimes you have to sacrifice what you merely want to get what you really desire.
Creators need to understand that the information they produce is a drop in the bucket compared to, for example, the estimated yottabyte (1x10^24 bytes) of information on the Internet. So if you want to make money off your creation, it had better stand out, because there's a lot of noise out there to drown it out. Simply put, if you want to get paid, make something people are willing to pay for.
I might inherit a portion of his farm. But that's a result of money that he saved at the time. I do not collect royalties on the *work* that he did 70 years ago.
If an author or musician wants to leave an inheritance, then they should save the money they make during a reasonable copyright term, and give that to their children. They can leave their typewriters, musical instruments, and other tools of the trade (analagous to a farm) as well.
They might have to actually forego a blowing everything they earn on cocaine and refrain from signing away most of their income on bad contracts to actually achieve this, but then so do the rest of us.
Maybe im the odd duck here but somehow waay back in early net days..the 90's i thought that this was such an obvious application of internet technology that it must be part of the original design purposes for the internet (darpanet and all that funding of course)
So the only surprise to me is that were just now hearing a proposal to do this??? sheesh, if i hadnt thought it so completely obvious to every netizen at those old public library terminals i wouda lost so much seep making it happen!!!
so now who's going to do it? and while its limboing through congress can we just put together a consortium to visit thie library we aready own with our digital camera's and OCR the thing into existence... how many of us woud need to donate our gmail 1g accounts to store it all?
Work for who? I think you are still confused from the dotcom era still. You must be thinking that "change society and business" means that scanning the entire LoC can make someone money (advertising??)
The important part in this case is the changing society part of the statement, which is what the vast potential of the net is capable of doing. It won't help you make money based on a bad idea (in fact, it may only help you lose money faster!) but it does have the potential to change the way a society views and deals with information.
Right now there is a vast amount of knowledge in the LoC that is effectively out of the ordinary citizen's hands. That is not how it should be. If knowledge is power, there is a storehouse of power waiting to be unleashsed by giving everyone access to what is being stockpiled. It won't happen over night, or in a few years, but eventually it will have a ripple effect. Historians lament the loss of the Great Library of Alexandria, but what difference would it have made if only a few could actually use the information that was contained?
I just downloaded the LoC.ps.tgz from the local WPI Internet2 tap using gnutella and my printer just ran out of ink....
Not only the Library of Congress of the Unites States of America, we should also scan every big library in the world to create a pool of human work to freely share and preserve.
What's in a sig?
I'd take you up on that offer, but it would be money wasted as you simply can not do the job for that little money.
The LOC doesn't just contain nice black and white typed texts. There are hand written documents in organic inks on animal hide and poorly constructed paper. There are paintings in every medium you can imagine and there are sound recordings on just about every media ever used: wax tubes, glass disks, wire spools, open reel, 8-track, cassette, CD, DVD, etc.
Each of these things needs to be digitized, categorized, indexed and offered in a searchable manner. A printed page, for example, will need to be photographed and transcribed/OCRed.
Much of the work needs to be done on delicate objects that may be destroyed if not handled correctly. If you were to play a wax recording disk with too much pressure, or under the wrong environmental conditions, the disk would shatter in to an irreparable pile of small bits.
What formats will you store them in? What formats will you make them available in?
Article X: The powers not delegated... by the Constitution...are reserved...to the people
As an author, I wonder how much of your valued craft was honed by reading the work of others for education and inspiration. How many books did you buy in elementary school, or high school? Yet that's where you learned your precious language skills you now market.
Knowledge, even the limited knowledge of an author, does not exist in a vacuum. You read, you learn, you practice, then you create. You could not have done this without the beneficence of others who aren't making a dime off the education they provided you.
To unleash the vast amounts of knowledge stored up in the LOC to the world would be one of the single best things this country could do for mankind. One book, one reader my hairy ass. Why not open the floodgates so everyone can benefit?
I understand the motivation of monetary incentives, but I also know a lot of great authors who died penniless. And they were at least brave enough to sign their names to their ideas.
What a cool idea and, even "if" the dollar estimate is too low, who cares? $260M is chump change for our gov't.
Right now, the only way to access the stuff in LoC is to go there in person. Anyone can do it but you have to travel to WashDC and pass through security and so forth to get into the LoC public reading room. Then you have to ask the librarian to pretty-please bring you the book that you want.
Now imagine that you can access any item in the LoC by simply entering the building and using a public kiosk with a browser. LoC's software would only permit use within the copyright so that is OK. But you don't have to mess with as much security because LoC isn't handing over the physical book.
Now imagine that, from any web browser, you can access any book in the LoC for which the copyright has expired. I like that idea!
My opinion... skip the buy on the next couple of cruise missiles and digitize LoC's books instead.
Oh yeah, before I forget, LoC already has tons of seriously neat stuff online. My favorite is this collection of tons photos from Russia. These were taken between about 1907 and 1915! I don't know about you, but I never dreamed that I would see color photos that are almost 100 years old.
Cheers,
-- Art Z.
Of course this instantly deteriorates into a discussion about the shameful state of IP and copyright laws, the need to pool all human knowledge, and how crappy the US budget deficit is.
If you go to the LOC's site, you'll notice American Memory on the front page.
American Memory is where you can get a good portion of the public domain stuff (books, letters from immigrants to their families back home, photos of civil war enlistees, audio, Edison-era short movies) for free in a low-quality format. Archival quality copies and custom scans/recordings are available for $$$. Almost any work in the LOC can be scanned on request (3 week waiting time or so); this is how they manage to continue adding scans to their collection without requiring public or private funding. It's underfunded as it is and needs more bandwidth.
This idiot in the article's proposal is completely unrealistic. Books can contain 100,000 to 5,000,000 characters. That's 100k-5Mb per book, times 26,000,000 books. That's not including the images and illustrations in some of these works. Many of the texts have value beyond the words they contain. We may be talking about image scanning the pages to preserve the look of the type, paper, and images. Archival TIFFs, since that's what the LOC uses.
The article also mentions $60 thousand to 'store' this data (per month?, per year?, just once???, what about access?, searching?, redundant backups?). Another unrealistic number, even working off of the 1TB estimate.
Death and danger are my various breads and various butters.