Proposal: Put Library of Congress' Contents Online
Mark_Uplanguage writes "The idea to scan in all materials available at the U.S. Library of Congress was presented at the Web 2.0 conference this week (as just one of many ideas presented). The proposed cost of $260 million would create a huge benefit to society (well, at least to those who can read English)."
well, at least to those who can read English
Correct me if I'm wrong, but doesn't the LOC contain all materials registered with the US copyright office? In which case it would have any foreign materials registered for copyright protection.
Javascript + Nintendo DSi = DSiCade
Let's realize, this is slashdot. MS is evil, no matter what. You have to ignore the fact that Gates is one of the most philantrophic billionaires (relatively speaking on the amount) of all time. That NEVER gets press here, because we can't say anything positive for M$!
Crap, there goes my karma.
Those first two estimates are based on the text content alone. If the graphical contents of those books were rendered into digital format. The third one assumes maps, photographs, sound recordings, etc.
Tyranny! And don't laugh, I'm serious about this.
All material is copyrighted at the instant of creation. All of it. You write a love letter to your girlfriend and it's copyrighted. It's all copyrighted! Beyond that, you're requiring them to *present* copies. I'm assuming this is to the LoC.
You could make a case for this when a copyright is *registered*, but please don't make a blanket statement like that without first engaging brain.
Don't blame me, I didn't vote for either of them!
The article claims that the LOC stored as image data would take up 1 TB.
That's wildly underestimated IMO. The LOC has 26 million books. If we conservatively assume that they each have at least 100 pages, that is 2.6 billion images. That equals 0.03 kb per image. That's some REAL good compression for an image as large as a full page of text.
However, if you have an entire library's contents available in digital format, it's possible to make perfect copies of it an infinite number of times. In contrast, there are restrictions as to how and how often copyrighted materials in a physical library can be run through a copy machine.
I can't see publishers liking the idea of an online Library of Congress at all. Viewers would be able to make their own e-books at a whim. Not that *I* would mind, but . . .
We're specifically talking about the Library of Congress, which has millions of books, not your local library with maybe 100k or so (I rememeber my university had about 800k books, probably a million by now). The idea is not to give access to the NYT bestsellers, but rare books that you would have a hard time finding anywhere else.
This guy has a $150,000 machine that scan 1,500 bound pages per hour. That would certainly help though it sounds expensive . . .
http://www.busyweather.com/
A librarian that respects copyright I can understand. A library capable of maintaing a secure client I think not.
There are a number of issues at play in libraries:
o Funding is scarce enough for books, let alone computers and networks.
o End-Client network security requirements are considered a hinderance and parasitic cost for web service devleopment and deployment.
o IT is considered to be the art of reinstalling Win9x.
o Egress filtering and authentication is largely unknown.
o Public access PCs are invariable on the same network as the "trusted" Librarian PC.
o Most libraries have public access PCs using obsolete and invariabley unpatched software.
Or maybe Bill and DRM will save us all from running unpaid, sorry unsigned, code.
See below...
______________________________________________
TITLE 17 > CHAPTER 4 > 407
407. Deposit of copies or phonorecords for Library of Congress
(a) Except as provided by subsection (c), and subject to the provisions of subsection (e), the owner of copyright or of the exclusive right of publication in a work published in the United States shall deposit, within three months after the date of such publication--
(2) if the work is a sound recording, two complete phonorecords of the best edition, together with any printed or other visually perceptible material published with such phonorecords.
Neither the deposit requirements of this subsection nor the acquisition provisions of subsection (e) are conditions of copyright protection.
(b) The required copies or phonorecords shall be deposited in the Copyright Office for the use or disposition of the Library of Congress. The Register of Copyrights shall, when requested by the depositor and upon payment of the fee prescribed by section 708, issue a receipt for the deposit.
(c) The Register of Copyrights may by regulation exempt any categories of material from the deposit requirements of this section, or require deposit of only one copy or phonorecord with respect to any categories. Such regulations shall provide either for complete exemption from the deposit requirements of this section, or for alternative forms of deposit aimed at providing a satisfactory archival record of a work without imposing practical or financial hardships on the depositor, where the individual author is the owner of copyright in a pictorial, graphic, or sculptural work and
(ii) the work has been published in a limited edition consisting of numbered copies, the monetary value of which would make the mandatory deposit of two copies of the best edition of the work burdensome, unfair, or unreasonable.
(d) At any time after publication of a work as provided by subsection (a), the Register of Copyrights may make written demand for the required deposit on any of the persons obligated to make the deposit under subsection (a). Unless deposit is made within three months after the demand is received, the person or persons on whom the demand was made are liable--
(2) to pay into a specially designated fund in the Library of Congress the total retail price of the copies or phonorecords demanded, or, if no retail price has been fixed, the reasonable cost to the Library of Congress of acquiring them; and
(3) to pay a fine of $2,500, in addition to any fine or liability imposed under clauses (1) and (2), if such person willfully or repeatedly fails or refuses to comply with such a demand.
(e) With respect to transmission programs that have been fixed and transmitted to the public in the United States but have not been published, the Register of Copyrights shall, after consulting with the Librarian of Congress and other interested organizations and officials, establish regulations governing the acquisition, through deposit or otherwise, of copies or phonorecords of such programs for the collections of the Library of Congress.
(2) Such re
you can do this at the french national library (see http://www.bnf.fr/pages/zNavigat/frame/accedocu.ht m, yes its in french)....
I have an idea.
This idea is mine, I own it. I invented it. It took a lot of effort for me to produce. When other people use my idea to create something else based on my hard work, I'd like something in return. Money, perhaps.
What is wrong with this? Why do you feel that I should give my idea away for free? When I produce a physical object, such as a chair, I am not obligated to share that chair with the world without compensation. Why do you think I should give away my ideas without compensation?
I think IP law is important. Getting rid of it is a step backwards for humanity.
Of course this instantly deteriorates into a discussion about the shameful state of IP and copyright laws, the need to pool all human knowledge, and how crappy the US budget deficit is.
If you go to the LOC's site, you'll notice American Memory on the front page.
American Memory is where you can get a good portion of the public domain stuff (books, letters from immigrants to their families back home, photos of civil war enlistees, audio, Edison-era short movies) for free in a low-quality format. Archival quality copies and custom scans/recordings are available for $$$. Almost any work in the LOC can be scanned on request (3 week waiting time or so); this is how they manage to continue adding scans to their collection without requiring public or private funding. It's underfunded as it is and needs more bandwidth.
This idiot in the article's proposal is completely unrealistic. Books can contain 100,000 to 5,000,000 characters. That's 100k-5Mb per book, times 26,000,000 books. That's not including the images and illustrations in some of these works. Many of the texts have value beyond the words they contain. We may be talking about image scanning the pages to preserve the look of the type, paper, and images. Archival TIFFs, since that's what the LOC uses.
The article also mentions $60 thousand to 'store' this data (per month?, per year?, just once???, what about access?, searching?, redundant backups?). Another unrealistic number, even working off of the 1TB estimate.
Death and danger are my various breads and various butters.
Honestly, the important searchability is author/title, not so much full text searching. I'd bet on PDF files.
I guess that depends on what you're looking for. I'd like to be able to search on quotes or keywords and authors at the same time. If I already know the exact source of the information I'm looking for, I can probably find it using other resources.