Slashdot Mirror


Proposal: Put Library of Congress' Contents Online

Mark_Uplanguage writes "The idea to scan in all materials available at the U.S. Library of Congress was presented at the Web 2.0 conference this week (as just one of many ideas presented). The proposed cost of $260 million would create a huge benefit to society (well, at least to those who can read English)."

15 of 394 comments (clear)

  1. Only English? by AKAImBatman · · Score: 3, Informative

    well, at least to those who can read English

    Correct me if I'm wrong, but doesn't the LOC contain all materials registered with the US copyright office? In which case it would have any foreign materials registered for copyright protection.

    1. Re:Only English? by AKAImBatman · · Score: 4, Informative

      From Wikipedia:

      [T]he Library assumed a role as a legal repository to guarantee copyright protection. All authors seeking American copyright had to submit two copies of the work to the Library. This requirement is no longer enforced, but copies of many books published in the US still arrive at the Library regularly.

      Damn trolls.

  2. Re:If Bill Gates by jkmiecik · · Score: 1, Informative

    Let's realize, this is slashdot. MS is evil, no matter what. You have to ignore the fact that Gates is one of the most philantrophic billionaires (relatively speaking on the amount) of all time. That NEVER gets press here, because we can't say anything positive for M$!

    Crap, there goes my karma.

  3. Re:Storage by Big+Bob+the+Finder · · Score: 3, Informative
    About ten terabytes. Or maybe 20 terabytes. Or maybe as much as 3 petabytes.

    Those first two estimates are based on the text content alone. If the graphical contents of those books were rendered into digital format. The third one assumes maps, photographs, sound recordings, etc.

  4. Re:One of the More interesting projects by Brandybuck · · Score: 2, Informative

    Tyranny! And don't laugh, I'm serious about this.

    All material is copyrighted at the instant of creation. All of it. You write a love letter to your girlfriend and it's copyrighted. It's all copyrighted! Beyond that, you're requiring them to *present* copies. I'm assuming this is to the LoC.

    You could make a case for this when a copyright is *registered*, but please don't make a blanket statement like that without first engaging brain.

    --
    Don't blame me, I didn't vote for either of them!
  5. Fuzzy math on storage reqs by ravenspear · · Score: 3, Informative

    The article claims that the LOC stored as image data would take up 1 TB.

    That's wildly underestimated IMO. The LOC has 26 million books. If we conservatively assume that they each have at least 100 pages, that is 2.6 billion images. That equals 0.03 kb per image. That's some REAL good compression for an image as large as a full page of text.

  6. Re:Er by DrMrLordX · · Score: 2, Informative

    However, if you have an entire library's contents available in digital format, it's possible to make perfect copies of it an infinite number of times. In contrast, there are restrictions as to how and how often copyrighted materials in a physical library can be run through a copy machine.

    I can't see publishers liking the idea of an online Library of Congress at all. Viewers would be able to make their own e-books at a whim. Not that *I* would mind, but . . .

  7. Re:Er by 1u3hr · · Score: 2, Informative
    I might umm, sound insensitive, but are you missing legs or something? Libraries are one of the easiest places to get to in pretty much every community

    We're specifically talking about the Library of Congress, which has millions of books, not your local library with maybe 100k or so (I rememeber my university had about 800k books, probably a million by now). The idea is not to give access to the NYT bestsellers, but rare books that you would have a hard time finding anywhere else.

  8. Re:Agreed, what about labour? by erick99 · · Score: 2, Informative

    This guy has a $150,000 machine that scan 1,500 bound pages per hour. That would certainly help though it sounds expensive . . .

    --
    http://www.busyweather.com/
  9. Re:Er by Anonymous Coward · · Score: 2, Informative

    A librarian that respects copyright I can understand. A library capable of maintaing a secure client I think not.

    There are a number of issues at play in libraries:
    o Funding is scarce enough for books, let alone computers and networks.
    o End-Client network security requirements are considered a hinderance and parasitic cost for web service devleopment and deployment.
    o IT is considered to be the art of reinstalling Win9x.
    o Egress filtering and authentication is largely unknown.
    o Public access PCs are invariable on the same network as the "trusted" Librarian PC.
    o Most libraries have public access PCs using obsolete and invariabley unpatched software.

    Or maybe Bill and DRM will save us all from running unpaid, sorry unsigned, code.

  10. Re:This will be DRM'd correct? by Lord+Moz · · Score: 2, Informative
    In order to register your copyright you agree to send a copy to the LoC if they request it. It's the law... in other words, the LoC gets a copy of anything they want that is protected by copyright in the United States.

    See below...
    ______________________________________________
    TITLE 17 > CHAPTER 4 > 407

    407. Deposit of copies or phonorecords for Library of Congress

    (a) Except as provided by subsection (c), and subject to the provisions of subsection (e), the owner of copyright or of the exclusive right of publication in a work published in the United States shall deposit, within three months after the date of such publication--
    • (1) two complete copies of the best edition; or

      (2) if the work is a sound recording, two complete phonorecords of the best edition, together with any printed or other visually perceptible material published with such phonorecords.

    Neither the deposit requirements of this subsection nor the acquisition provisions of subsection (e) are conditions of copyright protection.

    (b) The required copies or phonorecords shall be deposited in the Copyright Office for the use or disposition of the Library of Congress. The Register of Copyrights shall, when requested by the depositor and upon payment of the fee prescribed by section 708, issue a receipt for the deposit.

    (c) The Register of Copyrights may by regulation exempt any categories of material from the deposit requirements of this section, or require deposit of only one copy or phonorecord with respect to any categories. Such regulations shall provide either for complete exemption from the deposit requirements of this section, or for alternative forms of deposit aimed at providing a satisfactory archival record of a work without imposing practical or financial hardships on the depositor, where the individual author is the owner of copyright in a pictorial, graphic, or sculptural work and

    • (i) less than five copies of the work have been published, or

      (ii) the work has been published in a limited edition consisting of numbered copies, the monetary value of which would make the mandatory deposit of two copies of the best edition of the work burdensome, unfair, or unreasonable.

    (d) At any time after publication of a work as provided by subsection (a), the Register of Copyrights may make written demand for the required deposit on any of the persons obligated to make the deposit under subsection (a). Unless deposit is made within three months after the demand is received, the person or persons on whom the demand was made are liable--

    • (1) to a fine of not more than $250 for each work; and

      (2) to pay into a specially designated fund in the Library of Congress the total retail price of the copies or phonorecords demanded, or, if no retail price has been fixed, the reasonable cost to the Library of Congress of acquiring them; and

      (3) to pay a fine of $2,500, in addition to any fine or liability imposed under clauses (1) and (2), if such person willfully or repeatedly fails or refuses to comply with such a demand.

    (e) With respect to transmission programs that have been fixed and transmitted to the public in the United States but have not been published, the Register of Copyrights shall, after consulting with the Librarian of Congress and other interested organizations and officials, establish regulations governing the acquisition, through deposit or otherwise, of copies or phonorecords of such programs for the collections of the Library of Congress.

    • (1) The Librarian of Congress shall be permitted, under the standards and conditions set forth in such regulations, to make a fixation of a transmission program directly from a transmission to the public, and to reproduce one copy or phonorecord from such fixation for archival purposes.

      (2) Such re
  11. this is available at the french national library by Anonymous Coward · · Score: 1, Informative

    you can do this at the french national library (see http://www.bnf.fr/pages/zNavigat/frame/accedocu.ht m, yes its in french)....

  12. Re:Er by Anonymous Coward · · Score: 1, Informative

    I have an idea.

    This idea is mine, I own it. I invented it. It took a lot of effort for me to produce. When other people use my idea to create something else based on my hard work, I'd like something in return. Money, perhaps.

    What is wrong with this? Why do you feel that I should give my idea away for free? When I produce a physical object, such as a chair, I am not obligated to share that chair with the world without compensation. Why do you think I should give away my ideas without compensation?

    I think IP law is important. Getting rid of it is a step backwards for humanity.

  13. www.loc.gov by pNutz · · Score: 4, Informative

    Of course this instantly deteriorates into a discussion about the shameful state of IP and copyright laws, the need to pool all human knowledge, and how crappy the US budget deficit is.

    If you go to the LOC's site, you'll notice American Memory on the front page.

    American Memory is where you can get a good portion of the public domain stuff (books, letters from immigrants to their families back home, photos of civil war enlistees, audio, Edison-era short movies) for free in a low-quality format. Archival quality copies and custom scans/recordings are available for $$$. Almost any work in the LOC can be scanned on request (3 week waiting time or so); this is how they manage to continue adding scans to their collection without requiring public or private funding. It's underfunded as it is and needs more bandwidth.

    This idiot in the article's proposal is completely unrealistic. Books can contain 100,000 to 5,000,000 characters. That's 100k-5Mb per book, times 26,000,000 books. That's not including the images and illustrations in some of these works. Many of the texts have value beyond the words they contain. We may be talking about image scanning the pages to preserve the look of the type, paper, and images. Archival TIFFs, since that's what the LOC uses.

    The article also mentions $60 thousand to 'store' this data (per month?, per year?, just once???, what about access?, searching?, redundant backups?). Another unrealistic number, even working off of the 1TB estimate.

    --
    Death and danger are my various breads and various butters.
  14. Re:Er by 2old2rockNroll · · Score: 2, Informative

    Honestly, the important searchability is author/title, not so much full text searching. I'd bet on PDF files.

    I guess that depends on what you're looking for. I'd like to be able to search on quotes or keywords and authors at the same time. If I already know the exact source of the information I'm looking for, I can probably find it using other resources.