Slashdot Mirror


Proposal: Put Library of Congress' Contents Online

Mark_Uplanguage writes "The idea to scan in all materials available at the U.S. Library of Congress was presented at the Web 2.0 conference this week (as just one of many ideas presented). The proposed cost of $260 million would create a huge benefit to society (well, at least to those who can read English)."

5 of 394 comments (clear)

  1. Only English? by AKAImBatman · · Score: 3, Informative

    well, at least to those who can read English

    Correct me if I'm wrong, but doesn't the LOC contain all materials registered with the US copyright office? In which case it would have any foreign materials registered for copyright protection.

    1. Re:Only English? by AKAImBatman · · Score: 4, Informative

      From Wikipedia:

      [T]he Library assumed a role as a legal repository to guarantee copyright protection. All authors seeking American copyright had to submit two copies of the work to the Library. This requirement is no longer enforced, but copies of many books published in the US still arrive at the Library regularly.

      Damn trolls.

  2. Re:Storage by Big+Bob+the+Finder · · Score: 3, Informative
    About ten terabytes. Or maybe 20 terabytes. Or maybe as much as 3 petabytes.

    Those first two estimates are based on the text content alone. If the graphical contents of those books were rendered into digital format. The third one assumes maps, photographs, sound recordings, etc.

  3. Fuzzy math on storage reqs by ravenspear · · Score: 3, Informative

    The article claims that the LOC stored as image data would take up 1 TB.

    That's wildly underestimated IMO. The LOC has 26 million books. If we conservatively assume that they each have at least 100 pages, that is 2.6 billion images. That equals 0.03 kb per image. That's some REAL good compression for an image as large as a full page of text.

  4. www.loc.gov by pNutz · · Score: 4, Informative

    Of course this instantly deteriorates into a discussion about the shameful state of IP and copyright laws, the need to pool all human knowledge, and how crappy the US budget deficit is.

    If you go to the LOC's site, you'll notice American Memory on the front page.

    American Memory is where you can get a good portion of the public domain stuff (books, letters from immigrants to their families back home, photos of civil war enlistees, audio, Edison-era short movies) for free in a low-quality format. Archival quality copies and custom scans/recordings are available for $$$. Almost any work in the LOC can be scanned on request (3 week waiting time or so); this is how they manage to continue adding scans to their collection without requiring public or private funding. It's underfunded as it is and needs more bandwidth.

    This idiot in the article's proposal is completely unrealistic. Books can contain 100,000 to 5,000,000 characters. That's 100k-5Mb per book, times 26,000,000 books. That's not including the images and illustrations in some of these works. Many of the texts have value beyond the words they contain. We may be talking about image scanning the pages to preserve the look of the type, paper, and images. Archival TIFFs, since that's what the LOC uses.

    The article also mentions $60 thousand to 'store' this data (per month?, per year?, just once???, what about access?, searching?, redundant backups?). Another unrealistic number, even working off of the 1TB estimate.

    --
    Death and danger are my various breads and various butters.