Slashdot Mirror


Proposal: Put Library of Congress' Contents Online

Mark_Uplanguage writes "The idea to scan in all materials available at the U.S. Library of Congress was presented at the Web 2.0 conference this week (as just one of many ideas presented). The proposed cost of $260 million would create a huge benefit to society (well, at least to those who can read English)."

82 of 394 comments (clear)

  1. Er by DrMrLordX · · Score: 5, Insightful

    Pardon me for sounding like an eegnoramoose, but isn't at least some of the material in the Library of Congress copyrighted material? Putting it all online would let people get copies of it for *gasp* FREE.

    Can't have that, now can we?

    1. Re:Er by silentbozo · · Score: 5, Insightful

      Many of the libraries in the country carry copyrighted material. You can walk in and peruse the books at your leisure, for free. Same idea, only you grant access to a lot more people. Scholars routinely pay to get copies of rare items from libraries for research, and every time a query comes in, they have to haul the book out, and run it through a copier. It would be a lot more intelligent to scan once, store it, and make it available on demand.

      The chief benefit? Even if the original is lost or destroyed, the digital version lives on - a big issue, assuming that ANY item ever enters the public domain from now on, the way that they were supposed to. Hell, I'd lay out money for a copy of the Library of Congress on a set of blue-ray DVDs, and so would many large corporations (those that still have research labs, that is), universities and colleges, as well as other organizations and governmental entities around the world.

    2. Re:Er by jrockway · · Score: 4, Insightful

      Interesting concept, though. It's okay if I go to the Library and look it, but not if I look at it online? Why? ( I guess I know the answer; in real life only one person can see it at a time. Online, everyone on Earth can see it at the same time. Oh well. Information wants to be free. Don't want someone to know it? Don't write a book about it! )

      --
      My other car is first.
    3. Re:Er by siriuskase · · Score: 4, Interesting
      This is one more reason that the whole basis behind IP law needs to be reevaluated. Although we do want authors, inventors, and other creative types to be rewarded for their efforts, it is also true that what they create becomes more valuable the more it gets out into the world. Any academic knows that the more a paper gets cited, the more valuable it is. Likewise, the more a book is read, the more likely it will wind up in the canon of culturally significant books.

      Creating primarily for money is shortsighted when a work has the chance to impact the larger culture. Just look at Michael Moore (ooh, isn't he ugly, but that's not the point), he's more interested in people seeing and being influenced by his movies than in getting richer off them. Enough money to be comfortable is great, but then, barriers to free movement of ideas should be relaxed.

      --
      If you must moderate, please moderate as irrelevent, not something bad, because I'm sure someone will find this interest
    4. Re:Er by sonamchauhan · · Score: 4, Interesting
      Putting it all online would let people get copies of it for *gasp* FREE.

      Can't have that, now can we?


      No, we can't... it not be fair to lots of people whose copyrights haven't yet lapsed.

      But scanning the materials is _still_ a good idea. It allows for automated OCR that allows searching for text _within_ a book (like A9.com does, and as Google plans to do.) The difference is that all books published in the US could be searched.

      It would also make this scenario possible:
      • I walk into a public library
      • On a library computer, I enter keywords that search the new "library of congress book text search database".
      • Based on the results (matching text snippets from _within_ books), I decide to buy two books.
      • I walk to the librarian and pay the purchase price
      • She fires up a local print run on the library's new laser book printer
      • 500 automatically laser-printed-punched-and-bound pages later, I have my new two books.


      Since this process is handled by people trained to respect copyright (i.e. the librarians), it is a win-win for everyone.
    5. Re:Er by DrMrLordX · · Score: 2, Informative

      However, if you have an entire library's contents available in digital format, it's possible to make perfect copies of it an infinite number of times. In contrast, there are restrictions as to how and how often copyrighted materials in a physical library can be run through a copy machine.

      I can't see publishers liking the idea of an online Library of Congress at all. Viewers would be able to make their own e-books at a whim. Not that *I* would mind, but . . .

    6. Re:Er by mcrbids · · Score: 4, Insightful

      One of the greatest catastrophes in human history was the burning of the great library at Alexandria, Egypt.

      See, the ancient world had many items of great wisdom, and many of the only copies of these works were contained there. The burning of the great library was the end for countless such works.

      Today, however, our knowledge is much more widely spread. We all owe a tremendous debt to Gutenburg, for his printing press (removable type press, 1436) for making this possible.

      It's quite arguable that the dawn of the renaissance stemmed not from Galileo, or Kepler, but from the widespread nature of books in general after the removable type printing press made this possible.

      How many of these works are unique or very rare? I'd consider that a large percentage of these works fall into this category - in which, it would be a wonderful thing to build in some redundancy into the preservation of not only these works, but the wisdom, insight, and humor contained therein!

      Warm up the scanner, says I!

      --
      I have no problem with your religion until you decide it's reason to deprive others of the truth.
    7. Re:Er by 1u3hr · · Score: 2, Informative
      I might umm, sound insensitive, but are you missing legs or something? Libraries are one of the easiest places to get to in pretty much every community

      We're specifically talking about the Library of Congress, which has millions of books, not your local library with maybe 100k or so (I rememeber my university had about 800k books, probably a million by now). The idea is not to give access to the NYT bestsellers, but rare books that you would have a hard time finding anywhere else.

    8. Re:Er by Jeff+DeMaagd · · Score: 4, Insightful

      Even if the original is lost or destroyed, the digital version lives on

      Assuming a large sum of money is spent maintaining the digital versions. Computers lose and destroy data, even good computers fail. So it would require good backups done on a regular basis. File formats tend to change too.

    9. Re:Er by mrgreen4242 · · Score: 2, Funny
      assuming that ANY item ever enters the public domain from now on

      The ghost of Sonny Bono with haunt you forever, being sure that you know nothing will ever reach the public domain again...

    10. Re:Er by Anonymous Coward · · Score: 2, Informative

      A librarian that respects copyright I can understand. A library capable of maintaing a secure client I think not.

      There are a number of issues at play in libraries:
      o Funding is scarce enough for books, let alone computers and networks.
      o End-Client network security requirements are considered a hinderance and parasitic cost for web service devleopment and deployment.
      o IT is considered to be the art of reinstalling Win9x.
      o Egress filtering and authentication is largely unknown.
      o Public access PCs are invariable on the same network as the "trusted" Librarian PC.
      o Most libraries have public access PCs using obsolete and invariabley unpatched software.

      Or maybe Bill and DRM will save us all from running unpaid, sorry unsigned, code.

    11. Re:Er by slashdot.org · · Score: 4, Insightful

      I guess I know the answer; in real life only one person can see it at a time.

      And that's exactly the biggest mistake people keep making; analogies don't work. The stuff we are dealing with is *new*. A library != Internet. There is no analogy.

      I'm not saying that I have a solution to any of this, but I think the first thing people will have to realize is that things have changed in a dramatic way. The traditional way of thinking about IP (or really, information) no longer works.

      There is no simple answer to any of this, and it makes no sense to come up with analogies and try to justify or make judgement based on that.

      Fact of the matter is, all of a sudden it is possible for people to view/copy information pretty much instantly. What we need to realize is that _we_ are the ones that can/will put together the foundation of how to deal with this. No current laws really are suitable. Look at the mess with P2P networks and the music industry. Surely P2P networks _should_ be perfectly legal, but on the other hand if copying music would become so easy that you could listen to any song you'd like, at any given time without paying for it, it's hard to imagine how artists will be paid (and please don't give me the "they'll have to do live performances to make money" bs).

      The people that will be able to figure out what the _real_ answers are to these issues are the ones that will do really well. Think about it. /rant

    12. Re:Er by rjstephens · · Score: 2, Interesting
    13. Re:Er by sonamchauhan · · Score: 2, Interesting

      The library does not have to be capable of maintaining the client, just of funding the infrastructure. The client may even be a thin client run by an external company. After all, network connectivity would probably be essential to access the Congress book database (for copyright reasons, the entire database would probably run out of a Google-like government contracted facility somewhere.)

      About funding being scarce: after initial seed funding by the government, a library should easily be able to fund infrastructure in the same manner the internet funds itself.

      That is because giving the local library the ability to sell any of _260 million books_ to anyone who walks in their door, on demand, with effectively zero inventory costs, adds a _HUGE_ improvement to their basic information dissemination activity which is very valuable to the library's customers. The commission on book sales would earn easily recoup the investment on infrastructure. A library could become a sort of a low cost competitor to Barnes and Noble. Barnes and Noble would probably do the same thing as the library, but differentiate themselves by having a better quality printer, and being able to grant 24x7 access to a bn.com server where electronic copy of the book could be accessed by the purchaser. Ah, and the nice coffee shop, where you can read the electronic copy of your book on a nice LCD screen as it is being printed.

      Other advantages:
      - knowledge hidden in books would be suddenly visible and searchable
      - for most people, reading a book is more natural than reading a screen
      - when people buy a printed book, they retain first sale property rights
      (unlike DRM'd ebook software and music liceses)
      - the library could become a focal point for paper recycling efforts. For eg: as part of a loyalty program, it could issue credits for old printed books that people turned in.

      Disadvantages:
      - paper consumption
      - printer consumption

      I think the only loser will be online bookstores that have no mortar component, like amazon.com

    14. Re:Er by sonamchauhan · · Score: 2, Interesting

      > Barnes and Noble has a right to make money without having to compete
      > with government-subsidized pseudo-businesses.

      Sure they do. But I imagine Kinkos and B&N would have access to the same LoC book database, and would be able to print books for purchase too (otherwise it would be unfair to them). This just puts libraries on an even footing - libraries that don't want to sell books to the public could stay that way.

      > If you want information or to read a book, go to the library.
      > If you want to OWN the book, go to a bookstore.

      Yup, that's the current model. Lets break down what you said:

      Go to a library for this:
      1. information about a book
      2. to read a book

      Go to a bookstore for this:
      3. To own a book

      One of the key reasons people use libraries is the library database, and the assurance that a book in the library database is probably "in stock" for lending out. Now if this proposal goes ahead, both Kinko's and B&N will suddenly get #1 - the best library database there can be. With #3 becoming more attractive, (book price reduction due to a larger market -- see below), one of the USPs of libraries simply isn't so anymore.

      > I'd also say that the concept of really cheap books
      > because of lack of physical inventory isn't guaranteed.
      A book typically has a single fixed cost at the start (the authoring). After that, the more copies you sell, the more the profit.

      > It certainly hasn't panned out with magazines or academic journals
      That's because a journal is different from a typical book - each journal issue is like a book that comes out each month with the authoring costs paid each time, but sold to a very limited market.

      Since this proposal would broaden the market immensely both books and journals:
      - size of the print run is no longer an issue
      - inventory is no longer an issue
      - royalties keep flowing in for longer durations ... the costs for all categories of information - both books and journals - would come down.

      > There's also a big copyright issue with the whole concept of scanning in the LoC collection.
      > With physical library items, only one person may have the item at a time,
      > so there's no copyright issue. (No copies are being made.)
      > With a digital version, multiple people can access it at one time.

      Many copyright holders _want_ this to happen and are already doing this.
      For instance, you can go to Amazon's A9.com site and search on Gandhi's wife:
      http://a9.com/gandhi%20kasturba
      (be sure to click on the books button on the left - this returns matches within a book)

      Now if entire books were scanned into the LoC database, a canny person could type in the name of a book, and then "page 1", "page 2"... and so... to essentially read the book without paying for it.

      One way to secure copyright against behavior like this is by restrictions that can be imposed on both the server and the clients that are searching the book (say, the client cannot view more than 30 words surrounding each match). Amazon's restrictions seem to be that they just scan in the table of contents, not the entire book.

      > Finally, the upkeep cost for scanned items is huge.
      Well, that scanning would only be done once, using government funds to scan it into the LoC database. The only thing a library would need would be network access to the LoC database (just like they currently do with some electronic journals and databases.)

    15. Re:Er by 2old2rockNroll · · Score: 2, Informative

      Honestly, the important searchability is author/title, not so much full text searching. I'd bet on PDF files.

      I guess that depends on what you're looking for. I'd like to be able to search on quotes or keywords and authors at the same time. If I already know the exact source of the information I'm looking for, I can probably find it using other resources.

  2. Can't do that. by Pig+Hogger · · Score: 5, Funny

    This would violate the publishers' god-given right to milk their "creations" until the heat-death of the Universe.

    1. Re:Can't do that. by craXORjack · · Score: 2, Interesting

      Yes, the article does say 'all 26 million books in the US Library of Congress.' I think all the books should be scanned. Imagine if a terrorist detonated a nuclear bomb there and destroyed the largest library in the world. What a loss that would be. But just because we would have a backup of the data doesn't mean they must allow full access to copyrighted works. They could release DVDs of a subset which includes only information in the Public Domain. It would be a huge boon for Project Gutenberg though each scanned work would still need to be transcribed to text.

      --
      Liberals call everyone Nazis yet they are the closest thing to it.
    2. Re:Can't do that. by operagost · · Score: 2, Interesting

      You'll find that many here (including me - and I'm one of the most conservative) find that copyright period oppressively long. Just because you wrote one useful book shouldn't entitle you to a generation of monopoly on its art and ideas. The copyright period was once much shorter, and that encouraged derivative works.

      --

      Gamingmuseum.com: Give your 3D accelerator a rest.
    3. Re:Can't do that. by InfiniteWisdom · · Score: 4, Funny

      until the heat-death of the Universe.

      Hey its still a finite time
      - Walt Disney

    4. Re:Can't do that. by Waffle+Iron · · Score: 2, Insightful
      I mean, God forbid they or their childern are able to make money off their own creations and ideas.

      Their children can go out and get their own damned jobs. They would then be making a productive contribution to the economy.

      My grandpa was a farmer who died over 50 years ago. Since I don't get to collect royalties on the corn he grew in the 1930s, I've had to work to produce my own income. Imagine that.

    5. Re:Can't do that. by Senjutsu · · Score: 4, Interesting

      you must mean that whole 70 years after the author's death.

      You must mean currently. But we all know that as soon as anything major (like Steamboat Willy) comes close to coming out of copyright, we'll see Congress extend the term of copyright yet again, thanks to 'encouragement' from Disney.

      Copyright terms are nigh on infinite in fact, if not in law.

    6. Re:Can't do that. by Saeger · · Score: 2, Funny
      How did you get on my /.friends-list with an attitude like that?

      The internet has stripped away the convenient medium(s) that used to contain an inherently scarce message that could physically command the price you asked for. The new reality of the situation is that either you think DRM + DMCA can and should be used to keep doing things the old way, by keeping a decades-old instance of information artificially scarce, or you think-- like millions already do --that information is cheap, and the value lies in the inherently scarce service of performing or creating NEW information. New systems will emerge to fund creation and promote progress, but it won't be the "right to profit" gravytrain of the past.

      --

      --
      Power to the Peaceful
    7. Re:Can't do that. by geminidomino · · Score: 2, Insightful

      Just because you wrote one useful book shouldn't entitle you to a generation of monopoly on its art and ideas

      death+70 years would actually be about 4 generations (if you include the author as the first).

  3. well, at least to those who can read English by ForestGrump · · Score: 2, Funny

    and to those who can't, they can copy and paste the text into a translator.

    So yes, it would benefit society as a whole.

    Grump.

    --
    Is it true that more people vote for the winner of American Idol, than vote for the president? -Ali G.
  4. For once by theskeptic · · Score: 2, Funny

    a Library of Congress jokes will be on topic.

  5. Storage by Neon+Spiral+Injector · · Score: 4, Funny

    How data much storage would this require? Could someone give it to me in laymen's terms?

    1. Re:Storage by jrockway · · Score: 4, Funny

      About 1.0003 libraries of congress.

      --
      My other car is first.
    2. Re:Storage by SunPin · · Score: 2, Funny

      About 750 million cubic hogsheads.

      --
      Laws are for people with no friends.
    3. Re:Storage by Ghostgate · · Score: 4, Funny

      This will require 1.28 Libraries of Congress to store. The overhead is for all of the faulty copy protection to be added, which a 13-year-old somewhere in Europe is already working on cracking.

    4. Re:Storage by Big+Bob+the+Finder · · Score: 3, Informative
      About ten terabytes. Or maybe 20 terabytes. Or maybe as much as 3 petabytes.

      Those first two estimates are based on the text content alone. If the graphical contents of those books were rendered into digital format. The third one assumes maps, photographs, sound recordings, etc.

    5. Re:Storage by BorgCopyeditor · · Score: 2, Funny
      Let's see:
      engaging "3-2-1 Contact" mode...
      • If each byte of data were the size of a grain of sand, the LOC archive would be roughly the size of Laguna Beach!
      • If each byte of data were the thickness of a hair on a fly's ass, the LOC's collection, laid side by side, would be over 6 feet long!
      • If each byte of data were worth $0.01 U.S., the LOC database would rival the gross national product of Uzbekistan!
      --
      Shop as usual. And avoid panic buying.
    6. Re:Storage by operagost · · Score: 4, Funny

      Don't use a Pentium chip next time.

      --

      Gamingmuseum.com: Give your 3D accelerator a rest.
    7. Re:Storage by Dun+Malg · · Score: 2, Interesting
      About ten terabytes. Or maybe 20 terabytes. Or maybe as much as 3 petabytes.

      Heh. Whichever it turns out to be, the LoC, being yet another part of the federal government, will probably make it available for viewing/downloading as a single PDF file.

      PDF sucks.

      --
      If a job's not worth doing, it's not worth doing right.
  6. We need to get our priorities straight by davmoo · · Score: 4, Insightful

    Since Congress and the President can so easily pull out a hundred billion dollars to bomb the hell out of another country, I see no reason we can't come up with a whimpy $260 million for something as worthwhile as this.

    --
    I want a new quote. One that won't spill. One that don't cost too much. Or come in a pill.
    1. Re:We need to get our priorities straight by Zoop · · Score: 4, Funny

      Since Congress and the President can so easily pull out a hundred billion dollars to bomb the hell out of another country, I see no reason we can't come up with a whimpy $260 million for something as worthwhile as this.

      I'm sorry, I don't get it. How does your proposal bomb anybody?

      Are you suggesting we should bomb libraries?

      I mean, I see libraries, I see money, but I'm missing the bombs.

      Tell you what, rewrite your proposal with bombs and maybe some cool submunitions and make sure they're Furin libraries, and we'll talk.

  7. Units?! by skraps · · Score: 2, Funny
    He estimated that the scanned images would take up about a terabyte of space [...]
    Uhh.. "terabyte"? Again with these esoteric units!
    Someone, please.. how much is that in LOC?
    --
    Karma: -2147483648 (Mostly affected by integer overflow)
  8. One of the More interesting projects by randall_burns · · Score: 4, Interesting

    The government has proposed recently. I would also suggest that they put in place requirements that all future material that is to be copyrighted present appropriate copies in machine readable form so this will be cheaper in the future.

    1. Re:One of the More interesting projects by Brandybuck · · Score: 2, Informative

      Tyranny! And don't laugh, I'm serious about this.

      All material is copyrighted at the instant of creation. All of it. You write a love letter to your girlfriend and it's copyrighted. It's all copyrighted! Beyond that, you're requiring them to *present* copies. I'm assuming this is to the LoC.

      You could make a case for this when a copyright is *registered*, but please don't make a blanket statement like that without first engaging brain.

      --
      Don't blame me, I didn't vote for either of them!
    2. Re:One of the More interesting projects by brusk · · Score: 2, Interesting

      That's an absurd, dangerous proposal.

      If I write some poems, print them myself on my old-fashioned movable type press (not that I do this myself, but I know folks who do), and distribute 100 copies, why the heck should I have to submit an electronic copy to anyone?

      --
      .sig withheld by request
  9. Only English? by AKAImBatman · · Score: 3, Informative

    well, at least to those who can read English

    Correct me if I'm wrong, but doesn't the LOC contain all materials registered with the US copyright office? In which case it would have any foreign materials registered for copyright protection.

    1. Re:Only English? by AKAImBatman · · Score: 4, Informative

      From Wikipedia:

      [T]he Library assumed a role as a legal repository to guarantee copyright protection. All authors seeking American copyright had to submit two copies of the work to the Library. This requirement is no longer enforced, but copies of many books published in the US still arrive at the Library regularly.

      Damn trolls.

    2. Re:Only English? by InterGuru · · Score: 2, Interesting
      Let me correct you. They do not keep all the materials from the copyright office. Some they forward to other appropriate places, such as the National Library of Medicine.

      Their collections policy statement states that they only keep material specific to their very broad mission statement. This means that they will not keep a copy of a laundry list they received throught the copyright office.

  10. Homeland Security Savings by Anonymous Coward · · Score: 3, Funny
    The idea to scan in all materials available at the U.S. Library of Congress was presented at the Web 2.0 conference this week (as just one of many ideas presented). The proposed cost of $260 million would create a huge benefit to society

    It would probably pay for itself too since FBI agents would no longer have to travel to libraries to secretly gather records of who borrowed what. They can just use Carnivore to do it instead.

  11. Re:If Bill Gates by Anonymous Coward · · Score: 2, Insightful

    >wanted to do something really important and
    >contributive, he would fund this.

    Yeah, not like funding the B&M Gates foundation is doing anything worthwhile with all that immunization, AIDS research and anti-poverty work.

    Darned, useless Microsoft profits. Helping people. Imagine that!

  12. Ametrica! by Doc+Ruby · · Score: 5, Funny

    Finally, Slashdot can establish that for official purposes:

    1 Library of Congress = $260M

    And the 2004 US Federal budget can be spec'd at 0.000243754522 LoC:s (Libraries of Congress per second).

    --

    --
    make install -not war

  13. The theory of everything by antikarma · · Score: 3, Funny

    At long last, we shall finally know just how much one unit of Libraries of Congress is. This could quite possibly have profound effects on how we understand the universe. For example, for many years we have known that the universe is approximately 42 Libraries of Congress. Now we can fully understand its meaning.

    1. Re:The theory of everything by Pfhorrest · · Score: 2, Funny

      Oh dear God, they've found the Question.

      Goodbye, Universe.

      --
      -Forrest Cameranesi, Geek of all Trades
      "I am Sam. Sam I am. I do not like trolls, flames, or spam."
  14. Only the first step by Nom+du+Keyboard · · Score: 3, Insightful

    Putting the LoC on-line is only the first step. How long before those Internet book printing stations that can create an entire book for you from an electronic image in a deciminute for $1 tap into this? I'd have to think that this would be good for everyone except B&N who are busy reprinting old classics under their own label right now.

    --
    "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
  15. Halfbaked by n08ody · · Score: 3, Funny

    you've perused the Libray of Congress, but have you perused the Library of Congress Online

  16. Missing something? by ravenspear · · Score: 4, Funny

    In a traditional library it's not really easy to...

    1. walk in and pick up a book
    2. strike the author's name from it and replace it with your own
    3. replace the copyright notice with your own
    4. Make one thousand perfect copies
    5. Offer it for sale, start taking orders, and PROFIT!

    ...all within 30 minutes.

    I could easily do that on the internet.

    1. Re:Missing something? by thephotoman · · Score: 2, Insightful

      Even still, somebody along the way will get the idea to cross-reference you to the database, perhaps when they try to find out more about you by making an inquiry to the Library of Congress (which handles copyrights in the US) about your copyright.

      --
      Haec merda tauri est. Ceterum censeo Carthaginem esse delendam.
  17. a real application for internet2 by LuxFX · · Score: 3, Interesting

    Right now, Internet2 can download the entire Library of Congress in about 20 seconds.

    I'm not aware of any PIAA for publishers, but somebody is going to have a problem with this. And by the time this actually happens, I bet there will be an Internet4 that can do it all in 20ms.

    --
    Punctanym: alternate spelling of words using punctuation or numerals in place of some or all of its letters; see 'leet'
  18. Re:If Bill Gates by shubert1966 · · Score: 2, Insightful

    Don't feel slammed because they see an easy target. Your sentence is grammatically balanced; funding such a project WOULD be really important and contributive. It was your detractors who sought to insert injustice, and I think they are caffeinated, or worse, sleep deprived. You and I know Bill Gates isn't responsible for the lame-ass products his company delivers/promises. He just owns the company. It's the lame-ass M$ engineers who are to blame.

    Bill Gates does plenty of worthy things with the PHAT $$$ his company has liberated from millions of (l)users and this would be a fabulous project for he and Iron Mountain.

    --
    Stuff that matters.
  19. More dotcom hype... by stubear · · Score: 2, Insightful
    "Despite the hype surrounding the dotcom era, many believe that the vast potential of the net to change society and business remains largely untapped."


    If this is such a wonderful idea why doesn't he get a bunch of artists, musicians and writers to donate their own work to this project and actually prove the concept works?

    I'm tired of all the rhetoric about business models failing and how the web is going to transform the way society learns, works, and entertains themselves. The dotcom era should have taught these so called visionaries one thing, you actually have to have a business plan before you can transform business models.

    If these business models are so full of potential he should start one, with his own intellectual property, and prove that the old economy intellectual property businesses they are extinct. If his ideas work then the dinosaurs of the MPAA and RIAA will either have to adapt to the new economy or die. Forcing them to risk their entire business on a gamble like this is wrong from any perspective.
    1. Re:More dotcom hype... by MrWa · · Score: 4, Insightful
      If this is such a wonderful idea why doesn't he get a bunch of artists, musicians and writers to donate their own work to this project and actually prove the concept works?

      Work for who? I think you are still confused from the dotcom era still. You must be thinking that "change society and business" means that scanning the entire LoC can make someone money (advertising??)

      The important part in this case is the changing society part of the statement, which is what the vast potential of the net is capable of doing. It won't help you make money based on a bad idea (in fact, it may only help you lose money faster!) but it does have the potential to change the way a society views and deals with information.

      Right now there is a vast amount of knowledge in the LoC that is effectively out of the ordinary citizen's hands. That is not how it should be. If knowledge is power, there is a storehouse of power waiting to be unleashsed by giving everyone access to what is being stockpiled. It won't happen over night, or in a few years, but eventually it will have a ripple effect. Historians lament the loss of the Great Library of Alexandria, but what difference would it have made if only a few could actually use the information that was contained?

  20. Fuzzy math on storage reqs by ravenspear · · Score: 3, Informative

    The article claims that the LOC stored as image data would take up 1 TB.

    That's wildly underestimated IMO. The LOC has 26 million books. If we conservatively assume that they each have at least 100 pages, that is 2.6 billion images. That equals 0.03 kb per image. That's some REAL good compression for an image as large as a full page of text.

  21. What, you want me to starve to death? by Banner · · Score: 3, Funny

    Oh yeah, put 'em all online. I have a hard enough time already in libraries and book stores! If I could read any book I wanted to (even if they're only the ones already out of copyright) online, I'd probably not leave my computer until I passed out!!

  22. Great by Konster · · Score: 2, Funny

    This will be great! You know all those ads that claim such and such can transmit the Library of Congress in so and so seconds?

    Now we'll be able to test their notions!

  23. Re:As an author... by WaterBreath · · Score: 3, Insightful

    A 0-rated post noted that this type of free access is a big deal to people who make an honest living publishing their creations.

    This invokes a big, important question. The rise and flourish of the information age has and will continue to provide unbelievable freedom of access to unbelievable amounts of information. Where and how do we draw the line between the freedom of the consumers and the rights of the creators?

    I'm a software developer who loves movies: I'm a creator and a consumer, so I see both sides of this coin. And I think there needs to be a compromise between consumers and creators.

    Consumers need to realize that at a certain point, amassing more music, or more books, or more movies, or more whatever, becomes a luxury, not a right. So if the price of music prevents you from having a 10,000 song collection, I'm sorry but, "so sad too bad." That's how it's always been for just about every other purchaseable product. Sometimes you have to sacrifice what you merely want to get what you really desire.

    Creators need to understand that the information they produce is a drop in the bucket compared to, for example, the estimated yottabyte (1x10^24 bytes) of information on the Internet. So if you want to make money off your creation, it had better stand out, because there's a lot of noise out there to drown it out. Simply put, if you want to get paid, make something people are willing to pay for.

  24. Re:Can't do that-Inheritance. by Waffle+Iron · · Score: 5, Insightful
    If you get an inheritance? You effectively do.

    I might inherit a portion of his farm. But that's a result of money that he saved at the time. I do not collect royalties on the *work* that he did 70 years ago.

    If an author or musician wants to leave an inheritance, then they should save the money they make during a reasonable copyright term, and give that to their children. They can leave their typewriters, musical instruments, and other tools of the trade (analagous to a farm) as well.

    They might have to actually forego a blowing everything they earn on cocaine and refrain from signing away most of their income on bad contracts to actually achieve this, but then so do the rest of us.

  25. Library of Congress Transfer Rates by xombo · · Score: 2, Funny

    Isn't the size of the Library of Congress what people used to use as a quantifier for the speed of high-bandwidth connections? I remember several years ago that companies would brag that they can transfer the entire Library of Congress to England or wherever in less than 2 seconds and what have you. I suppose a statement like that would indicate that there are already digital versions of the Library of Congress out there somewhere meaning it will take virtually nothing dollar-wise to put it online (since I guess it's been flowing back and forth for years).

  26. this makes news? by Anonymous Coward · · Score: 3, Insightful

    Maybe im the odd duck here but somehow waay back in early net days..the 90's i thought that this was such an obvious application of internet technology that it must be part of the original design purposes for the internet (darpanet and all that funding of course)

    So the only surprise to me is that were just now hearing a proposal to do this??? sheesh, if i hadnt thought it so completely obvious to every netizen at those old public library terminals i wouda lost so much seep making it happen!!!

    so now who's going to do it? and while its limboing through congress can we just put together a consortium to visit thie library we aready own with our digital camera's and OCR the thing into existence... how many of us woud need to donate our gmail 1g accounts to store it all?

  27. Re:Agreed, what about labour? by erick99 · · Score: 2, Informative

    This guy has a $150,000 machine that scan 1,500 bound pages per hour. That would certainly help though it sounds expensive . . .

    --
    http://www.busyweather.com/
  28. Now I have a real problem... by HellYeahAutomaton · · Score: 3, Funny

    I just downloaded the LoC.ps.tgz from the local WPI Internet2 tap using gnutella and my printer just ran out of ink....

  29. Scupper copyrights by freedom_india · · Score: 2, Interesting

    There's a Vulcan saying: "The needs of the many outweigh the needs of the few."
    I would say, scupper copyrights for all volumes owned by LoC.Scan and put every volume on the internet.
    Within few years we would witness a Renaissance of sorts once again in human knowledge and education.

    --
    "Doing what i can, with what i have." ~ Burt Gummer
  30. Human's Book Pool by 12357bd · · Score: 3, Insightful

    Not only the Library of Congress of the Unites States of America, we should also scan every big library in the world to create a pool of human work to freely share and preserve.

    --
    What's in a sig?
  31. Re:Government Spending by gerardrj · · Score: 3, Insightful

    I'd take you up on that offer, but it would be money wasted as you simply can not do the job for that little money.

    The LOC doesn't just contain nice black and white typed texts. There are hand written documents in organic inks on animal hide and poorly constructed paper. There are paintings in every medium you can imagine and there are sound recordings on just about every media ever used: wax tubes, glass disks, wire spools, open reel, 8-track, cassette, CD, DVD, etc.

    Each of these things needs to be digitized, categorized, indexed and offered in a searchable manner. A printed page, for example, will need to be photographed and transcribed/OCRed.
    Much of the work needs to be done on delicate objects that may be destroyed if not handled correctly. If you were to play a wax recording disk with too much pressure, or under the wrong environmental conditions, the disk would shatter in to an irreparable pile of small bits.

    What formats will you store them in? What formats will you make them available in?

    --
    Article X: The powers not delegated... by the Constitution...are reserved...to the people
  32. Re:This will be DRM'd correct? by Lord+Moz · · Score: 2, Informative
    In order to register your copyright you agree to send a copy to the LoC if they request it. It's the law... in other words, the LoC gets a copy of anything they want that is protected by copyright in the United States.

    See below...
    ______________________________________________
    TITLE 17 > CHAPTER 4 > 407

    407. Deposit of copies or phonorecords for Library of Congress

    (a) Except as provided by subsection (c), and subject to the provisions of subsection (e), the owner of copyright or of the exclusive right of publication in a work published in the United States shall deposit, within three months after the date of such publication--
    • (1) two complete copies of the best edition; or

      (2) if the work is a sound recording, two complete phonorecords of the best edition, together with any printed or other visually perceptible material published with such phonorecords.

    Neither the deposit requirements of this subsection nor the acquisition provisions of subsection (e) are conditions of copyright protection.

    (b) The required copies or phonorecords shall be deposited in the Copyright Office for the use or disposition of the Library of Congress. The Register of Copyrights shall, when requested by the depositor and upon payment of the fee prescribed by section 708, issue a receipt for the deposit.

    (c) The Register of Copyrights may by regulation exempt any categories of material from the deposit requirements of this section, or require deposit of only one copy or phonorecord with respect to any categories. Such regulations shall provide either for complete exemption from the deposit requirements of this section, or for alternative forms of deposit aimed at providing a satisfactory archival record of a work without imposing practical or financial hardships on the depositor, where the individual author is the owner of copyright in a pictorial, graphic, or sculptural work and

    • (i) less than five copies of the work have been published, or

      (ii) the work has been published in a limited edition consisting of numbered copies, the monetary value of which would make the mandatory deposit of two copies of the best edition of the work burdensome, unfair, or unreasonable.

    (d) At any time after publication of a work as provided by subsection (a), the Register of Copyrights may make written demand for the required deposit on any of the persons obligated to make the deposit under subsection (a). Unless deposit is made within three months after the demand is received, the person or persons on whom the demand was made are liable--

    • (1) to a fine of not more than $250 for each work; and

      (2) to pay into a specially designated fund in the Library of Congress the total retail price of the copies or phonorecords demanded, or, if no retail price has been fixed, the reasonable cost to the Library of Congress of acquiring them; and

      (3) to pay a fine of $2,500, in addition to any fine or liability imposed under clauses (1) and (2), if such person willfully or repeatedly fails or refuses to comply with such a demand.

    (e) With respect to transmission programs that have been fixed and transmitted to the public in the United States but have not been published, the Register of Copyrights shall, after consulting with the Librarian of Congress and other interested organizations and officials, establish regulations governing the acquisition, through deposit or otherwise, of copies or phonorecords of such programs for the collections of the Library of Congress.

    • (1) The Librarian of Congress shall be permitted, under the standards and conditions set forth in such regulations, to make a fixation of a transmission program directly from a transmission to the public, and to reproduce one copy or phonorecord from such fixation for archival purposes.

      (2) Such re
  33. I have to ask... by Civil_Disobedient · · Score: 4, Insightful

    As an author, I wonder how much of your valued craft was honed by reading the work of others for education and inspiration. How many books did you buy in elementary school, or high school? Yet that's where you learned your precious language skills you now market.

    Knowledge, even the limited knowledge of an author, does not exist in a vacuum. You read, you learn, you practice, then you create. You could not have done this without the beneficence of others who aren't making a dime off the education they provided you.

    To unleash the vast amounts of knowledge stored up in the LOC to the world would be one of the single best things this country could do for mankind. One book, one reader my hairy ass. Why not open the floodgates so everyone can benefit?

    I understand the motivation of monetary incentives, but I also know a lot of great authors who died penniless. And they were at least brave enough to sign their names to their ideas.

  34. Re:Government Spending by Dr_Barnowl · · Score: 2, Interesting

    The best way so far of capturing wax recordings and the like is to run the disk under a high-resolution scanner and use a piece of software to render the image of the grooves as a waveform ; this involves no physical wear of the medium. In fact, I'd think that a commercial version of this could well catch on for old-timers with large vinyl collections....

  35. Re:Can't do that-Inheritance. by linzeal · · Score: 2

    Gambling is no way to run an economy.

  36. hmmm.... by Atrax · · Score: 2, Funny

    well, at least to those who can read English

    So that leaves out most Americans. Thanks from the rest of the world!

    (tongue firmly in cheek)

    --
    Screw you all! I'm off to the pub
  37. Re:That's a lot of developers .. are they off-shor by jas79 · · Score: 2

    I think that they are talking about developers who use Amazon's webservice.

  38. Pilot Program by PMuse · · Score: 2, Insightful
    Putting it all online would let people get copies of it for *gasp* FREE. Can't have that, now can we?

    No, we can't... it not be fair to lots of people whose copyrights haven't yet lapsed.

    Let us scan only things for which the copyright has lapsed. This has several advantages.
    1. Promotes and makes accessible works that are now free. (Project Gutenberg would be over the moon at a $1MIL grant, let alone $260MIL.)
    2. Provides citizens a cheap method of checking that a copyright has, in fact, expired for debunking false claims of continuing copyright.
    3. Shows the public what a public domain is and why it's valuable. Helps demonstrate that perpetual copyright is a theft from the public at large.
    4. Is far cheaper than scanning everything, requires no legal battles, and needs no DRM.
    5. Avoids promoting works still subject to copyright, which is the job of their owners, not the govt.
    --
    "We reject as false the choice between our safety and our ideals." --The American President (20.1.2009)
  39. Better Access for Everyone by azemon · · Score: 3, Interesting

    What a cool idea and, even "if" the dollar estimate is too low, who cares? $260M is chump change for our gov't.

    Right now, the only way to access the stuff in LoC is to go there in person. Anyone can do it but you have to travel to WashDC and pass through security and so forth to get into the LoC public reading room. Then you have to ask the librarian to pretty-please bring you the book that you want.

    Now imagine that you can access any item in the LoC by simply entering the building and using a public kiosk with a browser. LoC's software would only permit use within the copyright so that is OK. But you don't have to mess with as much security because LoC isn't handing over the physical book.

    Now imagine that, from any web browser, you can access any book in the LoC for which the copyright has expired. I like that idea!

    My opinion... skip the buy on the next couple of cruise missiles and digitize LoC's books instead.

    Oh yeah, before I forget, LoC already has tons of seriously neat stuff online. My favorite is this collection of tons photos from Russia. These were taken between about 1907 and 1915! I don't know about you, but I never dreamed that I would see color photos that are almost 100 years old.

    Cheers,
    -- Art Z.

  40. Project Gutenberg by cpghost · · Score: 2, Interesting

    Now imagine that, from any web browser, you can access any book in the LoC for which the copyright has expired. I like that idea!

    That's the idea of Project Gutenberg. It's been around for quite some time now, and everybody is free to join their distributed proofreading network!

    --
    cpghost at Cordula's Web.
  41. Does the money estimate add up? by sbaker · · Score: 2, Interesting

    According to the LOC website, they have 119 million items in the library.

    They tell us that there are:

    4.5 million maps.
    14 million 'images' ...so I guess we assume the rest are books and newspapers.

    So in round numbers, let's say there are 50 million books and 50 million newspapers, periodicals, comic books, etc.

    $260 million to scan all that stuff? $2.60 per book or newspaper? That seems a little unlikely. The book would have to be carried off the shelf to the scanning machine, mounted in the machine (which would clearly have to turn the pages and scan and index them 100% automatically), the title and such would probably have to be typed in manually, then the book carried back to the shelf and placed back in the correct place.

    I find it hard to believe that a machine for scanning newspapers could be devised that could turn the pages automatically...but even without that, the project is still possible. At minimum wage, you'd need to pay people to scan a complete newspaper in maybe 20 minutes.

    Then some significant fraction of the collection would probably be too fragile for the automatic page turning machines...the cost of hand-scanning those would be FAR more than the bulk of the books. Some books would be *so* fragile and valuable that scanning them would be a considerable expense.

    Then there is the cost of the storage media. Suppose those 100 million books and newspapers had just 100 pages each on average. To get a readable image of the page you're going to need to scan at maybe 2000 x 2000 resolution. So we'll have something like 10^16 pixels, let's be generous and allow 100:1 compression ratios - and one byte per pixel. So we have 1000 terabytes. That's a lot - but to put it in context, it's only about a fifth of the amount
    that Google is estimated to have in their main cluster. Goggle spent $250 mil to buy that - so maybe only 20% of the LOC's budget needs to be for storage.

    OCR'ing and indexing all that data would be an incredibly valuable thing - the extra storage is trivial and the cost can be low if you aren't in a hurry to get the project done. Just stick a few thousand PC's in a room and wait!

    Dunno - $260 mil sounds like a low end estimate to me - but it seems do-able.

    --
    www.sjbaker.org
  42. www.loc.gov by pNutz · · Score: 4, Informative

    Of course this instantly deteriorates into a discussion about the shameful state of IP and copyright laws, the need to pool all human knowledge, and how crappy the US budget deficit is.

    If you go to the LOC's site, you'll notice American Memory on the front page.

    American Memory is where you can get a good portion of the public domain stuff (books, letters from immigrants to their families back home, photos of civil war enlistees, audio, Edison-era short movies) for free in a low-quality format. Archival quality copies and custom scans/recordings are available for $$$. Almost any work in the LOC can be scanned on request (3 week waiting time or so); this is how they manage to continue adding scans to their collection without requiring public or private funding. It's underfunded as it is and needs more bandwidth.

    This idiot in the article's proposal is completely unrealistic. Books can contain 100,000 to 5,000,000 characters. That's 100k-5Mb per book, times 26,000,000 books. That's not including the images and illustrations in some of these works. Many of the texts have value beyond the words they contain. We may be talking about image scanning the pages to preserve the look of the type, paper, and images. Archival TIFFs, since that's what the LOC uses.

    The article also mentions $60 thousand to 'store' this data (per month?, per year?, just once???, what about access?, searching?, redundant backups?). Another unrealistic number, even working off of the 1TB estimate.

    --
    Death and danger are my various breads and various butters.
  43. Where do I send my dollar? by Corwyn+ap · · Score: 2, Interesting

    $260 million is $1 per US citizen. A bargain if ever there was one. I suspect that this estimate is extremely low.

    The hard part is, of course, proofreading. See distributed proofreading at http://www.pgdp.net/c/default.php

    Let's get started on the out-of-copyright stuff NOW. Maybe b the time is online, people will see the benefit of making everything available.

    Thank You Kindly.

  44. Re:Er -- public domain by testadicazzo · · Score: 2, Insightful
    Well, there's a huge portion of it which is already in the public domain. So we could start with that.

    While we are at it, let's scale back the copyright limits back to life of creator + 20 years (or even farther back as far as I'm concerned), and bring back more of the booty which the corporations have plundered from us, the public.

  45. Re:Plain Text by CatMan79 · · Score: 2, Insightful

    But what about all the pretty pictures? I can think of a good many textbooks or art collections that would be rather worthless without the images. Including high resolution images in addition to plain text would take a TON of disk space--is this factored into the proposal?