Slashdot Mirror


Digital Future of the Library of Congress

lesinator writes "On Monday the 28th the US Library of Congress is holding the eighth lecture in its series on Managing Knowledge and Creativity in a Digital Context. Previous speakers include David Weinberger on blogging, Brewster Kahle - founding member of archive.org and the wayback machine, and Lawrence Lessig on intellectual property and the creative commons. After the lecture questions will be taken from the audience and the internet. C-Span will be broadcasting the lecture live at 6:30 PM EST, and also has archives of previous lectures. Audio archives of previous lecture are available at Audible.com in the Selected Free Media section."

141 comments

  1. At last! by Shadow+Wrought · · Score: 3, Funny

    We'll know just how much storage really is required to hold the Library of Congress.

    --
    If brevity is the soul of wit, then how does one explain Twitter?
    1. Re:At last! by cmburns69 · · Score: 4, Insightful

      While it's an interesting question, it really depends on how you want to store the contents of each book.

      Would you store each page of each book as an image? As flat ASCII text (except of pictures and diagrams, of course!)? What kind of indexing would you do? Basic indexing of book names? Full-text indexing of the contents? All that storage adds up!

      In summary, the library of congress (depending on the method used) could probably fit into something ranging from a couple of gigabytes to a couple of petabytes.

      --
      Online Starcraft RPG? At
      Dietary fiber is like asynchronous IO-- Non-blocking!
    2. Re:At last! by Shadow+Wrought · · Score: 4, Interesting
      Well I owuld think that they would have to start with an image first. Once they OCR'd it and generated ascii text files, they could save a tremendous amoutn of space by simply deleting the images. However, after that much effort in imaging all those pages, I just can't see them doing that. The best bet is probably two databases, one of ascii text and one of images.

      They might even be able to generate revenue by having the ascii text freely available and searchable, while the images would cost money. That way folks just interested in the text can find it easily, while scholars and others who need to see the source material can have access at a moderate price.

      --
      If brevity is the soul of wit, then how does one explain Twitter?
    3. Re:At last! by WillAdams · · Score: 2, Interesting

      There's a cue for a question I've been wondering about for a while.

      What was the first reference / usage of ``LoC'' as a unit of knowledge measurement?

      The first time I recall seeing it was in Michael Gear's novels, _The Artifact_ if memory serves, ~1976.

      Anyone have an earlier instance?

      William

      --
      Sphinx of black quartz, judge my vow.
    4. Re:At last! by Anonymous Coward · · Score: 0

      I'm thinking it would depend on the book. Most books you'd want to keep as just ASCII text, but others, where the picture content and page numbers are important (children's books, encyclopedias, etc.) should probably be kept as images.

    5. Re:At last! by lelitsch · · Score: 1

      Well, you could always google for this kind of information.

      10 Terabytes: Printed collection of the U. S. Library of Congress

    6. Re:At last! by NoMoreNicksLeft · · Score: 1

      Uncompressed text, maybe markup, and you're looking at about 20 terabytes I believe. Adding in the works with either illustrations or photographs, in some decent but lossy compressed format, and you're easily quadrupling that (just a guess).

      Indexing, by what, subject, author, and title? 1% overhead at most. Fancier googlesque searching though, could be a big hit.

      And correct me if I'm wrong, but there are quite a few videos too.

      Not to mention some historical stuff that can't even be digitized all that well (think texts and other documents that could only be stored as images, because Unicode Consortium hasn't worked out the encoding for that language yet).

    7. Re:At last! by punkass · · Score: 0, Flamebait

      Go back and read his question. He was asking when was the term first used, not what it meant. Ass.

      --
      "Nobody owns the fucking words man." - James Dean
    8. Re:At last! by Clay+Pigeon+-TPF-VS- · · Score: 0, Troll

      No, you're the ass. Just look at your UID.

      --
      Viral software licensing is not freedom, it is in fact GNU/Socialism.
    9. Re:At last! by Anonymous Coward · · Score: 0

      Go back and read his response! He said if he wants to know the answer he can fucking google for it. ASS.

    10. Re:At last! by DustMagnet · · Score: 1

      I just tried Google and it's not supported by Google math.

      --
      'SBEMAIL!' is better than a goat!!
    11. Re:At last! by FlopEJoe · · Score: 1

      I can't stress this enough... whatever they do, for the love of God, DON'T USE .LIT!

    12. Re:At last! by Anonymous Coward · · Score: 0

      That makes to much damn sense... who are you and why are you on /.

    13. Re:At last! by caseydk · · Score: 2, Interesting


      I was working on this project just a few years back (2001-2002).

      Our estimates projected that by 2005, it would be take about 4 TB of digitization EACH day to keep pace.

      The first storage phase called for 180TB server.

    14. Re:At last! by aboyko · · Score: 2, Insightful

      A couple of gigabytes?! Only if you burn it first. There's something like 10^8 books, nevermind the other stuff. How do you compress any given book into 100 bytes?

      The "20 TB" figure comes from the smallest possible measure, treating the flat books as ASCII text. Even just considering current digital content, it's also inaccurately small by >1 order of magnitude.

      It's a really really really big library.

    15. Re:At last! by spectasaurus · · Score: 1

      I'm betting it takes about one Library of Congress to store the Library of Congress. Any takers?

    16. Re:At last! by cduffy · · Score: 1

      Mmm. I've seen document archival formats available (patented, I think) taylored for printed documents -- using one of these, it should be possible to get your typical page well below 100K, and stay in that general range even with drawn illustrations (though anything w/ color or photo-style images is no longer suitable). DjVu is a prime example of these, though others exist.

      So keeping the scanned images shouldn't really require such a tremendous amount of space.

  2. Here's an idea related to audio archiving by filmmaker · · Score: 4, Insightful

    Maybe the fine folks at audio.com might consider making their audio clips available by means other than the Real or MS media players?

    1. Re:Here's an idea related to audio archiving by Anonymous Coward · · Score: 0

      And what would you propose?

    2. Re:Here's an idea related to audio archiving by boarder8925 · · Score: 1

      How about MP3 or Ogg?

    3. Re:Here's an idea related to audio archiving by Anonymous Coward · · Score: 0

      My mistake. I was thinking video for some reason...
      I concur with you and say ogg though; better compression and OS
      Is their an alternative for video though that has the same benefits of ogg?

    4. Re:Here's an idea related to audio archiving by zotz · · Score: 1

      I will second that request. If you are really trying to benefit the public, try a format that is Free please.

      all the best,

      drew

      I was indeed taught that "beggers can't be choosers," but I am not begging, just giving "a word to the wise."

      http://www.archive.org/audio/audio-details-db.php? collection=opensource_audio&collectionid=JohnConst antakisdrewRobertsRainwaterBlues

      --
      FreeMusicPush If you want to see more Free Music made, listen to Free
    5. Re:Here's an idea related to audio archiving by zotz · · Score: 1

      OGG - theora and vorbis perhaps. Yes, I agree.

      "I am your father's brother's nephew's cousin's former roommate."

      Dude, I bremembah you now. Why didn't ya say so sooner?

      all the best,

      drew

      http://www.archive.org/search.php?query=creator%3A %22drew%20Roberts%22

      --
      FreeMusicPush If you want to see more Free Music made, listen to Free
  3. Re:How many.... by tquinlan · · Score: 0, Troll

    Ya know, when you do that, you ruin the opportunity to actually make the jokes for the rest of us! ;)

    --
    DBA? Software Engineer? My company is hiring! Click
  4. Dammit! by dteichman2 · · Score: 2, Insightful

    What are they thinking! Airing this at 6:30 PM EST! CSpan has just ensured that nobody on the west coast will see this. Or, is that what they are aiming for?

    --


    Silence is golden... and duct tape is silver.
    1. Re:Dammit! by BroadwayBlue · · Score: 1

      No VCRs, Tivo, mythTV, or tv-tuner cards out there? Or maybe just wait for the torrent when you get home?

    2. Re:Dammit! by dteichman2 · · Score: 1

      Yes. Maybe the torrent. I've got the VCR, but no tape for it.

      --


      Silence is golden... and duct tape is silver.
    3. Re:Dammit! by lukewarmfusion · · Score: 5, Funny

      C-SPAN is clearly concerned with ratings. Didn't you see the stuff they pulled out for Sweeps week? I think it was something like "old guy reading boring text to empty room."

    4. Re:Dammit! by 0x461FAB0BD7D2 · · Score: 1, Funny

      Sounds like college to me.

    5. Re:Dammit! by Hachey · · Score: 1

      I live on the west coast (California) and I am going to watch it.


      -----
      Check out the Uncyclopedia.org :
      The only wiki source for politically incorrect non-information about things like Kitten Huffing and Pong! the Movie !

      --
      Please allow me to hate the creator of the 120-character limit: *HATES*. Thank you.
    6. Re:Dammit! by Anonymous Coward · · Score: 0

      Who on the West Coast cares about knowledge? They're watching Fear Factor.

    7. Re:Dammit! by gregorio · · Score: 1
      What are they thinking! Airing this at 6:30 PM EST! CSpan has just ensured that nobody on the west coast will see this. Or, is that what they are aiming for?
      From the submitter: C-Span will be broadcasting the lecture live at 6:30 PM EST, and also has archives of previous lectures. .

      Well, if it's a LIVE broadcasting, I'm pretty sure that C-Span will have to air it at whatever time the lecture will be happening. =]

      Chill out, they archive their boardcasts.
    8. Re:Dammit! by Scott7477 · · Score: 1

      I think parent was aiming for "Funny" and not "Insightful".

      --
      "Lack of technical competence coupled with the arrogance of power, as usual, leads to no good end."
  5. Nice, but how long? by Anonymous Coward · · Score: 2, Funny

    How long is it going to take to digitize the entire library?

    Anyone have a good approximation? I'd like to know in Burning Libraries of Congress (BLC) please.

    I'm guessing somewhere around 10-200 BLC.

    1. Re:Nice, but how long? by yuriismaster · · Score: 3, Interesting

      Well, I would imagine that unless they have a massive staff and many OCR scanners or automation with REALLY good OCR, this may take a LOONNNG time.

      I'm not quite sure about the length of a BLOC, but this is a job for not-quite-manual labor. Each book requires a simple task: Scan page 1, flip page, scan page 2, page 3, flip, ad infinitum.

      One way to save on time would be to contact the publshers of any book made after 1985-ish, where you can get electronic copies from the author. Some older books may have been already digitized, but it's still going to take more than 25 years unless there's a massive army working on this.

    2. Re:Nice, but how long? by Blue-Footed+Boobie · · Score: 4, Informative
      Nonsense. I put together solutions with high-speed scanners all the time. Some of our highest-end average 118ipm (Duplex) and have 1000pg ADFs.

      Also, you would generally split the load between 4-6 of these scanners for a job this big. The software is automated, and will OCR/Convert/Archive the file is one step.

      As a general rule, you can fit 10,000 b/w text pages in 1GB of storage.

      --
      DAMN YOU OCTODOG! DAMN YOU TO HELL!
    3. Re:Nice, but how long? by Anonymous Coward · · Score: 0

      Scanners that have a frame for holding the book and an automatic page turner have been around for a while. I know that IBM has (had?) them in their corporate duplication department or whatever you call it for quite a while.

    4. Re:Nice, but how long? by erick99 · · Score: 1

      That's great for loose sheets but what about scanning bound books? Aren't you then back to scanning a page, flip a page, scan a page, etc.?

      --
      http://www.busyweather.com/
    5. Re:Nice, but how long? by Blue-Footed+Boobie · · Score: 2, Informative
      Nope, Canon (and others) make Book Scanners with actually flip and scan each page automatically. They can handle all sizes too.

      They are very expensive, but cool as hell.

      --
      DAMN YOU OCTODOG! DAMN YOU TO HELL!
    6. Re:Nice, but how long? by melandy · · Score: 1

      I used to work in that industry too. Typically, bound material would be cut into loose sheets... you basically sacrifice a book to get the images electronically. Also, any decent high volume scanner can scan both sides of a sheet at once, so there's no flipping.

      As an unrelated asidem some even scan in color, but your storage requirements go way up if you do anything other than bitonal (even greyscale eats up the bytes pretty quick).

    7. Re:Nice, but how long? by AnFraX · · Score: 1

      That's great for loose sheets but what about scanning bound books? Aren't you then back to scanning a page, flip a page, scan a page, etc.?

      Cut the binding off?

    8. Re:Nice, but how long? by sribe · · Score: 1

      I'm not quite sure about the length of a BLOC, but this is a job for not-quite-manual labor. Each book requires a simple task: Scan page 1, flip page, scan page 2, page 3, flip, ad infinitum.

      Uhmm, no. You cut the binding off and run the pages through a document feeder, then rebind the book, using these things that some people refer to as "machines" ;-)

  6. Some ideas by gowen · · Score: 5, Insightful

    Here an interesting talks they might give:

    i) What if the Apostles had had technological means to prevent the reproduction of the New Testament?

    ii) Would our culture be diminished if the people who rediscovered Beowulf had been unable to decrypt the manuscript?

    iii) Is the continual repitition and reworking of myth and fable through the Oral Tradition disrespectful of the content creators who first recorded these stories?

    --
    Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
    1. Re:Some ideas by Scrameustache · · Score: 3, Insightful
      i) What if the Apostles had had technological means to prevent the reproduction of the New Testament?

      Main Entry: apostle
      Pronunciation: &-'pä-s&l
      Function: noun
      Etymology: Middle English, from Old French & Old English; Old French apostle & Old English apostol, both from Late Latin apostolus, from Greek apostolos, from apostellein to send away, from apo- + stellein to send
      1 : one sent on a mission: as a : one of an authoritative New Testament group sent out to preach the gospel and made up especially of Christ's 12 original disciples and Paul b : the first prominent Christian missionary to a region or group

      They wouldn't have prevented the distribution of the story their mission it was to distribute, that's for sure.
      --

      You can't take the sky from me...

    2. Re:Some ideas by gowen · · Score: 1

      Ooops my bad. What's the collective noun for the dudes that wrote the Gospels?

      --
      Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
    3. Re:Some ideas by lanswitch · · Score: 0

      Kernigan&Richie?

    4. Re:Some ideas by maroonhat · · Score: 0

      that would be 'gospel writers'

      --
      The more I learn about Windows the more I am surprised it runs at all
    5. Re:Some ideas by Anonymous Coward · · Score: 0

      None the Gospels were written until at least 300 years after the death of Christ and the last apostle. And that's being very generous historically, it's probably more like 500 years. Several of the apostles, by the way, probably never saw the Man from Nazareth with their own eyes, having been born after and come the religion decades later.

    6. Re:Some ideas by Anonymous Coward · · Score: 4, Interesting

      It's been continually re-written. For example, until 1954 Jesus never actually said "I am the Son of God"; when Pontius Pilate accused him of claiming to be the Jewish Messiah, he cryptically responded "It is you who said it." The fact Jesus didn't claim to be the Son of God but was surrounded by intense believers was one the essential "mysteries" of Christianity that you were supposed to accept as a Christian.

      In 1954, the American "New International" edition just editted the trial dialog and "re-interpreted" "it is you who said it" into "I am the Son of God." I don't think the European and Catholic churches have editted that part yet.

    7. Re:Some ideas by zotz · · Score: 1

      "iii) Is the continual repitition and reworking of myth and fable through the Oral Tradition disrespectful of the content creators who first recorded these stories?"

      iv) Why do people of oral traditions get no legal protections for their work? (From those outside their tradition who would fix it and lock them out from their own work?) Why must it be fixed?

      I know that is at least halfway to zany, but please try to give a halfway to reasonable answer.

      all the best,

      drew

      --
      FreeMusicPush If you want to see more Free Music made, listen to Free
    8. Re:Some ideas by Mazem · · Score: 1

      This, folks, is why I read Slashdot. Despite all the dupes, trolls, groupthink and pseudoscience, occasionally I read a gem of a post. That is one of the most scathing, concise attacks on DRM and IP ridiculousness that I have ever read. Parent poster, I salute you!

    9. Re:Some ideas by Anonymous Coward · · Score: 0
      I assume you don't mean the New International Version (NIV). The translation for this version started in 1968, and it was finished in 1978. The NIV does not have the text as you indicated, but if there is a different translation that has it this way, I would be interested (a reference would be nice).

      Also, Pilate doesn't ask Jesus if he's the Messiah. The chief priests asked him that (they were interested in his theology). Pilate asked him if he was the King of the Jews (he was interested in whether he was starting a revolution).

      I don't know what you mean by which churches are editing the Bible. Most denominations to not have an official translation. There are plenty to choose from. The good news is you can find one with a good balance of readability and faithfulness to the original texts. The bad news is anyone can publish a translation, even a bad one (it is perfectly feasible that someone would write a translation with the errors you noted). But you can alwas go back to the original languages or compare any of the scores of translations out there.

    10. Re:Some ideas by Anonymous Coward · · Score: 0

      How do you explain the many quotations of the gospels from authors writing in the second century AD?

    11. Re:Some ideas by Anonymous Coward · · Score: 0

      How do you explain the many quotations from throughout the centuries on Slashdot? Time travel? Ancient Korean monks?

    12. Re:Some ideas by doctorcisco · · Score: 1

      If you actually go read Mark 15:2 and John 18:37 in the New International Version, which was not first published until the late 1970's, you'll find out that the previous poster is misinformed.

  7. too many.... by charon_1 · · Score: 0

    Wow.. thats like the most number of links I have ever seen.. Which one do I click first..

    1. Re:too many.... by Anonymous Coward · · Score: 0

      Just close your eyes, move your mouse around the screen, and click. You've got to be more adventurous and spontaneous with life.

      Oh one more thing, just hope that you don't get goatse or something.

  8. That's the right idea .. carry it further by Anonymous Coward · · Score: 5, Insightful

    It is amusing that this story follows directly after a story about Microsoft proprietary file formats.

    The Library of Congress should insist that all 'publications' be submitted to it in open formats. What good is it if they have something on file that nobody can read! The extreme is that they have to have a licensed copy of every piece of software that ever created a file. If all the formats have to be open then at least historians can cobble together something that can read a file of interest.

    With the ip laws as stupid as they are now, we run the real risk of losing the record of our age.

    1. Re:That's the right idea .. carry it further by Anonymous Coward · · Score: 1, Insightful

      "...What good is it if they have something on file that nobody can read!..."

      I wouldn't say nobody. The paying members of a private club would be able to read it.

    2. Re:That's the right idea .. carry it further by zotz · · Score: 1

      Not if the only people with the rights to make readers stop doing so.

      The government would then have to get into some emminent (SP?) domain type takings. Right?

      all the best,

      drew

      --
      FreeMusicPush If you want to see more Free Music made, listen to Free
    3. Re:That's the right idea .. carry it further by John+Seminal · · Score: 2, Insightful
      It is amusing that this story follows directly after a story about Microsoft proprietary file formats. The Library of Congress should insist that all 'publications' be submitted to it in open formats. What good is it if they have something on file that nobody can read!

      Why even have it on any digital media. I want the original records. Screw having computerized copies. This is the nations library, where a copy of everything in its' original form must be.

      I have no problem with the card catalogue system. Some things should not change. If someone wants to open the "Digital Library of Congress" then go for it. But leave the original as-is. I can only imagine someone wanting to digitize the Great Library in Alexandria back 2000 years ago that resulted in the great fire. HA! We screw ourselves again.

      --

      Rosco: "If brains were gunpowder, Enos couldn't blow his nose."

  9. God is on the side of DRM by Ohreally_factor · · Score: 1

    You'd better read this.

    --
    It's not offtopic, dumbass. It's orthogonal.
  10. Next series by E+IS+mC(Square) · · Score: 4, Funny

    "Managing Knowledge and Creativity with DRM"...

    Sponsored by Apple and Microsoft!

    1. Re:Next series by dteichman2 · · Score: 1

      No, not Apple. Apple has gone out of their way to ensure that you have somewhat generous rights with the music you purchase from their store. This is keeping the RIAA people happy.

      More like: Sponsored by the RIAA and Microsoft!

      If you aren't happy with the DRM on the iTMS songs, I suggest the HYMN project.

      --


      Silence is golden... and duct tape is silver.
  11. Re:CSPAN... O great by first.last · · Score: 0

    Not as quiet as if you yelled:
    Hey Everyone! I'M A /. JUNKY!

    --
    Wishing I was a millionaire since 1969.
  12. Lawrence Lessig by Anonymous Coward · · Score: 0

    The one by Lawrence Lessig is extremely good. I was very surprised.

    --JayR

  13. Hello, Project Gutenberg?!? by Infosquawk · · Score: 5, Interesting

    I can never understand why there isn't more acknowledgment of our debt to Project Gutenberg on these issues.

    Michael Hart was digitizing books before digitizing books was cool, as far back as 1971, and the Project's efforts have been hugely successful on very little money. Nevertheless, I rarely see any official or media acknowledgment of the Project's efforts. If anyone should be on that panel for their ability to give advice from practical experience and performance in this field, while on a shoestring budget, it would be Hart!

    --


    OoO

    Please do not publish outside of /.
    1. Re:Hello, Project Gutenberg?!? by Quiet_Desperation · · Score: 1

      It's just the way the media works. They are lazy, and point their sensor arrays at the noisiest targets. Look at the Terry Schaivo case. I've heard the televised opinion of eighty seven million doctors *EXCEPT* the ones that have actually examined her.

    2. Re:Hello, Project Gutenberg?!? by Anonymous Coward · · Score: 0

      > I've heard the televised opinion of eighty seven million doctors *EXCEPT* the ones that have actually examined her.

      That's because they're not talking. There's a doctor-patient relationship, with all the privacy and ethical concerns that entails.

    3. Re:Hello, Project Gutenberg?!? by Quiet_Desperation · · Score: 1

      I know. My point was more the pointlessness of hearing from anyone else.

  14. Outsource parts of LOC to Google or Amazon? by G4from128k · · Score: 4, Insightful

    With the current wave of outsourcing, privatization, and government use of commercial contractors, I wonder if Amazon or Google don't have a major role to play in the process of cataloging/archiving/serving digital content in the future.

    Although LOC could never be replaced by a Google or Amazon, these private companies could provide services that augment or reduce the cost of LOC-like services. For example, if Amazon scans a book, why should LOC scan it too?

    --
    Two wrongs don't make a right, but three lefts do.
    1. Re:Outsource parts of LOC to Google or Amazon? by HeedlessYouth · · Score: 2, Interesting

      You mean like this?

    2. Re:Outsource parts of LOC to Google or Amazon? by SirGarlon · · Score: 1

      Recent IP law allows copyright on aggregations of data even if the data itself is public domain. So if Google were to digitally archive a bunch of public domain books (copyright expired on each book) then the searchable database could still be copyrighted and owned by Google.

      In order to outsource the digitization of the collection to a private company, the LOC would have to license its own collection back from that company!

      --
      [Sir Garlon] is the marvellest knight that is now living, for he destroyeth many good knights, for he goeth invisible.
  15. Profit by lilrowdy18 · · Score: 0, Offtopic

    1.)Steal LOC Carmen Sandiego style 2.)???? 3.)Profit 4.)???? 5.)???? 6.)Jail time 7.)???? 8.)President of US

    1. Re:Profit by zotz · · Score: 1

      " 1.)Steal LOC Carmen Sandiego style 2.)???? 3.)Profit 4.)???? 5.)???? 6.)Jail time 7.)???? 8.)President of US"

      Dude, that is some business plan/method! Did you try to patent it yet?

      all the best,

      drew

      I would have given you +1 Funny

      --
      FreeMusicPush If you want to see more Free Music made, listen to Free
    2. Re:Profit by Anonymous Coward · · Score: 0

      LOL! Too bad the mods fucked you. I thought it was funny.

    3. Re:Profit by Anonymous Coward · · Score: 0

      2. Run a game show on PBS and sell merchandising rights

      4. Get caught
      5. Lose court case

      7. Run for president

  16. Re:it has to be said... by TheGavster · · Score: 1

    Assuming 10% overhead for indexing, 1.1LOC.

    --
    "Because Science" is one step from "Because old book". Try "Because of my experiment testing my falsifiable assertion".
  17. Publication of New Testament by dpilot · · Score: 3, Interesting

    Authorship of the New Testament is not a simple question at all. First off, the Apostles didn't sit down and start collecting the New Testament. That was done hundreds of years later by some chaps in Rome or Turkey who also had political axes to grind. Every few decades or centuries, there's also Yet Another Translation, and in the forward they talk about the prayer, consideration, and attempts to divine the True Word of God that went into it. Common belief is that over the centuries there has been so much prayer, consideration, and attempts to divine the True Word of God that today's bibles MUST be correct. Yet in spite of all that, I have this feeling that precedent is even stronger in the Bible than in the US legal system, and that we're still carrying the weight of perhaps improper decisions made over a thousand years ago, plus trying to justify them.

    Then you also get to the issue of what is and isn't in the Bible. Consider "The suppressed Gospels and Epistles of the original New Testament of Jesus the Christ, Complete" http://www.gutenberg.org/etext/6516 for an example. Would the Apostles have wanted them published, or not? What about "The Forgotten Books of Eden"? Or less/more controversial, how about Maccabees, Sirach, Tobit, and company - the ones in the Catholic, but not the Protestant Bible? (Perhaps Maccabees is the most historically verifiable book IN the Bible, too.)

    By the way, most of the Bible ended up being written down much later - after even US copyrights would have expired. Good thing Steamboat Willie doesn't date back to BC.

    --
    The living have better things to do than to continue hating the dead.
  18. What about a backup copy? by voss · · Score: 3, Interesting

    It would seem if the LOC is going to have X number of Petabytes on computers...why not have a second copy stored AWAY from DC. If something were to happen to DC at least we would have backup copies of everything...and we probably should have a separate backup location at a third site.

    1. Re:What about a backup copy? by Anonymous Coward · · Score: 0

      It's already in the works. :)

    2. Re:What about a backup copy? by aboyko · · Score: 1

      Say! that's brilliant! I'm going to go down the hall and mention that to the LC IT guys!
      .
      .
      .
      Oh. Apparently it had occurred to them. Well, thanks, just the same. You think of anything else, please, drop us a line!

    3. Re:What about a backup copy? by karlrado · · Score: 1

      Brewster Kahle said on a podcast (IT Conversations) that they are working on an agreement with the library at Alexandria Egypt to back up each other's archives. Sounds like a good deal, since Alexandria doesn't have most of the LOC content and the LOC has little of what Alexandria is archiving.

  19. No money is precisely Why by Baldur_of_Asgard · · Score: 1

    The fact that Project Gutenberg has not consumed huge amounts of money to produce a great amount of value is PRECISELY WHY it does not get more recognition.

    The business of charity does not want competition from groups that create better products for less money, as that would put pressure on them to create a reasonable amount of value themselves, without the benefits of cushy offices and hefty salaries.

    The business of education also does not want competition from organizations that produce greater value at lower cost, without the benefits of cushy offices and hefty salaries (for the administrators - not the profs).

    Plus, Project Gutenberg has focused more on PRODUCING VALUE than on getting publicity and recognition for doing so. And the folks in the media tend to be too lazy and stupid to recognize what is REALLY worth reporting.

    Baldur_of_Asgard

  20. DRM and archiving are so diametrically opposed... by PornMaster · · Score: 3, Insightful

    DRM and archiving are quite conflicting. But then again, how do you make available information on which you want to retain technical methods of copyright protection?

    I think the obvious solution is to archive it in a non-DRM, non-proprietary format, but transcode to a DRM/proprietary format when retrieved, if the content is not in the public domain.

  21. Re:it has to be said... by Clay+Pigeon+-TPF-VS- · · Score: 2, Funny

    But how are we going to measure asteroids and meteors now that the larger imperial unit (Libraries of Congress) is going to get smallers? Will we have to fall back to the smaller unit (VW Beetles) for all of them now?

    --
    Viral software licensing is not freedom, it is in fact GNU/Socialism.
  22. Re:How many.... by punkass · · Score: 0, Troll

    That would be the point... I've been a user here since 1999, and while quips and jokes are amusing, memes like "hot grits" and such are really just noise.

    --
    "Nobody owns the fucking words man." - James Dean
  23. Small representations. by Grendel+Drago · · Score: 2, Interesting

    Have you ever seen someone's hundred and fifty page thesis, diagrams and all, fit onto a 3.5" floppy? People who wrote their theses in TeX or LaTeX, with a few postscript diagrams. I was impressed by how tiny the code for a real, well-produced book could be.

    'Course, the problem is that these representations work if you're entering in the content with that method in the first place.

    --grendel drago

    --
    Laws do not persuade just because they threaten. --Seneca
    1. Re:Small representations. by Eternally+optimistic · · Score: 0, Flamebait

      Yes, but you are missing the $100 bill I left in the printed copy in the library :)

      --
      What keeps me going is my inertia.
  24. Out of curiosity by Anonymous Coward · · Score: 0

    How do tv shows stand in creative commons?

    If a studio or outlet released over the internet a standalone encrypted video file/included player (with commercials), 720x(whatever) digital,...maybe a monthly service charge???TCPA enabled MB's+CPU's/tiers---etc. An odd combination of distributor servers and P2P could seriously offload bandwidth costs, etc..

    Problems exist where...

    Canada (for example) heavily censors through canadian content quota's and "cultural concerns". Having control over foreign content is a demonstrated concern

  25. Conspiracy much? by Grendel+Drago · · Score: 1

    Man, you're appealing to malice a lot more than laziness and stupidity, when the latter is a much, much more likely culprit.

    --grendel drago

    --
    Laws do not persuade just because they threaten. --Seneca
  26. The problem I see with Project Gutenberg... by Anonymous Coward · · Score: 0

    The problem is the books are not digitally preserved, but only the discernable content. Book covers of long ago, the crusty pages, the discoloration; all of it is not expressed in the transcribed document. Some secrets are found in the picture elements that can't be expressed by ASCII characters. Books, moreso the oldest, should be preserved as would art. I have some 1800s books on law and there are errata all around the book that was excluded when some jackass thought to transcribe the body content and sell the public domain work on eBay under his/her claim of copyright. Did you hear that? People partially copy a "public domain" book, claim their copyright on the derivitive, and sell the copyright material on eBay as their content. Of course, the first claim made is that the cost is for compensation of the labor to transcribe the material, yet that never prevents a tyrant from claiming the content as their property. According to UNITED STATES, any work not expressing "copyright" and a legible name is void and is to be reclaimed. The problem with the uncatalogued military Crypto-clearance aspiring books held in the Library of Congress is they all exhibit a copyright system much more realistic than what is currently held by UNITED STATES CONGRESS and UNITED STATES; books durring the time of Thomas Jefferson and Benjamin Franklin for instance alway had a "copyright" on them; someone's name at a printing press. All that UNITED STATES Copyright Act "law", and the prejudice to reclaim works not recognized by that Copyright Act, are repugnant to every script and spirit complementing the Constitution up until the year 1871 to today.

    1. Re:The problem I see with Project Gutenberg... by Baldur_of_Asgard · · Score: 2, Informative

      (1) Under the old US law, content had to be marked "Copyright" to be copyrighted. Under the present US law, all work is automatically copyrighted the moment it is created, UNLESS the author specifies otherwise. I think this holds true for works since, was it 1987? I forget exactly - but it's been a little while now.

      (2) A person who transcribes a book that is in the public domain can CLAIM a copyright on it, but this is not enforceable unless they have changed the text significantly enough for it to be a new work - in which case you probably don't want it anyhow, except possibly as a work of satire or fiction.

      Baldur of Asgard

  27. Re:Riiiight.. by MynockGuano · · Score: 1

    Excuse me, sir. I think you are lost. I believe the thread you are looking for is here.

    Have a nice day, and good karma to you!

  28. Re:How many.... by Anonymous Coward · · Score: 0

    I've been a user here since 1999

    Let me guess; you're posting from Korea?

  29. This just in... by SmokeHalo · · Score: 5, Funny

    The LOC has announced that they are accepting volunteers to digitize texts. Their first volunteer is Earl the night janitor, who has been busily keying in the last 20 years of New York City phone books. He hopes to move on to Chicago soon.

    --
    I'm not good in groups. It's difficult to work in a group when you're omnipotent. - Q
    1. Re:This just in... by superpulpsicle · · Score: 2, Funny

      Don't worry Earl will soon have the assistance of hundreds of non-English speaking Iraqi prisoners to help him.

  30. Re:Conspiracy not so much? by Baldur_of_Asgard · · Score: 1

    Let's just say that I have seen so many examples that I can only conclude that:

    (1) People in many "charitable" organizations and "educational" establishments are quite corrupt; or

    (2) People in many "charitable" organizations "educational" establishments are amazingly, astoundingly stupid.

    Neither bodes well, but only corruption seems to explain all the facts, especially in the case of the "education" establishment.

    Baldur of Asgard

  31. Merger with the CIA? by DJCF · · Score: 1

    I wonder how long before they merge with the CIA and become the Central Intelligence Corporation...

    (It's a joke.)

  32. Are they requiring publishers to submit PDF files? by melted · · Score: 4, Interesting

    Are they requiring publishers to submit PDF files for new entries yet? Or files in another open format? Man, I'd hate to see taxpayer's money wasted on doing work that they could avoid doing by simply mandating PDF submissions from publishers.

    I can see that some publishers may just say, "oh, my book isn't gonna be in libraries if I don't submit PDF, so much the better, I'll sell more copies". I hope these fellas realize how badly they're shooting themselves in the foot.

  33. Re:Conspiracy not so much? by dpilot · · Score: 1

    Maybe:
    (3) People in many charitable organizations are out DOING charity, not talking about it. Kind of like Project Gutenberg.

    I suspect it's the (3)s that make charity work, and make people want to keep it alive, but it's the (1)s that make the most noise and draw the most money.

    IMHO there's an unfortunately large class of people who specialize in smelling the flow of money, and inserting themselves into that flow. The world would be for the most part better off without them.

    --
    The living have better things to do than to continue hating the dead.
  34. At current typical data density, by Digital+Pizza · · Score: 0, Redundant

    how may Volkswagens are required to hold each Library of Congess and how fast must they drive to achieve decent bandwidth?

    --
    We apologize for the inconvenience.
  35. [piracy & religion] are so diametrically oppos by Anonymous Coward · · Score: 0

    Hehe. I'm sitting here thinking about the place that piracy (which isn't a new idea, nor confined to a particular medium) has brought us to. And I'm thinking that one thing those who put out religious material don't have to worry about, is that. What athiest in their right mind would swipe, and widely distribute THAT?

  36. DRM? by 192939495969798999 · · Score: 1

    Isn't the Library of Congress' digital collection, especially with respect to music, going to totally screw iTunes and any other online DRM stuff, in order to bring us our library materials?

    --
    stuff |
  37. I'd say... by Liquid+Len · · Score: 1

    ... that as a universal unit of measurement, it's gonna be around for a while.

  38. Publication of [The New Balancing Act] by Anonymous Coward · · Score: 0

    "By the way, most of the Bible ended up being written down much later - after even US copyrights would have expired. Good thing Steamboat Willie doesn't date back to BC."

    Good thing most ranters forget that Disney wasn't the only one supporting the Sonny Bono Act. Can't be looking balanced, can we?

    1. Re:Publication of [The New Balancing Act] by Anonymous Coward · · Score: 0

      ... forget that Disney wasn't the only one supporting the Sonny Bono Act

      Yeah, but they paid for most of the votes.

    2. Re:Publication of [The New Balancing Act] by dpilot · · Score: 1

      Name a few more, and I'll add them to future rants.

      --
      The living have better things to do than to continue hating the dead.
  39. no compression by pablo_max · · Score: 0

    Personally, I think they should not be allowed to compress anything from LOC. This way when they say, it can hold the LOC 20000000 times over it will mean something. Though to be honest, you can never really use it as a measurment of something since the LOC is always growing. Any time those monkeys on the hill rape us, its in there.

  40. Or even the sources. by pavon · · Score: 1

    Since nearly all typesetting is done electronically these days, I wonder if they shouldn't just have publishers send them the raw typesetting documents in addition to a hardcopy. It wouldn't be much work for the LOC to write (or buy) software to convert all the common typesetting formats into whatever standard format(s) they would like to use internally, and for dispersion to the public.

    It would certainly be smarter than scanning them in themselves, or demanding extra work on the publishers part to to convert to a format like PDF that might not be preferable 100 years from now. Heck for all I know they may very well be doing what I said - I know nothing about how the LOC works :)

  41. Re:it has to be said... by Anonymous Coward · · Score: 0

    I thought that LoC was a measure of information equivalent to 20TB (that should be a Google conversion). Asteriods require a unit of volume and, so, are always be measured in Beetles and Empire State Buildings.

    Note that the traditional unit (the 20TB figure above) is only all the text in the LoC in ASCII. It would be MUCH less than it would take to digitize the entire LoC including pictures and film.

  42. Re:it has to be said... by mog007 · · Score: 1

    Just assume they'll store all that data on the old type punch cards, or those big drums from before I was a fetus. Then, the LoC can retain its position as the largest unit of mass, data, AND volume.

  43. Yes, and yet...no. by oneiros27 · · Score: 2, Insightful
    You're making a large number of assumptions in your first paragraph:
    1. The OCR is always correct.
    2. The documents could be represented in ASCII
    3. The text is the only part of the document with any value
    Of course, your second paragraph shows that clearly those assumptions can't be true -- why would someone pay more for something without an additional benefit?

    And you wouldn't maintain seperate databases -- pictures aren't searchable. You'd want to use any OCRd (preferably vetted afterwards) as the basis for indexing the images, so that you could help people find more images that might be of interest to them (which you mentioned in the second paragraph). However, I'm not sure what the requirements are that the LOC operates under, so even if they're allowed to do cost recovery or otherwise charge fees.
    --
    Build it, and they will come^Hplain.
    1. Re:Yes, and yet...no. by Shadow+Wrought · · Score: 1
      I did make quite a few assumptions, but it is, after all, a thought process. The actual image would have greater value for some people than for others. If you want to read Moby Dick, you can do that in just a text format, you don't have to view the actual images (which are significantly larger files). If, on the other had, you are doing your doctorate thesis on Moby Dick than you will likely want to view the actual images. It is also more likely that you, since you require more than just the text, would spend money on being able to see the image. While you could certainly view the original , in person, the cost of airfare to DC would be more expensive than simply downloading the images.

      As for the seperate databases, I mispoke (miswrote?). You would have the images indexed by a few fields and, if it's a robust enough program (like Concordance) than you could include the OCR as well. Then you could actually link the database to the images. I just don't know how well its going to scale when your talking terra or petabytes worth of material. The distinction would be that the ascii text and the images would likely be stored seperately for ease of retrieval.

      I've done this on litigation databses reaching into the millions of images, but even that is only a fraction of what the LOC has to offer.

      --
      If brevity is the soul of wit, then how does one explain Twitter?
  44. it might be cool if... by montygreen · · Score: 1

    sometime before the "finale" Enterprise gets destroyed and they rebuilt it as Enterprise B heh

    --
    long time lurker, rare poster
  45. Why the Apocrypha Isn't in the Bible. by Anonymous Coward · · Score: 0

    Why the Apocrypha Isn't in the Bible.

    Catholics will tell you, "You Protestants are missing part of the Bible. We have the rest of it." This can throw people off, but it no longer has to. These false Catholic additions to the Bible are commonly called the Apocrypha or sometimes the Deuterocanonical books. This is a short treatise on WHY these books are not in the Bible.

    What is the Apocrypha anyway?

    The Apocrypha is a collection of uninspired, spurious books written by various individuals. The Catholic religion considers these books as scripture just like a Bible-believer believes that our 66 books are the word of God, i.e., Genesis to Revelation. We are going to examine some verses from the Apocrypha later in our discussion.

    At the Council of Trent (1546) the Roman Catholic religion pronounced the following apocryphal books sacred. They asserted that the apocryphal books together with unwritten tradition are of God and are to be received and venerated as the Word of God. So now you have the Bible, the Apocrypha and Catholic Tradition as co-equal sources of truth for the Catholic. In reality, the Bible is the last source of truth for Catholics. Catholic doctrine comes primarily from tradition stuck together with a few Bible names. In my reading of Catholic materials, I find notes like this: "You have to keep the Bible in perspective." Catholics do not believe that the Bible is God's complete revelation for man.

    The Roman Catholic Apocrypha

    Tobit
    Judith
    Wisdom
    Ecclesiasticus
    Baruch
    First and Second Maccabees
    Additions to Esther and Daniel

    Apocryphal Books rejected by the Catholic Religion:

    First and Second Esdras
    Prayer of Manasses
    Susanna*

    *A reader says: "Susanna is in the Roman Catholic canon. It is Daniel 13."

    Why the Apocrypha Isn't in the Bible.

    1. Not one of the apocryphal books is written in the Hebrew language, which was alone used by the inspired historians and poets of the Old Testament. All Apocryphal books are in Greek, except one which is extant only in Latin.
    2. None of the apocryphal writers laid claim to inspiration.
    3. The apocryphal books were never acknowledged as sacred scriptures by the Jews, custodians of the Hebrew scriptures (the apocrypha was written prior to the New Testament). In fact, the Jewish people rejected and destroyed the apocrypha after the overthow of Jerusalem in 70 A.D.
    4. The apocryphal books were not permitted among the sacred books during the first four centuries of the real Christian church (I'm certainly not talking about the Catholic religion which is not Christian).
    5. The Apocrypha contains fabulous statements which not only contradict the "canonical" scriptures but themselves. For example, in the two Books of Maccabees, Antiochus Epiphanes is made to die three different deaths in three different places.
    6. The Apocrypha includes doctrines in variance with the Bible, such as prayers for the dead and sinless perfection. The following verses are taken from the Apocrypha translation by Ronald Knox dated 1954:

    Basis for the doctrine of purgatory:

    2 Maccabees 12:43-45, 2.000 pieces of silver were sent to Jerusalem for a sin-offering...Whereupon he made reconciliation for the dead, that they might be delivered from sin.

    Salvation by works:

    Ecclesiasticus 3:30, Water will quench a flaming fire, and alms maketh atonement for sin.

    Tobit 12:8-9, 17, It is better to give alms than to lay up gold; for alms doth deliver from death, and shall purge away all sin.

    Magic:

    Tobit 6:5-8, If the Devil, or an evil spirit troubles anyone, they can be driven away by making a smoke of the heart, liver, and gall of a fish...and the Devil will smell it, and flee away, and never come again anymore.

    Mary was born sinless (immaculate conception):

  46. Who modded up that troll? 'ere is a rebuttal. by Anonymous Coward · · Score: 0

    The word apocrypha is a Greek word meaning "hidden." This identifies that the origin of the Apocryphal books is unknown, or doubtful. The Greek word pseudepigrapha means "false writing." This identifies that certain books of the apocrypha were considered to be false writings in the first Century.

    The Old Testament Apocrypha include from 14 to 19 books, depending on the method of counting, which were written in the period of 200 B.C. to 100 A.D. The number of books, the verse numbering and the actual verses themselves vary greatly depending on who prints the Apocrytha. Catholic versions of the Bible include 12 of these, but do not consider 1st & 2nd Esdras and the Prayer of Manasseh to be canonical, which is interesting. The Church of England accepts the Apocrypha as "semi-canonical."

    Apocrypha and Pseudepigrapha writings from Bible times:

    Old Testament Apocrypha

    1.First Esdras,
    2.Fourth Ezra,
    3.Tobit,
    4.Judith,
    5.Additions to Esther,
    6.The Wisdom of Solomon,
    7.Sirach
    8.Baruch,
    9.Letter of Jeremiah,
    10.Prayer of Azariah,
    11.Daniel and Susanna,
    12.Bel and the Dragon,
    13.The Prayer of Manasseh,
    14.First Maccabees,
    15.Second Maccabees,
    16.Third Maccabees,
    17.Fourth Maccabees,
    18.Psalm 151

    Old Testament Pseudepigrapha

    1.The Book of Jubilees,
    2.The Books of Adam and Eve,
    3.Life of Adam and Eve-Slavonic Version
    4.A Fragment of the Apocalypse of Moses
    5.The Martyrdom of Isaiah
    6.First Enoch
    7.The Letter of Aristeas
    8.The Apocalypse of Adam
    9.The Revelation of Esdras
    10.The Second Treatise of the Great Seth
    11.The Testament of Abraham

    Old Testament Apocrypha and Pseudepigrapha.

    The Book of Tobit.
    The Vita of Adam and Eve.
    The Wisdom of Solomon
    Baruch.
    Testament of Abraham.
    Testament of the twelve Patriarchs.
    The Book of Abraham (discovered by Joseph Smith).
    The Book of Judith.
    The Revelation of Esdras.
    1.Esdras
    2.Esdras
    First Enoch.
    The Slavonic version of the Life of Adam and Eve
    The Martyrdom of Isaiah
    The Wisdom of Jesus, son of Sirach (Eccleasticus)
    The Letter of Aristeas
    Azariah
    The Letter of Jeremiah
    The History of Susannah
    Bel and the Dragon
    The Prayer of Manasseh
    1th Maccabees
    2nd Maccabees
    3rd Maccabees
    4th Maccabees
    Fragment- The Apocalypse of Moses.

    Some of these books, such as 1st Macabees and Ecclesiasticus, are truly interesting, but that does not mean that they are inspired. There are many valid reasons why the Apocrypha and the Pseudepigrapha cannot be accepted as Scripture.

    1. These books were never included in the Hebrew canon of the Old Testament. The Jews never considered them part of their sacred canon. Josephus expressly limited the Hebrew canon to the same material contained the 39 Books we know as the Old Testament. Josephus knew of other Jewish writings down to his time, but he did not regard them as having equal authority with the canonical books.

    2. These books, as far as the evidence contained in the New Testament, were never accepted as canonical by Jesus and His apostles. When Jesus made reference to the Scriptures he said:

    Luke 24:27
    "Then beginning with Moses and with all the prophets, He explained to them the things concerning Himself in all the Scriptures."

    Luke 24:44
    "These are My words which I spoke to you while I was still with you, that all things which are written about Me in the Law of Moses and the Prophets and the Psalms must be fulfilled."

    We know what was contained in the Law of Jesus day we know what was contained in the Prophets and in the Psalms. We have the cannon of the J

  47. What's wrong with people? PNG or JPG each page! by Anonymous Coward · · Score: 0

    Just capture a Portable Networked graphics or JPEG image of all angles of the book and every page and detail. Skip the PDF format because it is copyright for intrusive purposes. All the old books are public domain, so we need to use an image format that can't encroach on public domain content; this is why it should use somthing not PDF, such implicitly as the PNG format.

  48. Open formats, a definition by Anonymous Coward · · Score: 0
    What is an open format anyway?
    • Completely documented in sources that are freely and publically available. RFCs and the W3C web site are good examples.
    • Unencumbered by patents or other restrictions on independent implementation.
    • Readable and writable by multiple tools on multiple operating systems.
    Can anyone think of things that should be added to this list, or any clarifications?
  49. Re:it has to be said... by Anonymous Coward · · Score: 0

    Then, the LoC can retain its position as the largest unit of mass, data, AND volume.

    Nope, I believe that honor is reserved for none other than your mom.

  50. Re:it has to be said... by kaens · · Score: 1

    No such thing as Left Over Crack

  51. That's the beauty of PDF by melted · · Score: 1

    It's an open format. Adobe does not control it, just like they don't control TIFF or PostScript (despite having invented both).

  52. Re:How many.... by punkass · · Score: 1

    Troll? That's cool. I could comment on how the moderation system has completely derailed since then too, but I wouldn't want to upset anybody :).

    --
    "Nobody owns the fucking words man." - James Dean