Slashdot Mirror


Linux and OSS to Aid the Library of Congress

flakeman2 writes with a link to Linux.com article about Linux's new role at the Library of Congress. The national archive of books is looking to begin an ambitious digitization project, aimed at getting some rare and crumbling documents into the public record online. These will include "Civil War and genealogical documents, technical and artistic works concerning photography, scores of books, and the 850 titles written, printed, edited, or published by Benjamin Franklin. According to Brewster Kahle of the Internet Archive, which developed the digitizing technology, open source software will play an 'absolutely critical' role in getting the job done. The main component is Scribe, a combination of hardware and free software. 'Scribe is a book-scanning system that takes high-quality images of books and then does a set of manipulations, gets them in optical character recognition and compressed, so you can get beautiful, printable versions of the book that are also searchable,' says Kahle." Linux.com and Slashdot.org are both owned by OSTG.

6 of 63 comments (clear)

  1. Help the Library of Congress save American History by JanusFury · · Score: 5, Funny

    For the past few months Microsoft has been dispatching crack teams of special operatives into the past to alter the course of American History for their benefit, in hopes of eventually transforming the United States into the New Microsoft Empire. But little do they know, a world-weary Librarian and Ex-Marine at the Library of Congress won't stand for that shit. He's put together a team of agents in hopes of reversing the damage to the timestream before it becomes irreversable. Together with Agents Linus Torvalds (Technology Specialist - Special power: x-ray glasses), Donald Trump (Logistics Specialist - Special power: nuclear fusion comb-over) and Stephen Hawking (Quantum Physics Specialist - Special power: medusa glare), he just may be the only hope for American History's future.

    --
    using namespace slashdot;
    troll::post();
  2. Re:Hmm... by rednuhter · · Score: 4, Informative

    RaTFA (note the lowercase "a" for "all")
    "the Internet Archive has migrated Scribe entirely to Linux, and Windows support has been dropped."
    Seems focused on Linux to me.

    --
    ERR 411[Max number of witty sigs reached]
  3. The most important part is not free software by Ed+Avis · · Score: 4, Interesting

    As the article says, the OCR itself is still done with proprietary software. I wonder if Google is using Tesseract for their digitization efforts. It would be cool if the original raw scanned images could also be archived and available for download - then you could print your own copy of the book, check the OCR for errors, or even do some weird genetic algorithm thing to make a LaTeX style that typesets the text in the same format as the original book.

    --
    -- Ed Avis ed@membled.com
  4. The sad part of digitization. by Lethyos · · Score: 4, Interesting

    Eventually we will have no physical record of these writings and may someday learn from the digital copies that Benjamin Franklin, George Washington, and others had offered enthusiastic support for wiretapping and other forms of electronic surveillance.

    --
    Why bother.
  5. Re:All copyrighted works should be held by jimicus · · Score: 4, Insightful

    At the same time, unless congress wants to hold and distribute material of questionable moral quality,

    Stop right there.

    When the purpose of your organisation is, to put it in very simple terms, "catalogue everything", you can't start making exceptions on moral grounds on the simple basis that what constitutes "questionable moral quality" today may be totally different tomorrow. Furthermore, who gets to define "questionable moral quality"? The closest anyone's ever come to creating such a definition is to say "Well, I can't actually come up with a concrete definition but I knows it when I sees it".