Slashdot Mirror


Digitizing Your Dead Trees?

smart2000 asks: "I'm tired of lugging around dead trees. I've just moved offices and had to move over 100 pounds of 'essential' technical books. It is clear to me that the dead tree industry is never going to supply the books I want in electronic form, so it's time to do it myself. What hardware and software should I use?"

"The Plan: Take the binding of each book and cut it off. Feed into a scanner with duplex and cut-sheet feeder. Scan as a 300 DPI jpeg with compression. Then OCR them overnight. I don't expect the OCR to be perfect, just good enough to use as a searchable index.

What are the suitable scanner choices for Linux? Any recommendations for OCR software that will write in an open format? Has anyone done this before?"

4 of 347 comments (clear)

  1. Re:Go To Kinko's!!!! by Microsift · · Score: 4, Interesting

    I seriously doubt Kinko's would do this. They are ultra-paranoid about violating copyright. I imagine if you could do it at Kinko's, you'd have to all the work yourself in the Self-Service area. I doubt they have machines like that in self-service.

    --
    My other sig is extremely clever...
  2. Try one of these... by matthew.thompson · · Score: 3, Interesting
    Canon DR-5020

    Canon's 90ppm high speed scanner - only problem with high speed scanning is that they need loose leaves. Any decent books you have and want to copy will need a Stanley knife taking to the spine.

    Please remember to make decent backups on a long lasting madium with a high chance of recoverability. Failing that place the loose leaf versions with a document recovery firm and take their insurance for the full purchase value of the originals.

    --
    Matt Thompson - Actuality - Insert product here.
  3. Somewhat on topic... Historical Papers by Embedded+Geek · · Score: 3, Interesting
    My father passed on Sunday and we were going through all the family papers. We have lots of original documents from my family during the Civil War and earlier. My sister and I were thinking of donating them to a museum, so there would be no risk of their loss should my house get damaged (there's way too many documents to fit in my fire safe).

    Before doing this, though, we were thinking of scanning/copying all the documents to keep copies for ourselves. In doing so, though, we could use some advice:

    What special steps must we take in scanning 150+ year old documents, some very yellowed and fragile?

    What is the best format in which to store them (assuming we want them easilly readble in 20+ years for our kids)?

    What is the best media upon which to store the data (again, hoping for readability in 20+ years)? (I'm thinking online storage to allow easy conversion to the media of the moment, but I still want something to stash in the safe deposit box)

    Does anyone have experience with digital preservation/resoration of archival documents? Should I just try cleaning it up in photoshop or should I find a pro to help out? Maybe I can make it a term of the donation to the museum/library, for that matter.

    Thanks in andvance for your advice.

    --

    "Prepare for the worst - hope for the best."

  4. You *need* to be aware of OpenDJVu by Effugas · · Score: 5, Interesting

    Run, don't walk, to http://djvu.research.att.com/home.html . DJVu is a image-based competitor to PDF that is a feat of beautiful engineering -- 300DPI scans break down to about 10-30K a page, the viewer is about an order of magnitude faster than PDF, the format cleanly supports separate encoding of page texture/graphics vs. page text, there's significant amounts of open source for it, and more.

    It's truly a brilliant format. Go check it out.

    Yours Truly,

    Dan Kaminsky
    DoxPara Research
    http://www.doxpara.com