Preserving Old Research Notes and Documents?
twistedcubic asks: "I have several thousand 8.5 x 11 inch dead tree pages of notes and research that takes up too much storage space. I would like to have all these notes scanned into PDF files (for example) so I can recycle the pages and reclaim storage space. Does anyone know of a store that provides this service, or an inexpensive machine that will do the job in a reasonable amount of time?"
Sorry to reply to my own post, but I felt bad about the unhelpfulness of my previous comment. I headed over to Visioneer's site (www.visioneer.com) and found a few scanners that handle like 25 pages at a time. The more you spend, the faster it scans. Sorry, I cannot personally recommend a scanner in particular. Never had one like this.
Good luck!
"Derp de derp."
There's tons of companies that specialize in electronic document scanning & OCR, usually for the legal industry. Probably cost .05 to .10 a page, but you might be able to cut a deal as an individual rather than a law firm.
There are companies that will do this for you. For example, IMC in WV (http://www.imcwv.com/). They can scan it all to PDF using the image as what you see in the PDF backed up with the OCR'd text. That way the document is somewhat searchable, but you always see the exact scan of the doc when you look at the PDF.
I'm better, because I'm bigger
Disclaimer: I used to work for this company as a coop student.
I would contact PRG Schultz as they have done this for large clients in the past. Hey have a program called imDex which is pretty slick. Basically, it's a searchable, cross-indexable database, so you'll have OCR'd text, along with TIFF's or PDF's of the documents. If you would like more information, let me know.
The problem is then you have to come up with a safe long term way to store digital data.
Clue:
There isn't one.
The best thing to do is NOT convert the paper to digitized format. Find some space instead, and store the paper. Your data will be much safer.
Many libraries will have reader-printers that for a small fee (eg, $0.20/page?) you can print a copy.
Most of the expense with fiche is the production of the silver halide original; diazo copies are relatively cheap. If it's really important to you, have a copy made and lock the original film in a safe deposit box (or at least offsite)
Check out the Fuji ScanSnap. Their lowest-end document scanner; but still faster than all the slow consumer-level junk; and comes with a version of Acrobat that will OCR the images and put the text in a "hidden" layer for searching.
Dude! I already found a $100 scanner that does the job and works in Linux (HP officejet 4215). It scans really fast. My only problem up til now was that PDF redering was too slow. But then I compared the results to DJVU... Wow! The DJVU files render incredibly fast! Thanks!
(Of course, you will still need to spend lots of time scanning, naming and classifying those pages. The ADF and 10yo nephew suggested in another post might be useful for that.)
DjVu offers very compact representation without the need to OCR the document (I've converted a 13 megs scanned PDF into a 600K DjVu which was much faster and easier to read), and optionally a "hidden text layer" if you want to OCR it to make it searchable.
"I'm never quite so stupid as when I'm being smart" (Linus van Pelt)