Preserving Old Research Notes and Documents?
twistedcubic asks: "I have several thousand 8.5 x 11 inch dead tree pages of notes and research that takes up too much storage space. I would like to have all these notes scanned into PDF files (for example) so I can recycle the pages and reclaim storage space. Does anyone know of a store that provides this service, or an inexpensive machine that will do the job in a reasonable amount of time?"
"I have several thousand PDF files taking up too much disk storage space. I would like to have all these files printed on to 8.5 x 11 inch dead tree pages of notes so I can delete the files, empty the recycle bin and reclaim storage space. Does anyone know of a store that provides this service, or an inexpensive machine that will do the job in a reasonable amount of time?"
For future reference, I suggest a printer.
--BladeMelbourne
10-year old nephew and a scanner.
Filing cabinet.
/^([Ss]ame [Bb]at (time, |channel.)){2}$/
i would try and convert the pages to some sort of text format to allow searching...
Sorry to reply to my own post, but I felt bad about the unhelpfulness of my previous comment. I headed over to Visioneer's site (www.visioneer.com) and found a few scanners that handle like 25 pages at a time. The more you spend, the faster it scans. Sorry, I cannot personally recommend a scanner in particular. Never had one like this.
Good luck!
"Derp de derp."
ADF (Automatic Document Feeder) scanners are fairly pricey (good ones are in the US$400 - US$1000 range, but you can get a cheapie Brother MFC-3240C All-In-One (C$140) that has a 20-page document feeder and then get a slave (e.g. some grad student) to feed in your pages for you.
My Brother MFC-2340C scanner comes with the PaperPort application, which generates PDFs and supports double-sided scanning even though the scanner doesn't support it. (You just flip over the whole stack once you've scanned one side, and start scanning the other side. Paperport knows how to automatically reconcile the pages.)
If you have Acrobat Professional, you can do a Paper Capture(TM) which is basically doing an OCR on the PDF and then storing the recognized words as "keywords" so that the PDF is searchable via Spotlight or other indexing mechanisms.
A document scanner is indeed a very useful piece of equipment -- I use it to scan notes and scrap paper containing rough ideas, often with lots of mathematics. Sometimes writing stuff on paper is just easier than typing in LaTeX...
The eminent computer scientist Edsger Dijkstra also liked to write stuff using pen and paper. His digitized works, called EWDs (after his initials, Edsger Wybe Dijkstra) are available here:
http://www.cs.utexas.edu/users/EWD/
Are the notes graphics-heavy (i.e., scientific/engineering)?
If not, give it to a typing service. Once you show them how much "stuff" you have, I'm sure they'll give you a discount. They might even agree to use OpenOffice2 (because it handles huge documents well, the files are small, and it has an excellent PDF exporter).
You'd still have to scan in the pictures/drawing/graphs, and place them appropriately, which will take time.
Also, there are firms that specialize it digitizing paper documents (mostly forms and regularized documents for businesses). Depending on the amount of hand-writing & graphics, it might not be appropriate, though.
All in all, no matter how you do it, the project will
"I don't know, therefore Aliens" Wafflebox1
There are companies that will do this for you. For example, IMC in WV (http://www.imcwv.com/). They can scan it all to PDF using the image as what you see in the PDF backed up with the OCR'd text. That way the document is somewhat searchable, but you always see the exact scan of the doc when you look at the PDF.
I'm better, because I'm bigger
Disclaimer: I used to work for this company as a coop student.
I would contact PRG Schultz as they have done this for large clients in the past. Hey have a program called imDex which is pretty slick. Basically, it's a searchable, cross-indexable database, so you'll have OCR'd text, along with TIFF's or PDF's of the documents. If you would like more information, let me know.
The problem is then you have to come up with a safe long term way to store digital data.
Clue:
There isn't one.
The best thing to do is NOT convert the paper to digitized format. Find some space instead, and store the paper. Your data will be much safer.
Many libraries will have reader-printers that for a small fee (eg, $0.20/page?) you can print a copy.
Most of the expense with fiche is the production of the silver halide original; diazo copies are relatively cheap. If it's really important to you, have a copy made and lock the original film in a safe deposit box (or at least offsite)
Print them in a book and wait for Google Print to scan them all for you.
go buy a modem, and grab an old fax machine, then fax the documents to yourself. You should be able to fax a decent number of pages at a time and can walk away and leave it running. these will be saved as multi-page tiffs which while not pdfs and searchable at least solve part of your problem.
RandomAndInteresting.comdefending the world from stupidity since 1979
Try a legal copyist.
In a drawer or filing cabinet.
and what are guarantees that it'll actually stay preserved for that long?
Wet-film microfilm has an estimated survivability of 500 years in ideal conditions and a minimum of 100 years in any reasonable conditions. To my knowledge this exceeds the lifetime of any digital medium.
It's fairly trivial to store redundant copies of your digital files, even in multiple locations worldwide. The costs are minimal too.
It's fairly trivial to store redundant copies of your microfilm, even in multiple locations worldwide. The costs are minimal too.
This is not my sandwich.
You don't want to spend money on physical storage, yet you're asking about a service that will do the job of scanning for you? Here's a hint: for the cost of hiring someone to do this job for you, you can rent a small room at a self-store place for 15-20 years.
I'll turn into a supernova and burn up everything. Well I'll turn into a black little hole and you'll turn into string.
This is what we do at work. We spend about $5000 on the set up, but remember that this is an enterprise where we scan about 125,000 pages to .pdf a month. It is probably possible for about $500 or so, for what you are looking at (oh, and some programming)
;)
First, you'll need a low-volume scanner. (Check the duty cycle to make sure it can handle you bookshelf of papers.) Then, you'll need something to convert the images to pdf. If you have any programming experience, write a quick app that uses http://www.imagemagick.org/ Image Magick to convert from tiff to pdf. Put each binding in its own folder, and pretend the "untitled1.pdf" says "page1.pdf"
If you want to get fancier have the front end app rename the untiled1.tiff to whatever you'd like. Also, you can embed extra information into the pdf by using metadata and Adobe XMP SDK (free download from Adobe). Make the meta data like:
TITLE="My Book"
AUTHOR="Bart Simpson"
etc.
Dude! I already found a $100 scanner that does the job and works in Linux (HP officejet 4215). It scans really fast. My only problem up til now was that PDF redering was too slow. But then I compared the results to DJVU... Wow! The DJVU files render incredibly fast! Thanks!
(Of course, you will still need to spend lots of time scanning, naming and classifying those pages. The ADF and 10yo nephew suggested in another post might be useful for that.)
DjVu offers very compact representation without the need to OCR the document (I've converted a 13 megs scanned PDF into a 600K DjVu which was much faster and easier to read), and optionally a "hidden text layer" if you want to OCR it to make it searchable.
"I'm never quite so stupid as when I'm being smart" (Linus van Pelt)
Ah, fuck it. I'm tired of doing your research for you. You log in as an AC, then expect a legitimate user to Google "lifetime of microfilm" and "cost of microfilm transfer" because you're too sorry to educate yourself. I no longer see any benefit in changing the relationship between my knowledge and your ignorance.
The only reason use Slashdot as an Anonymous Coward is if you would be fired, arrested, or sued for your post.
This is not my sandwich.