How Would You Archive Mounds of Genealogy Data?
dexter riley asks: "Hello, all. My mother, a librarian, historian and genealogist for over twenty years, died about a year ago. She left a huge amount of genealogy information, culled from books, magazines, and the internet, mostly in the form of typewritten, photocopied, and printed pages. My main goals are: Preservation - converting the documents into a compact format that can be easily copied and transferred to others; and Indexing - making it possible for someone else to easily find the documents referring to a particular person, family, place, or document type (like land, marriage, military, birth or death records). To this end, I would like to convert her work into a format that can be stored digitally and scanned for keywords, to make it easier for others to use this information for their genealogy projects later on. What tools do you recommend for handling a project of this size?"
" I'd estimate there are at least 10,000 pages of documents in all. Much of it is organized by binder into family groups, but a lot of it is unorganized, loose paper. Besides being an irreplaceable resource for any future genealogists in my family, there are other researchers working on related lines that may find some part of this data useful. At the very least, I would like the satisfaction of keeping some part of her work from being lost for a few years more.
Here's a general list of things that I've determined I would need:
Here's a general list of things that I've determined I would need:
- Scanners: What flatbed scanners would you recommend for fast, high-resolution scanning of documents?
- Image formats: What lossless image formats would you scan your original documents into?
- OCR software: Although OCR is not perfect, would you recommend using it to allow keyword searching to the original document? If so, which software would you suggest?
- Document Indexing: In addition to OCR, are there other tools (document tags?) that you would use to help classify and organize images and other digital documents?
- File organization software: Ultimately, many thousands of text and image files will be generated. Since I don't want to just convert a paper mess into a digital mess, what tools would you use to organize related image and text files?
Dead tree lasts longer than computer media. Ask anyone who has ever tried to get data off twenty year old tapes...
You might also want to ask the guys from rotten.com if they'll let you see the code behind the nndb.
After all, I am strangely colored.