How Would You Archive Mounds of Genealogy Data?
dexter riley asks: "Hello, all. My mother, a librarian, historian and genealogist for over twenty years, died about a year ago. She left a huge amount of genealogy information, culled from books, magazines, and the internet, mostly in the form of typewritten, photocopied, and printed pages. My main goals are: Preservation - converting the documents into a compact format that can be easily copied and transferred to others; and Indexing - making it possible for someone else to easily find the documents referring to a particular person, family, place, or document type (like land, marriage, military, birth or death records). To this end, I would like to convert her work into a format that can be stored digitally and scanned for keywords, to make it easier for others to use this information for their genealogy projects later on. What tools do you recommend for handling a project of this size?"
" I'd estimate there are at least 10,000 pages of documents in all. Much of it is organized by binder into family groups, but a lot of it is unorganized, loose paper. Besides being an irreplaceable resource for any future genealogists in my family, there are other researchers working on related lines that may find some part of this data useful. At the very least, I would like the satisfaction of keeping some part of her work from being lost for a few years more.
Here's a general list of things that I've determined I would need:
Here's a general list of things that I've determined I would need:
- Scanners: What flatbed scanners would you recommend for fast, high-resolution scanning of documents?
- Image formats: What lossless image formats would you scan your original documents into?
- OCR software: Although OCR is not perfect, would you recommend using it to allow keyword searching to the original document? If so, which software would you suggest?
- Document Indexing: In addition to OCR, are there other tools (document tags?) that you would use to help classify and organize images and other digital documents?
- File organization software: Ultimately, many thousands of text and image files will be generated. Since I don't want to just convert a paper mess into a digital mess, what tools would you use to organize related image and text files?
1. Check the Document Management Continuum !h tml
http://www.archivebuilders.com/whitepapers/index.
2. Get two reasonable scanners that work with whatever software you choose. One with a document feeder (can be monochrome). Modern office MFPs work fine. The other one is a cheap flat bed scanner with color for anything the big one won't process.
3. Doc prep and Indexing will take much longer than the scanning - and unlike OCR, are a lot of manual labour. Expect a couple of weeks, minimum, especially if you have't got an indexing scheme in place.
4. Use TIFF G4 and PDF (OCRed text over the images).
5. Profit.
You might also want to ask the guys from rotten.com if they'll let you see the code behind the nndb.
After all, I am strangely colored.
The Mormon faith believes in tracing humans back to Adam and Eve. They have a hug geneaology library in Salt Lake City. There are several programs available that you can pu the information in, and submit it to them. They will keep it forever, and other can research it as well.
Great Linux Site
Ask anyone who has ever tried to get data off twenty year old tapes...
After I give them a dopeslap for not keeping their data current.
I mean, my first Mac had a 40MB hard drive, but I still have all the data from it - it's become easier and easier each generation to copy all my old data forward.
Granted, there's always the odd lost-tape found behind a cabinet, but that's someone who didn't have a good data retention plan in place and didn't care about that data too much.
There will always be a need for forensic recovery, but compared with just a few years ago, almost all the casual users I know keep all their data on a hard drive. The floppies and ZIP's are gone. Some of it is on CD-R, but that's the new backup media, not current storage.
Now getting them to do a good backup so I don't have to go rescue their drives with dd_rescue - somebody let me know how to do that!
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)