Slashdot Mirror


How Would You Archive Mounds of Genealogy Data?

dexter riley asks: "Hello, all. My mother, a librarian, historian and genealogist for over twenty years, died about a year ago. She left a huge amount of genealogy information, culled from books, magazines, and the internet, mostly in the form of typewritten, photocopied, and printed pages. My main goals are: Preservation - converting the documents into a compact format that can be easily copied and transferred to others; and Indexing - making it possible for someone else to easily find the documents referring to a particular person, family, place, or document type (like land, marriage, military, birth or death records). To this end, I would like to convert her work into a format that can be stored digitally and scanned for keywords, to make it easier for others to use this information for their genealogy projects later on. What tools do you recommend for handling a project of this size?" " I'd estimate there are at least 10,000 pages of documents in all. Much of it is organized by binder into family groups, but a lot of it is unorganized, loose paper. Besides being an irreplaceable resource for any future genealogists in my family, there are other researchers working on related lines that may find some part of this data useful. At the very least, I would like the satisfaction of keeping some part of her work from being lost for a few years more.

Here's a general list of things that I've determined I would need:
  • Scanners: What flatbed scanners would you recommend for fast, high-resolution scanning of documents?
  • Image formats: What lossless image formats would you scan your original documents into?
  • OCR software: Although OCR is not perfect, would you recommend using it to allow keyword searching to the original document? If so, which software would you suggest?
  • Document Indexing: In addition to OCR, are there other tools (document tags?) that you would use to help classify and organize images and other digital documents?
  • File organization software: Ultimately, many thousands of text and image files will be generated. Since I don't want to just convert a paper mess into a digital mess, what tools would you use to organize related image and text files?
Did I miss anything in the above list? Any suggestions you all might have would be hugely welcomed."

4 of 73 comments (clear)

  1. Dead tree by keesh · · Score: 2, Insightful

    Dead tree lasts longer than computer media. Ask anyone who has ever tried to get data off twenty year old tapes...

    1. Re:Dead tree by bill_mcgonigle · · Score: 4, Insightful

      Ask anyone who has ever tried to get data off twenty year old tapes...

      After I give them a dopeslap for not keeping their data current.

      I mean, my first Mac had a 40MB hard drive, but I still have all the data from it - it's become easier and easier each generation to copy all my old data forward.

      Granted, there's always the odd lost-tape found behind a cabinet, but that's someone who didn't have a good data retention plan in place and didn't care about that data too much.

      There will always be a need for forensic recovery, but compared with just a few years ago, almost all the casual users I know keep all their data on a hard drive. The floppies and ZIP's are gone. Some of it is on CD-R, but that's the new backup media, not current storage.

      Now getting them to do a good backup so I don't have to go rescue their drives with dd_rescue - somebody let me know how to do that!

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    2. Re:Dead tree by 4of12 · · Score: 3, Insightful
      Dead tree lasts longer than computer media.

      Frightful.

      I was looking at 100 year old newspapers from a small town in AL about 5 years ago. Yellowed, brittle, crumbly. Half of those old newspapers were useless to amateur pawed geneologists who were slowly contributing to their demise by even attempting to turn the pages.

      Then, there's the tons of valuable paper records that get passed to random descendents of record keepers (eg, Grandpa has a bunch of records of marriages, births, baptisms from some old church from 60 years ago that doesn't exist).

      If you make dead tree records I'd recommend making multiple copies that get distributed to different people in different places, preferably on acid-free paper.

      The biggest enemies of geneological records, IMHO, are

      • uncaring descendents chucking out a bunch of "junk",
      • their antecedents who never even bother to tell them or, better, write down who the hell is in those old photos, etc., and
      • the odd house fire that consumes everything that can burn.
      --
      "Provided by the management for your protection."
  2. Organization by poopdeville · · Score: 4, Insightful
    Are you sure there's actually a mess? Since your mom was a librarian, it seems to me that she would know how to organize this information. Go through it and make sure the information isn't structured before you start messing around.

    You might also want to ask the guys from rotten.com if they'll let you see the code behind the nndb.

    --
    After all, I am strangely colored.