Slashdot Mirror


How Would You Archive Mounds of Genealogy Data?

dexter riley asks: "Hello, all. My mother, a librarian, historian and genealogist for over twenty years, died about a year ago. She left a huge amount of genealogy information, culled from books, magazines, and the internet, mostly in the form of typewritten, photocopied, and printed pages. My main goals are: Preservation - converting the documents into a compact format that can be easily copied and transferred to others; and Indexing - making it possible for someone else to easily find the documents referring to a particular person, family, place, or document type (like land, marriage, military, birth or death records). To this end, I would like to convert her work into a format that can be stored digitally and scanned for keywords, to make it easier for others to use this information for their genealogy projects later on. What tools do you recommend for handling a project of this size?" " I'd estimate there are at least 10,000 pages of documents in all. Much of it is organized by binder into family groups, but a lot of it is unorganized, loose paper. Besides being an irreplaceable resource for any future genealogists in my family, there are other researchers working on related lines that may find some part of this data useful. At the very least, I would like the satisfaction of keeping some part of her work from being lost for a few years more.

Here's a general list of things that I've determined I would need:
  • Scanners: What flatbed scanners would you recommend for fast, high-resolution scanning of documents?
  • Image formats: What lossless image formats would you scan your original documents into?
  • OCR software: Although OCR is not perfect, would you recommend using it to allow keyword searching to the original document? If so, which software would you suggest?
  • Document Indexing: In addition to OCR, are there other tools (document tags?) that you would use to help classify and organize images and other digital documents?
  • File organization software: Ultimately, many thousands of text and image files will be generated. Since I don't want to just convert a paper mess into a digital mess, what tools would you use to organize related image and text files?
Did I miss anything in the above list? Any suggestions you all might have would be hugely welcomed."

6 of 73 comments (clear)

  1. Google by Phillup · · Score: 3, Interesting

    Give it to google, let them do it.

    You'll have it forever, and anyone will be able to pull it up.

    (Too bad they don't offer this service... yet.)

    --

    --Phillip

    Can you say BIRTH TAX
  2. Tools by ka9dgx · · Score: 2, Interesting
    Image formats: It appears that TIF is currently the gold standard in terms of archival storage of documents. JPEG2000 will be the way to go, once it becomes commonplace.

    Document Indexing/File Organization: A Wiki is the proper tool for this job, in my opinion. It makes it very easy to edit, and hyperlinking is instictive. You can easily attach documents to pages, you can usually export the whole thing as a directory tree. Most Wiki software also keeps track of all of the versions of a page, so you can worry less about making bad mistakes.

    I've used both MoinMoin, which is a traditional web based Wiki, and WikiDpad, which is an IDE environment for Windows that does Wiki-like things. Both of these programs are open source, Python based applications.

    You also might want to check out ThumbsPlus by Cerious Software, which stores thumbnails of images in a database (including SQL backends), along with keywords and user fields. It can help you as well.

    --Mike--

  3. In response to your questions... by Shimdaddy · · Score: 2, Interesting

    * Scanners: I would go with something basic. I'm a debater for a high school squad, and last year when we decided to digitize literally thousands of pages of evidence, we used one of these HP 5550 It's great, cheap (Only $300)and USB 2.0, the only thing I would say is an absolute requirement. * Image formats: I would use tif. They can be huge and higer resolutions, but scanning at 1bit (Black and White) seems to keep things under control. You also could try adobe's pdf, but then you are locked in with adobe. * OCR software: If your copies are clean, I would say go for OCR, but don't let it replace the images of the page. That's because OCR can now keep 99.9%ish of the text, but it loses all the formatting. So scan in the text for searching, but keep the images around for viewing. * Document Indexing: I would just index them by date, which I wwould make the filenames. * File organization software: Paperport is absolutely great for this task -- you can "stack" images together, put them into colored folders, conversion between formats is just drag and drop, I would highly reccommend using it, it's on verion 9 or 10 by now.

  4. Google says search, don't sort! by jimbro2k · · Score: 2, Interesting

    Meaning that you don't necessarily need to organize the data, just be able to search it quickly.
    If you agree with that philosophy, then, after you have it all in ASCII, just do a full text index of the data (which makes sense if the data is rarely or never updated) and it is quick to pull out anything you need.

    --
    There is not nearly enough love in the world, but there is far too much trust.
  5. Re:Mormoms can help by Anonymous Coward · · Score: 1, Interesting

    Not to turn this into a discussion on religion and whatnot, but it's probably worth mentioning that any dead relatives you turn over to the mormons will most likely be posthumously baptized into their faith. Being agnostic myself, I don't care one way or another, but some people do. I have a friend who is more into genealogy than I am who does research at one of the local LDS centers. He claims he is hounded everytime he goes there for his information (but won't give it for this very reason).

    This article: http://archives.cnn.com/2002/US/West/12/10/baptizi ng.the.dead.ap/
    talks about a problem Jewish descendants of holocaust victims have with said baptisms.

  6. Re:Mormoms can help by menscher · · Score: 2, Interesting
    Although I'm sure the Mormons would welcome a fixed, formalized family tree, most of the information I have is far lower-level than they might be interested in. There's a little data like, "Jehod begat Ezekial begat Fred", but most of it is like "John Smith owned 12 acres in Norfolk County in 1728."

    Although they might not be able to take that level of detail into their central database, I think they would welcme info like that at a more local level. At the local levels, they often have small libraries of genealogical data relevant to that specific locality. This way, a researcher can simply contact the local family history library and look up any information they might want.

    Definitely look up The Church of Jesus Christ of Latter-day Saints in your phonebook and give them a call. Most areas have a local genealogy expert that can help you.