Slashdot Mirror


Batch Cataloging of Scanned Documents via OCR?

munwin99 asks: "I am looking for some software to process a batch of images (scanned forms). We want to use Gallery to view the images, and be able to search them by 3 or 4 attributes. We want to get these attributes from the form (date, name, etc). We want it to check a section of the scanned form, read the info from that section(s), and dump the retrieved info into Gallery (using OCR / ICR). Is there any (preferably) free or open source software that can do this? Supported OSes should include either Windows, Linux or Mac OS X. Even Gallery is optional, if someone has a better suggestion."

1 of 31 comments (clear)

  1. Custom Layout by tonsofpcs · · Score: 4, Informative

    Many pieces of OCR software allow you to create a 'layout' for OCRing, that is, specify where images and textual data are. If your forms all follow the same layout, or you have just a few [relatively], you can set up these layouts and, in many pieces of software, reuse them. The only cavet is that you need to be sure that the forms are scanned the same way; if your forms have prepunched holes or markings in specific points on the edge, you can use animation software [like Bauhaus Software's Mirage] on a batch to 'pixel-track' the pages and align them based upon these marks, then export no-/low-loss TGAs, TIFFs, PNGs, or similar for OCRing.