Slashdot Mirror


Automated OCR for Forms Processing?

Oscar Carrillo asks: "We have to do a large NIH grant which collects tons of data. And much of that data is in the form of questionnaires. The forms will be available on the web, but it's mostly not feasible to have the subjects sit in front of the computer all day (not to mention that people get annoyed sitting in front of a computer all day). The study is being conducted at several universities and institutions around the country. Using Linux/JSP/Struts/PostgreSQL will take care of most of our needs. But it would save a lot of data entry, if all forms could be scanned at each site, images uploaded to the website, and then automatically put through OCR (Optical Character Recognition) to get only the relevant raw data that subjects wrote. Does anyone know of something that can handle this? Are there any open source projects that can handle this? Any good commercial alternatives?"

2 of 30 comments (clear)

  1. NIH has an OCR website by Hee+Hee+Hee · · Score: 3, Interesting

    Check out NLM's DocMorph at docmorph.nlm.nih.gov/docmorph/default.htm. It's a site put up by the NIH (coincidentally) where you post your scanned image and they post an OCR'ed document, in the format you choose, for a short period. It does a fairly good job for the price (free).

    --
    - Bill
  2. Solutions by Wrexen · · Score: 3, Interesting

    Does anyone know of something that can handle this?

    High school students? Technology isn't the answer to everything, and if these are handwritten you're not going to have very much success trying to automate the recognition. My name is Fod Na1oyyy, etc