Slashdot Mirror


Automated OCR for Forms Processing?

Oscar Carrillo asks: "We have to do a large NIH grant which collects tons of data. And much of that data is in the form of questionnaires. The forms will be available on the web, but it's mostly not feasible to have the subjects sit in front of the computer all day (not to mention that people get annoyed sitting in front of a computer all day). The study is being conducted at several universities and institutions around the country. Using Linux/JSP/Struts/PostgreSQL will take care of most of our needs. But it would save a lot of data entry, if all forms could be scanned at each site, images uploaded to the website, and then automatically put through OCR (Optical Character Recognition) to get only the relevant raw data that subjects wrote. Does anyone know of something that can handle this? Are there any open source projects that can handle this? Any good commercial alternatives?"

2 of 30 comments (clear)

  1. Bubbles by adamy · · Score: 3, Insightful

    Remeber those old annoying CTBS test a nad SAT stuff? If these surveys are multiple choice, use the old #2 lead pencil and scan em in that way. You data will already be entered in. Most universities have the facilities for this already.

    Do not count on handwriting recognition to be successful for the people who fill out the surveys. While it works fine for typeset and computer gnereated print, it won't work for many different handwritings and many different idiomatic expressions.

    --
    Open Source Identity Management: FreeIPA.org
  2. OC R forms processing is problematic at best by Pauly · · Score: 4, Insightful
    Having worked at one of the world's largest OCR/Forms processing vendors, take it from me: don't do this.

    OCR forms processing does:

    • waste money and time
    • create unnecessary pain
    • require high-quality and expensive printed forms
    • require high-quality and expensive scanning equipment
    • introduce more human error

    OCR forms processing does NOT:

    • "save a lot of data entry"
    • do anything automatically (unless your forms are all checkboxes)
    • save money or time
    That said, if you have a lot of questions to be answered, a well designed form using as little handwritten responses as possible (all checkboxes are best), may be viable.

    Frankly, most of the large projects I worked on could have gotten the task done easier and cheaper writing an app to run on low-end Palms given to each interviewee. Seriously.

    If you would like more concrete advice or contacts with people in the industry, email me.