Slashdot Mirror


Automated OCR for Forms Processing?

Oscar Carrillo asks: "We have to do a large NIH grant which collects tons of data. And much of that data is in the form of questionnaires. The forms will be available on the web, but it's mostly not feasible to have the subjects sit in front of the computer all day (not to mention that people get annoyed sitting in front of a computer all day). The study is being conducted at several universities and institutions around the country. Using Linux/JSP/Struts/PostgreSQL will take care of most of our needs. But it would save a lot of data entry, if all forms could be scanned at each site, images uploaded to the website, and then automatically put through OCR (Optical Character Recognition) to get only the relevant raw data that subjects wrote. Does anyone know of something that can handle this? Are there any open source projects that can handle this? Any good commercial alternatives?"

2 of 30 comments (clear)

  1. another lowly subject by tps12 · · Score: 4, Funny

    it's mostly not feasible to have the subjects sit in front of the computer all day

    Then I guess somebody forgot to tell my boss.

    --

    Karma: Good (despite my invention of the Karma: sig)
  2. OC R forms processing is problematic at best by Pauly · · Score: 4, Insightful
    Having worked at one of the world's largest OCR/Forms processing vendors, take it from me: don't do this.

    OCR forms processing does:

    • waste money and time
    • create unnecessary pain
    • require high-quality and expensive printed forms
    • require high-quality and expensive scanning equipment
    • introduce more human error

    OCR forms processing does NOT:

    • "save a lot of data entry"
    • do anything automatically (unless your forms are all checkboxes)
    • save money or time
    That said, if you have a lot of questions to be answered, a well designed form using as little handwritten responses as possible (all checkboxes are best), may be viable.

    Frankly, most of the large projects I worked on could have gotten the task done easier and cheaper writing an app to run on low-end Palms given to each interviewee. Seriously.

    If you would like more concrete advice or contacts with people in the industry, email me.