Slashdot Mirror


Accurate OCR?

theBrownfury asks: "I work at a lab on a university campus that provides services for disabled students. One of the main functions of this lab is to convert printed materials such as books, reading packets, etc. into electronic text(RTF or Word) that is either going to be fed to a text-to-speech synthesizer or going to be further processed for use in braille devices. Ideally we'd like to be able to process 1000 pages a week. However our current solution (a Bell&Howell 4040D scanner coupled to a mid-level PC workstation with OmniPage Pro 11 and 2-3 proofing stations) is limited to an average of 10-11 (16 on a good day) pages per hour because of the constant hand holding the OCR process requires. We've already made sure we're feeding the OCR engine good quality scans. Also it should be clarified that the variety of materials we deal with is so varied that a majority of it cannot be defined by any types of 'general' scanning or OCR templates."

"Do any of you know of a solution which can exploit our current scanner, which we're rather happy with, but bring in a better OCR method to improve our efficiency? It should be noted that the solution should be financially reasonable (as ni less than US$10K).

Our biggest bottlenecks:
- software's terrific inability to accurately pick up the areas of text on the scanned page to OCR
- marking words as possibly erroneous without checking against dictionary elongating the proofing process
- stability of OCR software

Bonuses:
- dealing with multiple languages such as Spanish and French
- capability to OCR matematical texts and papers. Currently we hand type math textbooks for students."

1 of 59 comments (clear)

  1. I have the solution ... wait ... by twoflower · · Score: 1, Troll

    Perfect or near-perfect OCR is one of the holy grails of information technology. Various companies are therefore constantly coming up with the "next big thing" and applying the latest buzzwords to the problem. I can remember when perfect OCR was just around the corner due to "fuzzy logic", then it was just around the corner due to "neural nets", then it was coming soon because of "heuristic analysis", then ... ad infinitum, ad nauseum.

    I don't think we'll ever have near-perfect OCR. 90% is as good as it gets.

    --


    --
    Twoflower