Slashdot Mirror


Optical Character Recognition Still Struggling With Handwriting

Ian Lamont recently asked Google if they planned to extend their transcription of books and other printed media to include public records, many of which were handwritten before word processors became ubiquitous. Google wouldn't talk about any potential plans, but Lamont found out a bit more about the limits of optical character recognition in the process: "Even though some CAPTCHA schemes have been cracked in the past year, a far more difficult challenge lies in using software to recognize handwritten text. Optical character recognition has been used for years to convert printed documents into text data, but the enormous variation in handwriting styles has thwarted large-scale OCR imports of handwritten public documents and historical records. Ancestry.com took a surprising approach to digitizing and converting all publicly released US census records from 1790 to 1930: It contracted the job to Chinese firms whose staff manually transcribed the names and other information. The Chinese staff are specially trained to read the cursive and other handwriting styles from digitized paper records and microfilm. The task is ongoing with other handwritten records, at a cost of approximately $10 million per year, the company's CEO says."

7 of 150 comments (clear)

  1. Beat up Martin = Eat up Martha by Joe+The+Dragon · · Score: 3, Funny

    Beat up Martin = Eat up Martha

  2. Translation server error by BadAnalogyGuy · · Score: 2, Funny
  3. Use them as CAPTCHA... by mevets · · Score: 2, Funny

    1. Use the handwritten words as CAPTCHAs
    2. Wait for the bad guys to come up with programs to break them.
    3. ...
    4. Profit!

  4. Re:Better approach? by mrsteveman1 · · Score: 3, Funny

    Chinese proof-reading? Only if you want your documents in Engrish.

  5. Re:Optical Character Recognition is the Correct Te by Alwin+Henseler · · Score: 5, Funny

    For a moment there, I was picturing some new technology that could distinguish between C, PERL and and Java written on scratch paper.

    In pseudocode:

    IF LooksLikeC THEN "This must be C code"
    IF LooksLikeJava THEN "This must be Java code"
    // undecipherable
    ELSE "Must be Perl code"

  6. Re:Fortunately, there is an alternative by Anonymous Coward · · Score: 1, Funny

    Yeah, and given my level of ability deciphering Indian people speaking English, I'm sure that software will have no problem whatsoever.

  7. Re:Fortunately, there is an alternative by Anonymous Coward · · Score: 5, Funny

    i've outsourced all of my computer applications and software needs to India.

    instead of using PowerPoint at meetings, i just have two Indian women in bikinis hold up large displays with my bullet points written on them--they even do slide transitions.

    instead of an e-mail client, i use an Indian courier. it takes a while for me to communicate with international clients, but i receive practically no spam.

    and rather than a word processor i have a guy with a notepad that a dictate to. he also offers me helpful tips when he notices that i'm trying to write a letter.

    then there's the 17-year-old i have doing my taxes. i don't even think he's out of high school yet, but he beats Turbo Tax any day.

    but you should really see the guy i have simulating Windows Vista for me. he wears this really slick suit, moves really slow, and everyone once in a while he comes up to me and kicks me in the balls.