Slashdot Mirror


Breakthrough In Automatic Handwritten Character Recognition Sans Deep Learning (technologyreview.com)

subh_arya writes: Researchers from NYU, UToronto and MIT have come up with a technique that captures human learning abilities for a large class of simple visual concepts to recognize handwritten characters from World's Alphabet. Their computational model (abstract) represents concepts as simple programs that best explain observed examples under a Bayesian criterion. Unlike recent deep learning approaches that require thousands of examples to train an efficient model, their model can achieve human-level performance with only one example. Additionally, the authors present several "visual Turing tests" probing the model's creative generalization abilities, which in many cases are indistinguishable from human behavior.

2 of 66 comments (clear)

  1. Improvements to OCR? by pipedwho · · Score: 4, Interesting

    I hope this heralds in some significant improvements to basic OCR. It amazes me that OCR against a printed document still doesn't always yield 100% success. Even worse are OCRs on printed music manuscripts. The recognition and transcription quality is atrocious.

    And yet, these guys can recognise handwriting with incredible accuracy.

    I keenly await when these algorithms can be expanded to general OCR / document recognition. Even if there need to be specific models for each type of document.

    1. Re:Improvements to OCR? by Richard+Kirk · · Score: 3, Interesting

      Suppose you had a bit of your handwriting that you could not read. How do you figure out what you wrote. One thing that I do, and you may do too, is to try and imagine writing the thing, and work out the rhythm of what you are writing. If you can get some sense of how your hand is writing, you may see that what was a 'u', or maybe an 'n' or half of am 'm' makes sense because of the way it joins up to other stuff. We seem to have some sort of kinematic two-and-a-half axis model for writing. We use different muscles if we are writing with a pen (fingers and wrist), a blackboard (wrist and upper arm), a spray-can (upper and lower arm), or a tiny engraving tool (just fingers) and yet our handwriting remains much the same. So some computer that can try and fit the same kinematic model should make better guesses for a word it has not met before than anything that just trained on the shape.

      This does not directly transfer to OCR. If you have a page of fixed-width text, then every letter has its own little rectangle, and you can either recognize that using the traditional OCR model, or you can't. However, there is something we can do along the same lines. Suppose you have a document that you guess was rendered from PostScript. If you have a guess for a particular word, and the font it was rendered in; you could render that part of text. You can then degrade that rendered image to mimic the properties of the printing and scanning, and check the fit. The best solution will probably be the one that achieves the best fit with the shortest, and hence most probable bit of PostScript. When you have more text, you can pick up hints from the spacing, the justification, and other larger page layout structures.

      I actually worked on OCR, and tried both of these once. It might have worked with a large software team, but I hadn't got one.