Google Docs' OCR Quality Tested

← Back to Stories (view on slashdot.org)

Google Docs' OCR Quality Tested

Posted by timothy on Thursday April 28, 2011 @10:12AM from the weighed-in-the-balance-and-found-wanting dept.

orenh writes "Google has released a Google Docs application for Android, which includes the ability to create documents by OCR-ing photos. I tested the application's OCR quality and found that it's mediocre under the best conditions and poor under real-world conditions. However, I believe that this poor performance is caused in part by an intentional decision by Google."

5 of 99 comments (clear)

Min score:

Reason:

Sort:

/b/ by stonewallred · 2011-04-28 10:13 · Score: 1, Interesting

Since the standard practice on 4chan is to use the word niggers for any word in a recaptcha that has a punctuation mark, I question just how good the OCR is.
Better to scan to PDF by icebike · 2011-04-28 10:34 · Score: 3, Interesting

There are a number of scanner apps in the market that do a much better job in the first step of this process, which is taking the picture. They then concentrate their efforts on producing a clean usable PDF of the document. I tested one of these and found that the PDF rendered by it was much better than the PDF produced by Google.
Everything is crisp and readable.
If the first fails, its no wonder the second OCR step fails.

--
Sig Battery depleted. Reverting to safe mode.
CAPTCHA Breakers by MoonBuggy · 2011-04-28 10:55 · Score: 3, Interesting

If the increasing absurdity of the CAPTCHAs I tend to see is anything to go by, there are programs out there that'll read normal printed text from even the crappiest photo without missing a beat. The question is, are the spammers using standard commercial solutions, or have they got some useful tech of their own that we might be able to get our hands on (seize it as part of a settlement and make it public domain, for instance).
More like Masters/PhD Thesis than Summer of Code by perpenso · 2011-04-28 11:39 · Score: 3, Interesting

Does that mean it couldn't be a viable candidate for some Summer of Code work then?
More like a bunch of masters/phd thesis to get started.

OCR is an area of AI research under the topic of Computer Vision. It is yet another area that seems simple in concept but turns out to be incredibly difficult in practice.
Re:99% success rate is crappy ... by martin-boundary · 2011-04-28 13:08 · Score: 3, Interesting

Heh, it's always fun to reinterpret requirements to make them easier to implement :)
A 99% success rate could also mean 99 pages with zero errors out of a 100 pages attempted. With 250 words per page that would represent a mandated success rate of 99.995%