Where Did Affordable OCR Go?

← Back to Stories (view on slashdot.org)

Posted by Cliff on Thursday August 12, 2004 @06:50AM from the from-print-to-data dept.

Goeland86 asks: "Has OCR (Optical Character Recognition) died down? Where have all the magical programs that translate your handwriting to office compatible files gone? Most of the windows programs nowadays are either expensive (ReadIris Pro 9 about $400) and not that many OSS projects for OCR have released a recent update (Kognition was last updated on July 17th 2003 according to Freshmeat). Has everyone already scanned/translated all of their paper files? Has OCR outlived its use, or is it just a fancy technology that hit a dead end in terms of the market? Have Slashdot readers used it? If so, are you still using it? If not, why?"

6 of 79 comments (clear)

Min score:

Reason:

Sort:

ocr and pdf by i621148 · 2004-08-12 06:52 · Score: 3, Interesting

i think that pdf's and the availability of the free adobe viewer have pretty much obsoleted ocr.
ocr has to be babysat also. it is not 100% reliable like scanning to pdf is...
Personally... by WildFire42 · 2004-08-12 07:00 · Score: 2, Interesting

Personally, I believed that the amount of return for any further research put into OCR technology wasn't really worth it at this point. OCR is actually pretty darn reliable for printed characters, even if it sucks wind for handwriting. Mostly, people are interested in OCR'ing printed characters, and handwriting recognition is just one of these nifty, shiny technologies that wouldn't be used that often.

At this point, OCR is a commodity. It's not really worth the hundreds of thousands or millions of dollars for research to get an extra 2% accuracy, so the technology is stagnant and the prices for standard, printed character OCR are dirt cheap.

With that being said, I see voice dictation as the next big thing. Voice recognition is where OCR was 10 years ago, still new, not many players in the market, and a lot of room for technological improvement. The accuracy isn't that great, even with extensive "training", and more and more, because of the need for archiving, data warehousing, captioning for accessibility (Section 508, W3C WAI and the like), captioning without training is going to become a shining goal within the next 10 years.
Where did useful OCR go? by Karma+Farmer · 2004-08-12 07:01 · Score: 2, Interesting

I want OCR that works, and I want a flying car.

I'm assuming people got sick of paying $39.95 for OCR software that didn't do jack squat, and was about as reliable as handing your documents to a spastic monkey. I'm also assuming software makers got sick of making $3 or $4 (or less) on each package, only to get a million tech support calls along the lines of "It doesn't work. I want my money back."

For $400, I'm guessing the software vendors can afford a small amount of support, and can expect the users to be willing to understand the limits of the software.
Free OCR? by Asprin · 2004-08-12 07:12 · Score: 3, Interesting

So far as I can tell, NON-free OCR isn't doing so hot either -- you pretty much have to proof-read and correct everything you scan anyway, which just makes it impractical for most purposes. If I had to scan a bunch of records, I'd probably outsource it to a pay service that specializes in that sort of thing, which means it would have to be worth the cost of getting it done.

What I want to know is what's Google going to do about this? They have a catalog search in their Google Labs playpen that indexes products and their descriptions to make them searchable. ...and by searchable, I mean you can search for "bicycle" and it will highlight all of the instances of that word in some 200+ PRINTED catalogs, not similar HTML/XML/PDF electronic documents. So clearly, they know some things about OCR we don't (and probably 2D map indexing, too), but durned if they aren't letting on about it.

In the next few years, I expect to see a fully automated Google OCR product that can not only scan your paper docs, but index them and help you search them too, all while maintaining the electronic copies in their original scanned (think photograph) state, not the some bastardized, mistranslated and screwed up PDF or DOC format.

**THAT'S** what's going to kill Microsoft, and probably why they're so keen to risk overreaching on their IPO.

--
"Lawyers are for sucks."
- Doug McKenzie
Re:I too recently noticed... by tchuladdiass · 2004-08-12 07:21 · Score: 2, Interesting

Too add to this, no OCR packages is 100% accurate. Most will be 95 - 99%, which still means you have to have someone proofread / correct each page. Which is just as expensive as having the text entered manually.
Side note: I remember a number of years ago, trying out OCR, and it turned out that I could type the page in sligtly faster than it could be scanned and recognized.
Re:Why bother?? by photon317 · 2004-08-12 08:51 · Score: 2, Interesting

The problem is that jpegs can't be grepped like text. People don't just want to scan a stack of images, they want the data to have meaning. In some cases they even want to parse typed hospital forms into an xml format for example.

--
11*43+456^2