Google Pushes Open Source OCR
SocialWorm writes "Google has just announced work on OCRopus, which it says it hopes will 'advance the state of the art in optical character recognition and related technologies.' OCRopus will be available under the Apache 2.0 License. Obviously, there may be search and image search implications from OCRopus. 'The goal of the project is to advance the state of the art in optical character recognition and related technologies, and to deliver a high quality OCR system suitable for document conversions, electronic libraries, vision impaired users, historical document analysis, and general desktop use. In addition, we are structuring the system in such a way that it will be easy to reuse by other researchers in the field.'"
Good one. Yeah, GOCR is crap.
As someone who was consistently getting high 90s% recognition on OmniPage with preservation of basic layout and images for work in 1996, linux is a non-starter and pathetically WAY, WAY behind in this area. It isn't even a GIMP vs. Photoshop ("Yeah, well GIMP is just different and 'special'!") argument. I'll look at a couple of the other suggestions here but I had basically just given up and said this is a linux blind spot.
So if Google _also_ wants to use it to torture kittens, or whatever, I"d have to say, "Well, let's weigh the pros and cons before we make a hasty judgement."