Digital Cameras vs Scanners for OCR?
ttennebkram asks: "With 6 and 8 Megapixel cameras on the market, some now with Wifi built in, it might be more convenient to shoot pictures of your bills and papers with a camera than fussing with the scanner. By the numbers, it would seem feasible. 300dpi for an 8.5"x11" sheet of paper works out to about 8 megapixels; 300 dpi is usually what OCR vendors suggest. I imagine for high volume good results you'd want to maybe mount the camera on a tripod arm over your desk. Heck, I was thinking of a glass desk and maybe one camera below and one above, and maybe a foot pedal to trigger the cameras (and I suppose a flash and high F-stop would help as well). If I could quickly 'snap' all the junk paper I have and electronically file it, maybe OCR the images at night in batch while I'm asleep, and then maybe get rid of all that paper once and for all. Using a traditional cheap scanner just takes too long. So has anybody tried this? I realize that camera optics are different than scanner optics, so maybe it's not just a question of raw pixel counts. Any thoughts?"
...the aspect ratio and even lighting are your enemies. It's almost impossible to shoot a bill or a check stub dead on, at close rage, without fish-eye'ing, and without getting in your own shadow. Sure, you might have a little white linnen box that you use to take your eBay photos, but, seriously, this is a job for a scanner.
Get a scanner
If a job's not worth doing, it's not worth doing right.
What you want is a scanner with a sheet feeder and a GOOD one at that.
Absolutely.
I tried this, myself, a few years ago. I guarantee that, using a camera, you'll get through, maybe, 100 pages. I got a decent scanner (HP something or other) with a sheet feeder. It does about 12ppm and that turned out to be too slow. I got tired of it in a day or two.
I tried a bunch of different solutions, but I finally had to take it all to work. We had a Fujitsu M4097D and an enormous Ricoh Copier/Scanner/Fax machine. Both did 60ppm, both sides (120 images a minute). I actually made some headway with that setup, but I still didn't finish.
As far as OCR is concerned, don't bother. Even today, it's nowhere near accurate enough. In my experience, the best software out there get an average of one error per page on a really good scan. Trust me: it will take a lot more of your time than you think to fix that. Assuming you're doing mostly black and white text, G4 compression will compress a 300dpi, 8.5x11 image down to about 100k. At that rate, you can store close to 7000 pages on one CD.
Sit, Ubuntu, sit. Good dog.
So to clarify... You want to trade the hassle of:
1) lift a lid
2) stick a paper in a well-defined corner
3) press a button
for the hassle of:
1) align a camera on a tripod, including angle as well as position
2) align a paper with no guide
3) adjust the lighting so that you get an even tone
4) make sure you didn't accidentally move the camera, the tripod, or bump the desk
5) step on a foot pedal that you jury-rigged to make take a picture
OR
5) Push a button on a camera that you can't afford to move even a hair.
6) Use image software to continue adjusting the photo so that the OCR will read it properly
7) Hope you did everything right the first time.
I think I'd pick door number 1.
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM