Slashdot Mirror


Digital Cameras vs Scanners for OCR?

ttennebkram asks: "With 6 and 8 Megapixel cameras on the market, some now with Wifi built in, it might be more convenient to shoot pictures of your bills and papers with a camera than fussing with the scanner. By the numbers, it would seem feasible. 300dpi for an 8.5"x11" sheet of paper works out to about 8 megapixels; 300 dpi is usually what OCR vendors suggest. I imagine for high volume good results you'd want to maybe mount the camera on a tripod arm over your desk. Heck, I was thinking of a glass desk and maybe one camera below and one above, and maybe a foot pedal to trigger the cameras (and I suppose a flash and high F-stop would help as well). If I could quickly 'snap' all the junk paper I have and electronically file it, maybe OCR the images at night in batch while I'm asleep, and then maybe get rid of all that paper once and for all. Using a traditional cheap scanner just takes too long. So has anybody tried this? I realize that camera optics are different than scanner optics, so maybe it's not just a question of raw pixel counts. Any thoughts?"

5 of 95 comments (clear)

  1. Aspect Ratio and Even Lighting by mythosaz · · Score: 5, Insightful

    ...the aspect ratio and even lighting are your enemies. It's almost impossible to shoot a bill or a check stub dead on, at close rage, without fish-eye'ing, and without getting in your own shadow. Sure, you might have a little white linnen box that you use to take your eBay photos, but, seriously, this is a job for a scanner.

    1. Re:Aspect Ratio and Even Lighting by TheWanderingHermit · · Score: 4, Insightful

      Considering that, and I'm speaking not just as a former student, but as a former teacher, there is a delicate balance in all professors between ego and laziness, most of what is taught in college is in the text books. As for handouts, I found it pretty easy to file them as well. As for notes -- you mean someone who is scribbling notes in a hurry actually takes them in good enough handwriting that OCR would be able to read them without a lot of prompting? I should have mentioned that a lot of similar material like that is included in my 4 drawers. You have to think to file them in folders, and the same thought is needed to figure out which directory to put them in, but a lot more is needed to photograph papers so they are legible. If it's that important, a sheet-feed scanner would be more practical, but there's the difference between theory and practice: it's not as easy to batch convert as it sounds.

      I've also found that there is a lot more of value to learn from practical experience than from pedants.

      Unless one is a geek gone wild.

  2. scanners are FOR documents by Dun+Malg · · Score: 4, Insightful
    Digital Cameras vs Scanners for OCR?
    What, are you kidding? You can use a joystick in place of a mouse, but why? Cameras are for capturing a 2D image of a 3D scene. Like you noted, the optics are designed specifically for it. Scanners are for capturing a digital version of a 2D paper image. Musing over whether today's new, heavier wrenches might be stout enough to drive nails is silly, as what you really need is a hammer.

    Get a scanner
    --
    If a job's not worth doing, it's not worth doing right.
  3. Re:Sheetfeeder by Dadoo · · Score: 3, Insightful

    What you want is a scanner with a sheet feeder and a GOOD one at that.

    Absolutely.

    I tried this, myself, a few years ago. I guarantee that, using a camera, you'll get through, maybe, 100 pages. I got a decent scanner (HP something or other) with a sheet feeder. It does about 12ppm and that turned out to be too slow. I got tired of it in a day or two.

    I tried a bunch of different solutions, but I finally had to take it all to work. We had a Fujitsu M4097D and an enormous Ricoh Copier/Scanner/Fax machine. Both did 60ppm, both sides (120 images a minute). I actually made some headway with that setup, but I still didn't finish.

    As far as OCR is concerned, don't bother. Even today, it's nowhere near accurate enough. In my experience, the best software out there get an average of one error per page on a really good scan. Trust me: it will take a lot more of your time than you think to fix that. Assuming you're doing mostly black and white text, G4 compression will compress a 300dpi, 8.5x11 image down to about 100k. At that rate, you can store close to 7000 pages on one CD.

    --
    Sit, Ubuntu, sit. Good dog.
  4. To Clarify... by Aladrin · · Score: 4, Insightful

    So to clarify... You want to trade the hassle of:

    1) lift a lid
    2) stick a paper in a well-defined corner
    3) press a button

      for the hassle of:

    1) align a camera on a tripod, including angle as well as position
    2) align a paper with no guide
    3) adjust the lighting so that you get an even tone
    4) make sure you didn't accidentally move the camera, the tripod, or bump the desk
    5) step on a foot pedal that you jury-rigged to make take a picture
    OR
    5) Push a button on a camera that you can't afford to move even a hair.
    6) Use image software to continue adjusting the photo so that the OCR will read it properly
    7) Hope you did everything right the first time.

    I think I'd pick door number 1.

    --
    "If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM