Slashdot Mirror


Digital Cameras vs Scanners for OCR?

ttennebkram asks: "With 6 and 8 Megapixel cameras on the market, some now with Wifi built in, it might be more convenient to shoot pictures of your bills and papers with a camera than fussing with the scanner. By the numbers, it would seem feasible. 300dpi for an 8.5"x11" sheet of paper works out to about 8 megapixels; 300 dpi is usually what OCR vendors suggest. I imagine for high volume good results you'd want to maybe mount the camera on a tripod arm over your desk. Heck, I was thinking of a glass desk and maybe one camera below and one above, and maybe a foot pedal to trigger the cameras (and I suppose a flash and high F-stop would help as well). If I could quickly 'snap' all the junk paper I have and electronically file it, maybe OCR the images at night in batch while I'm asleep, and then maybe get rid of all that paper once and for all. Using a traditional cheap scanner just takes too long. So has anybody tried this? I realize that camera optics are different than scanner optics, so maybe it's not just a question of raw pixel counts. Any thoughts?"

95 comments

  1. Aspect Ratio and Even Lighting by mythosaz · · Score: 5, Insightful

    ...the aspect ratio and even lighting are your enemies. It's almost impossible to shoot a bill or a check stub dead on, at close rage, without fish-eye'ing, and without getting in your own shadow. Sure, you might have a little white linnen box that you use to take your eBay photos, but, seriously, this is a job for a scanner.

    1. Re:Aspect Ratio and Even Lighting by TheWanderingHermit · · Score: 5, Informative

      That's about it. I used to transfer photos to video professionally. We had a nice rig with lighting and a video camera mounted on a stand. We had to do a lot of adjusting of focus because of different types of phots and other issues. More often that not it was not just put down and click, then move on to the next one. If you're dealing with letters, and you're not scanning, you'll have problems with some fonts and other oddities that make sure many shots won't turn out as perfect as you'd think.

      I have my own business. I keep all my bills, receipts of deductable expenses, home records, and so on. I keep personal records 7 years except in special cases. I just take the bill, when I get it (and most bills are e-mail now!) and put it in the envolope for the biller for that year. At the end of the year I spend less than 30 minutes writing up labels for the next year and when I get time, I burn the stuff that is past 7 years old. For "all those blls" I've never needed more than 4 filing drawers, which can be stacked as one cabinet that doesn't take up much space, or I use the two cabinets (2 drawers each) as legs for one of my desks.

      I thought about keeping things electronically, but then I realize I'd have to take time to scan them and file them and that would take a lot more time, over all, than just dropping them in folders. If you want, you can spend all that time scanning. I prefer not to, but then again, I have a life and would rather be cycling or rock climbing than scanning bills.

      This, to me, sounds like a geek gone wild, over thinking the solution and trying to come up with a hi-tech answer to a low-tech problem that really doesn't need an answer if one uses a little common sense and simple organization.

    2. Re:Aspect Ratio and Even Lighting by Null+Nihils · · Score: 2, Interesting

      It's almost impossible to shoot a bill or a check stub dead on, at close rage, without fish-eye'ing, and without getting in your own shadow.

      Thats assuming you need a pristine, perfect photo of the item to be OCR'ed. I suspect this is not the case: chances are that as long as you are trying to digitize printed (not handwritten) documents, the OCR won't mind a little fisheye distortion and offish lighting (as long as you make sure there is enough contrast and no dark shadows.) It really depends on the flexibility of your OCR software; it might not work well with the imperfections resulting from using a digital camera.

      Best way would be to test out a few handfuls of documents, tweak your setup so you get the best resulting digital images, and then see how your OCR software handles things. I have used a digital camera to digitize things on paper when a scanner wasn't available, and sometimes got decent results for what I needed. Usually using the camera's flash is a mistake; instead, try setting up a flourescent tube so you get even ambient lighting. YMMV.

    3. Re:Aspect Ratio and Even Lighting by mrchaotica · · Score: 1, Interesting

      Now imagine that those receipts are notes and handouts from your college class, and that you'll want to search them later. Does it still sound like a "geek gone wild?"

      --

      "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

    4. Re:Aspect Ratio and Even Lighting by TheWanderingHermit · · Score: 4, Insightful

      Considering that, and I'm speaking not just as a former student, but as a former teacher, there is a delicate balance in all professors between ego and laziness, most of what is taught in college is in the text books. As for handouts, I found it pretty easy to file them as well. As for notes -- you mean someone who is scribbling notes in a hurry actually takes them in good enough handwriting that OCR would be able to read them without a lot of prompting? I should have mentioned that a lot of similar material like that is included in my 4 drawers. You have to think to file them in folders, and the same thought is needed to figure out which directory to put them in, but a lot more is needed to photograph papers so they are legible. If it's that important, a sheet-feed scanner would be more practical, but there's the difference between theory and practice: it's not as easy to batch convert as it sounds.

      I've also found that there is a lot more of value to learn from practical experience than from pedants.

      Unless one is a geek gone wild.

    5. Re:Aspect Ratio and Even Lighting by TA · · Score: 1

      My cheap Ricoh Caplio R4 can correct the aspect ratio.

      I use it to take pictures of whiteboards, documents,
      anything rectangular, from the (for me) most convenient
      angle. The camera detects the rectangle and warps it
      so that it looks like a perfectly angled top (or front)
      shot. This works very well for a lot of stuff. Never
      tried it for OCR purposes, but from looking at some of
      the document photos I have there doesn't appear to be
      any obvious problems.

    6. Re:Aspect Ratio and Even Lighting by tdemark · · Score: 5, Interesting

      I thought about keeping things electronically, but then I realize I'd have to take time to scan them and file them and that would take a lot more time, over all, than just dropping them in folders.

      That's what I thought until I actually tried it.

      I have an Fujitsu ScanSnap document scanner which I use on all my documents. It scans both sides of a page at the same time, can hold 15 pages (I think) in its feeder tray, and takes 5 or 6 seconds to scan a page. Since it scans both sides of a page at the same time, this actually ends up being 5 or 6 seconds per two pages.

      It is small enough to sit on my desk and its "on" switch is the loading tray flap - flap closed is "off".

      When I want to scan something, I open the flap, load the tray with the document, and hit the "scan" button.

      It quickly scans all the pages and sends the scan to a program called Readiris Pro (v11) - this program will OCR the document and save it into my digital cabinet as a PDF "Image + Text". This is a really cool format because there are actually two "layers" to each page - the actual scan of the page (so it looks right) and then a text layer below that has all the OCR information. What this means is that, although you are looking at a raster image, you can search the PDF for specific information and copy and paste text right out of the document.

      Let me clarify that with an example:

      Let's say you have a PDF of a utility bill. The PDF you are looking at is a scan of the bill itself - not a text-based representation. However, you can grab the "text" cursor and copy your account number right from the image! Obviously, you are not copying from the image, but from the text layer that has all the OCR'd text positioned correctly on the page, but hidden from view.

      Since all the text has been OCR'd, the PDFs are now searchable. Since my digital cabinet is just a collection of folders based on category (Utility, Financial, etc), I use another program (DEVONthink Personal) to index it. Let's say I am talking with my insurance company and they have a question about a claim. I can type in the claim number into DEVONthink and, boom, all the documents which reference that claim will be displayed. Simply clicking on an entry in the result list will bring up the document itself and highlight where the claim number appears on the page. BTW, if a provider allows PDF downloads of actual bills, I can drop them directly into the digital cabinet and they will be indexed along with my other documents.

      Yes - this cost a little much to set up ($300 for the scanner (on sale), $90 total for DEVONthink and Readiris Pro), but I was able to sell the full copy of Adobe Acrobat that came with the scanner on eBay for $175, so the actual cost was closer to $225.

      It's probably not for everybody, but I am certainly happy with the process.

      - Tony

    7. Re:Aspect Ratio and Even Lighting by CastrTroy · · Score: 1

      Fish-eye would probably be the biggest problem, although i'm not sure how much it would affect the ability of the OCR program. In university we did a robotics project that used a webcam to take a picture of a table with a block, so the robot could pick up the block. We had to do quite a bit of calculation in order to account for the fish eye of the lense. However, the biggest problem is the setup that is needed. The author of the post is talking about an expensive 8 MP camera, on a tripod. You'd have to either leave it in place, or adjust and calibrate it every time you wanted to "scan" a document. With theinexpensive scanners that are available today that can do much better than 300 dpi (try 1200x2400), it's probably better to just stick with a scanner. I realize that it's sometimes tedious to wait for the scanner to finish scanning, and that the software doesn't really help in automating the process, but I don't think the camera setup would solve all those problems. Plus, the entire set up would take up a considerable amount of space.

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    8. Re:Aspect Ratio and Even Lighting by denisbergeron · · Score: 1
      ...the aspect ratio and even lighting are your enemies. It's almost impossible to shoot a bill or a check stub dead on, at close rage, without fish-eye'ing, and without getting in your own shadow. Sure, you might have a little white linnen box that you use to take your eBay photos, but, seriously, this is a job for a scanner.

      When you know nothing about a subject, the first thing to do is closing your mouth and learn.
      With any decent lens it's very easy to show anything dead on at close range (not close rage) without fish-eye'ing anything. Only cheap lens and cheap digicam will fish-eye'ing the subject at close range. For the close rage, I don't know.

      And without getting you own shadow How stupid it could be, you never know something called a flash.

      Follow this link : http://www.photo.net/photodb/photo?photo_id=100515 90

      Do you see my shadow, do you see some fish-eye'ing on the subject !

      --
      Ceci n'est pas une Signature !
    9. Re:Aspect Ratio and Even Lighting by thatnerdguy · · Score: 2, Interesting

      How about just typing your notes up each day? That way you will be re-reading them, allowing them another chance to sink in, you'll be able to spot if there is anything in them that you don't understand and your notes will be searchable.

      --
      I saw the Sign, and it opened up my eyes
    10. Re:Aspect Ratio and Even Lighting by drinkypoo · · Score: 1

      Not only that but static handwriting OCR is rarely accurate... and an error of just one letter can change the meaning entirely.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  2. I sort of tried this by Wizzerd911 · · Score: 1

    I haven't used cameras that would really be called "good" but it was really hard to get the text perfectly in focus and readable even with the camera on a stand. Also, lighting was kinda a pain but it sounds like you have that under control.

    --
    Is it just me or is it not going to upgrade to Vista in here?
    1. Re:I sort of tried this by flewp · · Score: 2, Interesting

      It sounds to me like he doesn't have the lighting under control at all.

      Using a direct flash isn't exactly the best option. The ink, even though black, may pick up noticable, and troublesome highlights. Depending on the range, it may even lead to uneven lighting on the paper itself. (Having part of the paper brighter than the rest)

      Ideally, perhaps you'd want to use softboxes or some other method for more diffuse lighting.

      Disclaimer: I'm not really familiar with OCR software though, so I don't know how well it can compensate/overcome such lighting issues as I described.

      --
      WWJD.... for a Klondike bar?
    2. Re:I sort of tried this by flewp · · Score: 2, Interesting

      Forgot to mention that I often sketch on paper, and then bring my sketches into the computer for digital painting, and when using a direct flash, I've often encountered the problem I've described. I currently don't have a scanner, so when I am in need of bringing a sketch into the computer, I'm using a Digital Rebel XT (350D, for those outside North America/the United States)

      --
      WWJD.... for a Klondike bar?
    3. Re:I sort of tried this by purduephotog · · Score: 1

      Thats why you put two flashes 45 degrees off axis. It's called a copy stand :)

  3. Sheetfeeder by cerberusss · · Score: 2, Informative

    What you want is a scanner with a sheet feeder and a GOOD one at that. They're not that expensive anymore, since there are lots of cheap machines which have a feeder anyway due to them having a fax function. This alone will go faster than manually swapping the papers and shooting with a camera.

    --
    8 of 13 people found this answer helpful. Did you?
    1. Re:Sheetfeeder by Dadoo · · Score: 3, Insightful

      What you want is a scanner with a sheet feeder and a GOOD one at that.

      Absolutely.

      I tried this, myself, a few years ago. I guarantee that, using a camera, you'll get through, maybe, 100 pages. I got a decent scanner (HP something or other) with a sheet feeder. It does about 12ppm and that turned out to be too slow. I got tired of it in a day or two.

      I tried a bunch of different solutions, but I finally had to take it all to work. We had a Fujitsu M4097D and an enormous Ricoh Copier/Scanner/Fax machine. Both did 60ppm, both sides (120 images a minute). I actually made some headway with that setup, but I still didn't finish.

      As far as OCR is concerned, don't bother. Even today, it's nowhere near accurate enough. In my experience, the best software out there get an average of one error per page on a really good scan. Trust me: it will take a lot more of your time than you think to fix that. Assuming you're doing mostly black and white text, G4 compression will compress a 300dpi, 8.5x11 image down to about 100k. At that rate, you can store close to 7000 pages on one CD.

      --
      Sit, Ubuntu, sit. Good dog.
    2. Re:Sheetfeeder by donak · · Score: 1

      I work for a government office that uses a similar scanner, every document is scanned at 200dpi, strictly Black & White (not greyscale or colour) at roughly 60% contrast, and each individual page is (about) a 70kb *.tif file. The documents are readable, and officially deemed of a sufficient archive quality.

      Go with a scanner + sheet feeder!

      --
      Don't blame me, it's usually 2 in the morning when I post ...
  4. I tried this once by KNicolson · · Score: 3, Interesting

    The problems I had were (a) getting the book flat, and (b) getting the lighting right. With flash, you end up with a ring of brightness and by OCR software got very confused, as the grey newsprint outside the flash's ring was being handled as black.

    If I were a whizz with Photoshop/GIMP/etc, I suppose I could have done some sort of correction to the picture, but...

    I've heard how Kinko's have book scanners that will copy and bind a book for you - perhaps they also have a scanning to CD/DVD service? Would that be cheaper for you?

    1. Re:I tried this once by MobileTatsu-NJG · · Score: 1

      "If I were a whizz with Photoshop/GIMP/etc, I suppose I could have done some sort of correction to the picture, but..."

      Getting rid of distortion is tricky, but making the image a proper B&W is easy.

      - Image/Adjust/Desaturate
      - Image/Adjust/Levels .. Just under the Histogram, grab the middle arrow and drag it left or right. Use this to even the contrast. .. Grab the left arrow and drag it right, that'll make the darks darker. .. Grab the right arrow and drag it left to make the lights lighter.

      Since text is black and just about every other shade of gray can be turned to white, this'll work 90% of the time for you. This particular step isn't hard and wouldn't take more than 30 seconds to do. The process in GIMP should be similar.

      This isn't a rebuttal, per-se. It doesn't fully address the problem. I just wanted to point out that that particular step isn't the difficult part. :)

      --

      "I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)

    2. Re:I tried this once by the+donner+party · · Score: 1

      Simple level adjustment only works well if you've got a very even lighting, otherwise the greys representing the background at the other end of the sheet end up too close to the greys representing the text at the other end of the sheet. I know, I've tried scanning by digital camera myself, and you need a lot of lighting to get it right. And because you need a glass pane to keep a book level, you also need to worry about the lighting angles to avoid reflections.

      An image processing tool that would adjust the black and white levels adaptively over the picture would be perfect, anyone know about one?

    3. Re:I tried this once by MobileTatsu-NJG · · Score: 1

      "Simple level adjustment only works well if you've got a very even lighting, otherwise the greys representing the background at the other end of the sheet end up too close to the greys representing the text at the other end of the sheet. I know, I've tried scanning by digital camera myself, and you need a lot of lighting to get it right."

      Any possibility of providing an example? I'd be curious to take a stab at it and see if I could offer a tip or two that'd help. The reason I think this would work is that you only need for the text to be black and the BG to be white. You could go very extreme with the Levels Adjustment and still get what you want, provided the dark areas of the page don't get very close to the black level of the text. Even with uneven lighting, you can use the Levels Tool (in this context, anyway) to remove the difference of the gradiated light from the text... in effect, filtering out all but the text. I've had good luck with this in the past, but I'd also have to admit I haven't tried this particular example before. You may need to get it close then use the Threshold filter to send the image back to 1-bit B&W. (or levels adjust it twice.. once to exagerrate the difference between the text and the gradiated light and one to finish it off down to B&W.)

      --

      "I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)

    4. Re:I tried this once by flewp · · Score: 1

      Another alternative method is the curves tool/adjustment.

      --
      WWJD.... for a Klondike bar?
    5. Re:I tried this once by MobileTatsu-NJG · · Score: 1

      Just for giggles, I found an image from the web and tried the technique I described. The only thing I did different was I used Threshold on top of it to make it 1-bit B&W. Have a peek.

      This example does illustrate the distortion problem, though. I don't know if an OCR could actually read this. (I'd be impressed!)

      --

      "I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)

    6. Re:I tried this once by Sique · · Score: 1

      I am currently remastering a cartoon I draw about 15 years ago. Now the paper starts to get yellow, and there are several dust specks on the original. I am using exactly the setup described: digital camera, no flash, GIMP.

      It took some experimentation at first, but now the process is quite easy. In GIMP just load the digital print, go to Tools->Color Tools->Contrast, increase the contrast, then to Tools->Color Tools->Treshold, and choose a black/white separation that is a good compromise between completeness of the letters and readability.

      Sometimes I have to split the area into several sub areas and work on each separately, because the light levels are too different. That's a nuissance though.

      --
      .sig: Sique *sigh*
    7. Re:I tried this once by mrchaotica · · Score: 1
      This example does illustrate the distortion problem, though. I don't know if an OCR could actually read this. (I'd be impressed!)

      I'm taking a computer vision class right now, and using some of the techniques I've learned it doesn't look too difficult to create software that would automatically warp the page back to flatness.

      I would think the most difficult part of OCRing this would be all that underlining/circling that somebody did (what the hell kind of idiot ruins a book like that?!).

      --

      "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

    8. Re:I tried this once by DavidHumus · · Score: 1

      I do this now for incidental scraps of paper - it's easier than jotting down a note if it's more than a couple of lines.

      Just last night my daughter asked me to take pictures of her class schedules. Now they're right there in an e-mail and she doesn't have to keep track of a couple of pieces of paper.

      I use a camera with good low-light capability (Fuji Finepix F10 - the new F30's even better) without a flash. I adjust the perspective in Paintshop Pro (to make it look like I took the picture straight on) - or don't - it's readable either way.

      It's much faster than a scanner. Also, OCR, as good as it's gotten, is no good for much of what I save this way: handwritten notes, signs and maps, pages of books with pictures or equations.

      The subsequent burden of retrieval falls on having a good directory structure for my needs (even though @#$%@ MS Windows doesn't give me linked files - which would allow me to file in more than one place cheaply) and a good, descriptive name for the file.

  5. HP ScanJet 4600 by YrWrstNtmr · · Score: 1

    The scanner is in the top, instead of the base. Flip the lid over, and you can scan without lifting the lid every time. Could probably scan sheets almost as fast as you could with a camera setup. And far cheaper.

    Some people report not being able to get good scans with it, but I've had no problems.

  6. woah by Anonymous Coward · · Score: 0
    Heck, I was thinking of a glass desk and maybe one camera below and one above, and maybe a foot pedal to trigger the cameras (and I suppose a flash and high F-stop would help as well).
    dumbest...askslashdot...ever...
  7. maybe the wrong approach by gumbi+west · · Score: 1

    I pay (almost) all of my bills with my online bank and just get e-bills. Then there is no paper -> OCR -> file issue. It's just website -> file. Saves you a big investment too!

    1. Re:maybe the wrong approach by CastrTroy · · Score: 1

      I use this for just about everything too. You don't need all your bills online either to go without paper. I just download my bank account files every week and put them in GnuCash. I mark the transactions under the appropriate accounts, so I can see what i'm spending where. Bills are filed in paper format, because it's the least amount of work for storing them, and it's the default format that most of my bills come in.

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
  8. Cartoonists and ADF by BKX · · Score: 1

    While cartoonists used cameras suspended from tables to make animiations for years, I don't really think this is what you want. A better solution would be to buy a scanner with an Automatic Document Feeder. These aren't particularly expensive and are usually worth it for other reasons. The most cost effective ones are the Laser Multifunction devices. They are basically laser printers with a USB port so they can act as a printer or scanner as well. Just make sure you get one with an ADF; it's worth it. Most also throw in fax functionality as well. My Brother MFC-7420 has lasted me over a year (still going strong) and two toner cartridges and can scan a batch of papers at 300 dpi in no time flat. While it prints B&W, it scans in color at a max of 9600 dpi, which is good enough even for pictures. It only cost me $250 and was totally worth it.

  9. Cameras aren't all that easy to use by Anonymous Coward · · Score: 0

    Google uses cameras to digitize their books, so it's feasible. I would shoot from 3-4 feet away with the lights at a 45-degree angle from the glass on either side.

    dom

    1. Re:Cameras aren't all that easy to use by otherniceman · · Score: 3, Informative

      Google used dedicated book scanners called Planetary or Orbital Scanners, see http://www.dlsg.net/bookeye.htm and http://en.wikipedia.org/wiki/Planetary_scanner. They are a lot better that a digital camera on a tripod.

    2. Re:Cameras aren't all that easy to use by Anonymous Coward · · Score: 0

      How do you know this? That Bookeye thing takes 2 seconds to scan a page, while a Canon 20D will take photos as quickly as your minimum wage worker can flip pages. It's also a good bet that one of those machines costs 2-10 times as much as a DSLR and some strobes.

      dom

    3. Re:Cameras aren't all that easy to use by otherniceman · · Score: 1

      It is my job to know this. I advise, plan, implement and manage large volume scanning operations for a variety of multinational companies and media orgainisations, serving the information back on the intranet / internet. I have work with books going back to the early 1700's. The planetry cams are just for bound books, you can get cheaper professional document scanners for loose paper.

      Google is advertising for people skilled with imaging equipment (http://www.google.co.uk/support/jobs/bin/answer.p y?answer=36925), these scanners are the type of equipment the imaging industry uses. At the volumes that Google are scanning these are good value. True, for a small collection they are not worth it, but I have not seen any digital camera pictures that have matched the quality of a professional document scanner.

    4. Re:Cameras aren't all that easy to use by Anonymous Coward · · Score: 0

      while a Canon 20D will take photos as quickly as your minimum wage worker can flip pages.

      Right and you'll get maybe a 40% OCR rate on your "quickly as your minimum wage worker can flip" page. You miss the crux of the issue here. Do you really think google (or any of the massive number of people looking for ocr'ing solutions) would go through the expense of using those specialized machines if all it took was a room full of monkey's with somewhat expensive digicams? Hell, with a 300dpi flatbed scan of a non-handwritten page, the ocr rates still suck big time.

  10. Why bother with OCR? by Badfysh · · Score: 2, Insightful

    If it's just for keeping a record of bills and other junk, why even bother with OCR? As long as you can read the results, just snap away.

    --

    I was conned by an old man in a cloak. It turns out those *were* the droids I was looking for.

    1. Re:Why bother with OCR? by flewp · · Score: 1

      Well, he may be able to want to search through his bills for specific payments, that kinda thing. Also, if he has a lot of stuff to digitize, text files could take up less hard disk space.

      --
      WWJD.... for a Klondike bar?
    2. Re:Why bother with OCR? by Wiseleo · · Score: 1

      I use NeatReceipts and scan everything in auto mode. See my amazon review for more details.

      Works great :-D

      --
      Leonid S. Knyshov
      Find me on Quora :)
  11. Or better yet... by svunt · · Score: 2, Funny

    C'mon, your work doesn't have a scanner/photocopier/printer with a feeder? I take my paperwork into the office once a quarter or so, feed the lot through the scanner in the print room, and email the output to myself at home. If you're one of the rare cases who'd feel bad about this, you could always offset the expense by not using their water cooler or coffee for a week :)

  12. scanners are FOR documents by Dun+Malg · · Score: 4, Insightful
    Digital Cameras vs Scanners for OCR?
    What, are you kidding? You can use a joystick in place of a mouse, but why? Cameras are for capturing a 2D image of a 3D scene. Like you noted, the optics are designed specifically for it. Scanners are for capturing a digital version of a 2D paper image. Musing over whether today's new, heavier wrenches might be stout enough to drive nails is silly, as what you really need is a hammer.

    Get a scanner
    --
    If a job's not worth doing, it's not worth doing right.
    1. Re:scanners are FOR documents by hords · · Score: 2, Insightful

      Besides that, scanners are usually cheaper than digital cameras anyway and can do much more than 300 DPI if you need it for another task. The scanner gets the lighting even and doesn't have to be focused. Maybe the reasoning is that it's faster to take a picture than to scan a document.

    2. Re:scanners are FOR documents by dmewhort · · Score: 1

      The one application I have been waiting for along these lines is what if you want to OCR something that you simply can't put in a scanner? Camera -> WiFi -> service -> OCR -> ?. The lighting issues make it mach harder on the OCR software, so it needs to be much smarter than it was the last time I tried, but from there I have a ton of applications. Doug

    3. Re:scanners are FOR documents by flewp · · Score: 2, Insightful

      Unless he has a proper lighting setup, it may actually take longer to clean up the photos of the documents than to simply scan them. Also, if the bills/reciepts/etc are of different sizes, he would have to zoom and fit it in frame, and crop the images among other considerations.

      --
      WWJD.... for a Klondike bar?
    4. Re:scanners are FOR documents by CastrTroy · · Score: 1

      Also, most bills come folded in the mail, getting the paper to lie flat enough would also create problems.

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    5. Re:scanners are FOR documents by Karma+Farmer · · Score: 1

      Did I just type "they're" instead of "their"? Dear god, whatever is wrong with teh rest of you is appearently CONTAGIUS! Your a rediculous looser.

    6. Re:scanners are FOR documents by suv4x4 · · Score: 1

      What, are you kidding? You can use a joystick in place of a mouse, but why?

      Because "I'm using a mouse" won't make it to Slashdot?

      Also for the chicks.

  13. reverse engineer this.. by RuBLed · · Score: 2, Interesting
  14. Focus is easy... manually set it. by WoTG · · Score: 1

    I quite often take photos of receipts and forms rather than walk to the other room to boot up the PC with the scanner. The trick for me is to force the focus to 1 metre and take the photo from about a meter away while using the optical zoom to make the paper fill in the field of view.

    I admit that the quality isn't quite as good as the scanner, but it's a heck of a lot more convenient and it's good enough for many uses.

    FWIW, macro mode doesn't really work well on my old digicam, plus, I'm not sure it makes sense to use it in this case.

  15. Must have reliable files -and- reliable system by Anonymous Coward · · Score: 1, Insightful

    It really sounds like you haven't actually tried it. I've got a Powershot 450, I think they just upped the versions so they're past that now, I imagine. Anyways, it's 5~mp, and I used it all the time in the library to take pictures of books so I could accurately quote them...much much easier than copying long quotes word for word, and then just look at the picture and re-type the text.

    My suggestion is to not take things so far digital with your process. Paper doesn't take up so much space that you can't hold onto digital and physical copies:

    1) Gather your big stack of "documents I probably don't need but if I do need, I will really really need"
    2) Take digital images of them all, either with a scanner or a digital camera. Choose a method, try it a few times, check the output to make sure it works and is legible, and then go with it.
    3) File the digital image on your computer, and file the paper in a system with the same setup and title. Even if you put all the papers in a an office-style cardboard box and stick them all in the back of the closet, the attic, whatever, do it in the same system as best you can.

    The point here is that you use the computer files if you ever need the information, and if they fail, you've still got the paper in the back of the closet in your basement. If you keep the papers organized, you can store them in such a way that after a few years, you'll know that an entire box is safe to shred and recycle.

    The key to any system like this is being able to trust that you have what you need when you need it. You can't ensure digital or physical documents 100%, but with both, you can feel pretty safe (store a copy of the digitals at a relative's house or some such, somewhere significantly off-site, or on an FTP somewhere or somesuch, or course). But having the files isn't the only part of a reliable system: you also have to be able to find them.

  16. Not as easy as you think.. by sakusha · · Score: 5, Informative

    I have some experience doing what you're trying to do. I've even done this type of work in professional labs with serious pro equipment (it was my job). It's a huge pain in the butt.

    I'm currently digitizing my collection of old tabloid punk magazines from the 1970s. I had to use a digital camera because flatbed scanners that do 11x17 or larger are extremely expensive, they're like $3000 or more. So I did some experiments with my consumer-grade 5 megapixel digital camera. The results were adequate, barely (and I have an art degree in Photography, this stuff is easy for me, YMMV). I've currently suspended my project until I can afford a higher rez digital camera, mostly because 5Mp is barely enough to capture the little 6 point type that is used in large sections of the magazines. But let me tell you more generally what I've learned.

    First off, you'll need a copy stand. This is a fairly standard photo accessory, but a good copy stand is fairly expensive. You need something that is easily adjustable, so you can raise and lower the camera to get the document to fill the frame, without using too much zooming. The copy stand keeps the camera parallel to the target at all distances. It is important to have quick adjustability in height, rather than zooming. You'd be much better off using a "prime lens" rather than a zoom, as zooms tend to have barrel and keystone distortion.

    Secondly, you need lights. If you only want to copy written documents (or B&W magazines like me) you can use cheap spotlights. If you want to do color, you need much better lighting, something with a fixed color temperature, or a flash system. Spotlights are really hot, and when I work in my small office, it gets intolerably hot when I spend about an hour photographing. For better, more repeatable results, you'd be better off getting a flash system. BUT...

    Here is the sticking point. You need something to keep the documents flat. That means placing them under a sheet of glass. So you are going to get reflections from the lights, and flash is high intensity lighting which makes it even more difficult to control reflections. The usual method is to put polarizing filters over the lights and the lens, to cancel out the reflections. This is a rather complex method, and a LOW END professional copystand with polarized lighting will set you back about $2500.

    OK, so what I did is I adapted my old disused photo enlarger. It was a huge monster for 4x5 negatives, I took off the enlarger head, and used a Bogen photo clamp with a ball-head joint attached to the motorized arm that goes up and down. It does a fairly good job as an improvised copy stand, but it is pretty cramped, the baseboard is only designed to make max 20x24 prints. Also it is a HUGE pain in the ass getting the camera leveled with the baseboard, I use a bubble level. Then I attached a cheap set of tungsten photofloods to the wings of the enlarger, so the light hits the baseboard at a 45 degree angle, to reduce glare. Note that it is best to point each light at the far side of the document, so the light paths cross each other. This gives the light a little distance to fan out and eliminate hot spots. I don't put my documents under glass, they're newspaper pages, so I flatten them for several weeks (!!!) under weights, then if there's a little curl, I use weights (like heavy metal rulers) at the edges, or hold the edges down with post-it notes. That eliminates the need for a glass plate to hold them down, and I don't have to deal with reflections. However, it takes a LOT of time and effort to get the documents positioned and flattened correctly, it is not a quick process.
    I use a Canon camera, so I use the Canon Camera Remote to my laptop to preview and take the shot. Even with the lights and some fill flash, I can end up with exposures of 1 or 2 seconds, so I can use a narrow f-stop. This shouldn't be necessary for a flat object, which requires no depth of field, but I find that the lens is sharper stopped down. It takes quite a bit of fiddling to get the optimal

    1. Re:Not as easy as you think.. by flewp · · Score: 1

      I had to use a digital camera because flatbed scanners that do 11x17 or larger are extremely expensive, they're like $3000 or more.

      They're closer to 1000-2500 dollars than 3000 dollars. At least for consumer equipment.

      --
      WWJD.... for a Klondike bar?
    2. Re:Not as easy as you think.. by sakusha · · Score: 1

      Hmm.. I didn't believe that 11x17 scanner prices had dropped so much, the last I'd seen was the Epson scanners for $3000 and $4000. But I checked it out and found Epson now makes a low-end 11x17 scanner for $1500 MSRP. Where did you find something for $1000? A $1500 scanner could finally make my project affordable, but $1000 would be better. Since I'm not making any money on this project, it's all expenses out of pocket, so I'm trying to keep it cheap. And quality isn't such an issue, these old tabloid magazines are printed so cheaply, the print quality is horrible, so even a poor scanner would be able to do a good job.

    3. Re:Not as easy as you think.. by flewp · · Score: 1

      Newegg has a 12x17 inch flatbed for $1042.99 USD.

      Specific model: http://www.newegg.com/Product/Product.asp?Item=N82 E16838150035/

      I don't have any experience with Microtek scanners, but it'd probably get the job done.

      --
      WWJD.... for a Klondike bar?
  17. You're assuming a standard paper format here by cheros · · Score: 1

    I'd agree with you, if I didn't have the impression that the original poster wants to scan things that are not of a 'regular' printed size. Most consumer grade ADFs are designed to only handle A4 or letter.

    --
    Insert .sig here. Send no money now. Owner may sue, contents will settle. Batteries not included.
    1. Re:You're assuming a standard paper format here by BKX · · Score: 1

      Mine handles all those weird bill sizes as well. Even Home Depot receipts will get sucked through, although it has a bit of trouble with feeding multiple crumpled receipts unless they're sandwiched between thicker sheets of non-crumpled paper.

  18. $100 8.5x11 scanner, and scan half-pages? by billstewart · · Score: 2, Insightful
    It sounds like you've got to handle each page by hand anyway -

    so get yourself an A-size scanner and just scan each page in two parts?


    Or if there aren't too many grayscales that you'd trash,
    just run it all through a photocopier to shrink to 8.5x11 and scan that?

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  19. www.scanr.com by mge · · Score: 1

    ScanRprovide a way for you to experiment. basically, they take an image, ostensibly from your cell phone, but could be from any digital camera and converts it to a searchable .pdf file.

    Depending on the quality of teh camera, they sugegst conference room whiteboards; Diagrams, notes, flow charts

    Their website has samples of what you'd expect, but basically you're capturing the image and sending it to scanR. They do the conversion and send it back as a .pdf.

    1. Re:www.scanr.com by wknoxwalker · · Score: 1

      I am now waiting for google to buy them out..

  20. Parallax? by Schraegstrichpunkt · · Score: 1

    I'm curious: Is parallax and other image distortion a significant problem when you take pictures of a paper vs. scanning it?

    1. Re:Parallax? by photoweenie · · Score: 1

      Off-axis geometry isn't as big an issue as uniform focus. If there is enough depth of field to have the whole image in focus, the geometry can be fixed; this is a nice closed problem as long as the paper is flat and the boundaries can be located.

  21. Bulk indexing by mutube · · Score: 1
    As far as OCR is concerned, don't bother. Even today, it's nowhere near accurate enough. In my experience, the best software out there get an average of one error per page on a really good scan. Trust me: it will take a lot more of your time than you think to fix that. Assuming you're doing mostly black and white text, G4 compression will compress a 300dpi, 8.5x11 image down to about 100k. At that rate, you can store close to 7000 pages on one CD.

    Is there any software available that can roughly OCR a document and store keywords or snippets of text in metadata or an index? This would solve both the problem of both keeping an accurate record and searchable content.

    What formats allow an easy mix of image and text data (without formatting)?

    1. Re:Bulk indexing by mlk · · Score: 2, Informative
      I work in the Media Monitoring industry. What we do is scan in newspapers (we have some 4000 publications), OCR them, throw 'em in a search engine and do some bloody complicated searches on that dataset before sending out hits.

      roughly OCR a document and store keywords or snippets of text in metadata or an index?

      Lots.
      You could be OK with GOCR and Apache Lucene if you do not require zoning (working out blocks of text and columns).

      OCR is not good enough

      Oh it is. You will need to add "variants" to your searches. E.g. if you are looking for Microsoft you would search for "M[i1]cr[o0]s[o0]ft". Some search engines can do this for you, others can say "max of two errors".

      What formats allow an easy mix of image and text data (without formatting)?

      XML (hehe). PDF can. Most systems would have the image as file somewhere on your file store, and the text in a database.
      --
      Wow, I should not post when knackered.
    2. Re:Bulk indexing by mrchaotica · · Score: 2, Insightful

      Oh it is. You will need to add "variants" to your searches. E.g. if you are looking for Microsoft you would search for "M[i1]cr[o0]s[o0]ft". Some search engines can do this for you, others can say "max of two errors".

      Once you've OCRd, is there any (preferably Free) software that can parse the text against a grammar and word list and hopefully fix some of these errors? Surely "if there's a digit in the middle of a word, it's probably really the letter with the similar shape," "if an unknown word is a character or two different from a known word, it's probably the known word," etc. aren't difficult heuristics, right?

      --

      "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

    3. Re:Bulk indexing by mlk · · Score: 1

      I believe some OCR packages do things like this, as do some search engines. However then ones we have tested have not been commercially viable.

      Some require more OCR machines than variant-based search (lots more) to do the load we do. This would mean more space, bigger air con, lots more cash for little gain.

      Some will not give information out in a way our current system can use, so we would have to rebuild large chuck of our system, or scape products and/or work flow procedures.

      --
      Wow, I should not post when knackered.
  22. To Clarify... by Aladrin · · Score: 4, Insightful

    So to clarify... You want to trade the hassle of:

    1) lift a lid
    2) stick a paper in a well-defined corner
    3) press a button

      for the hassle of:

    1) align a camera on a tripod, including angle as well as position
    2) align a paper with no guide
    3) adjust the lighting so that you get an even tone
    4) make sure you didn't accidentally move the camera, the tripod, or bump the desk
    5) step on a foot pedal that you jury-rigged to make take a picture
    OR
    5) Push a button on a camera that you can't afford to move even a hair.
    6) Use image software to continue adjusting the photo so that the OCR will read it properly
    7) Hope you did everything right the first time.

    I think I'd pick door number 1.

    --
    "If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
    1. Re:To Clarify... by vanyel · · Score: 2, Insightful

      I was thinking about this recently, and what I want is:

      1. stick the paper in the slot, it feeds, scans and files in "New Docs"
      2. drag thumbnail to register entry in gnucash, it optionally (sometime in the distant future) ocrs it and tries to find the total and the vendor, as well as matching the last 4 to one of your cards to verify it's going into the right account, then gives you a chance to correct its mistakes. The scanned image is included in the financial db attached to the register entry.

      Unfortunately, few of the sheet feed scanners seem to get very good marks in reviews...

  23. The solution by C4st13v4n14 · · Score: 1, Informative

    I have a four year-old Canon Powershot G2 that has been indispensible in the digitising of my documents. Given adequate lighting, all you need is to line up the document in the view-finder and take the photo. Autofocus is usually adequate, but if you just can't seem to get a clear shot (certain things will prove problematic), manual focus is your next best feature to utilise. If you're doing James Bond-type work and are in a hurry, then you'll often end up with blurry images that won't be useful in OCR. Given that I already own a digital camera, I will never invest money in a scanner. If anything, I'll buy a better camera when I can find one with all the features I want. Hope this helps in some strange way :)

  24. Works for me by Cycon · · Score: 1

    A lot of people seem to be critisizing the idea, but there are some uses for it.

    I have a multifunction scanner/printer/copy/fax that cost around $100 when it was purchased with a computer a year or two ago. Its great for scanning in receipts for work expense claims, and having soft copies of important paperwork. I used to hand-hold a digital camera, with receipts and papers on a well-lit, flat surface, and photograph in macro mode. Now I'll be going back to that method as I've moved into a very small place where space and power are limited. I don't have room for the big bulky device, nor a filing cabinet for storage of the originals.

    Long story short, taking the pictures by hand, without a tripod means that yes the image of the papers will often be distorted (taken from an angle) or have shadows in them. But the point is you can read all of the lettering and 9 times out of 10 throw away the original. I would never bother with OCR, just keep the image files well-organized and backed up.

    --
    Your Brain + EEG + LEGO Robots = Brainstorms
    1. Re:Works for me by iamhassi · · Score: 2, Interesting

      "A lot of people seem to be critisizing the idea, but there are some uses for it."

      I agree. I don't know about the OCR thing but I take a picture of everything. Every business card I get, every little receipt or scrap of paper. And why not? Just takes a second and it's done, I always have a digital copy to go back and read or print out if need be.

      If I only had a scanner I'd never bother, in fact I had a scanner for years before I had a digital camera capable of doing this and I never bothered then. It's saved my ass sometimes too, especially when the paper is filed away somewhere and it's easier to find the photo on the computer.

      --
      my karma will be here long after I'm gone
  25. Scanner fuss by omega9 · · Score: 2, Funny

    I was thinking of a glass desk and maybe one camera below and one above, and maybe a foot pedal to trigger the cameras

    Boy, you're right! Who'd want to fuss with a scanner!?

    --
    I'm against picketing, but I don't know how to show it.
  26. Bulk indexing-DjVu by Anonymous Coward · · Score: 0

    "What formats allow an easy mix of image and text data (without formatting)?"

    http://www.lizardtech.com/

    DjVu is what some libraries use and there's some free software out their.

    BTW there are legal size scanners out their. Shop around, plus don't forget to check manufacturers websites for refurbished and discontinued models.

  27. No scanning required by coinreturn · · Score: 2, Informative

    I find that most bill providers have an option to receive your bills electronically, keeping them either in their "safe" (ie, website) or to receive them in e-mail. This is true for credit cards, banks, major utilities; the main exception being the city-run water and trash company.

  28. I've tried this... stick with a scanner for now. by Ankh · · Score: 1

    I routinely scan pages of old books (and other documents) for my Web site, from old books. I use an Epson Expression 10000 XL, which, as someone else noted, isn't cheap, but it does A3/11x17/tabloid at 2800dpi. At 400dpi grayscale it can scan a regular page in a few seconds.

    I've also used a Casio Exilim camera to photograph pages.

    The way that it's done for archival purposes is to have a mount that holds a book and also holds a medium-format camera about four feet away. To get good resolution for OCR you'll need something that's about an 11 megapixel camera or more, for a full page at (say) 7x10 inches of actual text. Hugin and ptstitcher and friends, the panorama tools, include software to correct for lens distortion. Phase One sells a camera mount (in Canada you can get it from Vistek, together with their 40 megapixel back end for a medium-format camera. Or you could make a suitable mount yourself. The trick is that it holds the book open half-way (or less, using mirrors) so that you don't get as much page distortion. Holding the book and the camera rock steady is absolutely necessary if you are photographing text.

    For small items like a cheque (say), use a flatbed scanner, and scann at 400dpi grayscale. Project Gutenberg's guidelines are outdated (they use 300dpi black and white as I recall) and don't get such good results. If you go much higher than 400dpi, the OCR software starts having tantrums at you and the quality may actually degrade.

    The best OCR software on the market today as far as I can tell is Abbyy Finereader. I tried several, and found this had, for example, at least two orders of magnitudes fewer errors than the GNU OCR package. You should expect errors, though, especially in digits.

    Frankly I'd go with a scanner just because they're designed for this application, and you have less hassle. Transferring images from the camera to the computer twenty minutes after taking the photo means you need to keep a separate log of where each photo came from, or you'll muddle them up. I save images with filenames like Ball-Sussex/086-Pevensey-Castle.png so that the page number is in the filename. And the image quality with even a low-end scanner is much higher than you can get in practice with a camera without an elaborate set-up, and reliably better, comes out every time regardless of lighting, camera settings, wobbly hands, etc.

    Having said all that, I do photograph pages sometimes to make manual transcriptions. Afterwards I do careful proof-reading against the original.

    Liam

    --
    Live barefoot!
    free engravings/woodcuts
  29. None of that matters for his purpose. by twitter · · Score: 1

    It's almost impossible to shoot a bill or a check stub dead on, at close rage, without fish-eye'ing, and without getting in your own shadow.

    If his purpose is simply not to file paper a scanner is not required. A $200 Cannon from Walmart is all you need if you don't worry about OCR. You move the camera back and use the zoom and it works. I take 1600x1200 pictures of my classnotes and the results are perfectly legible. A good desk light saves your batteries by eliminating the need to flash. You move it to the side to avoid your shadow. The result might be fisheyed, but so what it's just your phone bill. Being able to OCR and text search would be like icing on the cake. The real prize is keeping your records without stuffing file cabinets with unimportant junk.

    --

    Friends don't help friends install M$ junk.

  30. Desktop duplex scanners by time961 · · Score: 1
    Over the last two years, I've scanned about 200,000 pages using several low-cost desktop duplex scanners. I particularly like the Xerox/Visioneer Documate 262 (street price around $800) and the Fujitsu ScanSnap fi-5110EOX2 (street price around $500). Earlier, lower-end versions of these (DM252, f1-4110) are still available, nearly as good, and significantly less expensive.

    These units scan both sides of the document in the same pass, at between 4 and 30 monochrome sheets per minute depending on resolution (up to 600 DPI) and model. They can also do color (and grayscale), but that tends to be slower and the files MUCH bigger. The Xerox is faster, but more persnickety; the Fujitsu is unflappable and optimized for one-step desktop use. With either one, you can just drop a stack of pages, checks, receipts, or whatever into the sheet feeder, press a button, and PDF files appear on your disk. This is VASTLY more convenient than feeding, or photographing, one sheet at a time, and the scanners are small--footprint is not much bigger than a sheet of paper.

    Speed isn't as important as you might think: unless you're scanning huge stacks of paper, the time spent fussing with the paper itself tends to dominate the process, and it's easy to do something else productive while the sheets are whirring through the auto-feeder. I've used them for everything from manuals to business cards, and I think they're the ideal solution to getting rid of all that paper. The software is idiosyncratic and not extremely stable (I occasionally have to reboot the machine that the Xerox unit is connected to), but it gets the job done.

    1. Re:Desktop duplex scanners by Anonymous Coward · · Score: 1, Informative

      I recently bought a used HP Network Scanjet 5 for $50 on ebay and upgraded it, following instuctions at http://www.madole.net/scanjet/. In addition to installing BSD, I upgraded to a bigger hard drive, so now it's both my scanner and my document repository. The scanner does a great job of 300dpi black and white scans, and I use NFS to mount the scanner's drive and organize the scanned documents, so I can easily access my files from any computer. It doesn't have an automatic duplex feed, so for two-sided originals you have to pick up the stack and turn it over when prompted, but you only have to do that once per scan job. I'm really impressed with how easy it was to upgrade the scanner and how well it works, now.

  31. you are making it too hard. by twitter · · Score: 1

    I'm currently digitizing my collection of old tabloid punk magazines from the 1970s.

    That's hard to do but it's not what's required. Snapping legible pictures of a phone bill is not hard if all you want to do is get rid of your paper. Getting OCR is harder, but still not as difficult as making museum grade preservation of artwork.

    Be sure to post the results on line some time after the copyright expires. In 2070, they will probably read like Elizabethan English but at long last the public domain will be served thanks to your efforts.

    --

    Friends don't help friends install M$ junk.

    1. Re:you are making it too hard. by sakusha · · Score: 2, Insightful

      Well, I'm trying to max quality with modest equipment, but the basics are always the same. You still need some sort of support like a camera stand, lighting, and something like glass to hold down the documents. Lighting and reflections will always be a problem. I've done this for real quickie jobs using camera on a tripod, and the results sucked. A flatbed scanner is still a much quicker, cheaper, and better way to do the job.

      BTW, I have privately circulated a few of my PDFs amongst some online punk communities, and they went nuts over them. The old school punks love them for the nostalgia, but to the new punks who weren't even born in the 1970s it might as well be Elizabethan English, they don't get it at all. Ha! Some of these magazines are still around, and even have major online websites, but none of this old material is available through the official sites. It's a shame, since they presumably have high quality reproductions in their archives, I just have 30 year old mouldering newsprint. They could probably never re-release this material, it all depends on context, half the fun is the advertisements next to the articles, and they could probably never get all the rights and sort out all the royalties to reproduce all the trademarks in the ads. But I could probably get away with circulating my scans openly, I don't think a British court could touch me here in the US. And some of these magazines don't exist anymore and no company has any financial interests in the content, so there's nobody left to file a lawsuit.

  32. You could buy one of these... by cwgmpls · · Score: 1

    Ray Kurzweil as tweaked a digital camera to do just what you describe, for the purpose of providing a portable text reader for blind people. They are for sale now, for about $3500 each.

  33. What I've seen by Ironsides · · Score: 1

    If you're doing low volume work, using a camera may be fine. However, once you start getting into the higher volume work, you want a scanner with a document feeder. Also, non-sheetfeed scanners are generaly cheaper than 6-8 megapixel cameras, so I'm not sure why one would use it. Especially since OCR really only needs B&W and any camera of that quality is going to be color only. A black and white scan takes a lot less time than a color.

    Back to the sheet feeds, I've worked with the Fujitsu fi-5220C series scanners before. The only time I've ever had a problem with the document feeder was when I forgot to remove a staple or paperclip. It's also quite fast, 3 seconds tops for a BW scan of an 8.5x14 and that includes the time to transfer the scaned image to the PC. I'd challenge you to find a camera that could keep that up for long, especially as you would have to manually change the pages yourself.

    --
    Fly me to the moon Let me sing among those stars Let me see what spring is like On jupiter and mars
  34. Everything should be free. by Anonymous Coward · · Score: 0

    With that mentality, why don't you hope they buy out a grocery store, or a gas station?

    1. Re:Everything should be free. by wknoxwalker · · Score: 1

      I live off tinned food bought from froogle.

  35. Not an entirely original idea... by AceCaseOR · · Score: 1

    Following World War II, a lot of photography businesses would photograph the discharge and service records of US servicemen and women, so they could present a copy of their records when applying for various benefits offered to veterans. My grandmother did this (the photography) after the war. She was a veteran as well - she worked for US Navy Intelligence working on Japanese military cyphers. She also analysed aerial photographs of enemy positions, but I digress.

    Unfortunatly, my grandmother passed away, so I cannot get any information from her on how the photographs were taken, but one of my family members is working on a biography of my grandmother, and she might have the information.

    --
    Zagreus sits inside your head, Zagreus lives among the dead, Zagreus sees you in your bed and eats you in your sleep.
  36. Manga scanning setup with digital camera by Kalak · · Score: 1

    It took a bit of digging to find it again, but I ran across this in some huge chain of events that is way off topic, but the result is one translator's approach to using a digital camera to scan his manga collection without unbinding. The foot petal idea is what tipped off my memory.

    http://www.mrdummy.net/mangatranslation/tutorial01 .php

    --
    I am, and always will be, an idiot. Karma: Coma (mostly effected by .hack)
    1. Re:Manga scanning setup with digital camera by mencomenco · · Score: 1

      Excellent link, Kalak.

      This setup looks very good. The emphasis on reflection control is very important, especially copying text and images on coated stock.

      My first improvement would be daylight-balanced fluorescents (probably circular) rather than straight &/or a bunch of cheap 3M monitor anti-glare screens to polarize the lights. My current preferred lighting is Colortran halogen soft-boxes and a ring light around the lens, but that's specialized and expensive.

      I discovered when scanning two-sided pages that backing the subject page with black construction paper effectively kills "bleed-through" of the reverse side. I use a black felt camera table for this reason -- white tables also create uncontrollable glare. Use grey felt for 3d objects.

      Most zoom lenses show lowest distortion at mid to high focal lengths. This puts a lot of distance between lens and subject, so I mounted my camera stand upside down. It used to be screwed to the ceiling or wall, but now I have a section of cheap used pallet racking. Overhead mounting works well and stays out of my life most of the time. Digicams are incredibly cheap -- I have three - one on the stand, one for the field and one for backup. Canons are excellent, but HP's stuff is impressive and cheaper -- I suspect that like their laser & inkjet printers much of the camera technology is bought from Canon.

      A vacuum table will very effectively flatten single sheets - it can be as simple as pegboard & a shop vac if you make your own. Just turn off the vacuum between shots.

      Visit any good local printer if you want to see how a proper copy camera works. they have been doing this for decades.

  37. Scanning with scanR by Anonymous Coward · · Score: 0

    Hi this is Chris from scanR.

    It's nice to see discussion on this topic. We think cameras are a much better consumer platform for capturing physical information and making it accessible digitally. There are roughly 600 million cameras + camera phones shipping each year and they are easy to use. So if you don't have a scanner, and you certainly don't have one in your pocket, using a camera is a great substitute. Of course, scanR makes it better, including OCR keywording for search.

    Read this post on what makes a good camera for scanning: http://blog.scanr.com/scanr_blog/2006/04/making_ev ery_pi.html

  38. Done it and it works well by Anonymous Coward · · Score: 0

    I have done document capture using a scanner and a digital camera.

    It falls into three main steps:
    - 1. Acquire the image via camera or scanner
    - 2. Fix image quality
    - 3. (optional) OCR the image

    I fix the image quality (i.e., reduce colors, fix the spotlight effect of a digital camera's flash, etc) by:

    1. Stretch the image's contrast using ImageMagick (similar to automatic level adjustment)

    convert.exe input_image_0.png -equalize image_1.png

    2. Reduce the number of shades of white. (changes all pixels within 10% of white to be white)

    convert.exe image_1.png -white-threshold 90% image_2.png

    3. Reduce the number number of shades of black. (changes all pixels within 10% of black to be black)

    convert.exe image_2.png -black-threshold 10% image_3.png

    4. Adaptive threshold the image. Important that you set the window (WidthxHeight) to be about the height of two lines of text and about 4 characters wide. A smaller window runs faster but can cause noise to be put inbetween two adjacent lines of text.

    convert.exe image_3.png -lat 20x20 image_4.png

    5. (optional) OCR image_4.png

    Other options:
    - Threshold by a percentage

    --> 50% threshold using
    convert x.png -monochrome out.png

    --> User definable threshold amount (e.g., 80%) using
    convert x.png -threshold 80% out.png

    Lastly, you can chain together commands so that the steps can be simply written as one line:

    convert input_image_0.png -equalize -white-threshold 90% -black-threshold 10% -lat 20x20 output_image.png

    Noise removal can be optionally done before any other image processing
    Noise removal methods:
    - NL filter for 'edge enhancement' in gimp (Filters -> Enhance -> NL Filter) (Use settings, Edge Enhancement, Alpha(0.60) Radius (1.0)). This is similar to the pnmnlfilt.exe in Netpbm - found at netpbm.sourceforge.net

    - Despeckle (not always helpful)
    - Dust and scratches in Photoshop
    - Smart blur in PhotoShop
    - Unsharp mask - It does a good job smoothing background noise but does not do a very good job at edge enhancement (NL Filter does a better job).

    ImageMagick is open source and can be had here http://www.imagemagick.org/
    Netpbm can be had here http://netpbm.sourceforge.net/

    I did this in two projects:

    - Scanning a large number of oversized newspapers (10 inches by 12.5 inches) (serveral hundered pages)
    - Scanning in 8.5 inch x 11 inch tinted color pages from books (about 500 pages, messy because pages had a color background)

    The multiple scan for an oversize page takes too much time because you have to scan the page face down on the scanner.
    This requires you to:
    1. lift up the paper,
    2. flip to the next page,
    3. flip over,
    4. place on scanner (align it)
    5. scan (suggest 300 dpi or higher black and white (monochrome) scan)
    6. align page for second scan
    7. scan (suggest 300 dpi or higher black and white (monochrome) scan)
    8. (later) put both scans into a single image file, or, much more time consuming, join the two scans into a single seamless image

    This takes about 1 minute per page given a 10 second per scan scanning time.

    A digital camera would take about 10 seconds per page without any post processing, page joining in an image editor, etc.

    The main OCR issue with a camera is that you need to approach the quality of 250+ DPI scanner image. You need to take a camera snapshot with a dpi higher than 250 to get near scanner quality because camera will distort the image, add noise, over sharpen, etc. I've had good results taking images that work out to 350 dpi with a digital camera.

  39. You'd be amazed. by Grendel+Drago · · Score: 1

    They had USB scanners at the local Goodwill for about six bucks. Six bucks! 'Course, you had to dig through a box of wall warts, and at least one of them had a mangled drive belt, but if you're willing to bring a laptop, it's hella cheap.

    --
    Laws do not persuade just because they threaten. --Seneca
  40. Too big by GWBasic · · Score: 1

    It'll be too big. I have a standalone USB-powered scanner that's 1" thick. (It's now retired because my printer came with an integrated scanner.) It was great when I was in college, because I could stick it in my backpack and take it to the library.

    It sounds like you want to perform scanning in a batch job. Perhaps an off-the-shelf solution is better, even if it's slow? (You'll be asleep, at work, ect.) Do the old HP ScanJets allow for batch scanning?

  41. Real camera solution by nuggz · · Score: 2, Informative

    1 Don't use a tripod, use a document photo stand.
    Think of an overhead projector with the camera where the mirror is for vertical adjustment.
    2 Have a guide for the paper, not that hard.
    3 Lighting is an important one, but as long as it's even the type of light doesn't really matter if you set your white balance correctly.
    4 If it is a rigid setup doesn't really matter
    5 Use the camera control software on the computer, you don't need to really use a camera.
    6 Save the file and run the OCR software.

    I use a similar setup to take photos of test parts at work, works nicely.

  42. only if you need to scan books non destructively by petermgreen · · Score: 1

    i belive for in print books even google are cutting off the spines and shoving them through a sheet feeder, far less labor that way.

    yes if you are archiving rare books there isn't much choice but for most applications sheet feeding or flatbed is fine (yes flatbed without sheet feeding is laborious but i'm not convinced theese "planetery scanners" are any less so)

    --
    note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  43. low res goatse by Anonymous Coward · · Score: 0

    OCR seems to work with Goatse.

    _______
    |=(0)=|
    | | | |

  44. Scan your receipts at least. by Anonymous Coward · · Score: 0

    It is a good idea to scan the receipts you get from the cashregisters when buying something with a warrenty.

    It's often in odd formats, making it hard to punch holes in an storing it in a binder.

    But more important: If the receipt is printed on cheap thermo-sensitive paper, the print fades away.
    What good is a 2 year warranty, if the receipt is just a blank piece of paper after 3 months.

    Leif