Slashdot Mirror


Open Source OCR That Makes Searchable PDFs

An anonymous reader writes "In my job all of our multifunction copiers scan to PDF but many of our users want and expect those PDFs to be text searchable. I looked around for software that would create text searchable pdfs but most are very expensive and I couldn't find any that were open source (free). I did find some open source packages like CuneiForm and Exactimage that could in theory do the job, but they were hard to install and difficult to set up and use over a network. Then I stumbled upon WatchOCR. This is a Live CD distro that can easily create a server on your network that provides an OCR service using watched folders. Now all my scanners scan to a watched folder, WatchOCR picks up those files and OCRs them, and then spits them out into another folder. It uses CuneiForm and ExactImage but it is all configured and ready to deploy. It can even be remotely managed via the Web interface. Hope this proves helpful to someone else who has this same situation."

5 of 133 comments (clear)

  1. Wait a sec by inKubus · · Score: 5, Funny

    There's something wrong with this Slashvertisement--it's for a free product!

    --
    Cool! Amazing Toys.
  2. Re:ocr by 0100010001010011 · · Score: 3, Funny

    Now it just needs to incorporate a Recaptcha Lite to improve accuracy.

    Maybe something on the web interface when it doesn't recognize a word you can correct it.

    [Given the success of the Cow Clicker on Facebook, maybe turn it into a facebook game. Tell people they're only allowed to correct words every 6 hours. If they want to correct more words, they'll have to pay for it. Add friends and correct more words to level up!]

  3. Re:added. by b4dc0d3r · · Score: 3, Funny

    Saw this on facebook.

    That isn't a good sign, my friend.

  4. Re:Anyone got error rates? by adavies42 · · Score: 1, Funny

    > tesseract-based

    you need 4d software to scan 2d text? trippy....

    --
    Media that can be recorded and distributed can be recorded and distributed.
    -kfg
  5. Re:commercial? by FelixNZ · · Score: 3, Funny

    Sole support staff's user name in 'ganjadude' I am a little wary :)