Slashdot Mirror


Paperless Office Solutions Under Linux?

sholgate asks: "I've been asked to look into implementing a paperless office under Linux. We receive emails, letters, word documents, PDFs etc and need a way of converting and storing them in a way that provides easy searching and accessing. We've been offered two Windows solutions, one based on Canon ScanFile and the other using Lotus Notes. My office went with Canon back in 1995 and now has a load of unreadable CDs as the original software was DOS based doesn't seem to work under Win98/XP. We now face paying for conversion to the new system plus new license fees. We are primarily Linux/Unix based here so Windows is inconvenient and history has shown that a closed product is not a good solution. I favour having a directory browsing system based on thumbnails (such as nautilus or konqueror) and searching with grep, but I can see the benefits of more complex systems that store a database of search terms etc. Have other Slashdotters thought about paperless offices? What answers did you come up with?"

4 of 44 comments (clear)

  1. Google search appliance by Tomah4wk · · Score: 4, Interesting

    A google search appliance sounds like it would suit the needs for at least your search requirements. It can also look through MS Office documents (i assume these get emailed to you) and PDF documents and display them as HTML in your browser. With regard to your letters, Clara OCR is free (as in beer, not sure as in speech) for linux (is debian packaged anyway).

    Hope this helps.

  2. Good scanners and outsourced proofreading. by duffbeer703 · · Score: 3, Interesting

    It depends on what you want to do.

    I've worked with a state agency which, not suprisingly, handles alot of paperwork. They have a scanning solution which brings in the images, stores them in graphics format (i thibk TIFF), and indexes the document under the case number it is associated with. Meta-info can be added by the people who work with the documents.

    Note that if you need to have legal proof of a signature or if your auditiors require you to keep documents for x years, they must be in graphic format --- an OCR'd document in ASCII text won't fly.

    If you are looking to automate data-entry, get a high speed commercial scanner (if you have large volume) from a company like Bell & Howell and outsource the OCR activity to another company. Tons of companies (Lockheed Martin does it for most federal agencies) do this. The outsourcers send your documents to a 3rd world country like Ghana for proofreading. OCR is only about 95% accurate, and automated OCR is not reliable enough for anything!

    The free Ziff-Davis magazine "Baseline" ran an article about this a couple of months ago, you might want to find their website (or look through the pile of free mags on your desk) and see fi you can find it.

    Don't shop for a solution based on platform, "Free"/non-"Free", etc. A "Free" solution will take longer and and your cost driver will be the implementation, not inital licensing cost.

    Get whatever provides you with the best solution, period.

    --
    Conformity is the jailer of freedom and enemy of growth. -JFK
  3. DjVu is better for this than PDF by 0x0d0a · · Score: 4, Interesting

    I will grant that PDF can store scanned documents, but it's really designed and best for storing printed-directly-to-PDF files...otherwise, you end up with absolutely massive files. Unfortunately, it's commonly used for said purpose. Even PNG would be much better.

    DjVu is an interesting format that was primarily designed for storing scanned formats.

    It uses a couple of techniques, such as OCR/pseudo-OCR, and multiple embedded images (JPEG/PNG) within the file for rasterable images. The idea is that, say, a scanned magazine page with text and a photographic image is stored as text, a little bit of outline font information, and a JPEG of the photographic image.

  4. Paperless Solution by Anonymous Coward · · Score: 1, Interesting

    One great system we've installed where I work, is from a company call Stellent. www.stellent.com, Great piece of software, configurable and changable to your hearts content, The server is java based runs on linux/solaris/windows with apache.
    It can convert documents automatically to PDF, and stores both the PDF and the native file in a specified direcory in the filesystem.
    Only sad thing is it's not open source (but you can modify anything you want to anyway) and it can get expensive depending on the number of users that will be checking in files.

    We've been using it here for over a year now and most people love it. Documents are easier to find than before, and we don't loose documents like we used to.
    Just wanted to pass this one on.