Paperless Office Solutions Under Linux?
sholgate asks: "I've been asked to look into implementing a paperless office under Linux. We receive emails, letters, word documents, PDFs etc and need a way of converting and storing them in a way that provides easy searching and accessing. We've been offered two Windows solutions, one based on Canon ScanFile and the other using Lotus Notes. My office went with Canon back in 1995 and now has a load of unreadable CDs as the original software was DOS based doesn't seem to work under Win98/XP. We now face paying for conversion to the new system plus new license fees. We are primarily Linux/Unix based here so Windows is inconvenient and history has shown that a closed product is not a good solution. I favour having a directory browsing system based on thumbnails (such as nautilus or konqueror) and searching with grep, but I can see the benefits of more complex systems that store a database of search terms etc. Have other Slashdotters thought about paperless offices? What answers did you come up with?"
A google search appliance sounds like it would suit the needs for at least your search requirements. It can also look through MS Office documents (i assume these get emailed to you) and PDF documents and display them as HTML in your browser. With regard to your letters, Clara OCR is free (as in beer, not sure as in speech) for linux (is debian packaged anyway).
Hope this helps.
Zope might be a good start for you :
http://www.zope.org
Votez ecolo : Chiez dans l'urne !
Every attempt I've ever seen to go "paperless office" have been failures - if all you end up with is a set of unreadable CD's a few years later, you've done very well so far. Personally, I think a paperless office is about as useful as a paperless toilet.
-John
htdig has support for msoffice docs and pdf's, and sounds a little cheaper than a google search appliance (although I'm sure a shiny yellow solution does a good job).
I've never used a modem in linux, so I have no idea what the telephony capabilities are.
I tend to agree with most of the replies here however. I tried my hardest to save a tree here and there and the other system administrator here prints EVERYTHING out. Until you can fire all the idiots and be left working alone, I'd skip on the "paperless office" idea and spend more time working on projects.
HP makes 40% of their money in Printer cartridges. Printer stuff is consumable. That means I use it up and buy more.
I have spent more on printer paper and ink than all my computer hardware put together in the last 5 years.
Paperless office is a dream.
-- Andy
There are a few solutions. Basically what you're looking for is a nice front end to a virtual filesystem, with some bells and whistles.
...)
Take a bunch of paper, scan it, index it, file it. Additionally, do the same for non-scanned work (email, doc, pdf,
Windows wise, Doctrieve (now Redmap networks, look for a similar product) is a good solution. Theres a range of products, all providing more or less similar functionality (some more bells here, some less whistles there...) Non-windows wise, theres an opensource one called DocMgr which provides similar functionality, albeit a bit immature.
OCR is really the big issue here with scanned work. I've only dabbled with OCR under linux (using GOCR) with limited success. Bad OCR == bad indexing == useless searching.
I'm currently in the process of writing something similar targeted for the higher-end market. If you're interested in testing or evaulating, drop me an email.
The Doctrieve link is not Mozilla friendly:
"To view this site you must be using Microsoft® Internet Explorer 4 or above.
If you do not have a copy of Microsoft® Internet Explorer please use the links below to download a FREE copy.
We look forward to you visiting us at www.doctrieve.com
A software company that is that ignorant about how to make web pages might not be the best business partner.
We are primarily Linux/Unix based here so Windows is inconvenient
You are a lucky SOB.
>
Since it sounds like you already are receiving the documents electronically, you need a content management system. There are plenty out there, and it depends on the types of things you want to do. there's Stellent which is primarily a content management system for documents, but i dont know what sorts of Linux support they have. Also there's Interwoven which is a little more based on web deployment content management.
another poster has mentioned Lotus, but there is a product from IBM called IBM Content Manager that runs on DB2 and WebSphere (which both run on Linux) and gives you really powerful storage and delivery of your stored content.
Of course, you could always check SourceForge which shows at least a dozen projects with "Content Management" in their descriptions...
It depends on what you want to do.
I've worked with a state agency which, not suprisingly, handles alot of paperwork. They have a scanning solution which brings in the images, stores them in graphics format (i thibk TIFF), and indexes the document under the case number it is associated with. Meta-info can be added by the people who work with the documents.
Note that if you need to have legal proof of a signature or if your auditiors require you to keep documents for x years, they must be in graphic format --- an OCR'd document in ASCII text won't fly.
If you are looking to automate data-entry, get a high speed commercial scanner (if you have large volume) from a company like Bell & Howell and outsource the OCR activity to another company. Tons of companies (Lockheed Martin does it for most federal agencies) do this. The outsourcers send your documents to a 3rd world country like Ghana for proofreading. OCR is only about 95% accurate, and automated OCR is not reliable enough for anything!
The free Ziff-Davis magazine "Baseline" ran an article about this a couple of months ago, you might want to find their website (or look through the pile of free mags on your desk) and see fi you can find it.
Don't shop for a solution based on platform, "Free"/non-"Free", etc. A "Free" solution will take longer and and your cost driver will be the implementation, not inital licensing cost.
Get whatever provides you with the best solution, period.
Conformity is the jailer of freedom and enemy of growth. -JFK
Otherwise, really, that's about all their is to it -- normal Linux / Unix LP print services. Switch to that and you'll never have to replace your toner cartridges again!
:-)
DO NOT LEAVE IT IS NOT REAL
I will grant that PDF can store scanned documents, but it's really designed and best for storing printed-directly-to-PDF files...otherwise, you end up with absolutely massive files. Unfortunately, it's commonly used for said purpose. Even PNG would be much better.
DjVu is an interesting format that was primarily designed for storing scanned formats.
It uses a couple of techniques, such as OCR/pseudo-OCR, and multiple embedded images (JPEG/PNG) within the file for rasterable images. The idea is that, say, a scanned magazine page with text and a photographic image is stored as text, a little bit of outline font information, and a JPEG of the photographic image.
May we never see th
More ignorance: "With the recent release of the new operating system Windows XP by Microsoft, Redmap Networks support wishes to advise that existing ManageEzy and ManagePoint will not run on the Microsoft Windows XP platform. We are currently striving towards a solution for Windows XP and this is expected to be completed by the 3rd quarter of 2002."
The company looks understaffed and underskilled. They gave themselves a year, and missed that deadline.
Forget about buying "paperless office" software. This is a dumb idea that only serves to filter money into some unimaginative software company's pocket. If you need to save your corespondence, save it to a directory(folder). Make a rule or standard for filing and naming these documents - hell, in the old days companies would hire 'secrataries' to do this sort of thing - they didnt have to be intelligent or usefull either - just organized. The cost of hiring a secratary wasnt too bad either - still isnt. The problem with these "paperless office" or "document management" systems is that: 1. They are overkill. 2. They are usually proprietary 3. Its one more thing for the average employee to f*ck up. 4. New employees will have to learn this system 5. It costs money 6. If you have a problem with it, you better hope the software provider is capable of fixing it 7. It makes your data less mobile - If you decide to go with another system 10 years down the road, you will have to figure out a way to translate the data from the old system to the new system. My Advice, K.I.S.S. - Keep it simple stupid!
X