How To Manage Hundreds of Thousands of Documents?
ajmcello78 writes "We're a mid-sized aerospace company with over a hundred thousand documents stored out on our Samba servers that also need to be accessed from our satellite offices. We have a VPN set up for the remote sites and use the Samba net use command to map the remote shares. It's becoming quite a mess, sometimes quite slow, and there is really no naming or numbering convention in place for the files and directories. We end up with mixed casing, all uppercase, all lowercase, dashes and ampersands in the file names, and there are literally hundreds of directories to sort through before you can find the document you are looking for. Does anybody know of a good system or method to manage all these documents, and also make them available to our satellite offices?"
Google them? http://www.google.com/enterprise/search/gsa.html
I Need someone to rebuild a Digitech Digital Delay pedal for me....for me...for me...for me.
I know I'm gonna get hit for blurting out the Microsoft Solution but...give SharePoint a shot...
...in bed
If you users are naming their files with strange characters in them (assuming it's not due to Samba) then they will just have to live with it, you won't have time to sort out all the wierd names that (mostly MS-Word) users give to their filenames. The primary objective should be to give your users access to the files. Making the directory listing pretty ought to be a secondary concern.
Absolutely correct. However, I would take it a step further and say that you need a document management system that manages security, meta-data, retention, disposition, etc. Examples are Documentum, IBM FileNet P8, Alfresco, etc. Here's a place to start readin: http://www.cmswire.com/.
Other than this text, there is no discernible information contained in this sig.
http://en.wikipedia.org/wiki/Document_management_system
For that level of documentation you need to have a staff and get it properly indexed. You need a high level librarian. This would be someone with a masters degree at minimum in library science and at least a bachelors in information technology. They will not come cheap and they are a long term investment. The software is available, it is not trivial. Hiring a large number of people to recategorize and tag all the documents for the length of time that takes is also an expense but worth it. Once it's all in place maintaining it gets much easier.
I've seen a system developed for Raytheon. They took all the old compartmentalized data Hughes had and put every scrap of paper through a scanner. It was exceptionally well done. This would display electronic files and would have the location of hard copy. Classified documents were in some cases indexed but were hard copy only afaik. There were some documents that were hard copy only, those were usually ones with an NDA or other restriction on making electronic copies. It had every thing mentioned wrt versioning and such. Documents spanned decades with hundreds of revisions and you could pull up and view any revision. Depending on how recent and what type of document you could view a change log. Older scanned ones did not have that unless they'd been important enough to reenter as modern documents which meant OCR or manually transcribed. Some schematics were reentered into the system in a modern format. The effort was worth it. Having that data is the only way some devices or parts could be made or repaired.
http://en.wikipedia.org/wiki/Document_management_system
I'd go on a Vegan diet but the delivery time from Vega is too long. --brownkitty
It's becoming quite a mess, sometimes quite slow, and there is really no naming or numbering convention in place for the files and directories. We end up with mixed casing, all uppercase, all lowercase, dashes and ampersands in the file names, and there are literally hundreds of directories to sort through before you can find the document you are looking for.
Slow. Upgrade your network and VPN. You know that VPN layer is just killing your performance.
No naming or numbering convention. Get one.
Mixed casing. Learn How to Properly Case Folders (and documents).
Dashes and ampersands. Are they a problem? Aesthetically unpleasant? I personally restrict punctuation in a filesystem to dashes, periods, and parenthesis (unless the punctuation is a replicable part of the name of the file/folder).
Examples:
01 - The First Track (vocal)
02 - $lashhvertisements Attack!
03 - Where Have All the A.C.'s Gone
Develop your own method that works and be obsessed about it to the point where you would reburn a disc if one of the filenames was "01-Name" instead of "01 - Name".
Hundreds of directories.
Each file should have it's own folder.
"That's insane!" you say. Start out with this mentality. If there is no reason at all to separate two files (they are part of the same thing) then place them in one folder, and make sure the folder is named all-encompasingly. Repeat for all files. If you get into a AB, BC, but not ABC situation, the solution is to have A and B and C, with A and C linking to B with your choice of shortcut/link/symlink/etc.
Do this until all files are in folders. Then repeat with folders.
There is NO substitute for organization and getting people on the same page. Develop some conventions. Task people to fix as they go. Check up to make sure people accessing documents are fixing as they go, and doing so according to convention. Once people are used to the convention, and once things are relatively organized, they won't ever need to search again. They'll instantly know where 99% of things are, and will be able to dig around and find anything else within seconds.
The main problem you face is getting organized after already being unorganized. It isn't easy, but at least you're not dealing with millions of paper documents.