How To Manage Hundreds of Thousands of Documents?

Google wave by Anonymous Coward · 2009-06-10 09:28 · Score: 1, Funny

I think it's in beta though.

Re:Google wave by Gerzel · 2009-06-10 09:51 · Score: 2, Insightful

Or better yet talk to people who've done it before. I mean seriously there have been organizations managing hundreds of thousands of documents since the Roman Era, its nothing new.
Re:Google wave by EQ · 2009-06-10 10:29 · Score: 4, Funny

"[O]rganizations managing hundreds of thousands of documents since the Roman Era,"
You mean The Vatican? I doubt that "small aerospace company" could afford to staff up on monks and monasteries.

--
Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo! http://goo.gl/J9bkO
Re:Google wave by theNetImp · 2009-06-10 10:36 · Score: 5, Funny

monks work for free, they just need food and enlightenment, and if you get lucky they fast and then only need the enlightenment aspect.
Re:Google wave by Anarchduke · 2009-06-10 12:51 · Score: 5, Insightful

There is a whole profession dedicated to this, and there is a major in college specifically designed to assist in organizing documents into meaningful collections.

I suggest your company look at hiring a library sciences major, since this is what they do.

--
who prays for Satan? Who in 18 centuries has had the humanity to pray for the 1 sinner that needed it most? ~Mark Twain
Re:Google wave by Hooya · 2009-06-10 17:05 · Score: 1

> ... food and enlightenment
you mean this? I didn't know monks were that picky about the wm.
Re:Google wave by iluvcapra · 2009-06-10 17:05 · Score: 1

And if you get lucky and the monks are still living in the dark ages, you only need the food aspect.

--
Don't blame me, I voted for Baltar.
Re:Google wave by makapuf · 2009-06-10 19:28 · Score: 1

Well, for e17 you certainly need enlightenment to achieve eternal life ...

Google to the rescue? by Shatrat · 2009-06-10 09:29 · Score: 4, Insightful

Isn't this the sort of thing that a google search appliance would be helpful for? Then you don't need to know the exact filename, just some specific information that can identify the file. This certainly solved my problem with having thousands of emails.

--
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0

Re:Google to the rescue? by mikael · 2009-06-10 10:21 · Score: 2, Funny

Just put everything up on a P2P server - then everyone can look for the documents at the same time as they are looking for their favourite Linux distro.

--
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
Re:Google to the rescue? by gwait · 2009-06-10 10:27 · Score: 2, Insightful

I agree - although you might want to eventually implement a systematic method of naming/storing your documents.
The google appliance (or some other reasonably fast "WAN" search tool) would let you find files in the current rats nest "as is", making it easier to organize them to the new "standard".

--
Bavarian Purity Law of Rice Krispie Squares: Rice Krispies, Marshmallows, Butter, Vanilla.
Re:Google to the rescue? by liquidsin · 2009-06-10 11:04 · Score: 5, Insightful

use your users, if you can. i'm just talking out my ass here, but i'd think it a not-too-difficult matter to add some sort of user input form along the lines of "hey, now that you've found the document you need, does the name fit the new naming scheme? if not, why not rename it so it fits!". this is assuming you can trust your userbase not to be asshats and to be able to follow the naming protocol.

--
do not read this line twice.
Re:Google to the rescue? by CozmicCharlie · 2009-06-10 11:12 · Score: 4, Insightful

Now I actually LOL'd on that one! Getting our userbase to actually give a flying fart about a naming protocol and then getting them to follow it!? I won't be holding my breath for either of those two things to happen...
Re:Google to the rescue? by Anonymous Coward · 2009-06-10 12:14 · Score: 2, Informative

Wow - Slashdot users must all be on their meds this week. Judging by the number of responses to that say to buy a google appliance, I judge the paranoia level to be closer to blue than red. Where the hell is the open source insight?
I guess it's right here from good old Anonymous - ever hear of SOLR http://lucene.apache.org/solr/ ? It's free, it's opensource and even if you hire a consulting company to set up an index of everything you have, you'll pay pennies on the dollar compared to a google appliance! Plus - your soul will remain intact!
Just google for SOLR consultants and you'll find them, no problem :-)
Re:Google to the rescue? by shri · 2009-06-10 13:43 · Score: 3, Informative

May I also suggest Yahoo/IBM's OmniFind as a free as beer alternative?
Re:Google to the rescue? by vrmlguy · 2009-06-10 13:54 · Score: 4, Interesting

Now I actually LOL'd on that one!
Getting our userbase to actually give a flying fart about a naming protocol and then getting them to follow it!?
I won't be holding my breath for either of those two things to happen...
You obviously don't know how to motivate people. Tell your boss you can get everything renamed for $100/week. Then post a leader board showing who has renamed the most documents each week, and give each week's winner a gift certificate to a local restaurant. Don't let anyone win more than once a month, to prevent too much disruption of normal job duties, and set up some sort of meta-moderation to prevent gaming the system. (You could probably use slashcode out-of the-box, just make each document a story and suggest better names in the comments.)

--
Nothing for 6-digit uids?
Re:Google to the rescue? by SlashWombat · 2009-06-10 17:14 · Score: 2, Interesting

I agree - although you might want to eventually implement a systematic method of naming/storing your documents.
While this seems like a good idea on the surface, it never seems to work very well. Even verbose file names seem to fail miserably, as the first 100 or so letters are always the same (IE:"Project Tiger Sausage rocket module assembly - Ion injector harware part 1 ...>

Then there is the problem of getting all the employees to fully understand directory structures. Just look at your workmates screens to see how many people save everything on their desktop. (Yes, really a windows problem ... but so what.)

I used to get a local WAN search engine, and let it index the entire site. Much more useful as it would find documents most people thought had disappeared years ago.

Another approach would be to have a database that assigned file names for various projects and/or functions and mandate that this be the only way files are named for storage on the WAN. This, however, does not get around the thousands of files already stored in weird places using weird names! (Which is why an already indexed search engine works so well ... not only does it extract the file names, but also search on random (but significant) phrases are picked up within the scanned documents. (I used to use "MAMMA", it worked a treat!) http://www.mamma.com/
Re:Google to the rescue? by Orlando · 2009-06-10 18:52 · Score: 1

Tell your boss you can get everything renamed for $100/week
Great, there's always a capitalistic solution to every problem.

--
-= This is a self-referential sig =-
Re:Google to the rescue? by jetole · 2009-06-10 19:59 · Score: 1

as a free as beer alternative?
Depends what you mean by free beer alternative.
"Software supports up to 500,000 documents" -Omnifind.

Well it had me sold till I read that. I'm not looking for this software right now although I thought it's good to know if our firm ends up needing it but I don't want to get software for free and then have to pay to upgrade it to capable later. Just not my style. OTOH solr from apache looks worth checking out further.
Re:Google to the rescue? by polar+red · 2009-06-10 21:57 · Score: 1

Capitalize the first letter of every word in a name.
why? caps don't add value.

--
Yes, I'm left. You have a problem with that?
Re:Google to the rescue? by BlackPignouf · 2009-06-10 22:41 · Score: 1

I might come a bit late to the party, but I'd like to say that I developed a free alternative to Google Search Appliance :
http://github.com/EricDuminil/picolena/tree/master
It's a small Ruby on Rails app (~1kLOC), uses either Ferret or Sphinx and implements full text search for .pdf, .doc, .docx, .odt, .xls, .ods, .ppt, .pptx, .odp, .rtf, .html, and metadata from music, pictures and videos.
It also includes language recognition, files thumbnailing and cache à la google.
We use it in our research center to index ~100 000 documents from 50 users on a Samba share, and we get relevant results in ~0.1s
Users don't need to learn any convention, they find what they want fast, and can use it as easily as Google.
If you're interested, drop me an email from Github.
Re:Google to the rescue? by Phreakiture · 2009-06-11 00:03 · Score: 1

I am not so sure. My employer has a Google appliance, and it has never been able to find relevant content for me on the company Intranet. It isn't that the content isn't there, but there is so much boilerplate language in place that, quite often, there are a glut of documents that contain my search terms. Your mileage, of course, may vary.
I think, though, that what may be needed her is a process, not a product. It will be long and painful, but your best bet, always, is to put a small group of humans authoratatively in charge of the documents. They can use technology to help them (such as the aforementioned Google appliance, Bayes categorizers, etc), but the ultimate decision needs to be a human one.

--
www.wavefront-av.com
Re:Google to the rescue? by Geotopia · 2009-06-11 02:58 · Score: 1

I like the way this guy thinks. I recommend, however, that he rename his post "Motivating the Userbase" and this particular thread "Involve the Userbase". Later, I'm going to come back and rename my article, and start a new thread for which I'll recommend another name. That will give me 5 or 6 points towards a Chili's GC, right?
Re:Google to the rescue? by stelling · 2009-06-11 03:19 · Score: 1

Google search appliance could be the way to go on a small scale, and most likely temporary, solution.
The real questions are:
How valuable are your documents ?
How much money do they generate ?
What is the cost for not being able to locate a document ?
What kind of processes are these documents used in ?
How distributed is your user base ?
How close to the your business core are these documents.
You certainly need a document management solution (or ECM), which one will depend basically on the answers to the previous questions.
Re:Google to the rescue? by plague3106 · 2009-06-11 03:57 · Score: 1

Humans are motivated only through self interest. Why is it suprising this solution is proposed, when it has a good chance of working?
Re:Google to the rescue? by plague3106 · 2009-06-11 03:59 · Score: 1

Perhaps OSS doesn't have a good solution? While everyone bitches about Windows indexing, I'm not even sure I know the equivolent in the Linux world.
And whats wrong with paying money for a solution that works and can be implemented very quickly?
Re:Google to the rescue? by Estanislao+Mart�nez · 2009-06-11 04:38 · Score: 1

Isn't this the sort of thing that a google search appliance would be helpful for?
Yes, but don't make the mistake of thinking that just because Google are the leading web search engine, they must also be the leading document search solution. Google's web search relies heavily on links between HTML documents to assess their relative importance. In an office with a lot of plain old documents, there will be no links.

--
Are you adequate?
Re:Google to the rescue? by Anonymous Coward · 2009-06-11 10:09 · Score: 1, Insightful

Now I actually LOL'd on that one!
Getting our userbase to actually give a flying fart about a naming protocol and then getting them to follow it!?
I won't be holding my breath for either of those two things to happen...
You obviously don't know how to motivate people. Tell your boss you can get everything renamed for $100/week. Then post a leader board showing who has renamed the most documents each week, and give each week's winner a gift certificate to a local restaurant. Don't let anyone win more than once a month, to prevent too much disruption of normal job duties, and set up some sort of meta-moderation to prevent gaming the system. (You could probably use slashcode out-of the-box, just make each document a story and suggest better names in the comments.)
Laughing my ass off, here...
What do you think an enterprise working environment is? College dorm?
"Document renamer of the week"?
You either are still in high school or in Dilbert like upper management to think that this has any remote chance of working (or makes any sense).

Hummingbird Document management by Anonymous Coward · 2009-06-10 09:30 · Score: 1, Informative

http://en.wikipedia.org/wiki/Hummingbird_Ltd

and

http://connectivity.hummingbird.com/home/connectivity.html?cks=y

Re:Hummingbird Document management by HikingStick · 2009-06-10 09:34 · Score: 3, Insightful

If they're going to consider Hummingbird, they need to be ready to cough up the dollars to get an *EXPERIENCED* Hummingbird administrator. If not, the product will be set up, but basic search functionality will be hosed because of some of the same issues in the original problem description (arising from differences in how the document's properties sheets are populated). If done well, it can be fantastic. If not, it users will hate it and do everything possible to avoid it (including installing their own NAS devices).

--
I use irony whenever I can, but my shirts are still wrinkled...
Re:Hummingbird Document management by kiwimate · 2009-06-10 09:43 · Score: 3, Informative

Yes, but it's not that hard to find someone. But Hummingbird (now owned by Open Text) or any other Document Management System. You've got a bunch of documents. You need to manage them. Ergo, a document management system.
Parent makes an excellent point, however: the single most critical component of a successful implementation is to get a skilled* consultant who can work with you to properly define the taxonomy. Everything else flows from there.
* If you go with Hummingbird DM, "skilled" means "not one of their over priced professional services people". They're dreadful.
Re:Hummingbird Document management by pkluss · 2009-06-10 09:44 · Score: 1

HikingStick is exactly right. We used Hummingbird for a while and it got out of hand and then it was every bit as bad as what you're experiencing now but we paid an arm and a leg for it (so it felt much worse). It's a decent product, but we ended up with migrating to SharePoint since it fit our needs.
Re:Hummingbird Document management by dimeglio · 2009-06-10 10:52 · Score: 3, Insightful

Skilled consultants are great but without training employees you'll keep on paying big $ for consultants whenever there's a change to make. Let the consultant show how and let the employees do the work. BTW: We have 3000+ users (all happy) on their system and no consultant.

--
Views expressed do not necessarily reflect those of the author.
Re:Hummingbird Document management by anexkahn · 2009-06-10 12:14 · Score: 1

I completely agree, we pay a third party to come in and help with all our Hummingbird issues that we can't solve in house. Any time we have a major upgrade or change we have them come out as well. I have had many bad experiences with Hummingbird Support....I wouldn't want to see what Opentext's (The people that sell hummingbird) professional services people are like.

--
Curious about Storage and Virtualization? Check out
Re:Hummingbird Document management by HikingStick · 2009-06-11 00:43 · Score: 1

You're on the key issue, but I'll take a different tack: not only do users need training, but user requirements (sometimes, extensive amounts of user requirements) need to be gathered before impementing a solution (and this goes way beyond DM systems). If time is spent with the users before the DM system, the project team can be aware of how things currently are done. This means they might need to understand the naming conventions being used by multiple business units or many admin staff, and that is only one example. The goal, then, is to sit down with a representative user group--a group that represents all stakeholders (from the end users to management, IT, information security, legal, and audit)--and review the gathered requirements. In places where there seem to be conflicting requirements, all stakeholders need to come together and agree on a common set of requirements. From there, they need to go back and start prepping their own groups if those requirement result in changes from their current practices.

If that's done (a major component of project management that is the that seems to be shortchanged all too often), you'll find yourself deploying a system in which the users have some sense of ownership, and which is less likely to run into significant resistance based on old arguments like "but we don't do it that way" or "the system just doesn't meet our needs."

--
I use irony whenever I can, but my shirts are still wrinkled...

Google Appliance by TornCityVenz · 2009-06-10 09:31 · Score: 4, Informative

Google them? http://www.google.com/enterprise/search/gsa.html

--
I Need someone to rebuild a Digitech Digital Delay pedal for me....for me...for me...for me.

Re:Google Appliance by VTBlue · 2009-06-10 12:44 · Score: 1, Informative

Google them? http://www.google.com/enterprise/search/gsa.html

Try Search Server 2008 Express from Microsoft. Although it has no hard limits, it can index upto a 1 million documents before you have to scale out. Best of all it is free!
If you need high availability, redundancy, fail-over or more document support, look at the standard version of the product or consider SharePoint 2007/2010 or FAST.
http://www.microsoft.com/enterprisesearch/en/us/search-server-express.aspx#none
msg me, if you have questions, I work at Microsoft.
Re:Google Appliance by scooterhanson · 2009-06-10 13:43 · Score: 1

A search appliance would be great, but there's not really a lot of structure on top of an index unless coupled with some other sort of knowledge-management infrastructure.
http://www.yakabod.com/ is a company that I've heard about that has been doing this kind of thing for the US intelligence agencies for a while -- coupling a search appliance with taxonomy / folksonomy and some other kinds of voodoo. I've heard these guys refer to it as a "knowledge network" in the sense that a social networking app keeps you aware of what your friends and colleagues are doing, but the knowledge networking app keeps you aware of what your whole business is doing.
There's always Sharepoint and Documentum type solutions, but trust me, brother, I've been down those roads before and I don't wish them on my enemies.
Re:Google Appliance by spyrochaete · 2009-06-10 23:35 · Score: 1
Enterprise Search Server is a really nifty app based on the excellent MOSS search functionality, but in my tests it doesn't hold a candle to the Google Search Appliance. To scratch the surface...
- the GSA will index the first 2.5MB of text in a document while SSX only indexes 256KB.
- SSX isn't exactly free because you need hardware plus a Windows Server license.
- implementations of SSX larger than 1 million documents are very complex, sometimes requiring multiple servers for query, index, and crawling, whereas the GSA supports 10 million documents in a single 2U server.
- the relevance of SERPs and snippets is just superior on GSA (my subjective opinion)
If you've got a spare Windows server laying around then ESS is a terrific way to put it to use, but if you're fleshing out an enterprise search solution from the ground up I would recommend GSA in almost any scenario.
Re:Google Appliance by spyrochaete · 2009-06-11 01:39 · Score: 1

I think a search solution is exactly what you want if you don't have a solid structure in place. Give people multifaceted navigation and let them choose whether they want to browse a hierarchical structure, or perform a search to try to cut to the chase, or even both so that users can choose a specific category or directory to narrow down the search corpus.
Re:Google Appliance by scooterhanson · 2009-06-12 07:20 · Score: 1

Yeah, but you still run into the same old story of issues with discovery -- How do you know something's been added inside the file cabinet if you're looking at the closed drawer?
A knowledge sharing app sitting on top of a search appliance would show activity and interaction across content.
While people may just want to be able to find stuff in their great big pile of content, they'll still run into obstacles with only a search appliance. It's the connections between pieces of content that really count for understanding that pile (e.g. me as a user and all of my content, related to the document you were looking for that I published, related to a new document that you need but never knew you needed, etc.).

How not to do it by Daimanta · 2009-06-10 09:33 · Score: 3, Funny

Store it on a single FAT32 partition and hope for the best. Only meant for people with guts or really really nice bosses.

--
Knowledge is power. Knowledge shared is power lost.

Re:How not to do it by CarpetShark · 2009-06-10 09:43 · Score: 4, Funny

Pfft. This is a serious job. 320k floppies are what you want.
Or... you know... you could try managing those documents with a document management system.
Re:How not to do it by selven · 2009-06-10 12:08 · Score: 4, Funny

Two of those should be enough for everyone!
Re:How not to do it by mysidia · 2009-06-10 13:15 · Score: 1

Should use punch cards.
Floppies have this problem that the magnetism degrades over time, also, when placed on top of a CRT monitor for long periods of time, the data just seems to disappear.

Answered your own question by Sir_Lewk · 2009-06-10 09:33 · Score: 5, Insightful

and there is really no naming or numbering convention in place for the files and directories.

I think you already know the answer.

--
"linux is just DOS with a UNIX like syntax" -- Galactic Dominator (944134)

Re:Answered your own question by peektwice · 2009-06-10 09:53 · Score: 4, Informative

Absolutely correct. However, I would take it a step further and say that you need a document management system that manages security, meta-data, retention, disposition, etc. Examples are Documentum, IBM FileNet P8, Alfresco, etc. Here's a place to start readin: http://www.cmswire.com/.

--
Other than this text, there is no discernible information contained in this sig.
Re:Answered your own question by hedwards · 2009-06-10 09:58 · Score: 1

In all honesty, I tend to agree with what you're implying. A database solution is great, if you put it into place immediately, otherwise you have to spend a lot of time getting all of the items into the database and properly tagged and sorted.

One way or another the work is going to have to be done, the relevant question is how easily will it be maintained, how will it handled increases in size and how easily can it be backed up.

I'm doing this sort of thing right now with my digital images. Thankfully, I can fall back on meta data to do most of the heavy lifting, which just leaves the process of creating subjective tags for pulling up random files and figuring out a decent backup system. I've been doing it all this week and haven't found a proper solution. Which is really a minimal hassle compared to what the OP is dealing with finding the files and reading them and putting them into some reasonable category, presumably many were created by employees no longer at the company.

To boil it all down a bit, make absolutely sure you've got all the tags you're going to want in, a file hierarchy of some sort for storing the physical files, and the thumb screws for anybody that's not willing to do their part. A system doesn't stay neat and organized on it's own, just because it's residing on some sort of database doesn't mean it's automatically easy to find things. Best bet for files is to organize those by roughly date, depending upon how many, that may require by day, week, month or year to keep them in a reasonable place to find.

Take it relatively slow demand that any new files be created within the realm of the new system and make regular effort at putting the older files into the new system in a consistent manner.
Re:Answered your own question by nine-times · 2009-06-10 10:05 · Score: 2, Insightful

Yeah, some people mentioned Google appliances, which I suppose is a sort-of solution. I've never used one of those internally, but I wouldn't trust that to be the end-all solution to your organizational problems. What if there's a file that Google can't read or gather good metadata for? What if you're searching for common terms, and the file you're looking for is on the 75th page? What if you're not remembering the correct search parameters and so your file just isn't turning up in your searches?
There's really no substitute yet for real organization and discipline. The first thing you should do is define your needs/parameters. Does everyone from every site need read access to all files? Do they all need write access? Most likely, the answer to both of these questions is "no", so narrow it down to specifically "who needs access to what". That will help you figure out the rest of these things. Also ask, who needs to be able to find which documents under which circumstances? What information will they have? You're going to want to use those pieces of information in your organization so that people can intuitively find the files that they need, without necessarily needing to see everyone else's files.
Come up with a hierarchical organization for your files, requesting user input if appropriate. Then create a directory structure that matches it. Make sure you've communicated the organization clearly to your users, and try to get them to use it.
If necessary, use directory permissions to try to restrict writing files to appropriate places. For example, if you break down the file structure by particular engineering groups or departments, then only provide write access to members of that group or department. Designate the head of that department as the person responsible for organization within that folder. If need be, restrict write access in a particular folder to only one person, and make that person responsible for checking files in and maintaining the organization for the group or department. Do the same sort of control with individual satellite sites, if appropriate.
Be a little tiny bit of a control freak, but you might want to give people a particular folder share where they can transfer files in a more freeform manner in a pinch. Someone might want to share one particular file, back something up for a minute, or whatever, but make it clear that this share is completely insecure and temporary. Let people know that everyone has access to that share, anyone can delete any file, you won't be backing it up, and in fact you might be clearing it out (deleting it) on a regular basis. Make a habit of deleting it all on a regular basis, or people will start dumping everything there to sidestep the organization. To be careful, you might want to actually move everything into a non-shared folder for a week, and then deleting it later, so if someone shows up and says, "Oh crap! You deleted business-critical information!" you can sigh, and say, "I'll see what I can do, but you really shouldn't store business-critical data there."
So, to go back and summarize: Come up with an organization, stick to it, enforce it, and retrain your users to use it properly.
Re:Answered your own question by Bill+Dimm · 2009-06-10 10:44 · Score: 1

otherwise you have to spend a lot of time getting all of the items into the database and properly tagged and sorted
Document clustering software can make that less painful by giving an overview of what you have (possibly hierarchical), and allowing you to categorize dozens (or even thousands) of related documents with a single mouse click. Blatant plug: Clustify.
Re:Answered your own question by dimeglio · 2009-06-10 10:55 · Score: 1

You can look also at OpenDMS. It's not very active lately but might have a good core that you can expand on.

--
Views expressed do not necessarily reflect those of the author.
Re:Answered your own question by CorporateSuit · 2009-06-10 10:59 · Score: 5, Funny

No kidding, men are practically born with this instinct.

The most basic is dividing the images up according to hair color or the number of girls appearing in each photo. Then you usually divide them up between hardcore and softcore, type of performance, fetish, etc. For your favorites, you can keep a folder in the home directory, of course. I know this guy works for an aerospace company, but keeping track of 500,000+ files isn't rocket science! We've all been able to do that since the advent of the 200GB harddrive.

--
I am the richest astronaut ever to win the superbowl.
Re:Answered your own question by SydShamino · 2009-06-10 11:06 · Score: 1

My wife is a fan of Livelink, which she implemented for document and workflow management at her last job.

--
It doesn't hurt to be nice.
Re:Answered your own question by BillAtHRST · 2009-06-10 11:23 · Score: 1

Depends on whether you want to "manage" the docs, or just be able to find them. The google thing looks promising (http://ask.slashdot.org/comments.pl?sid=1264509&cid=28285635), and is probably a LOT cheaper to get going with. Then, if you think you need more you could look at some of the more heavyweight solutions.
"Perfect is the enemy of the good" -- Voltaire
Re:Answered your own question by oldspewey · 2009-06-10 12:36 · Score: 2, Informative

LMAO ... Documentum is almost dead ... that's why they released V6.5 several months back and are on track to release V7 in H1'10.
Why don't you tell us all how your "competitor" company scales to tens of millions of documents with high availablity, disaster recovery, and content caching across 5 continents?

--
If libertarians are so opposed to effective government, why don't they all move to Somalia?
Re:Answered your own question by mysidia · 2009-06-10 13:23 · Score: 1

Put the temporary area on a "RAMDISK" and schedule a nightly reboot.
Re:Answered your own question by Anonymous Coward · 2009-06-10 17:22 · Score: 1, Interesting

Why don't you tell us why EMC feels the need to spew out an entirely new platform every 6-12 months. Oh, that's right, cause their product is so wonderful.
I used to be a Documentum developer. I've worked in everything from Workspace/Smartspace, through Smartspace Intranet, and into their Webtop/WDK platform.
IMHO, their product is crap. And expensive crap at that. It solves problems very poorly without extensive customization, and customization is a painful exercise in learning their horribly written and poorly documented development "framework".
Even worse, once you have things working the way you want, you can almost be guaranteed that your customizations won't work in the next release...which is probably going to happen in the next 2-3 quarters.
Re:Answered your own question by Phil06 · 2009-06-10 17:41 · Score: 1

What you need to do is get the right amount of meta-data. Everyone is going to be pushing you to add all kinds of fields for everything, you need to push for less. The metric should be that 80% of the information should be one or two clicks away, the next 15% should be 3-5 clicks away, let the last 5% go. If you try to make everything one click away you are going to fail.

--
"...and yet, I blame society" Duke - Repo Man
Re:Answered your own question by omglolbah · 2009-06-10 18:16 · Score: 1

While Documentum can be a royal pain in the ass at times I would hate to do my job without it...
Just keeping the hundred or so documents for my current project organized AND revisioned properly would be a major undertaking if it was all stored on a samba share...
Get something started already, or you'll end up in the crapper sooner or later. Revision/Version control of documents is quite useful in case someone screws up. It also allows locking of documents etc which can be useful in a myriad of situtations.
Re:Answered your own question by Sobrique · 2009-06-10 22:22 · Score: 1

find /scratch -mtime +14 -exec rm {} \;

seems to work quite nice. Also means it hits the backup, in case of muppetry.
Re:Answered your own question by Elbowgeek · 2009-06-11 02:29 · Score: 1

Well my confidence was a tad shaken when the documentum.com listing that google finds when you do a search comes up as a parked domain.
But it was completely shattered when clicking the link on ECM's site which promises me more information on their Collaboration and Document Management led to a Page not Found error. If they can't find their own information, how in the hell can they lay claim to being able to manage *mine*?

--
Who is this delectable creature with an insatiable love of the dead?

Re:Organize the files by Anonymous Coward · 2009-06-10 09:34 · Score: 1, Funny

I don't think this is one of those times, tough.

it's all about the index by Hognoxious · 2009-06-10 09:34 · Score: 2, Informative

The lack of a naming convention for the filenames and directories is neither here nor there. What matters is how well it's indexed.

Now I use naming conventions for my files (photos ,mp3s etc). Am i contradicting myself? No, it's because I don't have enough of them that I need a separate index.

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."

Re:it's all about the index by jd · 2009-06-10 11:22 · Score: 4, Interesting

Very true. I'd take a look at DSpace or Open Library for examples of software designed to handle gigantic numbers of documents and maintain sensible indexes for them.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)

OpenDocMan by loVolt · 2009-06-10 09:34 · Score: 1

OpenDocMan has helped a lot with our Graphics and Engineering department issues, similar to yours,
ldap access to storage helped sort out who could put what ..where. The implementation took a bit of
time to get the original files files into right locations, but it's easyer to manage now.

--
Darwin Enforcement Agent

Google Search Appliance by Swampash · 2009-06-10 09:35 · Score: 2, Informative

http://www.google.com/enterprise/gsa/

Alfresco or SharePoint by flydpnkrtn · 2009-06-10 09:35 · Score: 3, Insightful

Or some other corporate content management system

--
Here's to the crazy ones

Re:Alfresco or SharePoint by flydpnkrtn · 2009-06-10 10:01 · Score: 3, Interesting

...and I found an article backing up Alfresco pretty well:

"You can now stand up an Alfresco Labs server next to a SharePoint Server, and Office will not be able to tell the difference between the two," said John Newton, CTO of Alfresco. "But we are offering considerably more scale than SharePoint can deliver," he said.

--
Here's to the crazy ones
Re:Alfresco or SharePoint by Kadin2048 · 2009-06-10 10:23 · Score: 3, Informative

I have a personal bias, but I think IBM's FileNet would solve this quite neatly. I've done implementations of it that are pretty much exactly what the OP describes.
Customer has a share that's gotten totally out of control, just stuffed full of files. They want to make them available across multiple offices, generally without getting into complex VPN crap, and also want to simplify management, add more security / compartmentalization, or integrate it with corporate SSI. All doable. Runs on your choice of platforms, too. (Linux, Unix/AIX, Windows all OK as servers.)
There are even tools that basically take a share drive and walk the directory structure, importing documents at extremely high volume and using the folder structure to categorize and tag the documents within FileNet. It's quite slick and can either be used as a one-shot migration from a traditional fileserver to FileNet, or as an ongoing thing (take all files in a particular directory or set of directories and commit them).
Once you have the documents into FileNet you can access them over a web interface or via various desktop clients, and there is a nice API for integrating it with custom in-house applications if that's a requirement. Also, IBM makes some add-ons for Word and Excel (and maybe PowerPoint) that allow you to work directly with items stored in a FileNet repository. Plus, if down the road you want to get into "workflow" (basically building your document management system around your business process), that can be easily bolted on.
Email is in profile if you want specific case studies or whitepapers, or if you want me to put you in touch with people who do these sorts of things regularly.

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Re:Alfresco or SharePoint by afidel · 2009-06-10 11:26 · Score: 2, Informative

I'd suggest Livelink by OpenText. I know the Airforce uses it since our Livelink guy worked on their systems before coming to work for us, they obviously work with large volumes of aerospace related documents! =) That probably means OpenText can find consultants who have already designed and worked with an aerospace taxonomy.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:Alfresco or SharePoint by glwtta · 2009-06-11 03:39 · Score: 1

I wouldn't call it backing up when the comment came from the CTO of Alfresco.

I wouldn't call "indistinguishable from SharePoint" an endorsement.

--
sic transit gloria mundi
Re:Alfresco or SharePoint by flydpnkrtn · 2009-06-11 04:16 · Score: 1

First parent comment makes a good point - seeing a review from Alfresco would be a better source, rather than listening to the car salesman tell you how great the used Fiat is
To the reply to his comment - When folks can point Office directly at Alfresco and Office can't really tell that it's really talking to an Alfresco server instead of a SharePoint server, I'd say that's pretty significant. He wasn't saying that Alfresco is a SharePoint clone, he's saying it provides equivalent features. The reality is a _lot_ of corporate offices run Microsoft Office, and being able to provide a backend to Office that's just drop in is a good idea, especially from a user training perspective.

--
Here's to the crazy ones

Start with.... by s0litaire · 2009-06-10 09:35 · Score: 1

...Setting up a standard naming convention and make sure bosses and managers enforce it. It won't help older files but will stop it getting worse!

Then if you can be bothered, you can start going through older files and updating the naming conventions or entering them into the Document management system of you choice...

--
Laters Sol "Have you found the secrets of the universe? Asked Zebade "I'm sure I left them here somewhere"

Re:Start with.... by LandDolphin · 2009-06-10 09:48 · Score: 1

Hire a temp employee for $10/hr to go through and rename everything, or do any other clerical grunt work.

--
Spelling and Grammar errors have been added to this post for your enjoyment
Re:Start with.... by hedwards · 2009-06-10 10:00 · Score: 1

Only problem is that this is an aerospace company, they might get lucky finding somebody that's capable and willing to work for peanuts, but I wouldn't count on it. Realistically they may require somebody with technical know how of what the files actually are in order to properly categorize them. A temp might be able to handle reformatting the file names based upon information in the name, but probably not much more than that.
Re:Start with.... by LandDolphin · 2009-06-10 10:16 · Score: 1

Some training and the temp should be able to recognize different file types and where/how to classify them. The temp doesn't have to understand the information, just has to know that "Hey this looks like X, I was told X goes here" or "Oh, it says Y in spot B in the file, it must go to Place Z". The person with technical know how is going ot have to look in th efile for a clue as to where it should be filed; they can impart that small bit of information to a temp without having to teach taech them everything..

--
Spelling and Grammar errors have been added to this post for your enjoyment
Re:Start with.... by s0litaire · 2009-06-10 10:31 · Score: 1

Well they could then use the temp as a scape goat when planes start falling out of the skies...

--
Laters Sol "Have you found the secrets of the universe? Asked Zebade "I'm sure I left them here somewhere"
Re:Start with.... by dotgain · 2009-06-10 11:19 · Score: 1

No matter how prudent and methodic you are in your filestore-sorting exercise, nothing will stop somebody getting up in arms about something moving or changing. One of the reasons I left my last job was a filestore that was already out of control when I took it on was proving insurmountable, at least with no support from management.
Think medium sized company (considering the country, NZ), acquired at least five other businesses in the last two years, and effectively just chucking the new fileservers on the LAN along with all the others, spilling over into outsourced datacenters when running out of space & aircon in the original server room. Stacks of MS Access apps using hardcoded UNC paths dictating names of servers, etc. Commodity PCs with JBODs tacked on when space and money got tight. ntbackup.exe. Ugghh.
Re:Start with.... by LandDolphin · 2009-06-10 19:07 · Score: 1

The thought makes me a little sick. I wouldn't even want the hassle of reorganizing that.

--
Spelling and Grammar errors have been added to this post for your enjoyment
Re:Start with.... by Javaman59 · 2009-06-10 21:30 · Score: 1

a standard naming convention and make sure bosses and managers enforce it.

Won't work. Never has, never will. People won't comply. Bosses won't enforce. Some will make a good faith effort for a while. More will make a good faith effort, but get it wrong. Some will ignore it. Threatening memos will be issued from managers. Then it will emerge that one of the memos came from a manager who doesn't use the conventions himself (because he's "too busy"). The people who invested (wasted) time in understanding the system, and using it, will see that the they're efforts are futile because of the amount of non-compliance, and give up. Then the company will be left with a minor portion of the files in this system. 3 years later people will wonder "what the hell" these bizaar files are, along with the 17 other naming conventions they see around (and peoples who's names are on those files will look silly), and then someone will say "we need a standard naming convention and make sure bosses and managers enforce it."
I'm reminded of one of Joel's chestnuts - Whenever you have two incompatible systems, and introduce a third system to unify them, all you end up with is three incomptible systems. (or words to that effect).

--
I'm a software visionary. I don't code.
Re:Start with.... by Sobrique · 2009-06-10 22:27 · Score: 1

Standard naming conventions are ugly. Just use a proper directory structure instead. At least then you know where you need to start looking for something you want.
Although I'm seriously wondering how long it'll be before we see the 'next generation' of filesystems, that stop actually treating files and directories as the 'basic' object, and treat everything as a document - essentially 'forcing' categorization of them. Doesn't work so well in programspace, but would work fine for almost everything that _should_ be on a user/network share.
Re:Start with.... by dotgain · 2009-06-11 08:07 · Score: 1

It's not a difficult job if you're not rushed, I quite enjoy it when it goes well. Take your time, take stock of what you've got, and make a plan. Take the time to find fault with your plan, and don't be rushed into implementing it if you're not sure. It's when you ARE rushed to do it, when you're NOT supported by your higher-ups when you need to be, that it becomes a nightmare.
Hence resignation. Not long after my co-admin handed his in too. Between us we had 15 years IT experience with the company.
The remaining IT department has less than two years collectively, and includes six people. They're FUCKED now.

Use a cataloging system by vondo · 2009-06-10 09:35 · Score: 4, Interesting

I happen to have written one:

http://sourceforge.net/projects/docdb-v/

could be what you are looking for. Of course, it'll take effort to catalog the documents.

SharePoint? by tekiegreg · 2009-06-10 09:36 · Score: 4, Informative

I know I'm gonna get hit for blurting out the Microsoft Solution but...give SharePoint a shot...

--
...in bed

Re:SharePoint? by goffster · 2009-06-10 09:39 · Score: 4, Insightful

Why should you give sharepoint a chance? Even it it works well, it is proprietary and you are locked in.
Re:SharePoint? by EnhancedPanda · 2009-06-10 09:51 · Score: 1

I am going to have to second the Sharepoint suggestion, we have been using it for 2 years now to do exactly what you need. But I would recommend investing in SANS, no more vpn.
Re:SharePoint? by moosesocks · 2009-06-10 10:08 · Score: 2, Informative

Mod parent up. I helped create a tag-based document retrieval system for my former employer using SharePoint. It actually worked quite well.
Use the right tool for the job. It's got a nice interface (that's also very familiar-looking to most users), scales well, and integrates well with MS Office, which (like it or not) is used by 99.99% of the corporate world. It also handles non-office files just fine.
That's not to say that Unix-based solutions don't have their place. During the migration, I actually employed a series of shell/python scripts to assist with several of the more mundane aspects of the process. These probably saved us a couple thousand man-hours that would have otherwise been spent categorizing the files.

--
-- If you try to fail and succeed, which have you done? - Uli's moose
Re:SharePoint? by jockeys · 2009-06-10 10:13 · Score: 1

+1.

I'm no MS fanboy, but Sharepoint is great. I work for a large engineering company and we use it to organize blueprints, as well as pretty much all of our non-code documents. Even the most clueless HR-types can use it, and it's really not hard to set up.

--

In Soviet Russia jokes are formulaic and decidedly non-humorous.
Re:SharePoint? by moosesocks · 2009-06-10 10:15 · Score: 4, Interesting

Why should you give sharepoint a chance? Even it it works well, it is proprietary and you are locked in.
No less proprietary than other similar systems. Getting files in/out of Sharepoint is a fairly trivial process, and the API is open enough to craft your own migration plan if you ever decide to move away from it, given that everything else is equally (or even more) proprietary than Sharepoint.
MS Office might be proprietary, but is so widespread that it's a 'standard' in its own right -- Sharepoint integrates excellently with Office, and keeps your users happy.
I'm typically not one to advocate the use of Microsoft products. However, Sharepoint worked just fine when I was using it, and is definitely a huge step up from any of the competing products at the same price-level.

--
-- If you try to fail and succeed, which have you done? - Uli's moose
Re:SharePoint? by DigiShaman · 2009-06-10 10:27 · Score: 1

That's true of any other solution in the same manor as SharePoint. But at least the data is stored in a SQL database and not something proprietary like the MS Exchange information store.

--
Life is not for the lazy.
Re:SharePoint? by Itninja · 2009-06-10 10:31 · Score: 1

Totally agree. SharePoint is one of the few recent products that Microsoft actually got right. Of course, they will probably find a way to screw it up down the road, but currently it rocks as an enterprise level document repository.

--
I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
Re:SharePoint? by pete-classic · 2009-06-10 10:36 · Score: 3, Interesting

How does Sharepoint address his problem? It uses the exact same folder/file paradigm that is failing in his existing solution.
-Peter
Re:SharePoint? by glitch23 · 2009-06-10 10:42 · Score: 2, Informative

That's true of any other solution in the same manor as SharePoint. But at least the data is stored in a SQL database and not something proprietary like the MS Exchange information store.
Although the files are in a database you can change the view in the browser to be "explorer" and access the files using Windows File Sharing-like features (copy/paste) through the browser. This method of access though is an end-run around SharePoint's versioning system. New files can be uploaded in this manner as well. I presume that when you modify an existing document in this way that SharePoint just makes that version the newest one in the actual database. SharePoint is still no substitute for a properly standardized naming convention and folder structure. Yeah you can always do a SharePoint search for what you want but at work I never do searches because we have specific folders where we place stuff and I know that as long as people follow the standard then I can find what I'm looking for and so can everyone else. We don't have thousands of documents though so maybe with documents counted using 6 digits a standard naming convention is asking too much.

--
this nation, under God, shall have a new birth of freedom. -- Lincoln, Gettysburg Address
Re:SharePoint? by Anonymous Coward · 2009-06-10 10:43 · Score: 3, Insightful

What you say:

Why should you give sharepoint a chance? Even it it works well, it is proprietary and you are locked in.
What you mean:

Regardless of how perfect a solution might be for you, if it doesn't conform to MY personal ideological viewpoint, it shouldn't be given a chance.
God I hate people like you.
--AC
Re:SharePoint? by Sylver+Dragon · 2009-06-10 10:44 · Score: 1

Add another me too for Sharepoint.
From the initial question, I'd guess that just WSS3 will get the job done and it's free. One important piece of this though is: plan your deployment. Figure out what type of site structure you plan to use before you implement anything. Sharepoint can be a wonderful tool, but if you just jump into it and let it grow organically you will end up hating it and yourself. And trying to monkey around with the site structure after the fact can be trouble. Oh and, get familiar with ASP.NET master pages and what they do and how they work. You will be using them in WSS, and if you go into it without care you can trash your entire site fast.

--
Necessity is the mother of invention.
Laziness is the father.
Re:SharePoint? by dave562 · 2009-06-10 11:00 · Score: 1

Yeah you can always do a SharePoint search for what you want...
Unless SharePoint has gotten significantly better in the last few years, I wouldn't trust SharePoint search to find a file located in the root of the directory I point it at. When I was using it, SharePoint search didn't seem to understand the underlying hierarchy, so it required a lot of parameters and qualifiers to do what should have been a single search (Go find me blah.doc for example).
Re:SharePoint? by nighty5 · 2009-06-10 11:21 · Score: 2, Informative

We use SharePoint in a large enterprise although its pretty good at mashing together websites - unfortunately its really poor at search. I think Search 4.0 may improve the situation, but its nowhere near Yahoo, Google or other search technology. Technology doesn't solve all problems, I'd say this said company needs to focus on strengthening business process and implementing some user awareness programs.
Re:SharePoint? by preystalker · 2009-06-10 12:35 · Score: 4, Informative

I would recommend using Alfresco. Correct configured and deployed, you could access files via Windows Explorer, WebDav, web interface, etc. and data is stored in a SQL database. Alfresco uses open standards and should be considered instead of SharePoint.
Re:SharePoint? by slater86 · 2009-06-10 12:38 · Score: 1

we use sharepoint (the free version) at the moment, does an excellent job when you have no budget. but if you have the time, skills or budget (the usual "pick any two" rule) there are better CMS stuff available.

even works well with ldap/samba domain controllers.

--
When people ask if I'm an optimist, I say "I hope so". --Bill Bailey
Re:SharePoint? by blincoln · 2009-06-10 12:47 · Score: 1

Of course, they will probably find a way to screw it up down the road, but currently it rocks as an enterprise level document repository.
In its inner workings, it's already pretty screwed up. For example, for anything that shows up as a list (which includes any type of library), the things that look like database columns actually have their data stored in XML format in giant text fields in the database. For wiki articles in particular this is a problem because the entire text of the articles is one of those values in the giant "properties" text field in the database. It's also a problem for lists with lots of "columns", especially if the list/library is set to allow multiple content types, because then each "row" in the list gets all of the properties for each of the content types inserted into its XML text field, even the ones that its content type doesn't use. Besides the obvious performance/scalability issues here (IE you can't create a meaningful SQL index on this data because all of the data you'd want to index is in that one stupid XML field), the SharePoint search indexer basically does a SELECT * into RAM for each list it comes across. So if you have a wiki library with a few thousand articles in it, *bam!* you just ran out of memory and none of them will be indexed.
Most people seem to love SharePoint, so I think MS has done a great job on the front end. I just wish they'd devote the resources to make the back end a lot more solid.

--
"...always new atoms but always doing the same dance, remembering what the dance was yesterday." -Richard Feynman
Re:SharePoint? by FooRat · 2009-06-10 12:59 · Score: 1

For an aerospace company, you probably need something from a company with a better security track record - sorry, that's just due diligence.
Re:SharePoint? by MeanMF · 2009-06-10 13:16 · Score: 1

It lets you attach metadata to files and it full-text indexes pretty much anything you can throw at it.
Re:SharePoint? by scooterhanson · 2009-06-10 13:49 · Score: 1

Here's a great paper on the drawbacks of Sharepoint: http://www.yakabod.com/library/downloadDocument.html?docId=10805/
Re:SharePoint? by ewhac · 2009-06-10 14:19 · Score: 1

...give SharePoint a shot...
Bah. SharePoint is what you end up with when you don't know about Qtask.
Schwab

--
Editor, A1-AAA AmeriCaptions
Re:SharePoint? by ajlisows · 2009-06-10 14:30 · Score: 1

The nice thing about Sharepoint is, depending on the functionality you need it can be FREE (as in beer) if you can get away with Windows Sharepoint Services.
The company I work for really wanted A Document Management System. They had tons of paperwork laying around. We put in Sharepoint along with a product called KnowledgeLake. Knowledgelake reads bar codes off documents that are printed or scanned to a network drive, grabs metadata from a SQL Server based on that bar code, and files the thing. It is really no hassle at all. Knowledgelake also adds a search component that is much better than the Sharepoint Search so finding documents is really really easy. There is also a client program for the Knowledgelake system that lets you right click on a document, pick a document library to send the document to, and manually input the key field to grab the Metadata and file the document properly.
I don't know what types of documents you are looking to index but all MS Office documents integrate with Sharepoint, obviously....but the real issue is other file types. Autocad Files, for Example, can be integrated into the Sharepoint System using third party applications (We ended up not going that direction so I can't remember what it is called..the company is named Bentley maybe?) and I am sure there are many other programs that have similar applications written for them.
So yeah, you can make fun of me for sounding like a Microsoft shill but I evaluated several other Document Management Systems and Sharepoint with Knowledge Lake turned out to be the one that the company felt most comfortable with....and it has served it's purpose well!
Re:SharePoint? by symbolset · 2009-06-10 15:04 · Score: 1

Sharepoint is wonderful. I used to get all my cross-company plans, developments and projects from it. I could enter a couple of searches and have everything: executive travel. department budgets, next years product strategies, customer and vendor lists, even skunkworks projects with circuit layouts and logic diagrams. Definitely a huge career pusher once the gig was over.
And I was just a temp clerk in the mailroom. I wonder what people with privileges had access to.

--
Help stamp out iliturcy.
Re:SharePoint? by Seraphim_72 · 2009-06-10 15:08 · Score: 1

It uses the exact same folder/file paradigm

Actually it doesn't. All the SP gurus tell you to never make folders, everything is a list of files. There is a shift of how things are done in SP, it really is a hurdle. Alfresco does the same sort of thing. Plus the files are data aware. SharePoint actually has a few good ideas, but things like Wave will eat it alive eventually.

--
Slashdot, where armchair scientists get shouted down and armchair theologians get modded up.
Re:SharePoint? by Seraphim_72 · 2009-06-10 15:22 · Score: 1

it has limited (well no workflow)

uh? In reality the bitch about SharePoint is that it has too many ways to do a workflow.

--
Slashdot, where armchair scientists get shouted down and armchair theologians get modded up.
Re:SharePoint? by Dadoo · 2009-06-10 15:26 · Score: 1

Try not to select a solution that stores the files in a database
If you do that, it becomes problematic to back it up. We've got around 10 million documents, taking up about a terabyte, and it takes roughly 4 days to back it up. (We only have to do that once a year, thanks to incrementals, but it's still a pain.) Lots of small files will kill your backup performance, every time.

--
Sit, Ubuntu, sit. Good dog.
Re:SharePoint? by Seraphim_72 · 2009-06-10 15:29 · Score: 1

Your link errors ... great paper on why I should trust yakabod.

--
Slashdot, where armchair scientists get shouted down and armchair theologians get modded up.
Re:SharePoint? by moxitek · 2009-06-10 15:55 · Score: 1

Possibly because business doesn't normally give a shit if a blind monkey with three fingers wrote the code as long as it just fucking works and they can see some cost savings or business benefit to implementing it. Us in the real business world are less concerned about "lock in" or what license something was written under and just want our shit to run and run well. MOSS enterprise search does a really kick ass job of indexing file shares and making them available in a really to use, easy to manage central location.

Software is a tool to acheive a business objective. If I've got the best tool to do the job, I don't care what political/social dynamic the license of the code falls into.
Re:SharePoint? by Itninja · 2009-06-10 16:17 · Score: 1

Which is why I said it rocks as a document repository. All the wiki stuff is woefully inadequate. The vary concept of wiki really has no enterprise purpose in general (unless of course your enterprise is wikis). The recommended limit for content types and documents per library keeps the XML caching under control.

In my experience, most (if not all), complaints about the performance of MOSS can be tracked back to someone who did not know (or chose to ignore) the recommened limits of the product.

--
I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
Re:SharePoint? by jawahar · 2009-06-10 17:10 · Score: 1

Why not http://www.mediawiki.org/

--
Slashdot = Sarcasm
Re:SharePoint? by trendzetter · 2009-06-10 18:13 · Score: 1

I remember endless stories of troubles with Sharepoint. I think it's very buggy by design (one example: it stores it's users on multiple places, not in 1 db). You need enormous quantities of hardware and a large amount of money on licencing to get it running. Microsoft is making lots of money on support I guess, especially since the release of this product. It works not well with non-microsoft software (like browers). When I was using it it had no support for non-microsoft formats, maybe this has changed. I think it's more proprietary than Alfresco which is released as open source.
Re:SharePoint? by HavocXphere · 2009-06-11 00:06 · Score: 1

Does it come with MS Clippy office assistant?
Re:SharePoint? by david_thornley · 2009-06-11 02:39 · Score: 1

Businesses typically don't care much about whether they're using free or proprietary software (there are exceptions on both sides), but they at least should care about lock-in.
There's always the chance that a business will have to stop using a given product, probably more for a proprietary product (which can just be dropped by the vendor) than a free software product, but not by all that much. There's always the chance that the business will want to shift to another product (and that applies equally to all software). This means that there's a distinct advantage for the business (although not typically for the vendor) to avoid such lock-in.
Obviously, moving to another product will never be free; even if the new thing doesn't cost actual money, there will be a certain amount of training and general disruption. However, there's a difference between expensive and near-impossible. Free software can always be modified to get the content out, although that's not necessarily a very useful option. Proprietary software can be a real bitch, as can cloud-based products.
However, an earlier poster addressed this for Sharepoint, Apparently, lock-in isn't a problem in this particular case, since it's fairly easy to get everything out in a more-or-less standard format.

--
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
Re:SharePoint? by Larryish · 2009-06-11 13:04 · Score: 1

Can anyone recommend a good Alfresco tutorial?
Got an ebook collection here that has gotten out of hand, and the Alfresco free version sounds juicy.
Re:SharePoint? by scooterhanson · 2009-06-12 07:11 · Score: 1

Thanks for pointing that out -- I accidentally added an extra slash at the end, so this time you actually can shoot the messenger. http://www.yakabod.com/library/downloadDocument.html?docId=10805
Re:SharePoint? by badkarmadayaccount · 2009-06-14 06:46 · Score: 1

I love the matching sig...

--
I know tobacco is bad for you, so I smoke weed with crack.

Google Search Appliance by yakatz · 2009-06-10 09:36 · Score: 1

Google Search Appliance

Cygnet by Rob+Kaper · 2009-06-10 09:37 · Score: 1

Cygnet ECM might work for you.

Documentum by trondwn · 2009-06-10 09:37 · Score: 2, Interesting

use EMC document solution, where you have all documents i central database with metadata that can describe content. And can be accessed thru cached server from different sites.

Just the doc, or collaboration? by geekoid · 2009-06-10 09:38 · Score: 1

If you need to use just plain documents, store then in on big directory, update the meta information.
Let people move links onto there system and organize the links how the like, but don't let them move the documents.

Think iTunes for documents. I loath that example since I have set this sort of thing long before iTunes came around.

If you on collaborative use of your documents get something like this:
Jive.com

--
The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

Document Management to the Rescue by Anonymous Coward · 2009-06-10 09:38 · Score: 1, Informative

Sounds like you need a real document management system.

Depending on your requirements, you could go with something open source like Alfresco or one of the big boys like EMC Documentum or IBM/Filenet P8. Either way, you will end-up with an indexed repository of documents that makes it easy to to find old documents, add new ones, etc (assuming you and/or your integrator do the project correctly). It will also provide a web front-end so you don't have as much killer WAN traffic as you do now.

With a good document management system in-place, you are also on your way to having a workflow and other benefits as well. e.g. When Bob submits a document with XYZ as an index value, automatically tell Joe that it is in and ask Joe to approve it. When Joe approves it, tag it "Approved", and let Jim know.

Depending on your requirements for document retention, archiving, e-discovery, etc. the document management system can help you fulfill all of those automatically.

Simple answer... by Anonymous Coward · 2009-06-10 09:38 · Score: 1, Interesting

Hire human beings to sift through it and label each file with a numbering/labeling system devised by your engineers. The human mind is a relatively inexpensive and already well designed piece of machinery. A few dozen of them given enough time can work through those hundreds of thousands of document and get them sorted correctly. The problem you have, is that you have unsorted, improperly labeled material. It is cheaper to hire sufficiently (or even insufficiently) evolved groups of people than to invent a machine capable of doing so. And, with the economy the way it is, you'll be doing everyone a favor by giving them years of employment. When the Manhattan project needed to create a large excess of fissile material for the war with Japan, and with all the men away at war, they hired dozens of women to sit at machines; turning knobs, checking meter levels, verifying output. The scientists themselves did not even need to be there, they designed a process and the women were trained in it and followed it.

Re:Simple answer... by Mia'cova · 2009-06-10 12:54 · Score: 1

Not a great idea to pay kids min wage to organize all of your company's secrets. Presumably anyone with 100k+ documents has a good deal of intellectual property. They'd want a long term solution which improves productivity.
So my thought is, if it came down to dumb labor, I would still recommend that they do it in house with the people who wrote the documents. It's a giant distraction but it has the best result going forward.
Re:Simple answer... by budgenator · 2009-06-10 13:11 · Score: 1

Yes I was think hire a librarian, that's what they do organize large amounts of documents for retrieval. A big part of the posters problem is legacy documents she/he should start there.

--
Apocalypse Cancelled, Sorry, No Ticket Refunds

Document management software by Wrexs0ul · 2009-06-10 09:39 · Score: 4, Insightful

Most print companies like Xerox have their own proprietary Document management tools you can buy, and a bunch of CRM and ERP solutions (like OpenERP - it's free AND Open Source) provide some good simple document searching and indexing tools.

Really it comes down to how complex you want searching to be? Are there specific keys in the document you could index by? Do you require the full-text search capabilities of a Google search appliance?

A really good solution I've come across for some clients in Edmonton is Called MetalTrace by Trace Applications. Don't let the name fool you about the specificity, software like this can Scan, Index, and even read barcodes on all sorts of documents then let people search for it via the web. Their "killer-app" has multiple user-defined document types with multiple search fields, combined with some back-filing (digital and scanning) really saved the day.

Do your research though on "Document managment" and see what product best fits your needs. It's a really well established field so reinventing the wheel is a little masochistic... not that there's anything wrong with that. ;)

-Matt

--
--- Need web hosting?

Re:Document management software by MyDixieWrecked · 2009-06-10 10:47 · Score: 3, Insightful

Most print companies like Xerox have their own proprietary Document management [wikipedia.org] tools you can buy
Document management software is great, but when you have enormous numbers of documents (100s of thousands like in the summary), it becomes necessary to have a content management system in place. Something that's intelligent enough to break the documents up into pieces and allow searches, but something more robust than full-text search.
We've been using this software called MarkLogic Server (http://marklogic.com). It's an XML database and has a content processing framework for document ingestion. So, basically, assuming that documents are structured similarly, they can be converted into XML so they can be queried with custom weights being applied to content in different portions of the document. The software has built-in Word support so it'll automatically convert .doc files with proper formatting as well as the ability to add custom handlers for other formats including plaintext.
We're currently managing a couple million documents and generating dynamic documents on the fly for some processes. Since on-the-fly documents may take time to generate, we have a system in place that saves the result in the database which can also be queried at a later date. It's all really cool.
Of course, there's a bit of a learning curve to writing your own software for it since it uses XQuery, but it's not much harder to learn than SQL, and so far, it seems to be far more powerful.
Disclaimer: I'm not a shill nor am I being paid in any way by MarkLogic... I'm just seriously blown away by what their technology has enabled us to do.

--

...spike
Ewwwwww, coconut...
Re:Document management software by thoglette · 2009-06-10 14:11 · Score: 1

There's a dozen or so companies providing software in this area, from littlies like Atrove to the big players like Xerox's Docushare.
You have three problems
a) MS windows does not work with large end-to-end delays. You are going to need something third party (sharepoint, as has been pointed out, is not a solution to your problems)
b) you apparently don't know who owns your documents. You need to sort your documents by publisher, IP ownership rules and then publisher's ID
c) I worry when a "midsized aerospace company" hasn't worked out how to identify; revision control drafts and baseline manage issued documentation.
The problem has been solved for many years - the tools and best practice are constantly evolving (particularily with managing AV data).
Hire a DM/CM dude from a proper aerospace company. Or two. Or even a properly qualified librarian.
Finally, how on earth are you currently meet your contractual obligations?

--
-- Butlerian Jihad NOW!
Re:Document management software by cmdean · 2009-06-11 16:46 · Score: 1

The problem with document management software is that they require users to do some "extra" work filling in metadata. This fails. Generally users will not fill in more than title, adding keywords, short descriptions, file numbers are simply too much effort. When the metadata fails, the document management system also fails.
I suggest you first look at geting a good enterprise search engine. Lucene(apache.org) is open source and free, MindServer (www.recommind.com) from Recommind is not but is amazing (I'm a happy client, not a shill).
If your users can find everything they need to do their work, who cares how badly it is sorted or filed.

Knowledge Tree by crackervoodoo · 2009-06-10 09:39 · Score: 3, Informative

http://www.knowledgetree.com/ If you're looking for a no-cost (read as no license fee) option then Knowledge Tree Community Edition is a decent Document Management tool. We've been using it for a couple of years.

try wiki by bitsmith · 2009-06-10 09:41 · Score: 1

JamWiki.org, for instance, has search capabilities built in. Has security built-in and easily mnageable. You can upload the documents and even migrate them to wiki format later. Keeping the documents in near-text open format will help you re-migrate them into the future sometime later.

--
A man without religion is like a fish without a bicycle. -- Ron "Doc" Ferrell

Re:try wiki by evil_aar0n · 2009-06-10 10:39 · Score: 1

Our documentation is not nearly as bad as the OP's, but when I considered an approach to wrangling this mess into a usable state, Wiki was the first thing that came to mind. Wikipedia seems to work pretty well, and supports thousands of users all over the place. Couldn't be _that_ bad, could it?

--
Truth, Justice. Or the American Way.

Knowledge Tree? by gilesjuk · 2009-06-10 09:42 · Score: 1

I used an old version a while ago and it was pretty good then. Does versioning and other things.

http://www.knowledgetree.com/

Get yourself a good management system. by Anonymous Coward · 2009-06-10 09:42 · Score: 2, Informative

While this may be an odd suggestion, here's two things:
1) Get yourself a damn good document or content management system. Get it set up on the baddest machines you can afford.Overshoot the capability you need, so that you have room to grow.
2) Get a librarian to look at the kinds of documents you create, and develop a system to catalog documents while maintaining reasonable standards for file names. As the super simplest system, maybe document names that indicate (at a minimum) what project or what overhead department they belong to, a broad category of subject matter, and if it's versioned, a version number.

I tried to bludgeon a small company I worked for (around 40 engineers, one overworked Q&A person, and one system administrator) into moving towards a storage system for word documents that was not "Create a new folder for each version of the document set, place them all in the right folder, and if you don't Ray will eat your head." We wound up using (of all things) Perforce SCM to house fifty thousand word documents, and were starting on putting actual code revisions for automated test sets into the system when our avionics testing focus became a serious liability, and overhead workers were drastically cut. (Why have one Q&A guy and one system admin guy? We can get an intern to do BOTH!)

Get a Document Management System by bsy-1 · 2009-06-10 09:42 · Score: 1

Any of many document managment systems. They allow the extraction of meta data, which is in turn used to 'find' the document you are looking for. Nearly all contain some security settings and a viewer for many types of files. One thing to note. This magic doesn't happen by itself, if you get stuck doing this, be prepared for a. No one really knows how they want to do this, they all want to wonder if one of the many docs has their answer and have the correct doc located and opened for them. b. you are about to become a stranger to all those who know you outside of work.

Indexing and Cataloguing by Zerocool3001 · 2009-06-10 09:43 · Score: 1

If you don't like the idea of sending your information to google to have it indexed, you can look into some server side applications (with associated client apps) that do the indexing and searching for you. I'm not familiar with Windows ones (although I'm sure there are some) but there are quite a few for Linux and primarily Spotlight for the Mac. The option have the actual indexing done server side would save on your bandwidth tremendously. You may also want to consider using a different filesystem, one that has indexing capabilities built in.

--
Science will save us. The question is, will it destroy us first?

Lots of ECM solutions out there... by jwilkins13 · 2009-06-10 09:43 · Score: 2, Informative

Sure, with any number of ECM solutions. At the simplest end many of them simply enforce naming conventions; at the more robust end, they support many different file types for viewing, indexing, etc. and can also provide rich metadata on a document-by-document basis. Some of them have been named in the comments, including but certainly not limited to SharePoint 2007, Cygnet, Documentum, Open Text, FileNet, etc. Any system worth looking at has a web-based interface, at least for searching, and many of them offer for more meaningful interaction as well. Alfresco, Hyland, and SpringCM all have web-based ECM solutions and more comprehensive web-based offerings are available all the time. Oh - and if you're aerospace there are a number of regulatory requirements for information management you'll need to comply with, which does complicate the situation but spending the ducats for software and/or consulting help is probably cheaper than whatever your litigation and regulatory audit support processes cost today. Hope this helps, Jesse Wilkins ECM and other stuff consultant jwilkins13 at gmail dot com

Re:Lots of ECM solutions out there... by NeoSkandranon · 2009-06-10 09:57 · Score: 1

I don't think electronic countermeasures are gonna help in this case.

--
If you can't see the value in jet powered ants you should turn in your nerd card. - Dunbal (464142)
Re:Lots of ECM solutions out there... by afidel · 2009-06-10 11:41 · Score: 1

Enterprise Content Management.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.

where is the slowdown? by the_denman · 2009-06-10 09:46 · Score: 1

I think step one is to pick a storage/naming convention and stick with it. Also depending on your needs a document management system could help. The other thing I would do is look and figure out where the bottleneck is for your speed issue, is it the vpn connection, the network not being able to keep up, or the computer running samba. Once you know more of where the slowdown is work on that spot.

Switch to Apple... by Tibor+the+Hun · 2009-06-10 09:46 · Score: 3, Informative

I only partly jest, I know such a thing is damn near impossible to actually do, but in our Mac shop, such things are trivial. With one click of the mouse we enable spotlight searching on our Leopard AFP server and bam... all the clients have almost instantaneous search access to their docs.

--
If you don't know what AltaVista is (was), get off my lawn.

nothing beats a folder structure and naming by fxdgear · 2009-06-10 09:47 · Score: 2, Insightful

I'm gonna say nothing beats a proper folder structure and naming convention. I'd also recommend using svn. Also spend some time to develop some macros to assist in the creation/saving/retrieval of said documents from the repository. Maybe create some standard templates too... just my 2cents!

WebDav by SplashMyBandit · 2009-06-10 09:48 · Score: 4, Informative

There are a few options:

For relatively unstructured data without versioning you could serve them over HTTP with WebDAV (Apache) and use your existing HTTP security mechanisms. You wouldn't believe how relieved I've often been when I can get my (secured) resources from home-base while located at a clients site.
My outfit uses KnowledgeTree for versioned stuff (http://www.knowledgetree.com/)
Or you could embrace your dark-side and use Microsoft SharePoint (plus, with all the Microsoft bugs you'd have a job for life until your employeer goes bust). If you are a friend to your company you won't do this, plus your outfit has engineers and the good ones can spot trash solutions.

If you users are naming their files with strange characters in them (assuming it's not due to Samba) then they will just have to live with it, you won't have time to sort out all the wierd names that (mostly MS-Word) users give to their filenames. The primary objective should be to give your users access to the files. Making the directory listing pretty ought to be a secondary concern.

Re:WebDav by Anonymous Coward · 2009-06-10 11:44 · Score: 1, Insightful

The weird characters could easily be taken care of by something like Ant Renamer (even supports RegEx). Just replace the weird ones with an underscore or some other suitable character.
Re:WebDav by SplashMyBandit · 2009-06-10 17:50 · Score: 1

"Sharepoint as a effective document management solution"
You must work in a completely homogenous environment with the exact same desktop image and software install. For the rest of us Sharepoint is a relatively poor solution, requiring a specific client system, and usually a specific version of the o/s and productivity suite or lots of problems arise.
When you work on client sites for very large organisations (that have lots of versions of "everything" due to accretion) you realise that the Microsoft Way of replacing everything all at once in A Big Rollout is actually quite flaw, rather than just sticking to standards that work no matter what version of Windows, Mac OS X, Linux, or Solaris (they're engineers, after all) is being used. With standards-based solutions you can upgrade your infrastructure piecemeal while continuing to provide access.
I've found that the bigger the organisation I've been in (national level) then Windows is only on the desktop and some servers, the real heavy lifting is done by all sorts of systems (mainframes, DataPower devices, Un!x boxen). Sometimes the admins of these systems need corporate docs too. I have found Sharepoint to be an inferior solution in this kind of environment (yes, I have used Sharepoint before, which is why I've recommended other solutions that I've found to work better).

Most big companies seem to use.. by fluffernutter · 2009-06-10 09:49 · Score: 1

..something like Filenet or SAP. Sound like you have big corporation needs, get a big corporation solution.

--
Laws are rules for the court, but merely a bottom bar to hit for life. Think beyond laws in your actions always.

Re:Most big companies seem to use.. by Kadin2048 · 2009-06-10 10:56 · Score: 1

I agree that document management is the way to go, but I would just point out in their defense that they're not just exclusively "big corporation" products anymore.
All my knowledge is FileNet-centric, but at least with FN you can stand up a system quite easily; it's not a huge investment for what you get. I've seen deployments done for small and medium-size businesses (and relatively small departments within large companies) that justified the cost pretty easily in terms of not losing or having documents accidentally deleted, and being able to guarantee compliance and conformance to a backup strategy.
Versioning is also a big plus. You can let people edit documents without worrying that they're going to wipe out anyone else's work -- if you don't like their changes, just grab the previous version instead. Most places I've seen introduce most of their file-share complexity because they try to basically do version control in a non-versioning filesystem using file names, and everyone does it a bit differently. Total mess. Much better to do it the right way and use some sort of version control system or ECM product from the beginning, rather than try to use bare-filesystem share drives until they're totally unmanageable and then migrate.

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."

Mindoka Technology Corp. by Alethes · 2009-06-10 09:50 · Score: 1

Mindoka (http://www.mindoka.com) has a document management product that is designed to solve the problem that you have.

Riverbed Steelhead mobiles by DecepticonEazyE · 2009-06-10 09:50 · Score: 1

Put Steelhead mobile on all the clients. Document transfer over the VPN will GREATLY improve. Since it's mostly text/pictures, there will be so much duplicate data that doesn't need to be transferred over the wire multiple times, the round trip time will decrease so much they'll forget they're on a VPN.

FileNet by Ohio+Calvinist · 2009-06-10 09:50 · Score: 4, Interesting

I worked at a place that used FileNet, which is now an IBM product, to do this sort of thing. We had millions of scanned documents in the system. I wasn't personally very impressed with it, in that whenever anything "bad" happened, you had to call IBM because finding support online was impossible, and at that they support wasn't very good. It was also a very picky system, those seemed to handle the load well. If you go with it, I strongly encourage doing it for UNIX/Oracle because it screamed "poorly ported" when we used it for Windows/MSSSQL. It has an API for integration, but it is also, poorly documented and would take some time to integrate into your existing business systems.

This is more of a rant at this point, but it is a stop-gap solution that allows people to continue to use outdated business processes storing important data in image formats or in documents scattered about with minimal indexing/search capabilities, rather than analyzable "data" that can lead to "information." I always take the position that if the goal is something on paper, or the goal is to store something that "was" on paper, it is time to rethink the business process to see if we can automate it, or store/present the data electronically in the first place. The old school fights against it, but no one has ever been able to say it wasn't more efficent in the end and enabled IT to say "yes we can" when the next great idea came along versus "here is a stack of papers, figure out $trend."

--
Forgive my spelling from time to time. I'm often posting during short breaks.

Re:FileNet by flnca · 2009-06-10 12:25 · Score: 1

FileNET can be monitored using Tivoli TME 10 and the FileNET integration module and/or CALA (cenit Advanced Logfile Adapter). This way, you can automatically react to problems. BTW, there's another post up there from someone who had a better time with FileNET. ;)

Technical issues aside by Vroom_Vroom · 2009-06-10 09:50 · Score: 3, Insightful

Hire a document manager / clerk person who will create order. Your engineers won't.

--
Boing boing boing....

Re:Technical issues aside by Seraphim_72 · 2009-06-10 15:33 · Score: 1

The word you are looking for is: 'Librarian'

And yeah, they are that good.

--
Slashdot, where armchair scientists get shouted down and armchair theologians get modded up.
Re:Technical issues aside by James+McP · 2009-06-11 01:08 · Score: 1

Engineers can they just generally don't. I spent three years working at a library so I have a fondness for good organization systems.
I was the file system nazi at my last company, a civil engineering firm. I was hired for engineering and IT support right as they started implementing a standard. The nazi-ism started by marking every directory in the existing file store read-only on every project that was complete, according to the accountants.
Then "create directory" permissions were limited to senior project managers and their one administrative assistant. I set up a script to check for new directories every day and I'd email anyone who didn't follow protocol. I pre-seeded the directory structure by getting the list of open project numbers from finance so in theory, everything billable already had a home waiting for it. For new projects I simplified things by creating little widget that asked for a project number and the contract name and it created the directory tree.
We created a separate volume that contained data that was not project specific but may be needed across multiple projects. I.e. the various CAD standards (national, Corps of Engineers, DoD, DoT, etc) along with company/client logos, all the stock patterns/icons for the various CAD programs, etc. All the CAD programs were set to point to that shared directory by default to encourage the worker-bees to put shared data there so they wouldn't have to set project-specific directory over-rides.
That directory allowed everyone to add data but only the CAD/marketing/PR/QA managers could delete/overwrite files. A report was generated monthly and send to the managers that listed files with similar names and extensions to make sure we didn't wind up with 25 versions of one logo or hatch pattern.
This was staff-intensive but that's because capital expenditures were the devil since they couldn't be charged easily to a project. File management, however, is something that was billable.

--
I've been on slashdot so long I'm starting to get out of touch with the cool stuff if it ain't on slashdot.

SQL... nuff said... by Youngbull · 2009-06-10 09:50 · Score: 1

I think the right option for you would have to be ordering the documents in a database and serving them up through a website. I think that would be helpfull for your satelite offices since mapping shares through samba over VPN is sometimes unstable and always nontrivial. Besides the system doesn't seem to be working for you. You really don't have to be that proficiant with functional webpages to make something like this, especially if you use ruby on rails. A ruby on rails guy would probably use only a couple of hours to make such an application. Then you could have functionality like searching and sort by author, department, type and so on.

Alfresco by SplashMyBandit · 2009-06-10 09:53 · Score: 2, Informative

I forgot to mention Alfresco as well, although I've never personally tried it.
http://www.alfresco.com/index-b2.html

Just Don't Use Livelink by Myrv · 2009-06-10 09:53 · Score: 1

Can't really suggest a good document management program but I can tell you one to avoid. We use Livelink at my place of work and its indexing and search capabilities are horrible (some would say non-existent). For example every document added to Livelink gets a document number assigned to it. One would expect to be able to retrieve that document by using the same document number but if you enter it into the search bar Livelink returns no results found. Huh? Not to mention some odd UI behaviours like when you add a folder to the favourites box the original folder disappears from the standard file listing (meaning there is no single canonical listing of files and directories, you need to always look in 2 places).

Re:Just Don't Use Livelink by CodeMonkey22 · 2009-06-10 10:40 · Score: 1

It doesn't return search results because you haven't configured Livelink properly, or you are using a very old version of Livelink.
Searching on the DataID can work. You should read up on 'Best Bets' functionality, or how to use the Livelink Query Language.

Regarding your second comment, is not called the favourites box, but instead 'Featured Items' and this behaviour is configurable in later versions, too.
Upgrade Livelink to version 9.7.1 if you are not already there.

Livelink is incredibly powerful and can be configured to do anything you need it to do, but the key is knowing how to do it. A skilled administrator is definitely needed.

Full Disclosure:
I work for Open Text and am a certified Livelink Systems Administrator.

Institutional repository? by sidb · 2009-06-10 09:53 · Score: 1

What kind of documents are they? If they're mostly text and you want versioning, the only drawback to subversion is getting people to learn the tools, but that might be too much.

If they're archival/static documents, an institutional repository could work. Something like DSpace isn't that hard to deploy and will provide basic archival and search features.

The middle ground between those two solutions is probably what you want, though. Everyone I work with uses SharePoint for that, and I hate recommending proprietary lock-in.

Laserfiche by wguy00 · 2009-06-10 09:53 · Score: 2, Informative

Laserfiche (or LF) is just what this is for. It is DOD, DOJ certified and crap, and is used by all branches of the military and several other areas of the government as their document management system. With several different software offerings, just about any situation can be taken care of. It's features include the ability to search based on document name, template information, or OCR'd text (which the software also takes care of). With add-on features such as Quick Fields, it may be able to automatically sort, add template information, OCR, name and then store the documents. It really is a nice way to go. Satellite offices can access and be either full or read-only users. It has the ability and modules to connect to just about any other type of data/information system (GIS, financial software, etc) and is very scalable.

I was a tech for 5 years with a LF VAR. I'm not there anymore. We were constantly cleaning up messes left by other document management systems. Take your time with this thing and really plan your naming convention, folder hierarchy and user setup. It's easier to get it right(or as close to it as possible) then going back and having to fix it later. A good LF VAR should help you with this. Definitely check references of competing companies. Some VAR's are A LOT better than others.

It's called a DAM system. Do some research. by Logic+Bomb · 2009-06-10 09:54 · Score: 1

Digital Asset Management

http://www.lmgtfy.com/?q=digital+asset+management

Re:It's called a DAM system. Do some research. by Chris+Mattern · 2009-06-10 10:17 · Score: 1

Well, maybe he doesn't want your DAM system!

I work for a part 121 air carrier by maric · 2009-06-10 09:54 · Score: 1

we have extensive documentation and tracking needs. we use two sets of software for records and also keep a hard copy for long term storage. For tracking parts on/off and hours in service, TSO TSI etc... we use TRAX Evo2 We scan all written paperwork into a database which is interfaced with via Alchemy. This allows us to view the current status of all of our aircraft and their parts and track the paperwork for each action taken. Alchemy has a browser interface and we use IE to access it. this allows for a person to access the documentation from any of our stations and or offices internally on the network. Both Alchemy and TRAX are acceptable to our local FSDO. The hardware setup for this is not something I can shed light on as I do not get to play with computers that are ground bound. hope that helps, maric

Organize.... by Fallen+Kell · 2009-06-10 09:55 · Score: 1

As may have been pointed out, organizing the files is really the best way. Develop a strict schema for naming conventions as well as a hierarchical directory structure for maintaining and organizing. Something like:

/projectname/projectpart/data (contains the final draft of any document) /projectname/projectpart/working (contains files that people are modifying so that they can be merged/checked in to the data dir) /projectname/projecttpart/misc (contains misc. notes or files that need to be filed with the project)

The "projectpart" dirs are really just logical groupings of data/files for the project. Say you are designing a plane, well, break it up into relevant systems, like electronics, power plant, structure, etc., and each of those are the "projectpart" directories. The "projectname" is simply the overall project itself, be it the name of the plane, maybe the name of the contract, etc.

--
We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"

windows Terminal Server by smalltimecrime · 2009-06-10 09:56 · Score: 1

The OP did not mention exactly how many remote branches or computers need to access the documents at once, however, windows Terminal Server licenses aren't too expensive and the remote desktop experience is silky smooth. Also the documents would all reside on a central server raid array or NAS device and never need to travel over the internet to remote sites. This would also free up massive amounts of bandwidth over the VPN, considering TS just needs an internet connection and uses SSL encryption. (although I don't know what you would even need a VPN for after making this conversion)

Re:windows Terminal Server by JustNiz · 2009-06-10 10:07 · Score: 1

>> windows Terminal Server licenses aren't too expensive and the remote desktop experience is silky smooth.
BWAHHAHAHAHAAHAHAHAHAHAHAHAHAHAHA
Thanks for that. I needed a laugh. Silky smooth? Having to do anything remotely technical via Terminal Server is the biggest pain in the butt I've ever experienced.
BTW if you're really not a paid shill for Microsoft then WTF are you smoking?
Re:windows Terminal Server by smalltimecrime · 2009-06-10 11:25 · Score: 1

How hard is it to set AD permissions? Just install your users applications once, Install printers once, map network drives once... on the TS etc. (and give your users appropriate permissions from the program's Installation directory....once) the only time I have ever had trouble doing anything "technical" over TS/remote desktop was trying to remotely flash update a Watchguard Firebox X500's firewall settings. FYI when I was describing TS as "Silky Smooth" I was mostly referring to the quick responsiveness of the mouse and crisp and clearly drawn desktop. Of course it is going to take a halfway experienced MS techie.

Comment removed by account_deleted · 2009-06-10 09:56 · Score: 2, Informative

Comment removed based on user account deletion

Who else read this and thought... by tlambert · 2009-06-10 09:56 · Score: 1, Interesting

Who else read this and thought... working in a satellite office for an aerospace company would involve a lot of cool travel perks?

-- Terry

Odd that the next story... by ak_hepcat · 2009-06-10 09:58 · Score: 2, Informative

Odd that the next story has a great idea for document management right in the summary...

Hadoop!

--
Support FSF: Stop thinking with your wallet, and think with your imagination. (cc/non-commercial)

Re:Odd that the next story... by msantosn · 2009-06-11 03:40 · Score: 1

Targeted Merchandising? First, create the necessity, show the solution.

Sharepoint by jayhawk88 · 2009-06-10 09:59 · Score: 1

...seems like a natural solution for your connectivity issues, or perhaps whatever the open source variety of Sharepoint is. You really do need to tackle the naming convention question though. You can have all the file indexing you want, but sometimes a nice, logical, clean file name will get you what you're after much faster than any kind of searching.

It's going to be horrible, painful, thankless work that will put you on the shit list of just about every department manager and administrative assistant ("You want me to rename how many files?"), but it has to be done.

Aerospace QMS by dwarf75 · 2009-06-10 10:03 · Score: 1

What worries me more than anything else is that you claim to be a mid-sized aerospace company. If you are having problems finding documents, what happened to your traceability processes necessary for your QMS and how do you guarantee that employees use up-to-date documents? How did you handle the process in the past??? And, what does your QMS stipulate for records and traceability?

Re:Aerospace QMS by icebrain · 2009-06-10 13:51 · Score: 1

To be fair, that sounds like the aerospace company I worked at for a while. The giant Samba share drives didn't store certification data or production drawings or anything like that (such things were handled in a document-control system with version tracking and all that), but it was rather a big share drive for convenience... pictures, videos, presentations, department budget data, spreadsheets, etc. It was basically just an interdepartmental shared space for things that didn't need to be emailed or whatever. It was convenient because everyone could access it, and you could get to it from anywhere in the company (like if you had to present in another building; just pull it up straight from the share on the presentation computer).
Stuff that needs versioning or document control should be handled through SmarTeam or Serena PVCS or something, at least.

--
The meek may inherit the earth, but the strong shall take the stars.

IBM OmniFind - a simple easy solution by sfalc · 2009-06-10 10:11 · Score: 1

IBM OmniFind should do the trick, It indexes your files and then you can search the index very quickly. It also does caching of documents and other nifty stuff. It is based on Apache Lucene and there is a free (as in beer) version, IBM OmniFind Yahoo Edition. The free version will work with up to 500 000 documents. I used it for searching a number of networked drives with circa 50 000 files on them which it did very well.

SharePoint by PIPBoy3000 · 2009-06-10 10:12 · Score: 3, Informative

NASA is a big user of SharePoint, strangely enough. My coworkers run into their folks at conferences from time to time.

I personally am ambivalent about SharePoint. Its roots are in document management, so it seems to do that relatively well. The publishing features are fairly nice as well. I don't think it's the best system for making web sites, but it may some day get there. Currently it feels like a 2.0 product (the magic rule is to never buy anything from Microsoft before 3.0).

There are gotchas. SharePoint is tightly coupled with your clients. If everyone accessing the documents are using the latest version of Office, you'll be okay. If not, you'll run into problems. You may also need to throw a lot of hardware into SharePoint, as storing files inside of SQL has some built-in inefficiencies.

Still, some of our users seem to love SharePoint, so it might be a good option for you.

Good luck by kilodelta · 2009-06-10 10:13 · Score: 1

When I worked for the state Attorney General's office as I.T. Director a request came into I.T. that immediately gave me an upset stomach. The request was for all documents on the server that contained the word "lead" as in the chemical element Pb. The issue was that the word lead and the element share the same spelling.

I kicked in and wrote an app that generated a web list on the fly and had clickable links so the documents could be examined and then marked as part of discovery.

I also brought in three Xerox 490's. Those were the hardware part of the document management system. I don't know if they ever got the servers for it but at least they had the gear. In the meantime I suggested using meta-data in filenames.

New Hire. by deimtee · 2009-06-10 10:13 · Score: 1

Hire a real librarian, it's what they do.
On the plus side, you also get to hire a librarian. nudge, nudge, wink, wink, say no more.

--
I'm guessing that wasn't on their radar screen...

Re:Sharepoint by cfryback · 2009-06-10 10:14 · Score: 1

We run a EDMS system for our local council here - doesn't matter about the filename, it is how it is all indexed. Too many people here are thinking that you need to re-name EVERY document. I don't have any experience with Hummingbird, but what about HP's TRIM software? Yes $$$$, but it also has a WEB GUI interface. Just a thought.

Alfresco of course! by thule · 2009-06-10 10:14 · Score: 2, Interesting

It can scale extremely well. It is the backend to Adobe's acrobat.com website! So you know it can handle millions of documents if you need it to. Sharepoint requires MS SQL Server for searching documents. With Alfresco, that feature is built in.

Sharepoint is teaming software and not really designed for large document repositories. Alfresco has a teaming interface (Alfresco Share) and a more generic document repository interface.

Alfresco can expose the repository via FTP, SMB, WebDAV, and a web client interface.

Regular Expressions by EvilGrin5000 · 2009-06-10 10:15 · Score: 1

Your solution:

http://xkcd.com/208/

--
A black cat crossing your path signifies that the animal is going somewhere. -- Groucho Marx

WIKI by unum15 · 2009-06-10 10:15 · Score: 3, Interesting

Maybe not the best solution for this particular job, but man am I glad we started using Dokuwiki for all our scattered documents.

There is a right way. by mrmeval · 2009-06-10 10:16 · Score: 5, Informative

http://en.wikipedia.org/wiki/Document_management_system

For that level of documentation you need to have a staff and get it properly indexed. You need a high level librarian. This would be someone with a masters degree at minimum in library science and at least a bachelors in information technology. They will not come cheap and they are a long term investment. The software is available, it is not trivial. Hiring a large number of people to recategorize and tag all the documents for the length of time that takes is also an expense but worth it. Once it's all in place maintaining it gets much easier.

I've seen a system developed for Raytheon. They took all the old compartmentalized data Hughes had and put every scrap of paper through a scanner. It was exceptionally well done. This would display electronic files and would have the location of hard copy. Classified documents were in some cases indexed but were hard copy only afaik. There were some documents that were hard copy only, those were usually ones with an NDA or other restriction on making electronic copies. It had every thing mentioned wrt versioning and such. Documents spanned decades with hundreds of revisions and you could pull up and view any revision. Depending on how recent and what type of document you could view a change log. Older scanned ones did not have that unless they'd been important enough to reenter as modern documents which meant OCR or manually transcribed. Some schematics were reentered into the system in a modern format. The effort was worth it. Having that data is the only way some devices or parts could be made or repaired.

http://en.wikipedia.org/wiki/Document_management_system

--
I'd go on a Vegan diet but the delivery time from Vega is too long. --brownkitty

Re:There is a right way. by mucous · 2009-06-10 11:55 · Score: 1

I don't know where to put this, so I'll put it here. This is not a problem which can be fixed with technology. The organisation clearly has no recordkeeping policy. It has no information management policies. No one is trained, no one is monitored. They don't need a new computer or a bit of software or even a librarian. They need a trained records manager, a policy, a lot of change management, high-level support and about a year of hard work to straighten out the mess. The best way to fix problems like this is to avoid them. Failing that, you've got a hard slog ahead of you.
Re:There is a right way. by mrmeval · 2009-06-10 13:20 · Score: 1

Yes that would be the overhead I missed. I've not heard of the title "records manager" but that is what I meant when I used librarian. The person I talked to was over all of their records pertaining to proprietary data concerning software, hardware, build instructions and the like.
I am experiencing that ineptitude at my new job. IT all two of them, managed to LOSE all the data from a hard drive crash of a server. 10 years worth of design data go poof.
How embarrassing is it to have to go to one of your contract board houses and beg for copies of your data back?

--
I'd go on a Vegan diet but the delivery time from Vega is too long. --brownkitty
Re:There is a right way. by mucous · 2009-06-10 15:40 · Score: 1

The New Zealand government has written up a handy guide to problems like this: What to consider prior to implementing an IT 'solution' to a recordkeeping solution http://continuum.archives.govt.nz/files/file/guides/g3/index.html It's actually very easy to understand.
Re:There is a right way. by nil_orally · 2009-06-10 22:25 · Score: 1

Why are you being helpful? This is Slashdot. Gotta agree though. There is a time to call in the professionals. Not having one got you into this mess, so you can't get out of it without one. No amount of software will replace a Librarian who will how it should be fixed and implemented.

Old tech by DerekLyons · 2009-06-10 10:21 · Score: 2

It's called an index or a bibliography. There exists a profession known as 'librarian' specifically trained in the creation of such and in the management of large numbers of documents.

Oracle or Alfresco by steverar · 2009-06-10 10:29 · Score: 2

We went through this for both document management and web front end for access. We looked through, Sharepoint, Alfresco, Oracle UCM, Reddot and a few others. We dropped most due to cost, functionality, and ease of use for non-developers to do page work. Sharepoint was dropped due to cost in an internet setting (CALs), no non-developer front end for page layout (they couldn't use HTML) and it stores everything in the database. From prior experience this made backup/restore difficult as it keeps the IP ofthe web site in the database when you backup. If you restore to a different machine it gets confused. It was between Oracle and Alfresco. You cannot go wrong with either. Both are extensible, either have what you need built in or can be added easily. Both are good for non-developers to use. Support is very good with either. We went with Oracle. While it did cost more it matched our existing infrastructure.

Open Text - Document Management Solutions by CodeMonkey22 · 2009-06-10 10:29 · Score: 1

This is built for the exact situation you described:

http://www.opentext.com/2/global/sol-products/sol-pro-docmgmt-collaboration.htm
You can either import the files into the system, or leave them in place, index them and use the search engines to locate the needles in your haystacks...

About Open Text:

http://en.wikipedia.org/wiki/Open_Text

Hummingbird is a subsidiary of Open Text, the solution mentioned above...

Full Disclosure:

I am an Open Text employee.

Google is the answer by BlackSabbath · 2009-06-10 10:31 · Score: 1

Google?
http://www.google.com.au/enterprise/mini/index.html

Seriously, if you can't be bothered collecting/maintaining the metadata that more structured solutions require, then just let Google index the lot. It'll work just as well (or not) as it does on the Internet. Although its not free it seems reasonably priced. It could be a quick answer to your problem.

SharePoint wiki by Anonymous Coward · 2009-06-10 10:34 · Score: 1, Insightful

I know I'm gonna get hit for blurting out the Microsoft Solution but...give SharePoint a shot...

Just avoid the wiki functionality like the plague. It completely sucks.

Mac OS X Server - Spotlight Server by Gary+W.+Longsine · 2009-06-10 10:35 · Score: 4, Insightful

Since your organization probably has Windows clients, you can only long for something as nice as Mac OS X Spotlight Server.

Google Search Appliance is definitely what you want.

If you have a mid sized company you definitely don't have the surplus of highly talented systems administrator talent laying about to run one of the document management systems that others here are likely to suggest. Be very careful going down the document management server path. It's far, far more work than you think it will be, than the vendor will tell you it is. Not simply more work for you, but for your IT staff and your users, too.

The Google Search Appliance, by contrast, is "fire and forget". Plug it in. Turn it on. Patch it when Google suggests you do so. That's about it.

--
If you mod me down, I shall become more powerful than you could possibly imagine.

ProjectWise by adamziegler · 2009-06-10 10:37 · Score: 1

We use a Bentley product called ProjectWise. It is a document management system with file attribution among other things. It is primary useful for Bentley's line of products, but we have used it as an archival system as well as a working documents that are non-Bentley specific. No... I do not work for Bentley, but my job heavily uses their products.

Start with the WAN by PatJensen · 2009-06-10 10:42 · Score: 1

Take a look at network-based WAN acceleration products that will significantly reduce the overhead of SMB/CIFS traffic. This will make it easier to index, cache frequently used documents locally and improve your WAN utilization company wide. It will even cache directory lookups and they will "feel" instant to the end user.

A good example is Cisco WAAS, a cool video showing how it works is here: http://www.cisco.com/cdc_content_elements/flash/ans/index.html

See here for data sheets and specs: http://www.cisco.com/en/US/products/ps5680/Products_Sub_Category_Home.html

Cisco's solution is inexpensive and you can use your existing router investment to do all the heavy lifting.

Pat

A Document Management System? by Super+Jamie · 2009-06-10 10:43 · Score: 1

Unsurprisingly, the answer to managing many documents is to use a document management system. There are several commercial and free products available, both linked here and on the Wikipedia page for Document Management Systems.

I've worked next to the team who administered Bentley ProjectWise in a previous engineering job, which is expensive but definitely suited to your task. There may be other good options out there.

DMS by jjshoe · 2009-06-10 10:43 · Score: 1

DMS -- http://en.wikipedia.org/wiki/Document_management_system

--
-- botsex is {grep;touch;strip;unzip;head;mount} /dev/girl -t {wet;fsck;fsck;yes;yes;yes;umount} {/de

LaserFiche by Hadlock · 2009-06-10 10:44 · Score: 1

We're using a Win3.1 app called LaserFiche on XP with > 250,000 documents and it's lightning fast, works with TIFF files and PDF and probably more. Includes file and folder permissions.

--
moox. for a new generation.

try iPhoto by docbrody · 2009-06-10 10:45 · Score: 1

Step 1: Print out all 100 thousand docs and draw different little smiley faces on each of them. Step 2: scan all your docs back in as jpegs. Step 3: import all those jpegs into iPhoto and use "Faces" to magically organize them - just like on the television commercial.

Thunderstone by Darth+Cider · 2009-06-10 10:48 · Score: 1

Check out Thunderstone. It's what they do, and they do it very well.

The big guys use... by benow · 2009-06-10 10:49 · Score: 1

Documentum, docushare, livelink, sharepoint. I've heard of documentum installs with 100m+ docs. It's quite good, but expensive.

NetDocuments by bradvoy · 2009-06-10 10:50 · Score: 1

Take a look at NetDocuments. It's a SaaS (Software as a Service) document management system. It handles millions of documents, can be accessed from anywhere, and is relatively inexpensive compared to maintaining your own servers.

Garbage In Garbage Out by sexconker · 2009-06-10 10:52 · Score: 4, Informative

It's becoming quite a mess, sometimes quite slow, and there is really no naming or numbering convention in place for the files and directories. We end up with mixed casing, all uppercase, all lowercase, dashes and ampersands in the file names, and there are literally hundreds of directories to sort through before you can find the document you are looking for.

Slow. Upgrade your network and VPN. You know that VPN layer is just killing your performance.

No naming or numbering convention. Get one.

Mixed casing. Learn How to Properly Case Folders (and documents).

Dashes and ampersands. Are they a problem? Aesthetically unpleasant? I personally restrict punctuation in a filesystem to dashes, periods, and parenthesis (unless the punctuation is a replicable part of the name of the file/folder).

Examples:
01 - The First Track (vocal)
02 - $lashhvertisements Attack!
03 - Where Have All the A.C.'s Gone

Develop your own method that works and be obsessed about it to the point where you would reburn a disc if one of the filenames was "01-Name" instead of "01 - Name".

Hundreds of directories.
Each file should have it's own folder.
"That's insane!" you say. Start out with this mentality. If there is no reason at all to separate two files (they are part of the same thing) then place them in one folder, and make sure the folder is named all-encompasingly. Repeat for all files. If you get into a AB, BC, but not ABC situation, the solution is to have A and B and C, with A and C linking to B with your choice of shortcut/link/symlink/etc.
Do this until all files are in folders. Then repeat with folders.

There is NO substitute for organization and getting people on the same page. Develop some conventions. Task people to fix as they go. Check up to make sure people accessing documents are fixing as they go, and doing so according to convention. Once people are used to the convention, and once things are relatively organized, they won't ever need to search again. They'll instantly know where 99% of things are, and will be able to dig around and find anything else within seconds.

The main problem you face is getting organized after already being unorganized. It isn't easy, but at least you're not dealing with millions of paper documents.

Re:Garbage In Garbage Out by sexconker · 2009-06-10 10:56 · Score: 1

By "replicable part of the name of the file/folder" I mean in regards to illegal characters in the filesystem/os. Windows claims these are ><\/:|*^?" for example (dunno if it's Windows, NTFS, NTFS+FAT+Whatever else windows needs to support).
I didn't intend to do an example with /. references when I started. I wanted something showing the dollar sign, and then stuff with periods and a quote mark and a question mark (dropped). First stuff that came to mind. Had I planned it, or previewed my post, the first line would be "01 - The First Post (frosty mix)" or similar.
Re:Garbage In Garbage Out by jrumney · 2009-06-11 14:40 · Score: 1

Dashes and ampersands. Are they a problem? Aesthetically unpleasant? I personally restrict punctuation in a filesystem to dashes, periods, and parenthesis (unless the punctuation is a replicable part of the name of the file/folder).
Examples:
01 - The First Track (vocal)
02 - $lashhvertisements Attack!
03 - Where Have All the A.C.'s Gone

I'm not sure if you've done it deliberately, but all of your examples are a problem for cross-platform use. To answer the question, ampersands are always a problem, as they have special meaning in many contexts. Dashes are a problem only when they are the first character in a file name, where they can be misinterpreted as starting a list of options, and it isn't obvious how to make them be understood as a file name (quoting doesn't always work).
Re:Garbage In Garbage Out by sexconker · 2009-06-12 04:14 · Score: 1

Yes, it's intentional that I mixed a bunch of shit in.
I seriously doubt these people are digging through all these folders in a gui and then feeding them to command line shit.
Besides:
command -input -file.ext
command -input ./-file.ext
?
My question about ampersands (and punctuation in general) was rhetorical. It means, "sort out your filesystem and OS restrictions and come up with a definitive super set of restrictions".

file naming conventions and folders by PhantomHarlock · 2009-06-10 11:18 · Score: 1

I use the 'job' system, which I learned from working at Digital Domain (the Visual Effects Company) and then passed it on to the Aerospace company where I now work.

Effects companies deal with enormous amounts of data, and many different versions of a shot as well as all the elements that make up that shot, along with other data such as project settings files from software used in the making of that shot. They had a very specific file naming system to keep that all organized, and it was referred to as the job system, because first and foremost everything was logically separated by project.

How that has translated for me into the Aerospace field is at the root of the main drive share, there are two primary folders, job and departments. Departments contains generic documents for each department such as forms, standards, etc.

The 'job' folder contains several categories of jobs or projects, such as vehicles, engines, pumps, etc.

Inside those are folders with the project name. Inside each project folder is a series of folders for different data types, such as solidworks, reports, proposals, documentation images, etc.

File naming:

File naming should be consistent, and I always start my own files with the date with year first, because I do not trust meta-data one single iota. I have had dates wiped out when a backup system kept a backup, but did not preserve the file creation / modify date on copy.

After that it is the thing, then the version.

So 09-06-10_widget_v01.sldprt

version two should be exactly the same, with the number iterated up. There should never be a document named something_FINAL because you always end up with FINAL_FINAL_FINAL etc. :)

Now, as you probably know, the difficulty is enforcing a uniform standard when people are busy doing actual work. Things get sloppy, things get messy. You have to keep up after people, and policing stuff like this is not fun. At Digital Domain is was an urgent necessity for everyone to use the standard and there was automated software that relied upon it. At the aerospace company, I gave up years ago trying to enforce a perfect policy. Now, people generally follow the example I set to a point where you can easily find things. When I first got to this company, when it was really small, all files were (seriously) piled nearly in a single folder. This was when the company was very small, but it was already a disaster and it was impossible to find anything. People were used to working on their own computer and did not have a concept of a shared file server, at least not in a modern sense.

Now you can just swatch down the left pane in windows explorer and get what you want very quickly.

This system is designed to use the left pane (lots of folders for organization) and people who were used to the Windows 3.1 way of double clicking through folders without the left pane had to change their (awful) habits. That was the biggest concession among the old school users.

The trick is also not to over-do the nested folders. Just enough to keep it nice and tidy.

Every once in a long while you run into a file that really wants to belong to several folders, and that's what shortcuts are for. Even if the shortcut gets broken you can look at the shortcut file to see what it originally pointed to, and you can probably find it that way.

At home I use the same methodology to archive 30,000 photographs. I can find anything in an instant by expanding folder icons. When that fails, plain old windows search is able to turn up what I am looking for, in those rare instances.

I have always been against anything that 'collects' your files into meta data, such as iTunes, or various photo editing programs. It's a big mess because one day that software won't be around and your files will be a mess.

Even my MP3s are organized by genre/album/1.song.MP3. I just drag album folders or songs into Winamp and I am off and running as my own DJ. I don't use a media organize

Re:file naming conventions and folders by PhantomHarlock · 2009-06-10 11:27 · Score: 1

I should also add that this works with a small to medium sized companies. large corporations in an enterprise environment must take on more intricate data management policies.
Digital Domain had / has around 300 employees, and this company has less. DD at that size already had internal tools to manage and archive the files, and check for compliance with the structure.
I also did not address the issue of compartmentalization and security for classified vs. non classified material. The government has its own IT security standards that you must adhere to when dealing with classified information.
Re:file naming conventions and folders by smitty97 · 2009-06-10 15:35 · Score: 1

The 'job' folder contains several categories of jobs or projects, such as vehicles, engines, pumps, etc.
Inside those are folders with the project name. Inside each project folder is a series of folders for different data types, such as solidworks, reports, proposals, documentation images, etc.
File naming:
File naming should be consistent, and I always start my own files with the date with year first, because I do not trust meta-data one single iota. I have had dates wiped out when a backup system kept a backup, but did not preserve the file creation / modify date on copy.
After that it is the thing, then the version.
So 09-06-10_widget_v01.sldprt
version two should be exactly the same, with the number iterated up. There should never be a document named something_FINAL because you always end up with FINAL_FINAL_FINAL etc. :)
Because of the way SolidWorks looks for and uses referenced files, having the version number as part of the filename is BAD. How on earth do you update your assemblies when youve got to replace all the files you modify all the time? Either they dont get updated or you spend a lot of time with SW Explorer's Replace command.
If you ever move to a PDM system, and you should- even the free Workgroup one that's part of SW Office Pro, it will treat foo_v01.sldprt and foo_v02.sldprt as different files altogether, not versions. In Workgroup PDM, the version is a custom property; in Enterprise PDM its in sql. We have "job folders" similar to yours, with Docs (MS office, etc), Drawings (all CAD), Photos, Correspondence, etc. We moved the solidworks stuff out to workgroup pdm. It also takes care of write access issues when a few people are working on the same project- people can take "ownership" of a file they would like to change and check in a new version.
Give it a shot- you can set up a workgroup vault on your local machine and play with the settings. After working with it for a while youll wonder how you did without it.

--
mod me funny

All CMS is crap, total crap by wsanders · 2009-06-10 11:22 · Score: 1

Oh no, not another CMS.

I've never seen a CMS that was anywhere near up to date.

The only way to index more than a few dozen documents is to use Enterprise search.

For the really cheap, you can install Google Desktop on the PC that holds the Enormous Shared Drive, and then let people log in via Remote Desktop or VMC and look stuff up. (Is there a Google Desktop API?)

You eventually could have a lot of people making personal indexes of the Enormous Shared Drive with Google Desktop, which is going to cause problems that will motivate you to obtain a real enterprise search package.

--
Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"

Obviously by kitsunewarlock · 2009-06-10 11:28 · Score: 2, Funny

Obviously throw them on the desktop. Once it fills, throw them into a New Folder. Once your desktop fills with Folders, throw those in My Documents. Repeat until your computer crashes.

--
Ginga no Rekshiya Mata Each page.

Find and Egrep by antirelic · 2009-06-10 11:31 · Score: 1

Basic unix tools can do the trick. find (atime,ctime,etc) mixed with egrep, or just egrep with -R... all sorts of solutions, right at your command line.

--
20th century Marxism is not progress...

Why can't you use a database? by damn_registrars · 2009-06-10 11:32 · Score: 1

I'm pretty sure there are databases that can store and serve up documents based on criteria. Couldn't you set up a centralized web server with an SQL backend that hosts those files for you? You would be able to then keep track of who is using which document and when, and regulate who can do what with different documents as well. As a bonus you should be able to ditch SMB while you're at it and move to a more robust OS for your critical files. Centralizing those documents would also make it dramatically easier to back them up at regular intervals.

--
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.

WAN Optimisation - Riverbed & Cisco WAAS by kava_kicks · 2009-06-10 11:35 · Score: 1

This is not going to help you with your 'finding the right document' problem, but it is essential for your remote offices to be able to open (and save) those documents in a reasonable time. It will also have the added benefit of dramatically reducing your WAN traffic (think 50% reduction). When I initially trialled these, Riverbed was miles ahead of Cisco. That was 2 years ago, but they are still the only one with a remote client and a few other tricks. Well worth the investigation & money.

SharePoint by rennerik · 2009-06-10 11:39 · Score: 1

Yes, I know it's been mentioned before. Yes, I know it's Microsoft. But SharePoint is an excellent document management system. It supports clustering natively, load balancing, search, information rights management, web editing for most Office formats, InfoPath web-integration. Users can also save natively to SP via WEBDAV through Office apps directly, or through Explorer. There's a whole crapload more that you may want to check out at the SP site.

To get yourself organized and imported, there are .Net libraries available for you to natively access SP and manipulate the whole system via scripts. Importing and exporting files is a cinch using these APIs. There's also exposed web services via SOAP that let you do the same thing. And, in the end, there's the actual SQL backend that is very straight-forward so if you don't want to use the SOAP or SP .Net libraries, you can manipulate the database directly.

So no, you are not locked in. And, the licensing cost is the most reasonable out of all the document management software out there.

Real Men Use by maz2331 · 2009-06-10 11:41 · Score: 1

Real men use an old TI-99/4A machine with a casette recorder, and files sent via RS-232 connections.

Talk to a Large Lawfirm IT department by thinktech · 2009-06-10 11:46 · Score: 1

Lawfirms are experts at managing millions of documents using document management software. If you want state-of-the-art document management. Then the software that lawfirms use is what you're looking for.

--
What's up with this box everyone has to think inside of or outside of? Why does there have to be a box?

Documentum bad by KhaymanUCSD · 2009-06-10 11:51 · Score: 1

I'm on the IT Applications side of things, not operations so my experience with this has been more as a user than as an admin (though I've helped that group on a few things)...

...but we implemented Documentum and have found it to be slow, difficult to deal with and I've heard no end of horror stories about how hard it was to implement.

In all honesty we had a properly set up sharepoint (tsk!) solution at another company and it pretty much ran itself and did the job we needed it to do. YMMV.

--
Kneel before Sig!

Very simple setup by massons · 2009-06-10 11:57 · Score: 1

Simple.. use CVS. Documentation is centralized and de-centralized. You have versioning, log, comment, and overall this... it's free

Re:Filenet vs OnBase by kaatochacha · 2009-06-10 12:00 · Score: 1

we use onbase, I like it. AND, one day when a tech was onsite for training, the entire home office was having a day off at Cedar Point. mmmm, rollercoasters.

I've done this before. by Anonymous Coward · 2009-06-10 12:09 · Score: 1, Interesting

I personally dealt with an issue like this at the Australian arm of large international mining equipment manufacturer. I wrote the software solutions mentioned and went on to do my engineering honors project in the area. My first recommendation is, stay away from document management systems, they are bulky, inefficient and tend to lock you into "their way" of doing things. As soon as you want something different, you will find yourself stuck. This is a simple problem don't make it too hard for yourself.

My solution was multi-layered:
1) Place exactly 1 person in charge.
2) Enforce a naming convention. - Our CAD Drafters and Engineers (of which I did both) were notoriously bad at naming their documents correctly. Most of this was ignorance. Document your naming convention and make it well known.
3) Write or come up with a standardized way of generating document numbers. In my current job as a software engineer I would recommend a simple, incremental numbered approach. Every document, every revision, simply gets a new number. Our engineers did not like this. So we went for a middle ground. Something like XXX-YYY-ZZ.eee Where XXX is the equipment type, YYY is the sub type, ZZ is the revision no, eee is the extension/file type.
4) Standardize the way you store your documents. For instance, make a folder structure . C:\xxx\yyy\XXX-YYY-ZZ.eee
5) Register ALL documents in a database with location, comments, purpose, revision, author name etc etc.
6) Take the Draftsperson or the Engineer out of the archiving process. I wrote a utility that checks the a single "to be archived" folder, fixes obvious mistakes such as using "_" or "." instead of "-" and so on, checks the database to make sure that the document has been registered and then drops the into file system. Make the archive read only access for everyone except the person in charge (and any utilities of course).
7) Clean up your existing archive. This can be a semi-automated process. I wrote a utility to do this partially, but it just takes a lot of painstaking effort. With 70,000 documents this was a slow and painful process but it can be done.
8) STICK TO IT. Any exception will erode the system over time making it useless.

Document Management Systems by anexkahn · 2009-06-10 12:09 · Score: 1

There are a ton of Document Management systems out there, our company uses http://www.opentext.com/ look for DM You can use Microsoft Share point as a document management system, but it is not really what it was designed for. DM will integrate with all the Microsoft applications. It will give you document numbers, version numbers, etc... you can profile your emails as well if you want. We have had some performance problems for the remote locations, but it is still usable. I did a search for open source document management systems on Google and there are a ton out there if you don't feel like paying for something.

--
Curious about Storage and Virtualization? Check out

Re:Document Management Systems by Shados · 2009-06-10 14:52 · Score: 1

You can use Microsoft Share point as a document management system, but it is not really what it was designed for.
Then please tell me what it was designed for, since a large portion of the default feature involve pure document management.

Re:SharePoint by Shados · 2009-06-10 12:11 · Score: 1

Manipulating the sql backend is a pretty bad idea. Its not quite -THAT- straight forward, since a lot of the elements end up crunched in one table in xml, so you have to be careful with that. Things are pretty duplicated and its not supported, plus it changes drastically between version, making migrations difficult.

WebDav however is indeed the way to go (for documents), especially since Vista lets you map a webdav folder as a drive (letter), and Linux has tools to mount them like any other volume, too. Good stuff.

Use Confluence by Dani+Filth · 2009-06-10 12:24 · Score: 1

or Confluence Hosted: http://www.atlassian.com/software/confluence/hosted/

Oracle UCM by Everything+Else+Was · 2009-06-10 12:28 · Score: 1

I've worked with Oracle UCM (formerly Stellent) for a few years now and would thoroughly recommend it. It's scalable into (at least) the 10s of billions of documents. A single repository for Doc Management, Records, Web Content Management, workflow, imaging. It comes with security, library services, metadata, and search OOTB. Using the WCM, you can make your documents available on an intranet, extranet or internet site, according to specified security policies.

BTW... offices on satellites... that's so cool! ;-)

--
My other account has mod points!

Re:Oracle UCM by profaneone · 2009-06-11 01:07 · Score: 1

I totally agree (on the UCM and the satellites). My wife is the sole admin for her company's (a power utility) UCM; only a very small part of her responsibilities. My wife is a civil engineer not an EE or computer engineer and her department needed a document management solution years ago. Prior employees had evaluated and installed the system. The IS dept is only brought in when an upgrade is installed; the hardware is managed by IS after all. The system is so easy to use that additional departments keep putting in requests to have their documents added to the system due to word of mouth around the company. In addition to increased productivity, the company has saved hundreds of thousands of dollars in paper/printing.

An Inhouse System by sasha328 · 2009-06-10 12:31 · Score: 1

We're an old engineering company, and our products last decades, so we need to keep lots of records.
Recently, we started scanning old documents (a warehouse full of them) to make room for expansion.
It is a very tedious process, because we can't risk shredding the old files unless we know for sure that the scans are correct. Amyway, for storage, we decided to go for an in house web-based system (some one developed it for us) that is quite basic, and does two important things for us:
1- it references the file in it's location, rather than store the file in a database and copy it to the webserver
2- gives us the ability to change meta data (the document indexes) as we find errors in them

By referencing a file in it's "physical" location gives us two layers of access control: 1- through the database permissions, and the other one through file system permissions. this is important for restricted files...

Obviously, searching is the important part. and indexing is absolutely critical and the most time consuming process.

Someone suggested to us Google appliance, but non of the scanned documents can be searched. they are all images.

The actual application is pretty basic concept (nice interface features, but the concept is simple)
1- A database to hold the info
2- a table per document type containing teh meta data and the filename and filepath
3- a web interface to search and re-search to narrow down the list.

Content Management Systems by BentonMiller · 2009-06-10 12:35 · Score: 1

I'm sure it's been said by now, but you really should be looking at a content management system. There are several vendors out there that sell various types of document control systems; Pilgrim, Master Control, I'm sure Oracle has something that does that. There are also open source frameworks that you can develop in-house like Drupal. All of those are online document management systems. Users upload documents to them. File naming conventions can be enforced as well as directory structure etc. Many of them allow for document collaboration and approval. It's a complex problem, and a valuable solution will take some serious thought and time. I've heard some people use google documents, but for a company of your size I wouldn't recommend it. In any case, folders on network drives are NOT the answer.

Two words... by roc97007 · 2009-06-10 12:39 · Score: 1

Google appliance.

--
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.

We use ImageNow 6 by Nimey · 2009-06-10 12:51 · Score: 1

I work at a midwestern public university in the USA, and we've been using this program for several years and a few versions. Backend can work on AIX, Linux, or Windows, and the frontend at least Windows (don't know if Macs or *nix are supported, we don't have many of those on users' desks). We probably have several gigs of imaged documents in this system, and it seems to work pretty well.

You'll have to import all the documents into the system, of course. The company recommends certain tractor-feed scanners for this; lighter-duty ones are USB, heavier are SCSI. I think it also has a software printer emulator to let you dump e.g. Word documents into the system; how you organize things is up to you.

--
Hail Eris, full of mischief...

E pluribus sanguinem

It's all about Taxonomy and Metadata by TrekBody · 2009-06-10 12:57 · Score: 1

Whatever the solution, you have to get staff to declare what it is on the front end. It's not all about the technology. I see some of the benefits of Sharepoint, but depending on your audience (tech-savvy or not) it may become a training issue. Prepare for change management.

What I like about Sharepoint is the Office integration, the improvements over the last few years, document history (versions), and mostly, the ability to require metadata. If you have a taxonomy of topics, it will make it much easier to create a search appliance that can find what people are looking for. You may be forced to look at auto-classification if you can't get staff to do it, or hire knowledge managers (librarians) to properly catalogue. Trouble for us is getting to agreed-upon taxonomies and hierarchies across divisions (I'm in the knowledge management trenches here).

A good way to start might be Sharepoint repositories, require a topic field, seed it with however many topics you can come up with, and leave an OTHER field so you can collect what you have not organized. If you analyze what comes into the OTHER topic, you may keep adding new topics.

Find the logical buckets to start search before they think about searching too. Does your staff only care about 1 project at a time, break it up into project searches. Basically offer them one level of selection before they get to search - it may make things easier (if you are structured that way). They may look for something from a particular function - Marketing search vs. Operations search.

Also, sharepoint can leverage active directory info, so you may be able to get some metadata automation (Docs from sales staff vs. R&D, etc.)

Hope these points help. Contact me if you need more.

--
Jim - your name is Jim...

swish-e by ggpauly · 2009-06-10 13:06 · Score: 1

I implemented swish-e, http://swish-e.org/ for a client with html and .pdf indexing (nightly) in 11 hours from a standing start (never used swish-e before).

--
Verbum caro factum est

A cool web application :D by zeekren · 2009-06-10 13:21 · Score: 1

Hi there, I am one of the developers of this nice web tool which in fact was designed to achieve the requirements you say, we are calling it anydata, but dunno if we'll need to change it's name as it's a registered trademark, at least you see our goal ;)

http://devel.anydata.tv/

Try it out with firefox if you don't want to see something ugly right now. It's a beta, but in less than 1 month you will see it complete. It looks like a filemanager, pretty well known user interface for browsing documents and information. This system ables you to store files, bookmarks, text notes, contacts and soon pgp'ed passwords for secure-sharing across system administrators.

In short, keeps the 'tree-browsing' typical schema of filesystems plus generating and showing previews of documents, tagging, automatic keyword gathering from documents and a search engine.

By the way, it's GPL :D

Anyone interested just send me an email to kenneth at gnun d-o-t net and I'll give you a testing user or whatever needed.

Cheers!

Kenneth

The ultimate document managment system by zerofoo · 2009-06-10 13:21 · Score: 1

OK, so it is a bit hard to get your documents out once you put them in to this system, but man, does it tidy up a mess of documents.

-ted

Document Management System?? by bytethese · 2009-06-10 13:42 · Score: 1

How about Desksite (formerly iManage) or PC Docs?

Two paths by jocknerd · 2009-06-10 13:47 · Score: 1

You could set up a Document Management System like Alfresco or god-forbid, Sharepoint. Or you could run OS X Server and let Spotlight index everything.

Want it done right the first time? by Xadnem · 2009-06-10 13:48 · Score: 1

I've got this car, and it doesn't run and it's got all these strange bits inside under this hood thingie. . . . Hire a librarian or someone with a degree in knowledge management who has experience in the corp world.

First, you need a procedure, not a "Solution" by CAIMLAS · 2009-06-10 13:51 · Score: 1

First, you're potentially dealing with more than one problem here you're trying to solve: slowness, and naming convention. I'm guessing they're somewhat related (large directory listings due to lack of organization), but there might be a deeper infrastructure issue that needs to be dealt with, too.

As for organizing files, You need a naming convention for your project files, first and foremost. Throwing a bunch of disparate files at a CMS is going to do nothing but complicate things more (from a sane-management perspective).

Data categorization is key. You need to figure out a way to organize it in a fashion which is both contextual to how people use it as well as how it relates to the other data (in, say, a project).

For instance, you will want (at a minimum) the equivalent of user-level and group-level data shares. This would, in all likelihood, get kind of tricky with shifting working groups. For this there are multiple ways to use ACLs (as opposed to just user/group/all permissions) within Samba (with or without shackling the machine to a Windows domain/authentication server). ext3 and XFS both have the ability to use ACLs (XFS natively), last I checked. Ultimately, this would probably be better than just using user/group, as it would be more extensible.

As for a Solution...

Something to look into specific to samba, is the "veto files" directive for smb.conf. It is per-share. I am uncertain whether it supports regex (it didn't in early 2005 when I last used it), if it did it could be very useful for enforcing a specific namespace (going forward).

I would recommend "enforcing" namespace. While this is likely a self-created problem (ie you or your predecessor did not set things up properly in the first place), you really need to push to your users the importance of this. You need to tell them "organize your files, it'll make things faster" if there's any bitching.

There was an article in LinuxMagazine a while ago about determining the age of data. Utilizing this in some sort of auto-sort script to move "old" data to a "pre$date" directory within the original messy directory might speed things up. Also, archiving (or at least moving it to an "old shit" directory) past, unused data is important. It eases the "human element" of data organization.

Projects should all have a reference number (because there is, in all certainty, hard paper associated with the projects, and sometimes you need to cross reference). Keeping this consistent is important. Use what works, keep it short/demarked so users don't avoid using them. I like each project folder to have the project number to relate to contract/etc. start (short) date (eg. 080112 for Jan 12th, '08) followed by a 2-3 digit number (depending on how many projects are started per day) followed by major revision. End result: something like "080112.01.a Jennings Construction" Or organize by client ID. Or something.

Requiring and/or encouraging project naming conventions through the managers (at the bequest of your manager/CIO/whomever, or just pleading) might also be worth a try. One department out of 5 doing it would be better than none.

IMO, once you've reached this step, you can consider putting it in a CMS to help perpetuate/encourage the organization. But remember that a CMS is not a panacea, and might even complicate things further (ie, instead of navigating to a file, -everyone- just searches the whole index, slowing things down further).

--
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers

Alfresco by brassmaster · 2009-06-10 14:00 · Score: 1

A document management system is a must for that many documents. Check out Alfresco. It's open source and as such isn't outrageously expensive like it's competitors. If setup seems too daunting for you, check out tsgrp.com. Technology Services Group is a consulting firm in Chicago with experience working with Alfresco and may be able to make this transition easier for you.

You need a Document Management System... by Derwood5555 · 2009-06-10 14:10 · Score: 1

Or DMS. Commercial packages include Docs Open, and Soft Solutions.
Open Source DMS = http://mydms.sourceforge.net/

An Aerospace company without process == FAIL by Platinumrat · 2009-06-10 14:16 · Score: 1

It sounds like they're heading for an epic fail. Aerospace == Process + CMS. They will never survive the NTSB audits and safety Nazi without both. They will need to prove the Change trail for every nut/bolt/software path/data item/paper clip and who authorised/designed/checked/tested it for the rest of their natural lives. So if they don't have Process + CMS, they are screwed beyound belief. To me it sounds like a medium sized software house, that's decided to switch to Aerospace because it's cool or high tech or the marketing guys sold some product.

I wrote a few articles about that by nbauman · 2009-06-10 14:25 · Score: 1

I wrote a few articles about that for Law Office Computing magazine, so I'm very interested in these comments. It was a long time ago, and the software has changed, but the concepts are still the same.

http://www.nasw.org/users/nbauman/txtsrch.htm

http://www.nasw.org/users/nbauman/lawdb.htm

http://www.nasw.org/users/nbauman/discover.htm

They were imaging and indexing up to several million documents. During a civil suit, in discovery, companies on each side of the lawsuit have to disclose every relevant document to each other.

Lawyers probably use the most flexible and all-encompassing systems, since they have to deal with every industry, every profession, everything. They also spend more money on their systems than most people can afford. They told me it costs them about $1 a page to thoroughly index big databases.

Information scientists told me the best model of a document database was PubMed, which indexes virtually every significant published medical article. http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed

The big limitation of Google is that you can't search too well by date. Another limitation of text searches is that you can't search for concepts -- just words. Sometimes words (particularly names) match concepts very well, but if they don't, you've got a problem.

Yeah, it would have been nice if you had set up coding and naming conventions at the beginning, so the original authors could have sorted them as you went along. It may be difficult or impossible to go back and re-code them after the fact. It could wind up costing $1 a document. OTOH, you could be lucky -- some industries have been using standardized filing schemes and standardized jargon since the days of slide rules and T-squares.

There should be standard filing schemes and procedures throughout your industry, so your solutions may be industry-specific. There should be consultants that deal with your industry who would be happy to talk to you (for the prospect of maybe getting your business). There should be trade magazines in your industry that have covered the same issue for companies of your size. (Hell, if the price is right I'll write a roundup for them.) Or you might have a trade or professional association with some friendly people who have done it before. Trade and professional associations usually have a computer or information technology section, and if you're a member of the association, you can call up the members of the section.

librarian by confused+one · 2009-06-10 14:39 · Score: 1

everyone is talking about document management software and search appliances. You're going about it all wrong...

Hire a document management staff.

Librarians. Hot librarians.

DAM and Extensis Portfolio + Filemaker by digitalcurator · 2009-06-10 14:39 · Score: 1

Back in the 90's I helped create a media department for large textbook publisher. One of the first projects was an asset library and tracking system. To this message brief. We first needed a naming convention. Look for a constant throughout your products, ours was ISBN numbers. That became the main identity of the product/project and their main digital folder. Every item or product was dropped in a sub folder such as images, design, text, etc. From here the main folders were always scanned by Portfolio and it was told/programmed that the main descriptions should come from the folder names. This allowed anyone with knowledge of the product ISBN to find details on the project. It also greatly minimized keyboarding of metadata onto the files needlessly. Portfolio then will allow check in and check out (versioning) to stay abreast of any edits or updates. The whole metadata catalog would also be exported and brought into Filemaker for secondary backup. Look to a constant for naming convention, keep it simple, look at ways to minimize keyboarding metadata, go over the counter (they are much easier to work with and you can experiment-they are also more than capable of handling 100K documents). Last. Good luck and if needed look for help.

Subversion by lars_boegild_thomsen · 2009-06-10 14:52 · Score: 1

Why not use subversion? Files will be accessible using a subversion client (including log + history), as webdav (only current version) and through a standard browser (read-only).

Document Locator from ColumbiaSoft by ASBands · 2009-06-10 14:52 · Score: 1

The company I work for uses a system called Document Locator. It is a Windows-shell integrated document management system. Basically, if you took Subversion and gave yourself extremely fine-grained control of repositories, folders and the like. It scales decently, too -- we have millions of documents spread across 25 major repositories, many of which include AutoCAD, Bentley Microstation, Smartplant 3D and other sizable files. The system is also fairly extensible, as we've built quite a few internal applications off of the DL system and there are plenty of third-party plug-ins available (a notable one being Brava, an application that allows adding QC and other markup to repository files). And if you don't want to be constrained to Windows, there is a web client available, which works decently. While it is not without its problems, the overall experience has been pretty good.

Full disclosure: My company is ColunbiaSoft's largest customer and, as such, we know a good deal of the development team.

--
My UID is a prime number. Yeah, I planned that.

Query Based Document Management Software by indytx · 2009-06-10 14:57 · Score: 1

My last company relied on a program called isys to index and search documents and email. You don't have to worry about what a document is named, just the type of content you're looking for. This solution can save a lot of time, especially if your users are good and phrasing queries. On the other hand, I did not have to maintain it, so I have no idea how much administration time was devoted to keeping it working.

--
Make love, not reality television.

Salesforce Content is another option by 0xbeefcake · 2009-06-10 15:18 · Score: 1

One of your options is to use Salesforce Content, which is a very usable content & collaboration piece from salesforce.com. It's fully wired in to the rest of the force.com platform and CRM apps suite too, so if you're looking to build out more of your company's apps in the cloud, it's worth taking a look at it. http://www.salesforce.com/crm/marketing-automation/document-content-management/

You're better off doing it yourself by stoicio · 2009-06-10 15:25 · Score: 1

After looking at backup systems and maintaining libraries of data
our company found that we needed something that fit our needs.
We designed a system that worked and knuckled down to programming it.
We now have a search-able database of documents and files with attributes
as well as context from content for over 20 years of data and documents.
We can pretty much find any file in less than 5 minutes.
We could still make it better but we sure couldn't have done anything like
it C.O.T.S., Google included.

If Google failed tomorrow, where would your documents be then?

naming convention, DAM, archive by capsteve · 2009-06-10 15:31 · Score: 1

establish a naming convention. come up with a few simple rules regarding:
file names
directory names
customer names
job/project names
department names
limit the number of total allowable characters in a file name, and publish and distribute your rules in an easy to follow cheatsheet. for example:
all files for client "Smith Inc." reside in a directory named "SI"
all files for Smith Inc for project "Widget X" reside in a subdirectory names "WX"
all files for Smith Inc for project Widget X have a unique number generated by you accounting system
all files generated by the sales department need to have "S" after the project number
enforce using file name extensions for all file types
so a powerpoint deck created by the sales department for a sales pitch to smith inc for Project X with an internal job number of 1234 would be named "SIWX1234_S.ppt".
a well structured naming convention with simple but rigid rules will allow users to navigate a file system to find files and identify wrongly filed assets.

invest in a digital asset management system that with a database backend.
there are many DAM systems available both commercially and opensource.
utilize one that has a web front end, so you can enforce consistancy in end user experience(as opposed to a fat client embed metadata into the files themselves in XML format thru the DAM if possible.

based on the naming convention you've established and the DAM system you've deployed, you should be able to track when a file was created, modified, and last accessed. establish rules regarding when a file moves from disk to tape, and from online tape(in jukebox) to offline tape(out of jukebox), to cold storage(offsite).

--
three can keep a secret, if two are dead - benjamin franklin

Bring out the Pitchforks and Rope by moxitek · 2009-06-10 15:46 · Score: 1

I know that I'll probably get verbally lynched for saying this here, but MOSS 2007 enterpise search is a REALLY nice way of dealing with this . Since MOSS can index your file shares, then all of your users can search for documents contextually using a simple web portal across multiple sites... I better leave before I'm hanging from the Slashdot tree.

Simple is Best by Diagoras+of+Melos · 2009-06-10 15:54 · Score: 1

I decommissioned a document management system at my client, a smallish law firm, because the system was too complicated, insecure, and expensive. Updating it to run w/ the latest version of MS-Office would have cost thousand$ just for the s/w. We replaced it with Google Search, and we defined a file hierarchy and naming convention for all documents created after the switchover. Client is very happy, their file access is more efficient, and they saved a bundle of money on administration, not to mention all the h/w and s/w they never bought.

Obviously documents are the lifeblood of any law firm. These guys only have about 100,000 or so, less than the aerospace company in question, but the lesson applies. It's extremely unlikely the IT admin of the aerospace company has the resources to manage, much less install, a proprietary document management system.

The ONLY reason to have a formal document management system with a database (like Microsoft SQL *ugh*) is to control access. But access control is something that really, really should be done through the directory. So unless you're NASA or another organization with many, many millions of documents and a legally mandated auditing requirement, there's no reason to make this more complicated than necessary. And even then....

Of course, if we're talking about images with no searchable text, that's another story.

--
-- "The only thing that is ever new in the world is the history you do not know." -- Harry Truman

Contract a librarian by Anonymous Coward · 2009-06-10 15:58 · Score: 1, Interesting

I don't at all mean to be pat or facetious with such a short answer. But, seriously, you're asking the wrong crowd. Librarians have masters degrees in answering just the question you're asking and it goes far beyond just books. A couple of dozen hours of consulting contract with a good librarian can set you straight - whether you keep the samba store or you pony up for document management software. Because if you have a strategy for organizing your information and execute on it you will reap benefits that don't show up on any productivity spreadsheet. And a good librarian will tailor the system to how the people in your organization actually use the information. Get an internship program going with a library school to have someone remotely do the cleaning and maintenance every once in a while. Whole thing should be doable for a few grand.

multiple points by mr100percent · 2009-06-10 15:59 · Score: 1

You need to deal with this issue on multiple points

1. Consider PDF with OCR. That way you can search within files for specific words
2. FIle naming. Use a standard like date_headline.pdf
3. Hire a library sciences major, as an earlier poster suggested. They spend years studying how to organize and retrieve.

Use Permissions: User & Group: Company Structu by blavallee · 2009-06-10 16:08 · Score: 1

Outside of shoring up your connectivity to the remote site, you should use the structure of your company to your advantage.

It sounds like the wild west. You gave everyone full RW access to the fileserver.

Build a file structure the mirrors the organization of the company and apply permissions appropriately.
Map drives in the same fashion. An added advantage to this, you can split the files across separate Samba servers later with a minor map change.

The finance department has no reason digging around in your design documents.
The engineers don't have any reason to poke around in your sales collateral.
Does everyone in the company need to be tempted to open "DOD_GPS_NOYB_47-090611.xls"

Getting every employee to adhere to a single naming convention is like herding cats. Delegate responsibility to the directors and managers to keep their areas on the server organized to their own needs. Then you just need to deal with the occasional outlaw.

You may also want to deploy Samba servers to the local offices and back them up to a central server regularly. Use this for personal shares and anything that is primarily used ONLY in the local office.

In most cases, I doubt that "the single person" working on Project X at Remote Site A needs to work off of a centralized copy of their document. Do you really need to share this document across your entire organization? Let the employee keep their file on the local offices share. Let a employee or a manager share it with the entire department. Let the director share it with sales.

In the end, you may find one small part of the organization that REALLY needs a naming or numbering convention. You can address that when they approach you. For now, you need to stop everyone from treating the company share like their own desktop.

There's technology for that by sribe · 2009-06-10 16:14 · Score: 1

It's called a "database". You might want to look into it.

Checkout Isys by Odyssey by LBook3 · 2009-06-10 16:17 · Score: 1

As a PC user, I have found one of the best products to manage hundreds of thousands of documents (*.doc, *.txt, *.wpd, *.xls, *.ppt, and email, images, etc.) is Isys by Odyssey. It requires very little work on the part of the endusers. Just searching. For the IT person, it requires very little to be up-and-running. You can set up automatic indexing to run anytime, without restricting usage and searching. This can be done across all hard drives. I found this little company (and their software) about 15 years ago when I was still using DOS. They have, of course, developed their software to match all the Windows versions that have come out, and have Web versions also. I manage a huge library of both physical and digital documents - all that must be located within seconds. Without this software, I would not be able to perform this job in the high-level capacity that I currently do. Yes, Google is a great contender, but it has its limitations. Google desktop, for example, does not index all different types of software that the hundreds of users may have/use/need. I have found the Isys by Odyssey to not only be extremely fast, high quality, but they have great customer service, and their prices are reasonable. You can always start slow - with a low number of licenses, and work your way up, depending on the company's finances and needs. We have 2 licenses, where I work. I currently am the main end-user to the product, and people request documents or information from me, which I can find and email to them in an instant. It's worth the time to check them out. Their home web page is: http://www.isys-search.com/

Use a content management system: e.g. IBM/FileNet by peterofoz · 2009-06-10 16:25 · Score: 1

The content engines like IBM/FileNet are set up to manage millions of documents. Many also have the ability to add remote cache servers to improve local performance for repeat document access in satellite offices. Contact Dave at Softech-assoc.com if you need help.

Two words by jevring · 2009-06-10 18:01 · Score: 1

Search engine

--
Move sig!

Re:Sharepoint by XDirtypunkX · 2009-06-10 18:02 · Score: 1

TRIM also has good Sharepoint integration if you're so inclined.

Suggestion - A proper Content Management System by NacMacFeegle · 2009-06-10 19:29 · Score: 2, Informative

Some of the suggestions above says that you should just chuck everything haphazardly into a big pile and then use search engines to trawl the whole mess. I don't buy that. Instead, (like some others) I'd suggest a proper content management system such as the ones from http://www.alfresco.com/, http://www.interwoven.com/ or http://www.hummingbird.com/.

The reason for this suggestion is that I know that these systems are being used by organisations which handle, as OP said, hundreds of thousands of documents and which have satellite offices (e.g. large multinational lawfirms). They provide several benefits such as the possibility to structure projects, have both project related documents and e-mails saved and indexed in the project folders, allows for searching and proper document version chains (meaning that you can revert to older versions of documents if some klutz breaks a newer version).

Of course, this means quite an investment, a learning curve for everyone at your company and, most likely, the hiring of an individual with experience of the chosen system.

Users and Spotlight Server by namgge · 2009-06-10 20:31 · Score: 1

Firstly, you can absolutely forget about any system that requires users to name documents in a way that is descriptive, consistent, unique or anything else that a sane person would do.

Secondly, MacOS X Spotlight Server (as of version 10.5.7) doesn't work as one would expect/hope. Users' files stored on the server get indexed by the server but this index can only be read by users logged in to the server console (or via ssh), not clients that access the files my mounting them as shared volumes. If a client wishes to search the files, it must build its own index over the network. The workload on the server/network can cause severe performance issues until the clients have built their indexes, a process that will take hours and may take days to complete if you have a lot of files.

Namgge

Gina2 - web service by steve.decaux · 2009-06-10 20:34 · Score: 1

Dark Green have just this week gone live with Gina2, a web solution for document archives.

Have a look at http://www.gina2.net/ - the text is currently in German, but the English translation will be up there in the next couple of weeks.

Dark Green are offering Gina2 as a hosted service for companies whose core business is not managing IT infrastructure.

Two products by brentc3114 · 2009-06-10 21:15 · Score: 1

I work for a company that stores terabytes of documents. There are two products that do this well EMC's documentum and Microsoft Sharepoint. Pick your poison depending on whom you want to abuse you.

200,000 Resumes by Gob+Gob · 2009-06-10 21:49 · Score: 1

I've written a recruitment app that has 200k resumes and other types of folder indexed in text.

The files live on the disk in /TYPE/YEAR/MONTH and are converted to text and inserted into MySQL database.

They can be searched on name, date record id, free text, type, etc, etc; or just browsed to on disk.

The front end is PHP on MySQL.

These were imported from a files on disk approach.

It can scale with master slave replication, etc. Just keeping it simple helps.

Google search appliance by Nefarious+Wheel · 2009-06-10 22:01 · Score: 1

Go to Google main page and look for business solutions. They have a scheme where they'll charge you x dollars to index y hundred thousand documents, and they throw in the tinware (a custom pre-configured rack of search hardware, very scaleable) for you to plug into your LAN. All strictly inside your firewall. Set it up to crawl all your file shares and it won't matter whether you have a document management system or not. Most document management systems depend on keywords, taxonomies and special file name codes, all of which are decidedly old-hat. Index it and let 'em go search. The smallest version is kind of basic, but go up one level and they'll crawl pdf's, word docs, pretty much anything with text in it compressed or in source libraries or whatnot. They're pretty good. Not cheap, but then you're an aerospace firm...

--
Do not mock my vision of impractical footwear

Google? No. CMS! by Elixon · 2009-06-10 22:05 · Score: 1

Google is just a search engine. They need document management. :-) Correct me if it is not the thing called content management they need?

Import it into some CMS, sort it and make it available through the website secured by the password. We did something like this for http://www.olympus-ims.com/ (but these are public documents) and it really contains thousands of documents (in dozen languages) together with all the document revisions it is over the hundred of thousands of documents. Easy to search, easy to navigate, easy to manage.

Simply: CMS is what you need. Do research.

--
Well, I've got to get back to work. When I stop rowing, the slave ship just goes in circles.

Re:Google? No. CMS! by kdekorte · 2009-06-11 01:53 · Score: 1

Correct!
Tools that should fit the need include FileNet from IBM, and Documentum from EDMS. I'm sure those are others, but I'm familiar with both of them.
I've never really seen a good open source tool that does this.
Document Management tools allow organizing, searching, tagging, access control and filesystem or web based access. And 100,000 documents is nothing for one of those systems.
Re:Google? No. CMS! by Lord+Apathy · 2009-06-11 02:37 · Score: 1

Wrong! Leave Documentum out of that list. Documentum is a piece of shit. You would be far better off piling all your documents in one directory and searching them using grep. Or even better print them all out and tossing them around the office.
I don't have a real solution for your question, I'm looking into this myself. But I know Documentum is not what you are looking for. Using Documentum is like using a CA product for, well anything.

--
Supporting World Peace Through Nuclear Pacification

Check with the NTTC by JSC · 2009-06-10 23:42 · Score: 1

Several years ago I worked for a NASA project called the National Technology Transfer Center. A big part of the job there is organizing and searching through tens of thousands of pages of research documents. They used a document oriented database at the time although they may have migrated to something else since then. You might want to contact them for advice.

A friend of mine was the person primarily responsible for scanning in the documents. IIRC, the process involved OCR of the scans for key word search and indexing and then storing a compressible graphic image of the page - this got them around the problem of text databases not storing technical drawings, etc.

--
Time's fun when you're having flies. - Kermit the Frog

Asking for pain by jandersen · 2009-06-11 00:45 · Score: 1

Samba shared over a VPN? Man, you are asking for no end of painful trouble. There are many good ways of sharing docs, but putting MS docs in a filesystem shared over a VPN is not one of them. A simple way to improve things would be to drop all the filesystem sharing and create some sort of searchable index on a web server. If you want more sophistication and have money to burn (who hasn't these days?), go and talk to Oracle, they have some very good software for this very purpose.

I don't know why companies always do it this way - it is the worst possible way of organizing your documents. When you put them in a filesystem, people have to try to remember how to find the one they need; a directory is like a hiearchical database, badly implemented. Sharing it via a networked filesystem makes it even worse, because now you have a huge network overhead and the risk of undetectable corruption when the network stumbles. And the VPN means that your network traffic is something like 10 times as heavy because of the encryption.

I know you are going to laugh by hesaigo999ca · 2009-06-11 00:53 · Score: 1

The latest installment of Visual Source Safe is pretty good, they improved the performance over the network which used to kill on a domain spread across multiple cities (back during vb6 days), but now is really good repository tool. I also used another , but it lacked the history/detail section and could only keep a max number of files....seeing as you have hundreds of thousands

Smeadsoft by aapold · 2009-06-11 01:00 · Score: 1

Smeadsoft might work for you. -- note: I don't work for them or any affiliate of theirs, and have no vested interest in them being used --

I'm in the process of setting up one of their systems for document management, it seems to be quite capable of that. Its not open source and it would involve some cash to set it up, but I think it worth looking into if those two things don't eliminate it from consideration. (they also handle management of physical files, which is where they came from)... Thus far set up involves setting up a lot of framework and tags for the actual documents, and scanning a lot of physical files to be stored. There is this system of using large scanners with something called VRS, and putting barcode identifier sheets with stacks of documents.

So for example you could have a large stack of papers, of which half belong to one category (or subcat or subsubetc), the others to a second. You put barcode sheet (a blank paper save for one barcode) for the first category, then all those papers, then a barcode sheet for the next category, and so on. You load them into the scanner (obviously a high capacity one) and it reads them all and puts the scanned documents into the proper location in the database automatically.

--
"Waste not one watt!" - CZ

Content Addressable Storage by hicksw · 2009-06-11 01:17 · Score: 1

Don't try to use the file name or directory structure. This is difficult to adapt or relocate as the namespace becomes distorted from its original content over time.

Try this instead:

Assign arbitrary file names.
Adopt a directory structure derivable from those names, if you must.
Build a database of several tables to link keywords, project names, authors, etc, to the arbitrary file names.
Award small prizes for verified corrections to the database.

See http://en.wikipedia.org/wiki/Content-addressable_storage for more information.

From someone who has been down this road by CompMD · 2009-06-11 01:43 · Score: 1

Teamcenter. It freaking rules. Also, as evil as StarTeam is, it will do the job for you as well.

I have been a user/admin of both Teamcenter and StarTeam.

EMC Documentum by mu51c10rd · 2009-06-11 02:32 · Score: 1

We use EMC's Documentum suite here to manage our large volumes of documents. Expensive, but works great...and integrates with Fax software, MS Exchange, etc.

CamelCase by dna_(c)(tm)(r) · 2009-06-11 02:49 · Score: 1

Similar to CamelCase. Limits the number of variations on the same name considerably (no: camelcase, Camelcase, Camel case, Cam El Case,...)

Reminds me of the command 'passwd' in *nix, I always have to 'apropos password' to find the correct spelling. Why is it not 'password' or 'psswrd'? Arbitrarily dropping 1 vowel and 1 consonant is silly.

Content Management by wuglas · 2009-06-11 03:12 · Score: 1

What you need is a content management system. Such systems do more than store and find documents. They allow true document taxonomy management, records management for compliance and control, and many other features.
I personally specialize in IBM Content Manager. It's great for companies like yours where you have distributed offices. You can keep your metadata at one central location but have the documents themselves stored at your remote locations, all while maintaining centralized control.
Doug Hansknecht
Certified IT Architect
DougFromOhio@us.ibm.com

Universal online document viewer by crisgrey · 2009-06-11 03:25 · Score: 1

To help you with the challenge of sharing documents with your remote sites, there are universal web-based document viewers on the market that you can use to embed document viewing capability into your intranet or web site. The documents can be of different file formats too, they don't all need to be PDF. Some options use Adobe Flash, so a plug-in needs to be downloaded by the end user, but other options do not. Adeptol and Vuzit are two examples, but if you search for "online document viewer" in Google you'll find a number of options.

DMS vs. Repository by oneiros27 · 2009-06-11 03:45 · Score: 2

I'm surprised that there were quite a few programs not mentions on the DMS wikipedia page -- People might consider them to be more as repository software than DMS (or RMS), but some other ones to mention that would be useful to managing already existing documents:

And if you're looking for librarians with an IT background, in the libraries they're called "Systems Librarians". You might also check out the oss4lib and code4lib communities.

--
Build it, and they will come^Hplain.

Organization and Procedure by Edrick · 2009-06-11 03:57 · Score: 1

It seems that the first responses to a request like this is to suggest new technologies and programs to solve the problem. It sounds, though, like 95% of the problem is that there are no procedures and organization in place already so that files have a purpose or place to go. A good file storage policy with the appropriate instructions sent to the users could just as easily make this work going forward. I've seen collections of millions of files that were perfectly fine as they were organized by user, purpose, source, destination, etc...and then subdivided as needed...and users knew what the organization was and how to maintain it (to their own benefit as it means they can find their own stuff). You can also institute a more structured system where organization is already there for them to use, but it's your call. ALWAYS ALWAYS ALWAYS figure out how you want things to be organized first! What are the functions of these files, why are they saved, who created them,, who accesses them? This will make the job of sorting the mess out easier.

You forgot the top heirarchy level by linear+a · 2009-06-11 04:08 · Score: 1

Species.

IntraLinks or similar? by cloud0909 · 2009-06-11 06:59 · Score: 1

Related question, has anyone used, or would recommend using IntraLinks to help manage a similar scenario?

Lotus Notes by dogugotw · 2009-06-11 11:00 · Score: 1

If you don't want to go the google appliance route, Notes works great, is cheap to set up, and simple to administer.
One db.
One form with a couple of fields
One view
Render to the web
Write a simple agent that crawls your directory structure, snags the files and attach each one to a Notes doc. Stuff in the directory/file name if you care.
Let Notes build an index (and it can index damn near any file).
Poof - done.
Remove user's rights to leave crap in file directories and make 'em put new stuff into Notes and you have something that's maintainable without a ton of work.
If you then want to get fancy, you can make users enter some meta data before they can save new docs.
You can set up access control, etc, etc, etc.

Documentum costs about a quarter mil just to get it in the door and a boat load of cash to make it useful. (at least it did in the late '90s).
Notes server license a couple grand. If you need user authentication, it's around $150/client (ask your rep for prices because IBM is working tons of price schemes). If you don't need authentication, all you need is the server license.

Re:SharePoint? Doesn't scale by slashqwerty · 2009-06-11 12:36 · Score: 1

Where I work we wrote our own Document Management System that integrates with the rest of our systems. The integration has proven quite beneficial. Off-the-shelf systems can integrate but it generally doesn't work very well. Anyway, we were looking at using SharePoint as our back-end to get the indexing support and improved versioning. What we discovered is that SharePoint just doesn't scale very well. When you get into the hundreds of thousands of documents it has problems. When you get into the tens of millions it has major problems.

Given that the submitter already needs to file 500,000 documents I question if SharePoint is feasible.

Office Evolve by DocumentGuy · 2009-06-11 16:42 · Score: 1

Consider Office Evolve by Documatics. They've a system that will; organise your directories in projects, provides fully indexed searching of all your documents, caters for document generation from templates, has a complete history of all your documents, integrates with Outlook and manages workflow. It's in use at GE. We love it.

Hire a professional librarian. by Half+Balford · 2009-06-13 03:31 · Score: 1

Not a student. This is not a summer job. Even if someone at your office has nothing else to do, they will not be able to do a better job than a pro.

Slashdot Mirror

How To Manage Hundreds of Thousands of Documents?

290 of 438 comments (clear)