Slashdot Mirror


How To Manage Hundreds of Thousands of Documents?

ajmcello78 writes "We're a mid-sized aerospace company with over a hundred thousand documents stored out on our Samba servers that also need to be accessed from our satellite offices. We have a VPN set up for the remote sites and use the Samba net use command to map the remote shares. It's becoming quite a mess, sometimes quite slow, and there is really no naming or numbering convention in place for the files and directories. We end up with mixed casing, all uppercase, all lowercase, dashes and ampersands in the file names, and there are literally hundreds of directories to sort through before you can find the document you are looking for. Does anybody know of a good system or method to manage all these documents, and also make them available to our satellite offices?"

438 comments

  1. Google wave by Anonymous Coward · · Score: 1, Funny

    I think it's in beta though.

    1. Re:Google wave by Gerzel · · Score: 2, Insightful

      Or better yet talk to people who've done it before. I mean seriously there have been organizations managing hundreds of thousands of documents since the Roman Era, its nothing new.

    2. Re:Google wave by EQ · · Score: 4, Funny

      "[O]rganizations managing hundreds of thousands of documents since the Roman Era,"

      You mean The Vatican? I doubt that "small aerospace company" could afford to staff up on monks and monasteries.

      --
      Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo! http://goo.gl/J9bkO
    3. Re:Google wave by theNetImp · · Score: 5, Funny

      monks work for free, they just need food and enlightenment, and if you get lucky they fast and then only need the enlightenment aspect.

    4. Re:Google wave by Anarchduke · · Score: 5, Insightful

      There is a whole profession dedicated to this, and there is a major in college specifically designed to assist in organizing documents into meaningful collections.

      I suggest your company look at hiring a library sciences major, since this is what they do.

      --
      who prays for Satan? Who in 18 centuries has had the humanity to pray for the 1 sinner that needed it most? ~Mark Twain
    5. Re:Google wave by Hooya · · Score: 1

      > ... food and enlightenment

      you mean this? I didn't know monks were that picky about the wm.

    6. Re:Google wave by iluvcapra · · Score: 1

      And if you get lucky and the monks are still living in the dark ages, you only need the food aspect.

      --
      Don't blame me, I voted for Baltar.
    7. Re:Google wave by makapuf · · Score: 1

      Well, for e17 you certainly need enlightenment to achieve eternal life ...

    8. Re:Google wave by Anonymous Coward · · Score: 0

      Mod this up. Parent is right. This is exactly the kind of problem that library sciences folks are trained to solve.

  2. Google to the rescue? by Shatrat · · Score: 4, Insightful

    Isn't this the sort of thing that a google search appliance would be helpful for? Then you don't need to know the exact filename, just some specific information that can identify the file. This certainly solved my problem with having thousands of emails.

    --
    09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
    1. Re:Google to the rescue? by mikael · · Score: 2, Funny

      Just put everything up on a P2P server - then everyone can look for the documents at the same time as they are looking for their favourite Linux distro.

      --
      Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
    2. Re:Google to the rescue? by gwait · · Score: 2, Insightful

      I agree - although you might want to eventually implement a systematic method of naming/storing your documents.
      The google appliance (or some other reasonably fast "WAN" search tool) would let you find files in the current rats nest "as is", making it easier to organize them to the new "standard".

      --
      Bavarian Purity Law of Rice Krispie Squares: Rice Krispies, Marshmallows, Butter, Vanilla.
    3. Re:Google to the rescue? by Anonymous Coward · · Score: 0

      Depending on the size of the company and number of files, a GoogleMini might be a cheaper, equally effective option.
      http://www.googlestore.com/appliance/product.asp?catid=3

    4. Re:Google to the rescue? by liquidsin · · Score: 5, Insightful

      use your users, if you can. i'm just talking out my ass here, but i'd think it a not-too-difficult matter to add some sort of user input form along the lines of "hey, now that you've found the document you need, does the name fit the new naming scheme? if not, why not rename it so it fits!". this is assuming you can trust your userbase not to be asshats and to be able to follow the naming protocol.

      --
      do not read this line twice.
    5. Re:Google to the rescue? by Anonymous Coward · · Score: 0

      If you're going to redo it and you want a real document management system at a reasonable price, get Xerox Docushare. Free download to try it out... And you'll never want anything else.

    6. Re:Google to the rescue? by CozmicCharlie · · Score: 4, Insightful

      Now I actually LOL'd on that one! Getting our userbase to actually give a flying fart about a naming protocol and then getting them to follow it!? I won't be holding my breath for either of those two things to happen...

    7. Re:Google to the rescue? by Anonymous Coward · · Score: 2, Informative

      Wow - Slashdot users must all be on their meds this week. Judging by the number of responses to that say to buy a google appliance, I judge the paranoia level to be closer to blue than red. Where the hell is the open source insight?

      I guess it's right here from good old Anonymous - ever hear of SOLR http://lucene.apache.org/solr/ ? It's free, it's opensource and even if you hire a consulting company to set up an index of everything you have, you'll pay pennies on the dollar compared to a google appliance! Plus - your soul will remain intact!

      Just google for SOLR consultants and you'll find them, no problem :-)

    8. Re:Google to the rescue? by shri · · Score: 3, Informative

      May I also suggest Yahoo/IBM's OmniFind as a free as beer alternative?

    9. Re:Google to the rescue? by vrmlguy · · Score: 4, Interesting

      Now I actually LOL'd on that one!

      Getting our userbase to actually give a flying fart about a naming protocol and then getting them to follow it!?

      I won't be holding my breath for either of those two things to happen...

      You obviously don't know how to motivate people. Tell your boss you can get everything renamed for $100/week. Then post a leader board showing who has renamed the most documents each week, and give each week's winner a gift certificate to a local restaurant. Don't let anyone win more than once a month, to prevent too much disruption of normal job duties, and set up some sort of meta-moderation to prevent gaming the system. (You could probably use slashcode out-of the-box, just make each document a story and suggest better names in the comments.)

      --
      Nothing for 6-digit uids?
    10. Re:Google to the rescue? by Anonymous Coward · · Score: 0

      Laserfiche.

    11. Re:Google to the rescue? by Anonymous Coward · · Score: 0

      Try this naming convention:
      Capitalize the first letter of every word in a name. Make the subject the file name, add further identification information as needed to the name to delineate among files.
      You could put all the files in some sort of version control system. It takes up more space and requires hands-on maintenance of the program sometimes, but it's rather useful if there's a lot of changes being made to a single document.
      You should clearly lay out who is responsible for what documents. Usually this is a manager in each functional team. The users will need to know how to use the version control system, each person will need the client program installed on their machine.

      From the sounds of it, right now you have no idea who is using what documents. It is possible to get from here to there, but it will take time and attention. You're going to have to have everyone involved, because if you think about it, just about everyone can generate documents, and each one that is generated outside the system just adds to the confusion.

      Finally, make sure that the server the files are stored on has a regular backup schedule and that the restore procedures are tested to a different server regularly.
      Make sure that all the programs you actively use have their install disks kept in a centralized location.

      I think that just about covers the simple overview.

    12. Re:Google to the rescue? by SlashWombat · · Score: 2, Interesting

      I agree - although you might want to eventually implement a systematic method of naming/storing your documents.

      While this seems like a good idea on the surface, it never seems to work very well. Even verbose file names seem to fail miserably, as the first 100 or so letters are always the same (IE:"Project Tiger Sausage rocket module assembly - Ion injector harware part 1 ...>

      Then there is the problem of getting all the employees to fully understand directory structures. Just look at your workmates screens to see how many people save everything on their desktop. (Yes, really a windows problem ... but so what.)

      I used to get a local WAN search engine, and let it index the entire site. Much more useful as it would find documents most people thought had disappeared years ago.

      Another approach would be to have a database that assigned file names for various projects and/or functions and mandate that this be the only way files are named for storage on the WAN. This, however, does not get around the thousands of files already stored in weird places using weird names! (Which is why an already indexed search engine works so well ... not only does it extract the file names, but also search on random (but significant) phrases are picked up within the scanned documents. (I used to use "MAMMA", it worked a treat!) http://www.mamma.com/

    13. Re:Google to the rescue? by Anonymous Coward · · Score: 0

      We had a similar situation on a Win fileserver. They didn't want to install IIS, so I smbmount'ed everything to a Linux box under Apache's doc root. Then, with everything on the intranet, I was able to crawl it for free (using htDig, latter MnogoSearch). With doc2html and other converters, we now search all our Word, PDF, Excel, PwrPt, html and txt files from any browser. Works fine but not as slick as Google's search appliance (not free).

      Files that shouldn't be shared can be hidden from the JoeUser account used to smbmount. Apache can be told not to serve restricted subnets (eg foreign nationals, other divisions on the same campus, the vpn, etc).

    14. Re:Google to the rescue? by Orlando · · Score: 1

      Tell your boss you can get everything renamed for $100/week

      Great, there's always a capitalistic solution to every problem.

      --
      -= This is a self-referential sig =-
    15. Re:Google to the rescue? by jetole · · Score: 1

      as a free as beer alternative?

      Depends what you mean by free beer alternative.
      "Software supports up to 500,000 documents" -Omnifind.

      Well it had me sold till I read that. I'm not looking for this software right now although I thought it's good to know if our firm ends up needing it but I don't want to get software for free and then have to pay to upgrade it to capable later. Just not my style. OTOH solr from apache looks worth checking out further.

    16. Re:Google to the rescue? by polar+red · · Score: 1

      Capitalize the first letter of every word in a name.

      why? caps don't add value.

      --
      Yes, I'm left. You have a problem with that?
    17. Re:Google to the rescue? by BlackPignouf · · Score: 1

      I might come a bit late to the party, but I'd like to say that I developed a free alternative to Google Search Appliance :
      http://github.com/EricDuminil/picolena/tree/master

      It's a small Ruby on Rails app (~1kLOC), uses either Ferret or Sphinx and implements full text search for .pdf, .doc, .docx, .odt, .xls, .ods, .ppt, .pptx, .odp, .rtf, .html, and metadata from music, pictures and videos.
      It also includes language recognition, files thumbnailing and cache à la google.

      We use it in our research center to index ~100 000 documents from 50 users on a Samba share, and we get relevant results in ~0.1s
      Users don't need to learn any convention, they find what they want fast, and can use it as easily as Google.

      If you're interested, drop me an email from Github.

    18. Re:Google to the rescue? by Phreakiture · · Score: 1

      I am not so sure. My employer has a Google appliance, and it has never been able to find relevant content for me on the company Intranet. It isn't that the content isn't there, but there is so much boilerplate language in place that, quite often, there are a glut of documents that contain my search terms. Your mileage, of course, may vary.

      I think, though, that what may be needed her is a process, not a product. It will be long and painful, but your best bet, always, is to put a small group of humans authoratatively in charge of the documents. They can use technology to help them (such as the aforementioned Google appliance, Bayes categorizers, etc), but the ultimate decision needs to be a human one.

      --
      www.wavefront-av.com
    19. Re:Google to the rescue? by Anonymous Coward · · Score: 0

      We are in exactly the same boat. Same business and we have the same issues. We've been using an open source alternative. It started on Linux which sounds like would work for you since your already using samba. The product name is Knowledgetree. We also have a document control person that manages a lot of business critical docs but there are many things that the users can manage themselves.

      Here's the link to get the community edition (free) http://www.knowledgetree.com/community-download

      I feel your pain, but this has worked pretty well for us.

    20. Re:Google to the rescue? by Geotopia · · Score: 1

      I like the way this guy thinks. I recommend, however, that he rename his post "Motivating the Userbase" and this particular thread "Involve the Userbase". Later, I'm going to come back and rename my article, and start a new thread for which I'll recommend another name. That will give me 5 or 6 points towards a Chili's GC, right?

    21. Re:Google to the rescue? by stelling · · Score: 1

      Google search appliance could be the way to go on a small scale, and most likely temporary, solution.

      The real questions are:

      How valuable are your documents ?
      How much money do they generate ?
      What is the cost for not being able to locate a document ?
      What kind of processes are these documents used in ?
      How distributed is your user base ?
      How close to the your business core are these documents.

      You certainly need a document management solution (or ECM), which one will depend basically on the answers to the previous questions.

    22. Re:Google to the rescue? by plague3106 · · Score: 1

      Humans are motivated only through self interest. Why is it suprising this solution is proposed, when it has a good chance of working?

    23. Re:Google to the rescue? by plague3106 · · Score: 1

      Perhaps OSS doesn't have a good solution? While everyone bitches about Windows indexing, I'm not even sure I know the equivolent in the Linux world.

      And whats wrong with paying money for a solution that works and can be implemented very quickly?

    24. Re:Google to the rescue? by Anonymous Coward · · Score: 0

      And you'll get this:

      \\server\Archives\Old Documents From Engineering Team 7b\Plans From 1993 Quarter 1 Month of June\1993 Plans and Drawings\Plans Approved By JMORV\A small plan of a somewhat insignificant nature.pdf

      Which you may laugh at, but I've seen this naming system. It's frightening really, and does nothing to help out. The best method is to have some form of document management software that documents are checked in and out of and can be searched on. We wound up have a custom application built by these guys: http://qonsort.com/ and that saved us a ton of issues.

    25. Re:Google to the rescue? by Estanislao+Mart�nez · · Score: 1

      Isn't this the sort of thing that a google search appliance would be helpful for?

      Yes, but don't make the mistake of thinking that just because Google are the leading web search engine, they must also be the leading document search solution. Google's web search relies heavily on links between HTML documents to assess their relative importance. In an office with a lot of plain old documents, there will be no links.

    26. Re:Google to the rescue? by Anonymous Coward · · Score: 1, Insightful

      Now I actually LOL'd on that one!

      Getting our userbase to actually give a flying fart about a naming protocol and then getting them to follow it!?

      I won't be holding my breath for either of those two things to happen...

      You obviously don't know how to motivate people. Tell your boss you can get everything renamed for $100/week. Then post a leader board showing who has renamed the most documents each week, and give each week's winner a gift certificate to a local restaurant. Don't let anyone win more than once a month, to prevent too much disruption of normal job duties, and set up some sort of meta-moderation to prevent gaming the system. (You could probably use slashcode out-of the-box, just make each document a story and suggest better names in the comments.)

      Laughing my ass off, here...

      What do you think an enterprise working environment is? College dorm?

      "Document renamer of the week"?

      You either are still in high school or in Dilbert like upper management to think that this has any remote chance of working (or makes any sense).

  3. Hummingbird Document management by Anonymous Coward · · Score: 1, Informative

    http://en.wikipedia.org/wiki/Hummingbird_Ltd

    and

    http://connectivity.hummingbird.com/home/connectivity.html?cks=y

    1. Re:Hummingbird Document management by HikingStick · · Score: 3, Insightful

      If they're going to consider Hummingbird, they need to be ready to cough up the dollars to get an *EXPERIENCED* Hummingbird administrator. If not, the product will be set up, but basic search functionality will be hosed because of some of the same issues in the original problem description (arising from differences in how the document's properties sheets are populated). If done well, it can be fantastic. If not, it users will hate it and do everything possible to avoid it (including installing their own NAS devices).

      --
      I use irony whenever I can, but my shirts are still wrinkled...
    2. Re:Hummingbird Document management by kiwimate · · Score: 3, Informative

      Yes, but it's not that hard to find someone. But Hummingbird (now owned by Open Text) or any other Document Management System. You've got a bunch of documents. You need to manage them. Ergo, a document management system.

      Parent makes an excellent point, however: the single most critical component of a successful implementation is to get a skilled* consultant who can work with you to properly define the taxonomy. Everything else flows from there.

      * If you go with Hummingbird DM, "skilled" means "not one of their over priced professional services people". They're dreadful.

    3. Re:Hummingbird Document management by pkluss · · Score: 1

      HikingStick is exactly right. We used Hummingbird for a while and it got out of hand and then it was every bit as bad as what you're experiencing now but we paid an arm and a leg for it (so it felt much worse). It's a decent product, but we ended up with migrating to SharePoint since it fit our needs.

    4. Re:Hummingbird Document management by dimeglio · · Score: 3, Insightful

      Skilled consultants are great but without training employees you'll keep on paying big $ for consultants whenever there's a change to make. Let the consultant show how and let the employees do the work. BTW: We have 3000+ users (all happy) on their system and no consultant.

      --
      Views expressed do not necessarily reflect those of the author.
    5. Re:Hummingbird Document management by anexkahn · · Score: 1

      I completely agree, we pay a third party to come in and help with all our Hummingbird issues that we can't solve in house. Any time we have a major upgrade or change we have them come out as well. I have had many bad experiences with Hummingbird Support....I wouldn't want to see what Opentext's (The people that sell hummingbird) professional services people are like.

      --
      Curious about Storage and Virtualization? Check out
    6. Re:Hummingbird Document management by HikingStick · · Score: 1

      You're on the key issue, but I'll take a different tack: not only do users need training, but user requirements (sometimes, extensive amounts of user requirements) need to be gathered before impementing a solution (and this goes way beyond DM systems). If time is spent with the users before the DM system, the project team can be aware of how things currently are done. This means they might need to understand the naming conventions being used by multiple business units or many admin staff, and that is only one example. The goal, then, is to sit down with a representative user group--a group that represents all stakeholders (from the end users to management, IT, information security, legal, and audit)--and review the gathered requirements. In places where there seem to be conflicting requirements, all stakeholders need to come together and agree on a common set of requirements. From there, they need to go back and start prepping their own groups if those requirement result in changes from their current practices.

      If that's done (a major component of project management that is the that seems to be shortchanged all too often), you'll find yourself deploying a system in which the users have some sense of ownership, and which is less likely to run into significant resistance based on old arguments like "but we don't do it that way" or "the system just doesn't meet our needs."

      --
      I use irony whenever I can, but my shirts are still wrinkled...
  4. Google Appliance by TornCityVenz · · Score: 4, Informative
    --
    I Need someone to rebuild a Digitech Digital Delay pedal for me....for me...for me...for me.
    1. Re:Google Appliance by Anonymous Coward · · Score: 0

      some additional enterprise search options -

      Autonomy IDOL,
      Microsoft Fast ESP,
      Microsoft Search Server,
      IBM OmniFind

      (i believe there is a free version of omnifind called yahoo edition)

    2. Re:Google Appliance by Anonymous Coward · · Score: 0

      One thing I don't like about the GSA is that the licensing requires you to pay per year, otherwise the appliance will stop working:

      At the end of your license term, the Google Search Appliance expires and no longer searches or serves data.

      Source: http://www.google.com/support/gsa/bin/answer.py?hl=en&answer=18282

      Maybe they give you the appliance at a discounted price which offsets the cost of the licensing, but it is nice to be able to purchase a piece of hardware without having to worry about yearly maintenance fees for it to keep working. It would be interesting to know whether the appliance can be outright purchased.

    3. Re:Google Appliance by VTBlue · · Score: 1, Informative

      Google them? http://www.google.com/enterprise/search/gsa.html

      Try Search Server 2008 Express from Microsoft. Although it has no hard limits, it can index upto a 1 million documents before you have to scale out. Best of all it is free!

      If you need high availability, redundancy, fail-over or more document support, look at the standard version of the product or consider SharePoint 2007/2010 or FAST.

      http://www.microsoft.com/enterprisesearch/en/us/search-server-express.aspx#none

      msg me, if you have questions, I work at Microsoft.

    4. Re:Google Appliance by scooterhanson · · Score: 1

      A search appliance would be great, but there's not really a lot of structure on top of an index unless coupled with some other sort of knowledge-management infrastructure.

      http://www.yakabod.com/ is a company that I've heard about that has been doing this kind of thing for the US intelligence agencies for a while -- coupling a search appliance with taxonomy / folksonomy and some other kinds of voodoo. I've heard these guys refer to it as a "knowledge network" in the sense that a social networking app keeps you aware of what your friends and colleagues are doing, but the knowledge networking app keeps you aware of what your whole business is doing.

      There's always Sharepoint and Documentum type solutions, but trust me, brother, I've been down those roads before and I don't wish them on my enemies.

    5. Re:Google Appliance by spyrochaete · · Score: 1

      Enterprise Search Server is a really nifty app based on the excellent MOSS search functionality, but in my tests it doesn't hold a candle to the Google Search Appliance. To scratch the surface...

      • the GSA will index the first 2.5MB of text in a document while SSX only indexes 256KB.
      • SSX isn't exactly free because you need hardware plus a Windows Server license.
      • implementations of SSX larger than 1 million documents are very complex, sometimes requiring multiple servers for query, index, and crawling, whereas the GSA supports 10 million documents in a single 2U server.
      • the relevance of SERPs and snippets is just superior on GSA (my subjective opinion)

      If you've got a spare Windows server laying around then ESS is a terrific way to put it to use, but if you're fleshing out an enterprise search solution from the ground up I would recommend GSA in almost any scenario.

    6. Re:Google Appliance by spyrochaete · · Score: 1

      I think a search solution is exactly what you want if you don't have a solid structure in place. Give people multifaceted navigation and let them choose whether they want to browse a hierarchical structure, or perform a search to try to cut to the chase, or even both so that users can choose a specific category or directory to narrow down the search corpus.

    7. Re:Google Appliance by VTBlue · · Score: 0

      I should correct myself, Search Server 2008 Express scales upto 400,000 documents not 1,000,000 primarily due to 4GB limitation with SQL Server 2005 Express. If you have SQL 2008 Express, I'd have to check the scaling.

      One of the big benefits with Microsoft is the ability for granular search tuning. Enterprise Search is a very different from internet search and having access to the search algorithm is key to get better results. Below is a partner who deals with GSA and SharePoint/Search Server.

      http://www.nonlinearcreations.com/blog/index.php/2008/06/30/google-search-appliance-and-microsoft-search-side-by-side/

      If you want my powerpoint presentation on Search Server 2008, please visit:
      http://www.slideshare.net/ukdpe/microsoft-search-server-2008-technical-overview

    8. Re:Google Appliance by scooterhanson · · Score: 1

      Yeah, but you still run into the same old story of issues with discovery -- How do you know something's been added inside the file cabinet if you're looking at the closed drawer?

      A knowledge sharing app sitting on top of a search appliance would show activity and interaction across content.

      While people may just want to be able to find stuff in their great big pile of content, they'll still run into obstacles with only a search appliance. It's the connections between pieces of content that really count for understanding that pile (e.g. me as a user and all of my content, related to the document you were looking for that I published, related to a new document that you need but never knew you needed, etc.).

  5. Organize the files by Anonymous Coward · · Score: 0

    Sometimes you just have to do the work and not look for the magic bullet.

    1. Re:Organize the files by Anonymous Coward · · Score: 1, Funny

      I don't think this is one of those times, tough.

  6. How not to do it by Daimanta · · Score: 3, Funny

    Store it on a single FAT32 partition and hope for the best. Only meant for people with guts or really really nice bosses.

    --
    Knowledge is power. Knowledge shared is power lost.
    1. Re:How not to do it by CarpetShark · · Score: 4, Funny

      Pfft. This is a serious job. 320k floppies are what you want.

      Or... you know... you could try managing those documents with a document management system.

    2. Re:How not to do it by selven · · Score: 4, Funny

      Two of those should be enough for everyone!

    3. Re:How not to do it by mysidia · · Score: 1

      Should use punch cards.

      Floppies have this problem that the magnetism degrades over time, also, when placed on top of a CRT monitor for long periods of time, the data just seems to disappear.

  7. Answered your own question by Sir_Lewk · · Score: 5, Insightful

    and there is really no naming or numbering convention in place for the files and directories.

    I think you already know the answer.

    --
    "linux is just DOS with a UNIX like syntax" -- Galactic Dominator (944134)
    1. Re:Answered your own question by peektwice · · Score: 4, Informative

      Absolutely correct. However, I would take it a step further and say that you need a document management system that manages security, meta-data, retention, disposition, etc. Examples are Documentum, IBM FileNet P8, Alfresco, etc. Here's a place to start readin: http://www.cmswire.com/.

      --
      Other than this text, there is no discernible information contained in this sig.
    2. Re:Answered your own question by hedwards · · Score: 1

      In all honesty, I tend to agree with what you're implying. A database solution is great, if you put it into place immediately, otherwise you have to spend a lot of time getting all of the items into the database and properly tagged and sorted.

      One way or another the work is going to have to be done, the relevant question is how easily will it be maintained, how will it handled increases in size and how easily can it be backed up.

      I'm doing this sort of thing right now with my digital images. Thankfully, I can fall back on meta data to do most of the heavy lifting, which just leaves the process of creating subjective tags for pulling up random files and figuring out a decent backup system. I've been doing it all this week and haven't found a proper solution. Which is really a minimal hassle compared to what the OP is dealing with finding the files and reading them and putting them into some reasonable category, presumably many were created by employees no longer at the company.

      To boil it all down a bit, make absolutely sure you've got all the tags you're going to want in, a file hierarchy of some sort for storing the physical files, and the thumb screws for anybody that's not willing to do their part. A system doesn't stay neat and organized on it's own, just because it's residing on some sort of database doesn't mean it's automatically easy to find things. Best bet for files is to organize those by roughly date, depending upon how many, that may require by day, week, month or year to keep them in a reasonable place to find.

      Take it relatively slow demand that any new files be created within the realm of the new system and make regular effort at putting the older files into the new system in a consistent manner.

    3. Re:Answered your own question by nine-times · · Score: 2, Insightful

      Yeah, some people mentioned Google appliances, which I suppose is a sort-of solution. I've never used one of those internally, but I wouldn't trust that to be the end-all solution to your organizational problems. What if there's a file that Google can't read or gather good metadata for? What if you're searching for common terms, and the file you're looking for is on the 75th page? What if you're not remembering the correct search parameters and so your file just isn't turning up in your searches?

      There's really no substitute yet for real organization and discipline. The first thing you should do is define your needs/parameters. Does everyone from every site need read access to all files? Do they all need write access? Most likely, the answer to both of these questions is "no", so narrow it down to specifically "who needs access to what". That will help you figure out the rest of these things. Also ask, who needs to be able to find which documents under which circumstances? What information will they have? You're going to want to use those pieces of information in your organization so that people can intuitively find the files that they need, without necessarily needing to see everyone else's files.

      Come up with a hierarchical organization for your files, requesting user input if appropriate. Then create a directory structure that matches it. Make sure you've communicated the organization clearly to your users, and try to get them to use it.

      If necessary, use directory permissions to try to restrict writing files to appropriate places. For example, if you break down the file structure by particular engineering groups or departments, then only provide write access to members of that group or department. Designate the head of that department as the person responsible for organization within that folder. If need be, restrict write access in a particular folder to only one person, and make that person responsible for checking files in and maintaining the organization for the group or department. Do the same sort of control with individual satellite sites, if appropriate.

      Be a little tiny bit of a control freak, but you might want to give people a particular folder share where they can transfer files in a more freeform manner in a pinch. Someone might want to share one particular file, back something up for a minute, or whatever, but make it clear that this share is completely insecure and temporary. Let people know that everyone has access to that share, anyone can delete any file, you won't be backing it up, and in fact you might be clearing it out (deleting it) on a regular basis. Make a habit of deleting it all on a regular basis, or people will start dumping everything there to sidestep the organization. To be careful, you might want to actually move everything into a non-shared folder for a week, and then deleting it later, so if someone shows up and says, "Oh crap! You deleted business-critical information!" you can sigh, and say, "I'll see what I can do, but you really shouldn't store business-critical data there."

      So, to go back and summarize: Come up with an organization, stick to it, enforce it, and retrain your users to use it properly.

    4. Re:Answered your own question by Bill+Dimm · · Score: 1

      otherwise you have to spend a lot of time getting all of the items into the database and properly tagged and sorted

      Document clustering software can make that less painful by giving an overview of what you have (possibly hierarchical), and allowing you to categorize dozens (or even thousands) of related documents with a single mouse click. Blatant plug: Clustify.

    5. Re:Answered your own question by dimeglio · · Score: 1

      You can look also at OpenDMS. It's not very active lately but might have a good core that you can expand on.

      --
      Views expressed do not necessarily reflect those of the author.
    6. Re:Answered your own question by CorporateSuit · · Score: 5, Funny

      No kidding, men are practically born with this instinct.

      The most basic is dividing the images up according to hair color or the number of girls appearing in each photo. Then you usually divide them up between hardcore and softcore, type of performance, fetish, etc. For your favorites, you can keep a folder in the home directory, of course. I know this guy works for an aerospace company, but keeping track of 500,000+ files isn't rocket science! We've all been able to do that since the advent of the 200GB harddrive.

      --
      I am the richest astronaut ever to win the superbowl.
    7. Re:Answered your own question by SydShamino · · Score: 1

      My wife is a fan of Livelink, which she implemented for document and workflow management at her last job.

      --
      It doesn't hurt to be nice.
    8. Re:Answered your own question by BillAtHRST · · Score: 1

      Depends on whether you want to "manage" the docs, or just be able to find them. The google thing looks promising (http://ask.slashdot.org/comments.pl?sid=1264509&cid=28285635), and is probably a LOT cheaper to get going with. Then, if you think you need more you could look at some of the more heavyweight solutions.
      "Perfect is the enemy of the good" -- Voltaire

    9. Re:Answered your own question by Anonymous Coward · · Score: 0

      This is exactly what should be done, and it has the added benefit that you can use existing infrastructure. You will however have to spend a good amount of time creating this structure, and it will require continued maintenance to keep it working as intended.

    10. Re:Answered your own question by Anonymous Coward · · Score: 0

      Something much cheaper than documentum...

      http://www.knowledgetree.com/

    11. Re:Answered your own question by Anonymous Coward · · Score: 0

      Documentum is almost dead. My company is somewhat of a competitor, but deals more with document management on an enterprise development level (product lifecycle management or PLM). I'd advertise, but our product is quite expensive and probably not what you're looking for ($1500 per user per year). Documentum's solution is considered a precursor to PLM systems.

    12. Re:Answered your own question by Anonymous Coward · · Score: 0

      For your own sanity please please please do not use Filenet.

    13. Re:Answered your own question by oldspewey · · Score: 2, Informative

      LMAO ... Documentum is almost dead ... that's why they released V6.5 several months back and are on track to release V7 in H1'10.

      Why don't you tell us all how your "competitor" company scales to tens of millions of documents with high availablity, disaster recovery, and content caching across 5 continents?

      --
      If libertarians are so opposed to effective government, why don't they all move to Somalia?
    14. Re:Answered your own question by Anonymous Coward · · Score: 0

      My company was just bought a year ago by a company that uses Documentum. We're just getting access and when we asked for a guide to how files were named, and what folders were for what, no one could tell us. A database solution is probably a start, but without standards from the beginning, rigidly enforced, you'll end up with the same sort of mess. After all, what is a file system but a special type of database.

      You need first and foremost, dedicated corporate librarians. This is one problem that has a better human solution than a technical one.

    15. Re:Answered your own question by mysidia · · Score: 1

      Put the temporary area on a "RAMDISK" and schedule a nightly reboot.

    16. Re:Answered your own question by Anonymous Coward · · Score: 0

      it's not a CMS they need. It's a document management system.

    17. Re:Answered your own question by Anonymous Coward · · Score: 0

      I think you already know the answer.

      Yes, and your next question should be what your retention and change-management requirements are like. At my last startup company we decided to combine the naming/numbering scheme with a central SVN repository. So all significant documents must be committed into SVN to share with others, and this retains version history and allows central backup and disaster-recovery. Combine this with branching policy and nightly commits of drafts to protect against users damaging or losing their scratch workspace on their laptops or desktops.

      Getting users over the hump to use checkout/update/commit commands was well worth it in order to have sanity in documents, easy backups and replication (you can checkout a full working tree from the company LAN onto your laptop and update incrementally over VPN from remote work sites).

    18. Re:Answered your own question by Anonymous Coward · · Score: 0

      oldspewey is right .. EMC's Documentum isn't dead, it's firmly entrenched in a number of enterprises.

      Now, it is so horribly mis-applied in most of those enterprises, it should be dead, but that's just wishful thinking on my part ....

    19. Re:Answered your own question by Anonymous Coward · · Score: 1, Interesting

      Why don't you tell us why EMC feels the need to spew out an entirely new platform every 6-12 months. Oh, that's right, cause their product is so wonderful.

      I used to be a Documentum developer. I've worked in everything from Workspace/Smartspace, through Smartspace Intranet, and into their Webtop/WDK platform.

      IMHO, their product is crap. And expensive crap at that. It solves problems very poorly without extensive customization, and customization is a painful exercise in learning their horribly written and poorly documented development "framework".

      Even worse, once you have things working the way you want, you can almost be guaranteed that your customizations won't work in the next release...which is probably going to happen in the next 2-3 quarters.

    20. Re:Answered your own question by Phil06 · · Score: 1

      What you need to do is get the right amount of meta-data. Everyone is going to be pushing you to add all kinds of fields for everything, you need to push for less. The metric should be that 80% of the information should be one or two clicks away, the next 15% should be 3-5 clicks away, let the last 5% go. If you try to make everything one click away you are going to fail.

      --
      "...and yet, I blame society" Duke - Repo Man
    21. Re:Answered your own question by omglolbah · · Score: 1

      While Documentum can be a royal pain in the ass at times I would hate to do my job without it...

      Just keeping the hundred or so documents for my current project organized AND revisioned properly would be a major undertaking if it was all stored on a samba share...

      Get something started already, or you'll end up in the crapper sooner or later. Revision/Version control of documents is quite useful in case someone screws up. It also allows locking of documents etc which can be useful in a myriad of situtations.

    22. Re:Answered your own question by Sobrique · · Score: 1

      find /scratch -mtime +14 -exec rm {} \;

      seems to work quite nice. Also means it hits the backup, in case of muppetry.

    23. Re:Answered your own question by Anonymous Coward · · Score: 0

      Don't forget about OnBase....the best DMS out there.

    24. Re:Answered your own question by Anonymous Coward · · Score: 0

      We use Documentum and it is very far from dead. We are very pleased with it, our original in-house solution was awefull.

    25. Re:Answered your own question by Elbowgeek · · Score: 1

      Well my confidence was a tad shaken when the documentum.com listing that google finds when you do a search comes up as a parked domain.

      But it was completely shattered when clicking the link on ECM's site which promises me more information on their Collaboration and Document Management led to a Page not Found error. If they can't find their own information, how in the hell can they lay claim to being able to manage *mine*?

      --
      Who is this delectable creature with an insatiable love of the dead?
    26. Re:Answered your own question by Anonymous Coward · · Score: 0

      Works fine here. Maybe you just fail at t3h internet.

    27. Re:Answered your own question by Anonymous Coward · · Score: 0

      Documentum is most definitely NOT dead - in fact it's under constant development. Additionally it's owned by EMC the same parent company which owns VMWare. Documentum is HIGHLY scalable, and also created with distributed locations and user bases in mind. It's really a family of productions and you should review what's at hand and see if it's a solution appropriate for your company and situation. Good Luck!

  8. it's all about the index by Hognoxious · · Score: 2, Informative

    The lack of a naming convention for the filenames and directories is neither here nor there. What matters is how well it's indexed.

    Now I use naming conventions for my files (photos ,mp3s etc). Am i contradicting myself? No, it's because I don't have enough of them that I need a separate index.

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    1. Re:it's all about the index by Anonymous Coward · · Score: 0

      Now I use naming conventions for my files (photos ,mp3s etc). Am i contradicting myself? No, it's because I don't have enough of them that I need a separate index.

      So, can you give me a good indexing scheme for all my pr0n? I find it hard to separate the facials from the hentai.

    2. Re:it's all about the index by jd · · Score: 4, Interesting

      Very true. I'd take a look at DSpace or Open Library for examples of software designed to handle gigantic numbers of documents and maintain sensible indexes for them.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    3. Re:it's all about the index by Anonymous Coward · · Score: 0

      Greenstone is another software option. Or you could hire a librarian.

  9. OpenDocMan by loVolt · · Score: 1

    OpenDocMan has helped a lot with our Graphics and Engineering department issues, similar to yours,
    ldap access to storage helped sort out who could put what ..where. The implementation took a bit of
    time to get the original files files into right locations, but it's easyer to manage now.

     

    --
    Darwin Enforcement Agent
  10. Tiered storage by Anonymous Coward · · Score: 0

    You have massive project on your hands! You need a tiered storage solution and document management system that is back-end (Stored) on SAN storage. How big of a budget do you have to solve this problem? Double it.

    Tiered storage requires the business to prioritize data by levels (1...n) 1 is highest, 2 is less than one, 3 is less than 1 and 2.
    Generally 3 levels are employed sometime more.

    Does the mgmt understand the complexity of the issue? Do they support the project? You have a lot of data gathering to do before you can even determine what you need.

    Godspeed!

  11. Google Search Appliance by Swampash · · Score: 2, Informative
  12. Alfresco or SharePoint by flydpnkrtn · · Score: 3, Insightful

    Or some other corporate content management system

    1. Re:Alfresco or SharePoint by flydpnkrtn · · Score: 3, Interesting

      ...and I found an article backing up Alfresco pretty well:

      "You can now stand up an Alfresco Labs server next to a SharePoint Server, and Office will not be able to tell the difference between the two," said John Newton, CTO of Alfresco. "But we are offering considerably more scale than SharePoint can deliver," he said.

    2. Re:Alfresco or SharePoint by Kadin2048 · · Score: 3, Informative

      I have a personal bias, but I think IBM's FileNet would solve this quite neatly. I've done implementations of it that are pretty much exactly what the OP describes.

      Customer has a share that's gotten totally out of control, just stuffed full of files. They want to make them available across multiple offices, generally without getting into complex VPN crap, and also want to simplify management, add more security / compartmentalization, or integrate it with corporate SSI. All doable. Runs on your choice of platforms, too. (Linux, Unix/AIX, Windows all OK as servers.)

      There are even tools that basically take a share drive and walk the directory structure, importing documents at extremely high volume and using the folder structure to categorize and tag the documents within FileNet. It's quite slick and can either be used as a one-shot migration from a traditional fileserver to FileNet, or as an ongoing thing (take all files in a particular directory or set of directories and commit them).

      Once you have the documents into FileNet you can access them over a web interface or via various desktop clients, and there is a nice API for integrating it with custom in-house applications if that's a requirement. Also, IBM makes some add-ons for Word and Excel (and maybe PowerPoint) that allow you to work directly with items stored in a FileNet repository. Plus, if down the road you want to get into "workflow" (basically building your document management system around your business process), that can be easily bolted on.

      Email is in profile if you want specific case studies or whitepapers, or if you want me to put you in touch with people who do these sorts of things regularly.

      --
      "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
    3. Re:Alfresco or SharePoint by Anonymous Coward · · Score: 0

      I wouldn't call it backing up when the comment came from the CTO of Alfresco.

    4. Re:Alfresco or SharePoint by afidel · · Score: 2, Informative

      I'd suggest Livelink by OpenText. I know the Airforce uses it since our Livelink guy worked on their systems before coming to work for us, they obviously work with large volumes of aerospace related documents! =) That probably means OpenText can find consultants who have already designed and worked with an aerospace taxonomy.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    5. Re:Alfresco or SharePoint by Anonymous Coward · · Score: 0

      They also have a remote cache server if you need to support remote offices over slower links. Livelink has a Linux version as well (they support Win and I think HPUX too).

    6. Re:Alfresco or SharePoint by glwtta · · Score: 1

      I wouldn't call it backing up when the comment came from the CTO of Alfresco.

      I wouldn't call "indistinguishable from SharePoint" an endorsement.

      --
      sic transit gloria mundi
    7. Re:Alfresco or SharePoint by flydpnkrtn · · Score: 1

      First parent comment makes a good point - seeing a review from Alfresco would be a better source, rather than listening to the car salesman tell you how great the used Fiat is

      To the reply to his comment - When folks can point Office directly at Alfresco and Office can't really tell that it's really talking to an Alfresco server instead of a SharePoint server, I'd say that's pretty significant. He wasn't saying that Alfresco is a SharePoint clone, he's saying it provides equivalent features. The reality is a _lot_ of corporate offices run Microsoft Office, and being able to provide a backend to Office that's just drop in is a good idea, especially from a user training perspective.

    8. Re:Alfresco or SharePoint by Anonymous Coward · · Score: 0

      Well, we evaluated Alfresco. The Sharepoint protocol and Webdav client for the Office suite both lacked SPNEGO support. Also the ACL support for some areas was horrible, requiring to edit manually XML files to set access rights to content. We abandoned Alfresco instantly because security was for us a requirement.

  13. Start with.... by s0litaire · · Score: 1
    ...Setting up a standard naming convention and make sure bosses and managers enforce it. It won't help older files but will stop it getting worse!

    Then if you can be bothered, you can start going through older files and updating the naming conventions or entering them into the Document management system of you choice...

    --
    Laters Sol "Have you found the secrets of the universe? Asked Zebade "I'm sure I left them here somewhere"
    1. Re:Start with.... by LandDolphin · · Score: 1

      Hire a temp employee for $10/hr to go through and rename everything, or do any other clerical grunt work.

      --
      Spelling and Grammar errors have been added to this post for your enjoyment
    2. Re:Start with.... by hedwards · · Score: 1

      Only problem is that this is an aerospace company, they might get lucky finding somebody that's capable and willing to work for peanuts, but I wouldn't count on it. Realistically they may require somebody with technical know how of what the files actually are in order to properly categorize them. A temp might be able to handle reformatting the file names based upon information in the name, but probably not much more than that.

    3. Re:Start with.... by LandDolphin · · Score: 1

      Some training and the temp should be able to recognize different file types and where/how to classify them. The temp doesn't have to understand the information, just has to know that "Hey this looks like X, I was told X goes here" or "Oh, it says Y in spot B in the file, it must go to Place Z". The person with technical know how is going ot have to look in th efile for a clue as to where it should be filed; they can impart that small bit of information to a temp without having to teach taech them everything..

      --
      Spelling and Grammar errors have been added to this post for your enjoyment
    4. Re:Start with.... by s0litaire · · Score: 1

      Well they could then use the temp as a scape goat when planes start falling out of the skies...

      --
      Laters Sol "Have you found the secrets of the universe? Asked Zebade "I'm sure I left them here somewhere"
    5. Re:Start with.... by dotgain · · Score: 1

      No matter how prudent and methodic you are in your filestore-sorting exercise, nothing will stop somebody getting up in arms about something moving or changing. One of the reasons I left my last job was a filestore that was already out of control when I took it on was proving insurmountable, at least with no support from management.
      Think medium sized company (considering the country, NZ), acquired at least five other businesses in the last two years, and effectively just chucking the new fileservers on the LAN along with all the others, spilling over into outsourced datacenters when running out of space & aircon in the original server room. Stacks of MS Access apps using hardcoded UNC paths dictating names of servers, etc. Commodity PCs with JBODs tacked on when space and money got tight. ntbackup.exe. Ugghh.

    6. Re:Start with.... by LandDolphin · · Score: 1

      The thought makes me a little sick. I wouldn't even want the hassle of reorganizing that.

      --
      Spelling and Grammar errors have been added to this post for your enjoyment
    7. Re:Start with.... by Javaman59 · · Score: 1

      a standard naming convention and make sure bosses and managers enforce it.

      Won't work. Never has, never will. People won't comply. Bosses won't enforce. Some will make a good faith effort for a while. More will make a good faith effort, but get it wrong. Some will ignore it. Threatening memos will be issued from managers. Then it will emerge that one of the memos came from a manager who doesn't use the conventions himself (because he's "too busy"). The people who invested (wasted) time in understanding the system, and using it, will see that the they're efforts are futile because of the amount of non-compliance, and give up. Then the company will be left with a minor portion of the files in this system. 3 years later people will wonder "what the hell" these bizaar files are, along with the 17 other naming conventions they see around (and peoples who's names are on those files will look silly), and then someone will say "we need a standard naming convention and make sure bosses and managers enforce it."

      I'm reminded of one of Joel's chestnuts - Whenever you have two incompatible systems, and introduce a third system to unify them, all you end up with is three incomptible systems. (or words to that effect).

      --
      I'm a software visionary. I don't code.
    8. Re:Start with.... by Sobrique · · Score: 1

      Standard naming conventions are ugly. Just use a proper directory structure instead. At least then you know where you need to start looking for something you want.
      Although I'm seriously wondering how long it'll be before we see the 'next generation' of filesystems, that stop actually treating files and directories as the 'basic' object, and treat everything as a document - essentially 'forcing' categorization of them. Doesn't work so well in programspace, but would work fine for almost everything that _should_ be on a user/network share.

    9. Re:Start with.... by dotgain · · Score: 1
      It's not a difficult job if you're not rushed, I quite enjoy it when it goes well. Take your time, take stock of what you've got, and make a plan. Take the time to find fault with your plan, and don't be rushed into implementing it if you're not sure. It's when you ARE rushed to do it, when you're NOT supported by your higher-ups when you need to be, that it becomes a nightmare.

      Hence resignation. Not long after my co-admin handed his in too. Between us we had 15 years IT experience with the company.

      The remaining IT department has less than two years collectively, and includes six people. They're FUCKED now.

  14. Use a cataloging system by vondo · · Score: 4, Interesting

    I happen to have written one:

    http://sourceforge.net/projects/docdb-v/

    could be what you are looking for. Of course, it'll take effort to catalog the documents.

  15. SharePoint? by tekiegreg · · Score: 4, Informative

    I know I'm gonna get hit for blurting out the Microsoft Solution but...give SharePoint a shot...

    --
    ...in bed
    1. Re:SharePoint? by goffster · · Score: 4, Insightful

      Why should you give sharepoint a chance? Even it it works well, it is proprietary and you are locked in.

    2. Re:SharePoint? by EnhancedPanda · · Score: 1

      I am going to have to second the Sharepoint suggestion, we have been using it for 2 years now to do exactly what you need. But I would recommend investing in SANS, no more vpn.

    3. Re:SharePoint? by moosesocks · · Score: 2, Informative

      Mod parent up. I helped create a tag-based document retrieval system for my former employer using SharePoint. It actually worked quite well.

      Use the right tool for the job. It's got a nice interface (that's also very familiar-looking to most users), scales well, and integrates well with MS Office, which (like it or not) is used by 99.99% of the corporate world. It also handles non-office files just fine.

      That's not to say that Unix-based solutions don't have their place. During the migration, I actually employed a series of shell/python scripts to assist with several of the more mundane aspects of the process. These probably saved us a couple thousand man-hours that would have otherwise been spent categorizing the files.

      --
      -- If you try to fail and succeed, which have you done? - Uli's moose
    4. Re:SharePoint? by jockeys · · Score: 1

      +1.

      I'm no MS fanboy, but Sharepoint is great. I work for a large engineering company and we use it to organize blueprints, as well as pretty much all of our non-code documents. Even the most clueless HR-types can use it, and it's really not hard to set up.

      --

      In Soviet Russia jokes are formulaic and decidedly non-humorous.
    5. Re:SharePoint? by Anonymous Coward · · Score: 0

      Posting anon because I am shilling the company I work for, But docuware - www.docuware.com is exactly what this person needs. Document management software scan archive emails index OCR all in one, Integrations with all ODBC (by all I mean most) compliant databases - can use mysql oricle or mssql as the main foundation (bundled with mysql) short answer, google desktop and start indexing documents

    6. Re:SharePoint? by moosesocks · · Score: 4, Interesting

      Why should you give sharepoint a chance? Even it it works well, it is proprietary and you are locked in.

      No less proprietary than other similar systems. Getting files in/out of Sharepoint is a fairly trivial process, and the API is open enough to craft your own migration plan if you ever decide to move away from it, given that everything else is equally (or even more) proprietary than Sharepoint.

      MS Office might be proprietary, but is so widespread that it's a 'standard' in its own right -- Sharepoint integrates excellently with Office, and keeps your users happy.

      I'm typically not one to advocate the use of Microsoft products. However, Sharepoint worked just fine when I was using it, and is definitely a huge step up from any of the competing products at the same price-level.

      --
      -- If you try to fail and succeed, which have you done? - Uli's moose
    7. Re:SharePoint? by DigiShaman · · Score: 1

      That's true of any other solution in the same manor as SharePoint. But at least the data is stored in a SQL database and not something proprietary like the MS Exchange information store.

      --
      Life is not for the lazy.
    8. Re:SharePoint? by Itninja · · Score: 1

      Totally agree. SharePoint is one of the few recent products that Microsoft actually got right. Of course, they will probably find a way to screw it up down the road, but currently it rocks as an enterprise level document repository.

      --
      I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
    9. Re:SharePoint? by pete-classic · · Score: 3, Interesting

      How does Sharepoint address his problem? It uses the exact same folder/file paradigm that is failing in his existing solution.

      -Peter

    10. Re:SharePoint? by Anonymous Coward · · Score: 0

      I agree. This is exactly what Sharepoint is designed to do. It is a powerful document management tool which offers functionality such as:

      • Document Versioning
      • Metadata
      • Access Control
      • Search
      • Integration with Office products

      I

    11. Re:SharePoint? by Anonymous Coward · · Score: 0

      I agree 110%. Sharepoint is the best document management solution ever created. I'll bet my chair on it.

      --
      steveb@microsoft.com

    12. Re:SharePoint? by glitch23 · · Score: 2, Informative

      That's true of any other solution in the same manor as SharePoint. But at least the data is stored in a SQL database and not something proprietary like the MS Exchange information store.

      Although the files are in a database you can change the view in the browser to be "explorer" and access the files using Windows File Sharing-like features (copy/paste) through the browser. This method of access though is an end-run around SharePoint's versioning system. New files can be uploaded in this manner as well. I presume that when you modify an existing document in this way that SharePoint just makes that version the newest one in the actual database. SharePoint is still no substitute for a properly standardized naming convention and folder structure. Yeah you can always do a SharePoint search for what you want but at work I never do searches because we have specific folders where we place stuff and I know that as long as people follow the standard then I can find what I'm looking for and so can everyone else. We don't have thousands of documents though so maybe with documents counted using 6 digits a standard naming convention is asking too much.

      --
      this nation, under God, shall have a new birth of freedom. -- Lincoln, Gettysburg Address
    13. Re:SharePoint? by Anonymous Coward · · Score: 3, Insightful

      What you say:

      Why should you give sharepoint a chance? Even it it works well, it is proprietary and you are locked in.

      What you mean:

      Regardless of how perfect a solution might be for you, if it doesn't conform to MY personal ideological viewpoint, it shouldn't be given a chance.

      God I hate people like you.

      --AC

    14. Re:SharePoint? by Sylver+Dragon · · Score: 1

      Add another me too for Sharepoint.
      From the initial question, I'd guess that just WSS3 will get the job done and it's free. One important piece of this though is: plan your deployment. Figure out what type of site structure you plan to use before you implement anything. Sharepoint can be a wonderful tool, but if you just jump into it and let it grow organically you will end up hating it and yourself. And trying to monkey around with the site structure after the fact can be trouble. Oh and, get familiar with ASP.NET master pages and what they do and how they work. You will be using them in WSS, and if you go into it without care you can trash your entire site fast.

      --
      Necessity is the mother of invention.
      Laziness is the father.
    15. Re:SharePoint? by Anonymous Coward · · Score: 0

      SharePoint has lots of issues - you cannot implement enterpise wide policies, you end up with multiple 'siloed' document stores, it has limited (well no workflow), it does not understand the relationship between objects, actors, artifacts, has no concept of lifecycle - it is a poor solution for a complex information environment - better to use Windchill Foundation from PTC (www.ptc.com) - does all the aforementioned. Even with accomplished tools you still have to get the 'mess' into a structure (taxonomy) that is meaningful. By the way NASA, Boeing, EADS, US DOD, etc, use this toolset. You would sit Windchill behind SharePoint.

    16. Re:SharePoint? by Anonymous Coward · · Score: 0

      With the economic downturn, the national company I work for was looking to get away from SharePoint. We have thousands of documents in it right now, and we're migrating the whole thing to Liferay Portal. It's been pretty great so far and we're saving a bundle.

    17. Re:SharePoint? by dave562 · · Score: 1
      Yeah you can always do a SharePoint search for what you want...

      Unless SharePoint has gotten significantly better in the last few years, I wouldn't trust SharePoint search to find a file located in the root of the directory I point it at. When I was using it, SharePoint search didn't seem to understand the underlying hierarchy, so it required a lot of parameters and qualifiers to do what should have been a single search (Go find me blah.doc for example).

    18. Re:SharePoint? by Anonymous Coward · · Score: 0

      you can change the view in the browser to be "explorer" and access the files using Windows File Sharing-like features (copy/paste) through the browser.

      The last time I checked, only if you were using Microsoft's Internet Explorer web browser.

    19. Re:SharePoint? by Anonymous Coward · · Score: 0

      SharePoint's got nothin on Equella http://www.equella.com/ . Great product, great people behind it

    20. Re:SharePoint? by nighty5 · · Score: 2, Informative

      We use SharePoint in a large enterprise although its pretty good at mashing together websites - unfortunately its really poor at search. I think Search 4.0 may improve the situation, but its nowhere near Yahoo, Google or other search technology. Technology doesn't solve all problems, I'd say this said company needs to focus on strengthening business process and implementing some user awareness programs.

    21. Re:SharePoint? by Anonymous Coward · · Score: 0

      We submitted nearly 100 bugs within the first week of using v1.

      v2 is slow. Folders with over 1,000 files slows down performance to an unusable speed. 3rd party search applications are needed to make it half decent. Folder to folder transfers, when moving hundreds of files, is quirky and sometimes throws errors. Office integration is quirky, the save as dialog looks like a web page, how cute. Cannot save files with an ampersand in the name, no friendly errors.

      v3, can't wait to roll that one out. Blogs and Wikis and all other manner of features no one will use.

      There must be something better.

    22. Re:SharePoint? by Anonymous Coward · · Score: 0

      SharePoint can work for small to moderate collaborative endeavors. If this is your goal, you'll find it a good fit. On the other hand, if you're working a lot with non-office documents, very large files sizes on average (1GB+), large numbers of files at a time, you're going to find SharePoint will get unwieldy and backups will get to be more painful. A package like Documentum is designed for meta data and storing big piles of documents and managing them. Meta data requires the user base to be diligent, so organizational readiness is key. If the user base won't maintain it and it isn't mandated (e.g., a regulatory body or compliance to a corporate policy that is policed) you're going to have issues with finding and retrieving documents. All that said, I'm not clear how well any of the above will play into the VPN situation for your remote sites. I suspect you'll want to look into your WAN bandwidth utilization.

    23. Re:SharePoint? by preystalker · · Score: 4, Informative

      I would recommend using Alfresco. Correct configured and deployed, you could access files via Windows Explorer, WebDav, web interface, etc. and data is stored in a SQL database. Alfresco uses open standards and should be considered instead of SharePoint.

    24. Re:SharePoint? by slater86 · · Score: 1

      we use sharepoint (the free version) at the moment, does an excellent job when you have no budget. but if you have the time, skills or budget (the usual "pick any two" rule) there are better CMS stuff available.

      even works well with ldap/samba domain controllers.

      --
      When people ask if I'm an optimist, I say "I hope so". --Bill Bailey
    25. Re:SharePoint? by Anonymous Coward · · Score: 0

      Wait til you see the Client Access licence charges

    26. Re:SharePoint? by blincoln · · Score: 1

      Of course, they will probably find a way to screw it up down the road, but currently it rocks as an enterprise level document repository.

      In its inner workings, it's already pretty screwed up. For example, for anything that shows up as a list (which includes any type of library), the things that look like database columns actually have their data stored in XML format in giant text fields in the database. For wiki articles in particular this is a problem because the entire text of the articles is one of those values in the giant "properties" text field in the database. It's also a problem for lists with lots of "columns", especially if the list/library is set to allow multiple content types, because then each "row" in the list gets all of the properties for each of the content types inserted into its XML text field, even the ones that its content type doesn't use. Besides the obvious performance/scalability issues here (IE you can't create a meaningful SQL index on this data because all of the data you'd want to index is in that one stupid XML field), the SharePoint search indexer basically does a SELECT * into RAM for each list it comes across. So if you have a wiki library with a few thousand articles in it, *bam!* you just ran out of memory and none of them will be indexed.

      Most people seem to love SharePoint, so I think MS has done a great job on the front end. I just wish they'd devote the resources to make the back end a lot more solid.

      --
      "...always new atoms but always doing the same dance, remembering what the dance was yesterday." -Richard Feynman
    27. Re:SharePoint? by FooRat · · Score: 1

      For an aerospace company, you probably need something from a company with a better security track record - sorry, that's just due diligence.

    28. Re:SharePoint? by Anonymous Coward · · Score: 0

      Not to boost M$ any more than we already have, but I once built a searchable text system using Microsoft's Index Server. The system isn't in use any longer, but before the project was abandoned and servers reformatted, the system had about 500 million documents (+/- 5 terabytes of data) and could return search results of just about any complexity in about one second on very modest hardware (lots of ram for the indexes, but not big on processor looking back).

      We also had a custom document filter (filters are used to tell index server how to index file formats it doesn't already know about, and to get custom properties that can be cached-- ci has an api of sorts, but it's not very well documented) for non-ms office files, but the thing comes ready to go for txt, html, and all the office formats.

      Plus, it's not like you're locked into anything. It's just an indexer. Build a web interface to it and you're good to go. Who cares what they are named or what folders they are in if users can find them after waiting 1 second for search results?

      PS: we looked at google's search appliance at the time and it was huge monies. and they didn't offer anything (at that time) that would do 500 million documents. (yeah, I know. I said the same thing. "REALLY? Doesn't seem like GOOGLE would have any trouble with a measly half a billion files!!!!)

    29. Re:SharePoint? by MeanMF · · Score: 1

      It lets you attach metadata to files and it full-text indexes pretty much anything you can throw at it.

    30. Re:SharePoint? by Anonymous Coward · · Score: 0

      Try not to select a solution that stores the files in a database (e.g., SharePoint, Oracle Portal). File systems are much cheaper than databases. Plus, the backup, restore, and anti-virus of SharePoint is more challenging (i.e. costly) than the file system.

      So look for a solution that stores the files on the file system, the meta-data in a database, and exposes the files via the web (and ideally via Windows Explorer). Examples include EMC Documentum (mucho expense, but for hundreds of thousands of files, maybe worth it) and Alfresco (and many others).

      Posted by, apparently, an Anonymous Coward...

    31. Re:SharePoint? by Anonymous Coward · · Score: 0

      SharePoint? Oh my God no! SharePoint stores files as BLOBs in the database - fine if you have a small number of docs, but not if you have a lot of them. The MS fix for this is to tie SharePoint to "external storage". Read "a real DMS". Go find an enterprise DMS, with plenty of indexes available so you can build good taxonomies. IBM, Oracle, OpenText, Interwoven - they all make good DMS systems, frequently with web front ends, and full text indexing too. You'll invest time and money in the setup and the movement of documents into the system, but in the long run you'll be able to search for and find information. And that's the idea, right?

    32. Re:SharePoint? by scooterhanson · · Score: 1

      Here's a great paper on the drawbacks of Sharepoint: http://www.yakabod.com/library/downloadDocument.html?docId=10805/

    33. Re:SharePoint? by ewhac · · Score: 1

      ...give SharePoint a shot...

      Bah. SharePoint is what you end up with when you don't know about Qtask.

      Schwab

    34. Re:SharePoint? by ajlisows · · Score: 1

      The nice thing about Sharepoint is, depending on the functionality you need it can be FREE (as in beer) if you can get away with Windows Sharepoint Services.

      The company I work for really wanted A Document Management System. They had tons of paperwork laying around. We put in Sharepoint along with a product called KnowledgeLake. Knowledgelake reads bar codes off documents that are printed or scanned to a network drive, grabs metadata from a SQL Server based on that bar code, and files the thing. It is really no hassle at all. Knowledgelake also adds a search component that is much better than the Sharepoint Search so finding documents is really really easy. There is also a client program for the Knowledgelake system that lets you right click on a document, pick a document library to send the document to, and manually input the key field to grab the Metadata and file the document properly.

      I don't know what types of documents you are looking to index but all MS Office documents integrate with Sharepoint, obviously....but the real issue is other file types. Autocad Files, for Example, can be integrated into the Sharepoint System using third party applications (We ended up not going that direction so I can't remember what it is called..the company is named Bentley maybe?) and I am sure there are many other programs that have similar applications written for them.

      So yeah, you can make fun of me for sounding like a Microsoft shill but I evaluated several other Document Management Systems and Sharepoint with Knowledge Lake turned out to be the one that the company felt most comfortable with....and it has served it's purpose well!

    35. Re:SharePoint? by Anonymous Coward · · Score: 0

      We use sharepoint, but it is an expensive overkill if all you want to do is manage documents.

    36. Re:SharePoint? by symbolset · · Score: 1

      Sharepoint is wonderful. I used to get all my cross-company plans, developments and projects from it. I could enter a couple of searches and have everything: executive travel. department budgets, next years product strategies, customer and vendor lists, even skunkworks projects with circuit layouts and logic diagrams. Definitely a huge career pusher once the gig was over.

      And I was just a temp clerk in the mailroom. I wonder what people with privileges had access to.

      --
      Help stamp out iliturcy.
    37. Re:SharePoint? by Seraphim_72 · · Score: 1

      It uses the exact same folder/file paradigm

      Actually it doesn't. All the SP gurus tell you to never make folders, everything is a list of files. There is a shift of how things are done in SP, it really is a hurdle. Alfresco does the same sort of thing. Plus the files are data aware. SharePoint actually has a few good ideas, but things like Wave will eat it alive eventually.

      --
      Slashdot, where armchair scientists get shouted down and armchair theologians get modded up.
    38. Re:SharePoint? by Seraphim_72 · · Score: 1

      it has limited (well no workflow)

      uh? In reality the bitch about SharePoint is that it has too many ways to do a workflow.

      --
      Slashdot, where armchair scientists get shouted down and armchair theologians get modded up.
    39. Re:SharePoint? by Dadoo · · Score: 1

      Try not to select a solution that stores the files in a database

      If you do that, it becomes problematic to back it up. We've got around 10 million documents, taking up about a terabyte, and it takes roughly 4 days to back it up. (We only have to do that once a year, thanks to incrementals, but it's still a pain.) Lots of small files will kill your backup performance, every time.

      --
      Sit, Ubuntu, sit. Good dog.
    40. Re:SharePoint? by Seraphim_72 · · Score: 1

      Your link errors ... great paper on why I should trust yakabod.

      --
      Slashdot, where armchair scientists get shouted down and armchair theologians get modded up.
    41. Re:SharePoint? by moxitek · · Score: 1

      Possibly because business doesn't normally give a shit if a blind monkey with three fingers wrote the code as long as it just fucking works and they can see some cost savings or business benefit to implementing it. Us in the real business world are less concerned about "lock in" or what license something was written under and just want our shit to run and run well. MOSS enterprise search does a really kick ass job of indexing file shares and making them available in a really to use, easy to manage central location.

      Software is a tool to acheive a business objective. If I've got the best tool to do the job, I don't care what political/social dynamic the license of the code falls into.

    42. Re:SharePoint? by Itninja · · Score: 1

      Which is why I said it rocks as a document repository. All the wiki stuff is woefully inadequate. The vary concept of wiki really has no enterprise purpose in general (unless of course your enterprise is wikis). The recommended limit for content types and documents per library keeps the XML caching under control.

      In my experience, most (if not all), complaints about the performance of MOSS can be tracked back to someone who did not know (or chose to ignore) the recommened limits of the product.

      --
      I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
    43. Re:SharePoint? by Anonymous Coward · · Score: 0

      I've used Sharepoint, and it was relatively simple, but I still felt like it was a pain in the ass. I never really liked it.
      For what the OP suggests it for, it's too simple (not enough capabilities) really.

    44. Re:SharePoint? by jawahar · · Score: 1
    45. Re:SharePoint? by trendzetter · · Score: 1

      I remember endless stories of troubles with Sharepoint. I think it's very buggy by design (one example: it stores it's users on multiple places, not in 1 db). You need enormous quantities of hardware and a large amount of money on licencing to get it running. Microsoft is making lots of money on support I guess, especially since the release of this product. It works not well with non-microsoft software (like browers). When I was using it it had no support for non-microsoft formats, maybe this has changed. I think it's more proprietary than Alfresco which is released as open source.

    46. Re:SharePoint? by Anonymous Coward · · Score: 0

      That's completely inaccurate. It does have an ability to create a folder in a list, but it should rarely be used. Documents are classified by content types, tagged with metadata and a taxonomy. You find them based on those attributes, not anything to do with storage hierarchies unless you just really have no idea what your doing.

    47. Re:SharePoint? by Anonymous Coward · · Score: 0

      AND the vendor does not have an agenda/interest in releasing only (or preferring) the Windows stack AFAIK (IE, Hosting on windows server, proper DB)

    48. Re:SharePoint? by Anonymous Coward · · Score: 0

      Just a little nitpicking but the data isn't stored in a SQL Database. The documents are stored in the file system. Metadata etc. is stored in a database. But anyway Alfresco is generally a good choice.

    49. Re:SharePoint? by Anonymous Coward · · Score: 0

      LOL. Avoiding vendor lock-in is now an ideology? It's more like common sense. Of course in practice you can't always reach the best business solution in all senses, but that doesn't change the fact that all else being equal, avoiding lock-in is definitely a positive for any business. Sometimes you have to take a calculated risk and let yourself get locked in anyhow, but that's life, not an opposite ideological choice unless you really believe Microsoft is a religion.

    50. Re:SharePoint? by HavocXphere · · Score: 1

      Does it come with MS Clippy office assistant?

    51. Re:SharePoint? by Anonymous Coward · · Score: 0

      Actually, this is not the case. Although you _can_ use it just like a file system, this is not the optimal use. Documents can have relevant metadata assigned as part of a "Content Type" and then searched/sorted by type. As with anything else, the more up-front thought is given to the information being gathered, the more optimal the search result. This is true whether you are using Google, SharePoint, or any other search appliance.

    52. Re:SharePoint? by david_thornley · · Score: 1

      Businesses typically don't care much about whether they're using free or proprietary software (there are exceptions on both sides), but they at least should care about lock-in.

      There's always the chance that a business will have to stop using a given product, probably more for a proprietary product (which can just be dropped by the vendor) than a free software product, but not by all that much. There's always the chance that the business will want to shift to another product (and that applies equally to all software). This means that there's a distinct advantage for the business (although not typically for the vendor) to avoid such lock-in.

      Obviously, moving to another product will never be free; even if the new thing doesn't cost actual money, there will be a certain amount of training and general disruption. However, there's a difference between expensive and near-impossible. Free software can always be modified to get the content out, although that's not necessarily a very useful option. Proprietary software can be a real bitch, as can cloud-based products.

      However, an earlier poster addressed this for Sharepoint, Apparently, lock-in isn't a problem in this particular case, since it's fairly easy to get everything out in a more-or-less standard format.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
    53. Re:SharePoint? by Larryish · · Score: 1

      Can anyone recommend a good Alfresco tutorial?

      Got an ebook collection here that has gotten out of hand, and the Alfresco free version sounds juicy.

    54. Re:SharePoint? by scooterhanson · · Score: 1

      Thanks for pointing that out -- I accidentally added an extra slash at the end, so this time you actually can shoot the messenger. http://www.yakabod.com/library/downloadDocument.html?docId=10805

    55. Re:SharePoint? by Anonymous Coward · · Score: 0

      Actually the Blob binary data is not in the DB but in structured folders on the file sytem with links in the DB. Sharepoint is actually in the DB so watch out when the DB gets big.

    56. Re:SharePoint? by badkarmadayaccount · · Score: 1

      I love the matching sig...

      --
      I know tobacco is bad for you, so I smoke weed with crack.
  16. Sharepoint by Anonymous Coward · · Score: 0

    invest in sans and get a sharepoint server. you dont need sans for sharepoint though.

    1. Re:Sharepoint by cfryback · · Score: 1

      We run a EDMS system for our local council here - doesn't matter about the filename, it is how it is all indexed. Too many people here are thinking that you need to re-name EVERY document. I don't have any experience with Hummingbird, but what about HP's TRIM software? Yes $$$$, but it also has a WEB GUI interface. Just a thought.

    2. Re:SharePoint by Anonymous Coward · · Score: 0

      They're also a big user of the Google Search Appliance.

    3. Re:SharePoint by Shados · · Score: 1

      Manipulating the sql backend is a pretty bad idea. Its not quite -THAT- straight forward, since a lot of the elements end up crunched in one table in xml, so you have to be careful with that. Things are pretty duplicated and its not supported, plus it changes drastically between version, making migrations difficult.

      WebDav however is indeed the way to go (for documents), especially since Vista lets you map a webdav folder as a drive (letter), and Linux has tools to mount them like any other volume, too. Good stuff.

    4. Re:SharePoint by Anonymous Coward · · Score: 0

      Currently it feels like a 2.0 product (the magic rule is to never buy anything from Microsoft before 3.0

      So you must be quite excited about the upcoming release of Windows 7?

    5. Re:Sharepoint by XDirtypunkX · · Score: 1

      TRIM also has good Sharepoint integration if you're so inclined.

  17. Google Search Appliance by yakatz · · Score: 1

    Google Search Appliance

  18. Document Locator by Anonymous Coward · · Score: 0

    I'm not affliated with them, but I do use their product, and its a steal for the cost.

    www.documentlocator.com

    You get version control, auditing control, web access, and a bunch more stuff.

  19. Cygnet by Rob+Kaper · · Score: 1

    Cygnet ECM might work for you.

  20. Documentum by trondwn · · Score: 2, Interesting

    use EMC document solution, where you have all documents i central database with metadata that can describe content. And can be accessed thru cached server from different sites.

    1. Re:Documentum by Anonymous Coward · · Score: 0

      We also use Documentum now after having terrible problems with an in house solution and then a Xerox that couldn't cut the mustard.

      We have over 10 million documents in a Documentum database , and generating thousands more every day. I just wish it was faster but I'd cannot say for certain it's the software our our hardware/network. Using an established piece of software designed to do the job is the only way to go.

    2. Re:Documentum by Anonymous Coward · · Score: 0

      Yes, once you start talking about "hundreds of thousands" of documents you need a CMS (a powerful one at that).
      Sharepoint probably won't cut it, you're only really left with Documentum and Filenet.

      And can be accessed thru cached server from different sites.

      To expand on this, you're basically running a Caching Server at each of these remote sites, when users request content it goes through Branch Office Caching Services (BOCS) to determine if there's a local copy of the file and that it's at the right version. If so, the download process (UCF) will get the file from the local cache automatically.

      Documentum provides lots of products around getting all the content in as well.
      Then, once you've got it in-house you can look at automating a lot of the internal business processes (expense reports, HR, etc) with the workflow engines.

      (and yes, I do work for EMC, and no this is not an official statement)

  21. ECM by Anonymous Coward · · Score: 0

    Look into Enterprise Content Management solutions, there are many. Many of them are very expensive but depending on your needs it may be worth it. Several examples are EMC Documentum, Alfresco, and even Sharepoint to an extent. Alfresco is open source so that may be a good place to start.

  22. Just the doc, or collaboration? by geekoid · · Score: 1

    If you need to use just plain documents, store then in on big directory, update the meta information.
    Let people move links onto there system and organize the links how the like, but don't let them move the documents.

    Think iTunes for documents. I loath that example since I have set this sort of thing long before iTunes came around.

    If you on collaborative use of your documents get something like this:
    Jive.com

    --
    The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
  23. Document Management to the Rescue by Anonymous Coward · · Score: 1, Informative

    Sounds like you need a real document management system.

    Depending on your requirements, you could go with something open source like Alfresco or one of the big boys like EMC Documentum or IBM/Filenet P8. Either way, you will end-up with an indexed repository of documents that makes it easy to to find old documents, add new ones, etc (assuming you and/or your integrator do the project correctly). It will also provide a web front-end so you don't have as much killer WAN traffic as you do now.

    With a good document management system in-place, you are also on your way to having a workflow and other benefits as well. e.g. When Bob submits a document with XYZ as an index value, automatically tell Joe that it is in and ask Joe to approve it. When Joe approves it, tag it "Approved", and let Jim know.

    Depending on your requirements for document retention, archiving, e-discovery, etc. the document management system can help you fulfill all of those automatically.
     

  24. Simple answer... by Anonymous Coward · · Score: 1, Interesting

    Hire human beings to sift through it and label each file with a numbering/labeling system devised by your engineers. The human mind is a relatively inexpensive and already well designed piece of machinery. A few dozen of them given enough time can work through those hundreds of thousands of document and get them sorted correctly. The problem you have, is that you have unsorted, improperly labeled material. It is cheaper to hire sufficiently (or even insufficiently) evolved groups of people than to invent a machine capable of doing so. And, with the economy the way it is, you'll be doing everyone a favor by giving them years of employment. When the Manhattan project needed to create a large excess of fissile material for the war with Japan, and with all the men away at war, they hired dozens of women to sit at machines; turning knobs, checking meter levels, verifying output. The scientists themselves did not even need to be there, they designed a process and the women were trained in it and followed it.

    1. Re:Simple answer... by Mia'cova · · Score: 1

      Not a great idea to pay kids min wage to organize all of your company's secrets. Presumably anyone with 100k+ documents has a good deal of intellectual property. They'd want a long term solution which improves productivity.

      So my thought is, if it came down to dumb labor, I would still recommend that they do it in house with the people who wrote the documents. It's a giant distraction but it has the best result going forward.

    2. Re:Simple answer... by budgenator · · Score: 1

      Yes I was think hire a librarian, that's what they do organize large amounts of documents for retrieval. A big part of the posters problem is legacy documents she/he should start there.

      --
      Apocalypse Cancelled, Sorry, No Ticket Refunds
  25. Document management software by Wrexs0ul · · Score: 4, Insightful

    Most print companies like Xerox have their own proprietary Document management tools you can buy, and a bunch of CRM and ERP solutions (like OpenERP - it's free AND Open Source) provide some good simple document searching and indexing tools.

    Really it comes down to how complex you want searching to be? Are there specific keys in the document you could index by? Do you require the full-text search capabilities of a Google search appliance?

    A really good solution I've come across for some clients in Edmonton is Called MetalTrace by Trace Applications. Don't let the name fool you about the specificity, software like this can Scan, Index, and even read barcodes on all sorts of documents then let people search for it via the web. Their "killer-app" has multiple user-defined document types with multiple search fields, combined with some back-filing (digital and scanning) really saved the day.

    Do your research though on "Document managment" and see what product best fits your needs. It's a really well established field so reinventing the wheel is a little masochistic... not that there's anything wrong with that. ;)

    -Matt

    --
    --- Need web hosting?
    1. Re:Document management software by Anonymous Coward · · Score: 0

      If you are an Aerospace company, you must not be ISO standard, which means you are under imminent risk of the mighty audit hammer.

      You're supposed to have a CHANGE CONTROL SYSTEM!!! for your documents.

      I like Agile, but you can implement it wrong. Parent post also has some good material.

      Just so you know, what you are about to install, will help or hinder the entire company.

    2. Re:Document management software by MyDixieWrecked · · Score: 3, Insightful

      Most print companies like Xerox have their own proprietary Document management [wikipedia.org] tools you can buy

      Document management software is great, but when you have enormous numbers of documents (100s of thousands like in the summary), it becomes necessary to have a content management system in place. Something that's intelligent enough to break the documents up into pieces and allow searches, but something more robust than full-text search.

      We've been using this software called MarkLogic Server (http://marklogic.com). It's an XML database and has a content processing framework for document ingestion. So, basically, assuming that documents are structured similarly, they can be converted into XML so they can be queried with custom weights being applied to content in different portions of the document. The software has built-in Word support so it'll automatically convert .doc files with proper formatting as well as the ability to add custom handlers for other formats including plaintext.

      We're currently managing a couple million documents and generating dynamic documents on the fly for some processes. Since on-the-fly documents may take time to generate, we have a system in place that saves the result in the database which can also be queried at a later date. It's all really cool.

      Of course, there's a bit of a learning curve to writing your own software for it since it uses XQuery, but it's not much harder to learn than SQL, and so far, it seems to be far more powerful.

      Disclaimer: I'm not a shill nor am I being paid in any way by MarkLogic... I'm just seriously blown away by what their technology has enabled us to do.

      --



      ...spike
      Ewwwwww, coconut...
    3. Re:Document management software by thoglette · · Score: 1

      There's a dozen or so companies providing software in this area, from littlies like Atrove to the big players like Xerox's Docushare.

      You have three problems
      a) MS windows does not work with large end-to-end delays. You are going to need something third party (sharepoint, as has been pointed out, is not a solution to your problems)
      b) you apparently don't know who owns your documents. You need to sort your documents by publisher, IP ownership rules and then publisher's ID
      c) I worry when a "midsized aerospace company" hasn't worked out how to identify; revision control drafts and baseline manage issued documentation.

      The problem has been solved for many years - the tools and best practice are constantly evolving (particularily with managing AV data).

      Hire a DM/CM dude from a proper aerospace company. Or two. Or even a properly qualified librarian.

      Finally, how on earth are you currently meet your contractual obligations?

      --
      -- Butlerian Jihad NOW!
    4. Re:Document management software by Anonymous Coward · · Score: 0

      As a publishing professional with successful CMS implementions under my belt over the years I can tell you that, while MarkLogic is a fantastic solution for discovery and delivery of content it is not a content mgt system. In fact, ML co-markets (or at least co-presents) with a mid-tier CMS from Really Solutions to address the difficult tasks of security models, versioning, and other critical functions.

      From the albeit brief description of the problem to be solved, I would lean very heavily towards an Alfresco based solution since OOTB it answers virtually every one of the issues presented, it's relatively inexpensive (forget the top-tier solutions, you wouldn't want the cost or upkeep...trust me, I'm the Application Director for a Documentum implementation). Alfresco was developed from the ground up as an enterprise solution for document and, later on, for web content management. It's robust, intuitive and very reasonable to implement and support.

      The only potential fly in the ointment (which all of the respondents that I read seem to be ignoring), is the remote office access issue. This begs for a central repository with branch office caching or similar capabilities...not something that lower tiered solutions usually provide or may not do well at. This will probably be the most challenging aspect of the project to 'get right'.

      Forget Sharepoint, unless you combine it with a real CMS. The Google Search appliance is fine with discovery but doesn't provide any mgt. tools. Much more to consider, but if you read this far you already know that.

    5. Re:Document management software by cmdean · · Score: 1

      The problem with document management software is that they require users to do some "extra" work filling in metadata. This fails. Generally users will not fill in more than title, adding keywords, short descriptions, file numbers are simply too much effort. When the metadata fails, the document management system also fails.

      I suggest you first look at geting a good enterprise search engine. Lucene(apache.org) is open source and free, MindServer (www.recommind.com) from Recommind is not but is amazing (I'm a happy client, not a shill).

      If your users can find everything they need to do their work, who cares how badly it is sorted or filed.

  26. Knowledge Tree by crackervoodoo · · Score: 3, Informative

    http://www.knowledgetree.com/ If you're looking for a no-cost (read as no license fee) option then Knowledge Tree Community Edition is a decent Document Management tool. We've been using it for a couple of years.

    1. Re:Knowledge Tree by Anonymous Coward · · Score: 0

      I second KnowledgeTree. It has some truly awesome features in the community edition, is easy to set up and well worth the licensing fees if you need some of the more robust features.

    2. Re:Knowledge Tree by Anonymous Coward · · Score: 0

      For those who are wary of open source install docs or hardware procurement can buy an appliance pre-installed with Knowledge Tree (or even other open source apps) here:

      http://www.networktoaster.com

  27. Enterprise solution: by Anonymous Coward · · Score: 0

    Don't know enough about your company, budget, policies, real requirements. But throwing Documentum at it is probably good. Either that or something simple like Sharepoint. Both provide rich web based access and documentum can support long term archiving and version control. I have no idea how google appliances would do jack for access. However, if you need search there are google or cheaper/better commercial solutions from companies that actually do it right.

  28. try wiki by bitsmith · · Score: 1

    JamWiki.org, for instance, has search capabilities built in. Has security built-in and easily mnageable. You can upload the documents and even migrate them to wiki format later. Keeping the documents in near-text open format will help you re-migrate them into the future sometime later.

    --
    A man without religion is like a fish without a bicycle. -- Ron "Doc" Ferrell
    1. Re:try wiki by evil_aar0n · · Score: 1

      Our documentation is not nearly as bad as the OP's, but when I considered an approach to wrangling this mess into a usable state, Wiki was the first thing that came to mind. Wikipedia seems to work pretty well, and supports thousands of users all over the place. Couldn't be _that_ bad, could it?

      --
      Truth, Justice. Or the American Way.
  29. Worldox by Anonymous Coward · · Score: 0

    http://www.worldox.com

    Document management is generally very good. Forces people to fill out required fields. I've seen it implemented in law offices.

  30. Anonymous Jonas by Anonymous Coward · · Score: 0

    http://cdsware.cern.ch/invenio/index.html

  31. ask google by Anonymous Coward · · Score: 0

    http://www.google.com/search?hl=en&safe=off&q=document+management+system

  32. I worked on by Anonymous Coward · · Score: 0

    a web application 4 years ago at Konica-Minolta. It is called DocuBreeze. I am not sure whether you need all the functionality it provides, but you may want to take a look. Google Docubreeze and you will find it.

    I am no way related to this company any more and I have nothing to gain from recommending this to you.

  33. Your Website by Anonymous Coward · · Score: 0, Redundant

    You forgot the link to your website: www.nasa.gov

  34. Knowledge Tree? by gilesjuk · · Score: 1

    I used an old version a while ago and it was pretty good then. Does versioning and other things.

    http://www.knowledgetree.com/

  35. Get yourself a good management system. by Anonymous Coward · · Score: 2, Informative

    While this may be an odd suggestion, here's two things:
    1) Get yourself a damn good document or content management system. Get it set up on the baddest machines you can afford.Overshoot the capability you need, so that you have room to grow.
    2) Get a librarian to look at the kinds of documents you create, and develop a system to catalog documents while maintaining reasonable standards for file names. As the super simplest system, maybe document names that indicate (at a minimum) what project or what overhead department they belong to, a broad category of subject matter, and if it's versioned, a version number.

    I tried to bludgeon a small company I worked for (around 40 engineers, one overworked Q&A person, and one system administrator) into moving towards a storage system for word documents that was not "Create a new folder for each version of the document set, place them all in the right folder, and if you don't Ray will eat your head." We wound up using (of all things) Perforce SCM to house fifty thousand word documents, and were starting on putting actual code revisions for automated test sets into the system when our avionics testing focus became a serious liability, and overhead workers were drastically cut. (Why have one Q&A guy and one system admin guy? We can get an intern to do BOTH!)

  36. Get a Document Management System by bsy-1 · · Score: 1

    Any of many document managment systems. They allow the extraction of meta data, which is in turn used to 'find' the document you are looking for. Nearly all contain some security settings and a viewer for many types of files. One thing to note. This magic doesn't happen by itself, if you get stuck doing this, be prepared for a. No one really knows how they want to do this, they all want to wonder if one of the many docs has their answer and have the correct doc located and opened for them. b. you are about to become a stranger to all those who know you outside of work.

    1. Re:Get a Document Management System by Anonymous Coward · · Score: 0

      We manage hundreds of thousands of documents with Laserfiche. It works pretty well and is not as expensive as some of the other systems out there. (still not cheap..) They key is to do your homework and find the one that is right for your needs.

  37. Indexing and Cataloguing by Zerocool3001 · · Score: 1

    If you don't like the idea of sending your information to google to have it indexed, you can look into some server side applications (with associated client apps) that do the indexing and searching for you. I'm not familiar with Windows ones (although I'm sure there are some) but there are quite a few for Linux and primarily Spotlight for the Mac. The option have the actual indexing done server side would save on your bandwidth tremendously. You may also want to consider using a different filesystem, one that has indexing capabilities built in.

    --
    Science will save us. The question is, will it destroy us first?
  38. Lots of ECM solutions out there... by jwilkins13 · · Score: 2, Informative

    Sure, with any number of ECM solutions. At the simplest end many of them simply enforce naming conventions; at the more robust end, they support many different file types for viewing, indexing, etc. and can also provide rich metadata on a document-by-document basis. Some of them have been named in the comments, including but certainly not limited to SharePoint 2007, Cygnet, Documentum, Open Text, FileNet, etc. Any system worth looking at has a web-based interface, at least for searching, and many of them offer for more meaningful interaction as well. Alfresco, Hyland, and SpringCM all have web-based ECM solutions and more comprehensive web-based offerings are available all the time. Oh - and if you're aerospace there are a number of regulatory requirements for information management you'll need to comply with, which does complicate the situation but spending the ducats for software and/or consulting help is probably cheaper than whatever your litigation and regulatory audit support processes cost today. Hope this helps, Jesse Wilkins ECM and other stuff consultant jwilkins13 at gmail dot com

    1. Re:Lots of ECM solutions out there... by NeoSkandranon · · Score: 1

      I don't think electronic countermeasures are gonna help in this case.

      --
      If you can't see the value in jet powered ants you should turn in your nerd card. - Dunbal (464142)
    2. Re:Lots of ECM solutions out there... by afidel · · Score: 1

      Enterprise Content Management.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  39. Wow by Locke2005 · · Score: 0

    "We're a mid-sized aerospace company with... satellite offices. Wow... apparently the state-of-the-art in aerospace is a lot more advanced than I thought! What kind of rocket do you use for commuting to those satellite offices?

    --
    I've abandoned my search for truth; now I'm just looking for some useful delusions.
  40. Shameless plug by Anonymous Coward · · Score: 0

    I work on a product whose focus is to address this very problem. Check us out at http://www.kalexo.com/

    It's integrated file/document/project management. It's targeted at industries that are geographically spread far and wide but need collaborative, secure access to common files to work on stuff.

  41. where is the slowdown? by the_denman · · Score: 1

    I think step one is to pick a storage/naming convention and stick with it. Also depending on your needs a document management system could help. The other thing I would do is look and figure out where the bottleneck is for your speed issue, is it the vpn connection, the network not being able to keep up, or the computer running samba. Once you know more of where the slowdown is work on that spot.

  42. Switch to Apple... by Tibor+the+Hun · · Score: 3, Informative

    I only partly jest, I know such a thing is damn near impossible to actually do, but in our Mac shop, such things are trivial. With one click of the mouse we enable spotlight searching on our Leopard AFP server and bam... all the clients have almost instantaneous search access to their docs.

    --
    If you don't know what AltaVista is (was), get off my lawn.
  43. nothing beats a folder structure and naming by fxdgear · · Score: 2, Insightful

    I'm gonna say nothing beats a proper folder structure and naming convention. I'd also recommend using svn. Also spend some time to develop some macros to assist in the creation/saving/retrieval of said documents from the repository. Maybe create some standard templates too... just my 2cents!

  44. Who tagged this delete? by Anonymous Coward · · Score: 0

    That's such a silly solution to the problem.

    Shift+Delete

    works so much better!

  45. OpenAFS by Anonymous Coward · · Score: 0

    OpenAFS will speed up local access, and also provide an automatic backup of important files at all the satellite offices. (could be a full backup if you mirror everything).

    As for the lack of any naming convention or other organization - first, the fact that you somehow manage to continue operating with a hundred thousand documents indicates that you actually DO have some form of organization in place.

    If it isn't structured - get on it.

  46. WebDav by SplashMyBandit · · Score: 4, Informative
    There are a few options:
    • For relatively unstructured data without versioning you could serve them over HTTP with WebDAV (Apache) and use your existing HTTP security mechanisms. You wouldn't believe how relieved I've often been when I can get my (secured) resources from home-base while located at a clients site.
    • My outfit uses KnowledgeTree for versioned stuff (http://www.knowledgetree.com/)
    • Or you could embrace your dark-side and use Microsoft SharePoint (plus, with all the Microsoft bugs you'd have a job for life until your employeer goes bust). If you are a friend to your company you won't do this, plus your outfit has engineers and the good ones can spot trash solutions.

    If you users are naming their files with strange characters in them (assuming it's not due to Samba) then they will just have to live with it, you won't have time to sort out all the wierd names that (mostly MS-Word) users give to their filenames. The primary objective should be to give your users access to the files. Making the directory listing pretty ought to be a secondary concern.

    1. Re:WebDav by Anonymous Coward · · Score: 0

      Or you could embrace your dark-side and use Microsoft SharePoint (plus, with all the Microsoft bugs you'd have a job for life until your employeer goes bust). If you are a friend to your company you won't do this, plus your outfit has engineers and the good ones can spot trash solutions.

      I've used Sharepoint and it doesn't have "all the Microsoft bugs" you are talking about.. sure there are bugs, but what software doesn't have bugs? Even if this guy uses another enterprise or open source piece of software, you think that software won't have bugs? And if it's open source I'm sure he will have time to sort through the code and fix those bugs. If he's a friend to his company, then he will find the right tool for the job. If that tool is Sharepoint, then so be it. And Sharepoint as a trash solution? Please. If those engineers really feel that way, then they have an incredible bias against Microsoft or are an "open source only" advocate. If they are that gung ho about what software to use and ignore the idea of using a tool that works, then they should GTFO. You will find many people (including a few around these parts) recommend Sharepoint as a effective document management solution.

    2. Re:WebDav by Anonymous Coward · · Score: 1, Insightful

      The weird characters could easily be taken care of by something like Ant Renamer (even supports RegEx). Just replace the weird ones with an underscore or some other suitable character.

    3. Re:WebDav by SplashMyBandit · · Score: 1
      "Sharepoint as a effective document management solution"

      You must work in a completely homogenous environment with the exact same desktop image and software install. For the rest of us Sharepoint is a relatively poor solution, requiring a specific client system, and usually a specific version of the o/s and productivity suite or lots of problems arise.

      When you work on client sites for very large organisations (that have lots of versions of "everything" due to accretion) you realise that the Microsoft Way of replacing everything all at once in A Big Rollout is actually quite flaw, rather than just sticking to standards that work no matter what version of Windows, Mac OS X, Linux, or Solaris (they're engineers, after all) is being used. With standards-based solutions you can upgrade your infrastructure piecemeal while continuing to provide access.

      I've found that the bigger the organisation I've been in (national level) then Windows is only on the desktop and some servers, the real heavy lifting is done by all sorts of systems (mainframes, DataPower devices, Un!x boxen). Sometimes the admins of these systems need corporate docs too. I have found Sharepoint to be an inferior solution in this kind of environment (yes, I have used Sharepoint before, which is why I've recommended other solutions that I've found to work better).

  47. Most big companies seem to use.. by fluffernutter · · Score: 1

    ..something like Filenet or SAP. Sound like you have big corporation needs, get a big corporation solution.

    --
    Laws are rules for the court, but merely a bottom bar to hit for life. Think beyond laws in your actions always.
    1. Re:Most big companies seem to use.. by Kadin2048 · · Score: 1

      I agree that document management is the way to go, but I would just point out in their defense that they're not just exclusively "big corporation" products anymore.

      All my knowledge is FileNet-centric, but at least with FN you can stand up a system quite easily; it's not a huge investment for what you get. I've seen deployments done for small and medium-size businesses (and relatively small departments within large companies) that justified the cost pretty easily in terms of not losing or having documents accidentally deleted, and being able to guarantee compliance and conformance to a backup strategy.

      Versioning is also a big plus. You can let people edit documents without worrying that they're going to wipe out anyone else's work -- if you don't like their changes, just grab the previous version instead. Most places I've seen introduce most of their file-share complexity because they try to basically do version control in a non-versioning filesystem using file names, and everyone does it a bit differently. Total mess. Much better to do it the right way and use some sort of version control system or ECM product from the beginning, rather than try to use bare-filesystem share drives until they're totally unmanageable and then migrate.

      --
      "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
  48. Sounds easy enough... by Anonymous Coward · · Score: 0

    If you need an easy way to find things, your looking at a good searching algorithm. In order to use a good searching algorithm I'd have to recommend the bubblesort first. That way you don't need to worry about the data for a good millenium or two!

  49. Mindoka Technology Corp. by Alethes · · Score: 1

    Mindoka (http://www.mindoka.com) has a document management product that is designed to solve the problem that you have.

  50. Riverbed Steelhead mobiles by DecepticonEazyE · · Score: 1

    Put Steelhead mobile on all the clients. Document transfer over the VPN will GREATLY improve. Since it's mostly text/pictures, there will be so much duplicate data that doesn't need to be transferred over the wire multiple times, the round trip time will decrease so much they'll forget they're on a VPN.

  51. FileNet by Ohio+Calvinist · · Score: 4, Interesting

    I worked at a place that used FileNet, which is now an IBM product, to do this sort of thing. We had millions of scanned documents in the system. I wasn't personally very impressed with it, in that whenever anything "bad" happened, you had to call IBM because finding support online was impossible, and at that they support wasn't very good. It was also a very picky system, those seemed to handle the load well. If you go with it, I strongly encourage doing it for UNIX/Oracle because it screamed "poorly ported" when we used it for Windows/MSSSQL. It has an API for integration, but it is also, poorly documented and would take some time to integrate into your existing business systems.

    This is more of a rant at this point, but it is a stop-gap solution that allows people to continue to use outdated business processes storing important data in image formats or in documents scattered about with minimal indexing/search capabilities, rather than analyzable "data" that can lead to "information." I always take the position that if the goal is something on paper, or the goal is to store something that "was" on paper, it is time to rethink the business process to see if we can automate it, or store/present the data electronically in the first place. The old school fights against it, but no one has ever been able to say it wasn't more efficent in the end and enabled IT to say "yes we can" when the next great idea came along versus "here is a stack of papers, figure out $trend."

    --
    Forgive my spelling from time to time. I'm often posting during short breaks.
    1. Re:FileNet by flnca · · Score: 1

      FileNET can be monitored using Tivoli TME 10 and the FileNET integration module and/or CALA (cenit Advanced Logfile Adapter). This way, you can automatically react to problems. BTW, there's another post up there from someone who had a better time with FileNET. ;)

    2. Re:FileNet by Anonymous Coward · · Score: 0

      Yeah, go ahead, fix IBM crap with more IBM crap, brilliant

      Meanwhile, I'd rather spend money in a system that works and doesn't cost a million dollars

  52. Technical issues aside by Vroom_Vroom · · Score: 3, Insightful

    Hire a document manager / clerk person who will create order. Your engineers won't.

    --
    Boing boing boing....
    1. Re:Technical issues aside by Seraphim_72 · · Score: 1

      The word you are looking for is: 'Librarian'

      And yeah, they are that good.

      --
      Slashdot, where armchair scientists get shouted down and armchair theologians get modded up.
    2. Re:Technical issues aside by James+McP · · Score: 1

      Engineers can they just generally don't. I spent three years working at a library so I have a fondness for good organization systems.

      I was the file system nazi at my last company, a civil engineering firm. I was hired for engineering and IT support right as they started implementing a standard. The nazi-ism started by marking every directory in the existing file store read-only on every project that was complete, according to the accountants.

      Then "create directory" permissions were limited to senior project managers and their one administrative assistant. I set up a script to check for new directories every day and I'd email anyone who didn't follow protocol. I pre-seeded the directory structure by getting the list of open project numbers from finance so in theory, everything billable already had a home waiting for it. For new projects I simplified things by creating little widget that asked for a project number and the contract name and it created the directory tree.

      We created a separate volume that contained data that was not project specific but may be needed across multiple projects. I.e. the various CAD standards (national, Corps of Engineers, DoD, DoT, etc) along with company/client logos, all the stock patterns/icons for the various CAD programs, etc. All the CAD programs were set to point to that shared directory by default to encourage the worker-bees to put shared data there so they wouldn't have to set project-specific directory over-rides.

      That directory allowed everyone to add data but only the CAD/marketing/PR/QA managers could delete/overwrite files. A report was generated monthly and send to the managers that listed files with similar names and extensions to make sure we didn't wind up with 25 versions of one logo or hatch pattern.

      This was staff-intensive but that's because capital expenditures were the devil since they couldn't be charged easily to a project. File management, however, is something that was billable.

      --
      I've been on slashdot so long I'm starting to get out of touch with the cool stuff if it ain't on slashdot.
  53. SQL... nuff said... by Youngbull · · Score: 1

    I think the right option for you would have to be ordering the documents in a database and serving them up through a website. I think that would be helpfull for your satelite offices since mapping shares through samba over VPN is sometimes unstable and always nontrivial. Besides the system doesn't seem to be working for you. You really don't have to be that proficiant with functional webpages to make something like this, especially if you use ruby on rails. A ruby on rails guy would probably use only a couple of hours to make such an application. Then you could have functionality like searching and sort by author, department, type and so on.

  54. Alfresco by SplashMyBandit · · Score: 2, Informative

    I forgot to mention Alfresco as well, although I've never personally tried it.
    http://www.alfresco.com/index-b2.html

  55. Just Don't Use Livelink by Myrv · · Score: 1

    Can't really suggest a good document management program but I can tell you one to avoid. We use Livelink at my place of work and its indexing and search capabilities are horrible (some would say non-existent). For example every document added to Livelink gets a document number assigned to it. One would expect to be able to retrieve that document by using the same document number but if you enter it into the search bar Livelink returns no results found. Huh? Not to mention some odd UI behaviours like when you add a folder to the favourites box the original folder disappears from the standard file listing (meaning there is no single canonical listing of files and directories, you need to always look in 2 places).

    1. Re:Just Don't Use Livelink by CodeMonkey22 · · Score: 1

      It doesn't return search results because you haven't configured Livelink properly, or you are using a very old version of Livelink.
      Searching on the DataID can work. You should read up on 'Best Bets' functionality, or how to use the Livelink Query Language.

      Regarding your second comment, is not called the favourites box, but instead 'Featured Items' and this behaviour is configurable in later versions, too.
      Upgrade Livelink to version 9.7.1 if you are not already there.

      Livelink is incredibly powerful and can be configured to do anything you need it to do, but the key is knowing how to do it. A skilled administrator is definitely needed.

      Full Disclosure:
      I work for Open Text and am a certified Livelink Systems Administrator.

  56. Institutional repository? by sidb · · Score: 1

    What kind of documents are they? If they're mostly text and you want versioning, the only drawback to subversion is getting people to learn the tools, but that might be too much.

    If they're archival/static documents, an institutional repository could work. Something like DSpace isn't that hard to deploy and will provide basic archival and search features.

    The middle ground between those two solutions is probably what you want, though. Everyone I work with uses SharePoint for that, and I hate recommending proprietary lock-in.

  57. Laserfiche by wguy00 · · Score: 2, Informative

    Laserfiche (or LF) is just what this is for. It is DOD, DOJ certified and crap, and is used by all branches of the military and several other areas of the government as their document management system. With several different software offerings, just about any situation can be taken care of. It's features include the ability to search based on document name, template information, or OCR'd text (which the software also takes care of). With add-on features such as Quick Fields, it may be able to automatically sort, add template information, OCR, name and then store the documents. It really is a nice way to go. Satellite offices can access and be either full or read-only users. It has the ability and modules to connect to just about any other type of data/information system (GIS, financial software, etc) and is very scalable.

    I was a tech for 5 years with a LF VAR. I'm not there anymore. We were constantly cleaning up messes left by other document management systems. Take your time with this thing and really plan your naming convention, folder hierarchy and user setup. It's easier to get it right(or as close to it as possible) then going back and having to fix it later. A good LF VAR should help you with this. Definitely check references of competing companies. Some VAR's are A LOT better than others.

  58. It's called a DAM system. Do some research. by Logic+Bomb · · Score: 1
    1. Re:It's called a DAM system. Do some research. by Chris+Mattern · · Score: 1

      Well, maybe he doesn't want your DAM system!

  59. I work for a part 121 air carrier by maric · · Score: 1

    we have extensive documentation and tracking needs. we use two sets of software for records and also keep a hard copy for long term storage. For tracking parts on/off and hours in service, TSO TSI etc... we use TRAX Evo2 We scan all written paperwork into a database which is interfaced with via Alchemy. This allows us to view the current status of all of our aircraft and their parts and track the paperwork for each action taken. Alchemy has a browser interface and we use IE to access it. this allows for a person to access the documentation from any of our stations and or offices internally on the network. Both Alchemy and TRAX are acceptable to our local FSDO. The hardware setup for this is not something I can shed light on as I do not get to play with computers that are ground bound. hope that helps, maric

  60. Organize.... by Fallen+Kell · · Score: 1

    As may have been pointed out, organizing the files is really the best way. Develop a strict schema for naming conventions as well as a hierarchical directory structure for maintaining and organizing. Something like:

    /projectname/projectpart/data (contains the final draft of any document) /projectname/projectpart/working (contains files that people are modifying so that they can be merged/checked in to the data dir) /projectname/projecttpart/misc (contains misc. notes or files that need to be filed with the project)

    The "projectpart" dirs are really just logical groupings of data/files for the project. Say you are designing a plane, well, break it up into relevant systems, like electronics, power plant, structure, etc., and each of those are the "projectpart" directories. The "projectname" is simply the overall project itself, be it the name of the plane, maybe the name of the contract, etc.

    --
    We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"
  61. windows Terminal Server by smalltimecrime · · Score: 1

    The OP did not mention exactly how many remote branches or computers need to access the documents at once, however, windows Terminal Server licenses aren't too expensive and the remote desktop experience is silky smooth. Also the documents would all reside on a central server raid array or NAS device and never need to travel over the internet to remote sites. This would also free up massive amounts of bandwidth over the VPN, considering TS just needs an internet connection and uses SSL encryption. (although I don't know what you would even need a VPN for after making this conversion)

    1. Re:windows Terminal Server by JustNiz · · Score: 1

      >> windows Terminal Server licenses aren't too expensive and the remote desktop experience is silky smooth.

      BWAHHAHAHAHAAHAHAHAHAHAHAHAHAHAHA

      Thanks for that. I needed a laugh. Silky smooth? Having to do anything remotely technical via Terminal Server is the biggest pain in the butt I've ever experienced.
      BTW if you're really not a paid shill for Microsoft then WTF are you smoking?

    2. Re:windows Terminal Server by smalltimecrime · · Score: 1

      How hard is it to set AD permissions? Just install your users applications once, Install printers once, map network drives once... on the TS etc. (and give your users appropriate permissions from the program's Installation directory....once) the only time I have ever had trouble doing anything "technical" over TS/remote desktop was trying to remotely flash update a Watchguard Firebox X500's firewall settings. FYI when I was describing TS as "Silky Smooth" I was mostly referring to the quick responsiveness of the mouse and crisp and clearly drawn desktop. Of course it is going to take a halfway experienced MS techie.

    3. Re:windows Terminal Server by Anonymous Coward · · Score: 0

      BWAHHAHAHAHAAHAHAHAHAHAHAHAHAHAHA

      You must be one of those freetards.

  62. Comment removed by account_deleted · · Score: 2, Informative

    Comment removed based on user account deletion

  63. Who else read this and thought... by tlambert · · Score: 1, Interesting

    Who else read this and thought... working in a satellite office for an aerospace company would involve a lot of cool travel perks?

    -- Terry

  64. Odd that the next story... by ak_hepcat · · Score: 2, Informative

    Odd that the next story has a great idea for document management right in the summary...

    Hadoop!

    --
    Support FSF: Stop thinking with your wallet, and think with your imagination. (cc/non-commercial)
    1. Re:Odd that the next story... by Anonymous Coward · · Score: 0

      You obviously don't know what Hadoop is. SOLR or Lucene (SOLR is an out of the box for the Lucene Libraries) -

    2. Re:Odd that the next story... by msantosn · · Score: 1

      Targeted Merchandising? First, create the necessity, show the solution.

  65. Sharepoint by jayhawk88 · · Score: 1

    ...seems like a natural solution for your connectivity issues, or perhaps whatever the open source variety of Sharepoint is. You really do need to tackle the naming convention question though. You can have all the file indexing you want, but sometimes a nice, logical, clean file name will get you what you're after much faster than any kind of searching.

    It's going to be horrible, painful, thankless work that will put you on the shit list of just about every department manager and administrative assistant ("You want me to rename how many files?"), but it has to be done.

  66. try this software by Anonymous Coward · · Score: 0

    www.Mindwrap.com

  67. Aerospace QMS by dwarf75 · · Score: 1

    What worries me more than anything else is that you claim to be a mid-sized aerospace company. If you are having problems finding documents, what happened to your traceability processes necessary for your QMS and how do you guarantee that employees use up-to-date documents? How did you handle the process in the past??? And, what does your QMS stipulate for records and traceability?

    1. Re:Aerospace QMS by icebrain · · Score: 1

      To be fair, that sounds like the aerospace company I worked at for a while. The giant Samba share drives didn't store certification data or production drawings or anything like that (such things were handled in a document-control system with version tracking and all that), but it was rather a big share drive for convenience... pictures, videos, presentations, department budget data, spreadsheets, etc. It was basically just an interdepartmental shared space for things that didn't need to be emailed or whatever. It was convenient because everyone could access it, and you could get to it from anywhere in the company (like if you had to present in another building; just pull it up straight from the share on the presentation computer).

      Stuff that needs versioning or document control should be handled through SmarTeam or Serena PVCS or something, at least.

      --
      The meek may inherit the earth, but the strong shall take the stars.
  68. Not quite what you want, but maybe similar enough by Anonymous Coward · · Score: 0

    In a previous job we dealt with the same problem but on a smaller scale: One main office with ~ 60 people with a branch office at quite some distance with ~ 6 people working there. In our case the problem wasn't documents but a combination of large profiles which had to be pumped through a VPN link over a rather narrow ADSL line at the branch office.

    In that case we placed an offsite login server which contains all the information that was also present on the main server, with nightly delta synchronisation. Users still use the main server for work that requires write acces, but we were able to offer ~ 300 GB of data locally, instead of over the network.

    We also placed a so-called WAFS device in both offices. This is basically a network optimizer which intercepts inefficient network traffic and wraps this data with compression in its own network protocol. Next to that it also caches network traffic which means that to some extent, often-referenced data / network traffic is also available locally. So far i've been positively surprised with the increased throughput we've shown (about a five-fold increase as compared to the old situation).

    Lastly, we've been trying to push a version tracker system as a basis for documents, but hit a lot of walls with users whom preferred their 'known' samba enviroments over a versioning system. It does allow for you to re-design your data structure for documents and string together old/related documents in an interesting way.

    Regardless, you'll have to rethink and restructure how you want to store documents, if only by using better directories and creating a 'method' which users will have to adhere to. And in the end you'll need some poor cheap students whom will have the pleasure of migrating all this data to your new system.

    Just my 2 cents.

  69. Re:I got 3 letters for you by Anonymous Coward · · Score: 0

    S V N

    Do it right, or just don't freaking do it.

    What SVN repository manages hundreds of thousands of documents between users that do not know how to deal with SVN?

  70. IBM OmniFind - a simple easy solution by sfalc · · Score: 1

    IBM OmniFind should do the trick, It indexes your files and then you can search the index very quickly. It also does caching of documents and other nifty stuff. It is based on Apache Lucene and there is a free (as in beer) version, IBM OmniFind Yahoo Edition. The free version will work with up to 500 000 documents. I used it for searching a number of networked drives with circa 50 000 files on them which it did very well.

  71. SharePoint by PIPBoy3000 · · Score: 3, Informative

    NASA is a big user of SharePoint, strangely enough. My coworkers run into their folks at conferences from time to time.

    I personally am ambivalent about SharePoint. Its roots are in document management, so it seems to do that relatively well. The publishing features are fairly nice as well. I don't think it's the best system for making web sites, but it may some day get there. Currently it feels like a 2.0 product (the magic rule is to never buy anything from Microsoft before 3.0).

    There are gotchas. SharePoint is tightly coupled with your clients. If everyone accessing the documents are using the latest version of Office, you'll be okay. If not, you'll run into problems. You may also need to throw a lot of hardware into SharePoint, as storing files inside of SQL has some built-in inefficiencies.

    Still, some of our users seem to love SharePoint, so it might be a good option for you.

  72. Good luck by kilodelta · · Score: 1

    When I worked for the state Attorney General's office as I.T. Director a request came into I.T. that immediately gave me an upset stomach. The request was for all documents on the server that contained the word "lead" as in the chemical element Pb. The issue was that the word lead and the element share the same spelling.

    I kicked in and wrote an app that generated a web list on the fly and had clickable links so the documents could be examined and then marked as part of discovery.

    I also brought in three Xerox 490's. Those were the hardware part of the document management system. I don't know if they ever got the servers for it but at least they had the gear. In the meantime I suggested using meta-data in filenames.

  73. New Hire. by deimtee · · Score: 1

    Hire a real librarian, it's what they do.
    On the plus side, you also get to hire a librarian. nudge, nudge, wink, wink, say no more.

    --
    I'm guessing that wasn't on their radar screen...
  74. Alfresco of course! by thule · · Score: 2, Interesting

    It can scale extremely well. It is the backend to Adobe's acrobat.com website! So you know it can handle millions of documents if you need it to. Sharepoint requires MS SQL Server for searching documents. With Alfresco, that feature is built in.

    Sharepoint is teaming software and not really designed for large document repositories. Alfresco has a teaming interface (Alfresco Share) and a more generic document repository interface.

    Alfresco can expose the repository via FTP, SMB, WebDAV, and a web client interface.

    1. Re:Alfresco of course! by Anonymous Coward · · Score: 0

      I agree! Alfresco is a winner

  75. Regular Expressions by EvilGrin5000 · · Score: 1

    Your solution:

    http://xkcd.com/208/

    --
    A black cat crossing your path signifies that the animal is going somewhere. -- Groucho Marx
  76. WIKI by unum15 · · Score: 3, Interesting

    Maybe not the best solution for this particular job, but man am I glad we started using Dokuwiki for all our scattered documents.

  77. There is a right way. by mrmeval · · Score: 5, Informative

    http://en.wikipedia.org/wiki/Document_management_system

    For that level of documentation you need to have a staff and get it properly indexed. You need a high level librarian. This would be someone with a masters degree at minimum in library science and at least a bachelors in information technology. They will not come cheap and they are a long term investment. The software is available, it is not trivial. Hiring a large number of people to recategorize and tag all the documents for the length of time that takes is also an expense but worth it. Once it's all in place maintaining it gets much easier.

    I've seen a system developed for Raytheon. They took all the old compartmentalized data Hughes had and put every scrap of paper through a scanner. It was exceptionally well done. This would display electronic files and would have the location of hard copy. Classified documents were in some cases indexed but were hard copy only afaik. There were some documents that were hard copy only, those were usually ones with an NDA or other restriction on making electronic copies. It had every thing mentioned wrt versioning and such. Documents spanned decades with hundreds of revisions and you could pull up and view any revision. Depending on how recent and what type of document you could view a change log. Older scanned ones did not have that unless they'd been important enough to reenter as modern documents which meant OCR or manually transcribed. Some schematics were reentered into the system in a modern format. The effort was worth it. Having that data is the only way some devices or parts could be made or repaired.

    http://en.wikipedia.org/wiki/Document_management_system

    --
    I'd go on a Vegan diet but the delivery time from Vega is too long. --brownkitty
    1. Re:There is a right way. by mucous · · Score: 1

      I don't know where to put this, so I'll put it here. This is not a problem which can be fixed with technology. The organisation clearly has no recordkeeping policy. It has no information management policies. No one is trained, no one is monitored. They don't need a new computer or a bit of software or even a librarian. They need a trained records manager, a policy, a lot of change management, high-level support and about a year of hard work to straighten out the mess. The best way to fix problems like this is to avoid them. Failing that, you've got a hard slog ahead of you.

    2. Re:There is a right way. by mrmeval · · Score: 1

      Yes that would be the overhead I missed. I've not heard of the title "records manager" but that is what I meant when I used librarian. The person I talked to was over all of their records pertaining to proprietary data concerning software, hardware, build instructions and the like.

      I am experiencing that ineptitude at my new job. IT all two of them, managed to LOSE all the data from a hard drive crash of a server. 10 years worth of design data go poof.

      How embarrassing is it to have to go to one of your contract board houses and beg for copies of your data back?

      --
      I'd go on a Vegan diet but the delivery time from Vega is too long. --brownkitty
    3. Re:There is a right way. by mucous · · Score: 1

      The New Zealand government has written up a handy guide to problems like this: What to consider prior to implementing an IT 'solution' to a recordkeeping solution http://continuum.archives.govt.nz/files/file/guides/g3/index.html It's actually very easy to understand.

    4. Re:There is a right way. by nil_orally · · Score: 1

      Why are you being helpful? This is Slashdot. Gotta agree though. There is a time to call in the professionals. Not having one got you into this mess, so you can't get out of it without one. No amount of software will replace a Librarian who will how it should be fixed and implemented.

  78. ls | grep by zindorsky · · Score: 0, Redundant

    ls | grep

    amiright?

    --
    If the geiger counter does not click, the coffee, she is not thick.
    1. Re:ls | grep by JamesP · · Score: 0, Redundant

      ls | grep

      amiright?

      NO!

      find | grep

      --
      how long until /. fixes commenting on Chrome?
  79. Old tech by DerekLyons · · Score: 2

    It's called an index or a bibliography. There exists a profession known as 'librarian' specifically trained in the creation of such and in the management of large numbers of documents.

  80. Filenet vs OnBase by Anonymous Coward · · Score: 0

    Sure, you could use a big solution from IBM (filenet) or one of the other products they make that compete with each other that were mashed together from years of acquisitions. Not to mention the large costs of "test" databases and extensive configuration.

    I have used OnBase from Hyland Software for years at my office. THey are a family run company in Ohio with google'esqe leanings. (see photos of the large plastic slides on Wikipedia) They have always been easy to use, robust, point and click configurable and they have the ability to screen grab from almost any legacy application. (COLD/DIP as well) Straight forward pricing...out of the box functionality. it just works.

    I strongly recommend you take a look for yourself. (i wont post any links....i am not a OnBase stooge bot...just a fan)

    1. Re:Filenet vs OnBase by kaatochacha · · Score: 1

      we use onbase, I like it. AND, one day when a tech was onsite for training, the entire home office was having a day off at Cedar Point. mmmm, rollercoasters.

  81. Cognidox by Anonymous Coward · · Score: 0

    In my last company, which was a leading semiconductor designer with a large document repository and several branch offices, we used Cognidox:

    http://www.cognidox.com/

    This worked well for us; it has good document workflow management, tagging, search capability, user rights management, etc.

  82. Oracle or Alfresco by steverar · · Score: 2

    We went through this for both document management and web front end for access. We looked through, Sharepoint, Alfresco, Oracle UCM, Reddot and a few others. We dropped most due to cost, functionality, and ease of use for non-developers to do page work. Sharepoint was dropped due to cost in an internet setting (CALs), no non-developer front end for page layout (they couldn't use HTML) and it stores everything in the database. From prior experience this made backup/restore difficult as it keeps the IP ofthe web site in the database when you backup. If you restore to a different machine it gets confused. It was between Oracle and Alfresco. You cannot go wrong with either. Both are extensible, either have what you need built in or can be added easily. Both are good for non-developers to use. Support is very good with either. We went with Oracle. While it did cost more it matched our existing infrastructure.

  83. Open Text - Document Management Solutions by CodeMonkey22 · · Score: 1

    This is built for the exact situation you described:

    http://www.opentext.com/2/global/sol-products/sol-pro-docmgmt-collaboration.htm
    You can either import the files into the system, or leave them in place, index them and use the search engines to locate the needles in your haystacks...

    About Open Text:

    http://en.wikipedia.org/wiki/Open_Text

    Hummingbird is a subsidiary of Open Text, the solution mentioned above...

    Full Disclosure:

    I am an Open Text employee.

  84. Google is the answer by BlackSabbath · · Score: 1

    Google?
    http://www.google.com.au/enterprise/mini/index.html

    Seriously, if you can't be bothered collecting/maintaining the metadata that more structured solutions require, then just let Google index the lot. It'll work just as well (or not) as it does on the Internet. Although its not free it seems reasonably priced. It could be a quick answer to your problem.

  85. SharePoint wiki by Anonymous Coward · · Score: 1, Insightful

    I know I'm gonna get hit for blurting out the Microsoft Solution but...give SharePoint a shot...

    Just avoid the wiki functionality like the plague. It completely sucks.

  86. Mac OS X Server - Spotlight Server by Gary+W.+Longsine · · Score: 4, Insightful

    Since your organization probably has Windows clients, you can only long for something as nice as Mac OS X Spotlight Server.

    Google Search Appliance is definitely what you want.

    If you have a mid sized company you definitely don't have the surplus of highly talented systems administrator talent laying about to run one of the document management systems that others here are likely to suggest. Be very careful going down the document management server path. It's far, far more work than you think it will be, than the vendor will tell you it is. Not simply more work for you, but for your IT staff and your users, too.

    The Google Search Appliance, by contrast, is "fire and forget". Plug it in. Turn it on. Patch it when Google suggests you do so. That's about it.

    --
    If you mod me down, I shall become more powerful than you could possibly imagine.
  87. ProjectWise by adamziegler · · Score: 1

    We use a Bentley product called ProjectWise. It is a document management system with file attribution among other things. It is primary useful for Bentley's line of products, but we have used it as an archival system as well as a working documents that are non-Bentley specific. No... I do not work for Bentley, but my job heavily uses their products.

  88. KnowledgeTree by Anonymous Coward · · Score: 0

    You could do worse than to look into KnowledgeTree
    http://www.knowledgetree.com/
    it's released under GPL2.

  89. Start with the WAN by PatJensen · · Score: 1
    Take a look at network-based WAN acceleration products that will significantly reduce the overhead of SMB/CIFS traffic. This will make it easier to index, cache frequently used documents locally and improve your WAN utilization company wide. It will even cache directory lookups and they will "feel" instant to the end user.

    A good example is Cisco WAAS, a cool video showing how it works is here: http://www.cisco.com/cdc_content_elements/flash/ans/index.html

    See here for data sheets and specs: http://www.cisco.com/en/US/products/ps5680/Products_Sub_Category_Home.html

    Cisco's solution is inexpensive and you can use your existing router investment to do all the heavy lifting.

    Pat

  90. A Document Management System? by Super+Jamie · · Score: 1

    Unsurprisingly, the answer to managing many documents is to use a document management system. There are several commercial and free products available, both linked here and on the Wikipedia page for Document Management Systems.

    I've worked next to the team who administered Bentley ProjectWise in a previous engineering job, which is expensive but definitely suited to your task. There may be other good options out there.

  91. DMS by jjshoe · · Score: 1
    --
    -- botsex is {grep;touch;strip;unzip;head;mount} /dev/girl -t {wet;fsck;fsck;yes;yes;yes;umount} {/de
  92. LaserFiche by Hadlock · · Score: 1

    We're using a Win3.1 app called LaserFiche on XP with > 250,000 documents and it's lightning fast, works with TIFF files and PDF and probably more. Includes file and folder permissions.

    --
    moox. for a new generation.
  93. try iPhoto by docbrody · · Score: 1

    Step 1: Print out all 100 thousand docs and draw different little smiley faces on each of them. Step 2: scan all your docs back in as jpegs. Step 3: import all those jpegs into iPhoto and use "Faces" to magically organize them - just like on the television commercial.

  94. Don't forget to add WAN Acceleration to the mix by Anonymous Coward · · Score: 0

    No matter what system you use, its still going to be slow. To overcome the slowness you will need something that makes the SMB/CIFS protocol less chatty. I would suggest:

    Cisco WAAS www.cisco.com/go/waas
    Riverbed www.riverbed.com

    As two great WAN acceleration products that will help you speed up document retrieval, access, and writes across the satellite link.

  95. Thunderstone by Darth+Cider · · Score: 1

    Check out Thunderstone. It's what they do, and they do it very well.

  96. The big guys use... by benow · · Score: 1

    Documentum, docushare, livelink, sharepoint. I've heard of documentum installs with 100m+ docs. It's quite good, but expensive.

  97. NetDocuments by bradvoy · · Score: 1

    Take a look at NetDocuments. It's a SaaS (Software as a Service) document management system. It handles millions of documents, can be accessed from anywhere, and is relatively inexpensive compared to maintaining your own servers.

  98. Alfresco (Open-Source) by Anonymous Coward · · Score: 0

    Look into Alfresco: http://www.alfresco.com/

  99. Garbage In Garbage Out by sexconker · · Score: 4, Informative

    It's becoming quite a mess, sometimes quite slow, and there is really no naming or numbering convention in place for the files and directories. We end up with mixed casing, all uppercase, all lowercase, dashes and ampersands in the file names, and there are literally hundreds of directories to sort through before you can find the document you are looking for.

    Slow. Upgrade your network and VPN. You know that VPN layer is just killing your performance.

    No naming or numbering convention. Get one.

    Mixed casing. Learn How to Properly Case Folders (and documents).

    Dashes and ampersands. Are they a problem? Aesthetically unpleasant? I personally restrict punctuation in a filesystem to dashes, periods, and parenthesis (unless the punctuation is a replicable part of the name of the file/folder).

    Examples:
    01 - The First Track (vocal)
    02 - $lashhvertisements Attack!
    03 - Where Have All the A.C.'s Gone

    Develop your own method that works and be obsessed about it to the point where you would reburn a disc if one of the filenames was "01-Name" instead of "01 - Name".

    Hundreds of directories.
    Each file should have it's own folder.
    "That's insane!" you say. Start out with this mentality. If there is no reason at all to separate two files (they are part of the same thing) then place them in one folder, and make sure the folder is named all-encompasingly. Repeat for all files. If you get into a AB, BC, but not ABC situation, the solution is to have A and B and C, with A and C linking to B with your choice of shortcut/link/symlink/etc.
    Do this until all files are in folders. Then repeat with folders.

    There is NO substitute for organization and getting people on the same page. Develop some conventions. Task people to fix as they go. Check up to make sure people accessing documents are fixing as they go, and doing so according to convention. Once people are used to the convention, and once things are relatively organized, they won't ever need to search again. They'll instantly know where 99% of things are, and will be able to dig around and find anything else within seconds.

    The main problem you face is getting organized after already being unorganized. It isn't easy, but at least you're not dealing with millions of paper documents.

    1. Re:Garbage In Garbage Out by sexconker · · Score: 1

      By "replicable part of the name of the file/folder" I mean in regards to illegal characters in the filesystem/os. Windows claims these are ><\/:|*^?" for example (dunno if it's Windows, NTFS, NTFS+FAT+Whatever else windows needs to support).

      I didn't intend to do an example with /. references when I started. I wanted something showing the dollar sign, and then stuff with periods and a quote mark and a question mark (dropped). First stuff that came to mind. Had I planned it, or previewed my post, the first line would be "01 - The First Post (frosty mix)" or similar.

    2. Re:Garbage In Garbage Out by Anonymous Coward · · Score: 0

      Re " ... No naming or numbering convention. Get one. ..."

      Well, there; that certainly solves THAT problem.

      Re " ... There is NO substitute for organization and getting people on the same page. Develop some conventions ..."

      Google have certainly found a substitute. Come on, sexconker, the guy has a real problem and the few solutions that are around don't start with a huge - and impossible - purification task. (While freezing the document base while the purification is underway?)

    3. Re:Garbage In Garbage Out by Anonymous Coward · · Score: 0

      I think this is bad advice. Part of me wants to agree with it to keep things nice & orderly, but in the end I think it would slow things down a lot. You'll have to continually stop and ask yourself, "Where is the best place to put this and what should I call it exactly?". There will be so many ambiguous situations that it'll just be frustrating and time consuming - not to mention the upfront time spent on reorganizing everything.

      Just use something like a Google Search Appliance and get people to make a best effort organizing things in the future.

      Trying to get everything perfect is a pipe dream.

    4. Re:Garbage In Garbage Out by jrumney · · Score: 1

      Dashes and ampersands. Are they a problem? Aesthetically unpleasant? I personally restrict punctuation in a filesystem to dashes, periods, and parenthesis (unless the punctuation is a replicable part of the name of the file/folder).

      Examples:
      01 - The First Track (vocal)
      02 - $lashhvertisements Attack!
      03 - Where Have All the A.C.'s Gone

      I'm not sure if you've done it deliberately, but all of your examples are a problem for cross-platform use. To answer the question, ampersands are always a problem, as they have special meaning in many contexts. Dashes are a problem only when they are the first character in a file name, where they can be misinterpreted as starting a list of options, and it isn't obvious how to make them be understood as a file name (quoting doesn't always work).

    5. Re:Garbage In Garbage Out by sexconker · · Score: 1

      Yes, it's intentional that I mixed a bunch of shit in.

      I seriously doubt these people are digging through all these folders in a gui and then feeding them to command line shit.

      Besides:
      command -input -file.ext
      command -input ./-file.ext
      ?

      My question about ampersands (and punctuation in general) was rhetorical. It means, "sort out your filesystem and OS restrictions and come up with a definitive super set of restrictions".

  100. Anonymous Coward by Anonymous Coward · · Score: 0

    Alfresco is likely your answer. We dumped the unbelievably expensive FileNet and jumped onboard with an Open Source solution. It can be done for free, but likely your company, like mine, would opt for paying a small license fee for support benefits. See http://www.alfresco.com/index-b1.html.

  101. A versioning system with check-in verification by Anonymous Coward · · Score: 0

    CVS, SVN, Git & friends with some sort of check-in verification scripts could provide what you look for. All of these
    can interact with your LDAP directory as well.

    A Tortoise client can provide Win Explorer integration and simplify user operations but a nice How-To with pictures
    could probably help you sell it.

  102. LLOL'd by Anonymous Coward · · Score: 0

    Literally LOL'd

  103. Just get OnBase, your own or have it hosted by Anonymous Coward · · Score: 0

    OnBase -> http://www.onbase.com/english/index.aspx

  104. DAM by Anonymous Coward · · Score: 0

    Digital Asset Management applications solve this problem; one of which is NetXposure (netx.net).

  105. How to make a vector space search engine in Perl by Anonymous Coward · · Score: 0

    The OP doesn't grok vector space. He should search for, "how to make a vector space search engine in 12 lines of Perl".

  106. file naming conventions and folders by PhantomHarlock · · Score: 1

    I use the 'job' system, which I learned from working at Digital Domain (the Visual Effects Company) and then passed it on to the Aerospace company where I now work.

    Effects companies deal with enormous amounts of data, and many different versions of a shot as well as all the elements that make up that shot, along with other data such as project settings files from software used in the making of that shot. They had a very specific file naming system to keep that all organized, and it was referred to as the job system, because first and foremost everything was logically separated by project.

    How that has translated for me into the Aerospace field is at the root of the main drive share, there are two primary folders, job and departments. Departments contains generic documents for each department such as forms, standards, etc.

    The 'job' folder contains several categories of jobs or projects, such as vehicles, engines, pumps, etc.

    Inside those are folders with the project name. Inside each project folder is a series of folders for different data types, such as solidworks, reports, proposals, documentation images, etc.

    File naming:

    File naming should be consistent, and I always start my own files with the date with year first, because I do not trust meta-data one single iota. I have had dates wiped out when a backup system kept a backup, but did not preserve the file creation / modify date on copy.

    After that it is the thing, then the version.

    So 09-06-10_widget_v01.sldprt

    version two should be exactly the same, with the number iterated up. There should never be a document named something_FINAL because you always end up with FINAL_FINAL_FINAL etc. :)

    Now, as you probably know, the difficulty is enforcing a uniform standard when people are busy doing actual work. Things get sloppy, things get messy. You have to keep up after people, and policing stuff like this is not fun. At Digital Domain is was an urgent necessity for everyone to use the standard and there was automated software that relied upon it. At the aerospace company, I gave up years ago trying to enforce a perfect policy. Now, people generally follow the example I set to a point where you can easily find things. When I first got to this company, when it was really small, all files were (seriously) piled nearly in a single folder. This was when the company was very small, but it was already a disaster and it was impossible to find anything. People were used to working on their own computer and did not have a concept of a shared file server, at least not in a modern sense.

    Now you can just swatch down the left pane in windows explorer and get what you want very quickly.

    This system is designed to use the left pane (lots of folders for organization) and people who were used to the Windows 3.1 way of double clicking through folders without the left pane had to change their (awful) habits. That was the biggest concession among the old school users.

    The trick is also not to over-do the nested folders. Just enough to keep it nice and tidy.

    Every once in a long while you run into a file that really wants to belong to several folders, and that's what shortcuts are for. Even if the shortcut gets broken you can look at the shortcut file to see what it originally pointed to, and you can probably find it that way.

    At home I use the same methodology to archive 30,000 photographs. I can find anything in an instant by expanding folder icons. When that fails, plain old windows search is able to turn up what I am looking for, in those rare instances.

    I have always been against anything that 'collects' your files into meta data, such as iTunes, or various photo editing programs. It's a big mess because one day that software won't be around and your files will be a mess.

    Even my MP3s are organized by genre/album/1.song.MP3. I just drag album folders or songs into Winamp and I am off and running as my own DJ. I don't use a media organize

    1. Re:file naming conventions and folders by PhantomHarlock · · Score: 1

      I should also add that this works with a small to medium sized companies. large corporations in an enterprise environment must take on more intricate data management policies.

      Digital Domain had / has around 300 employees, and this company has less. DD at that size already had internal tools to manage and archive the files, and check for compliance with the structure.

      I also did not address the issue of compartmentalization and security for classified vs. non classified material. The government has its own IT security standards that you must adhere to when dealing with classified information.

    2. Re:file naming conventions and folders by smitty97 · · Score: 1

      The 'job' folder contains several categories of jobs or projects, such as vehicles, engines, pumps, etc.

      Inside those are folders with the project name. Inside each project folder is a series of folders for different data types, such as solidworks, reports, proposals, documentation images, etc.

      File naming:

      File naming should be consistent, and I always start my own files with the date with year first, because I do not trust meta-data one single iota. I have had dates wiped out when a backup system kept a backup, but did not preserve the file creation / modify date on copy.

      After that it is the thing, then the version.

      So 09-06-10_widget_v01.sldprt

      version two should be exactly the same, with the number iterated up. There should never be a document named something_FINAL because you always end up with FINAL_FINAL_FINAL etc. :)

      Because of the way SolidWorks looks for and uses referenced files, having the version number as part of the filename is BAD. How on earth do you update your assemblies when youve got to replace all the files you modify all the time? Either they dont get updated or you spend a lot of time with SW Explorer's Replace command.

      If you ever move to a PDM system, and you should- even the free Workgroup one that's part of SW Office Pro, it will treat foo_v01.sldprt and foo_v02.sldprt as different files altogether, not versions. In Workgroup PDM, the version is a custom property; in Enterprise PDM its in sql. We have "job folders" similar to yours, with Docs (MS office, etc), Drawings (all CAD), Photos, Correspondence, etc. We moved the solidworks stuff out to workgroup pdm. It also takes care of write access issues when a few people are working on the same project- people can take "ownership" of a file they would like to change and check in a new version.

      Give it a shot- you can set up a workgroup vault on your local machine and play with the settings. After working with it for a while youll wonder how you did without it.

      --
      mod me funny
  107. All CMS is crap, total crap by wsanders · · Score: 1

    Oh no, not another CMS.

    I've never seen a CMS that was anywhere near up to date.

    The only way to index more than a few dozen documents is to use Enterprise search.

    For the really cheap, you can install Google Desktop on the PC that holds the Enormous Shared Drive, and then let people log in via Remote Desktop or VMC and look stuff up. (Is there a Google Desktop API?)

    You eventually could have a lot of people making personal indexes of the Enormous Shared Drive with Google Desktop, which is going to cause problems that will motivate you to obtain a real enterprise search package.

    --
    Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"
    1. Re:All CMS is crap, total crap by Anonymous Coward · · Score: 0

      Shoot - if you were going to do something low end like that you would just load Windows Server 2008 and turn on the indexing component and let the client queries go directly to the indexer. This works sweet and gets rid of that silly Samba crap and solves the case problem immediately.

  108. Don't use anything from NextPage either... by Anonymous Coward · · Score: 0

    Oh man NextPage NXT sucks. Just stay away from it. Anything is better. It's consulting ware. You pay a ton of money for a mediocre product with mediocre support, and then a ton more money to pay their experts to set it all up and integrate it for you since it's so poorly documented.

    Their IIS plug-in also allowed unauthenticated users to shut down the NXT web site with a simple GET request. We accidentally shut down their support web site one Friday afternoon after trying a command that was listed in their own documentation on their support site from a web browser with no special access.

    Thank God I got a better job, so I don't ever have to work with that piece of crap ever again.

  109. Obviously by kitsunewarlock · · Score: 2, Funny

    Obviously throw them on the desktop. Once it fills, throw them into a New Folder. Once your desktop fills with Folders, throw those in My Documents. Repeat until your computer crashes.

    --
    Ginga no Rekshiya Mata Each page.
    1. Re:Obviously by Anonymous Coward · · Score: 0

      works for me, been using this method since win3.11 came out

  110. Find and Egrep by antirelic · · Score: 1

    Basic unix tools can do the trick. find (atime,ctime,etc) mixed with egrep, or just egrep with -R... all sorts of solutions, right at your command line.

    --
    20th century Marxism is not progress...
  111. Why can't you use a database? by damn_registrars · · Score: 1

    I'm pretty sure there are databases that can store and serve up documents based on criteria. Couldn't you set up a centralized web server with an SQL backend that hosts those files for you? You would be able to then keep track of who is using which document and when, and regulate who can do what with different documents as well. As a bonus you should be able to ditch SMB while you're at it and move to a more robust OS for your critical files. Centralizing those documents would also make it dramatically easier to back them up at regular intervals.

    --
    Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
  112. Have you checked out.... by Anonymous Coward · · Score: 0

    Have you checked out IntraLinks? (www.intralinks.com)

  113. WAN Optimisation - Riverbed & Cisco WAAS by kava_kicks · · Score: 1

    This is not going to help you with your 'finding the right document' problem, but it is essential for your remote offices to be able to open (and save) those documents in a reasonable time. It will also have the added benefit of dramatically reducing your WAN traffic (think 50% reduction). When I initially trialled these, Riverbed was miles ahead of Cisco. That was 2 years ago, but they are still the only one with a remote client and a few other tricks. Well worth the investigation & money.

  114. SharePoint by rennerik · · Score: 1

    Yes, I know it's been mentioned before. Yes, I know it's Microsoft. But SharePoint is an excellent document management system. It supports clustering natively, load balancing, search, information rights management, web editing for most Office formats, InfoPath web-integration. Users can also save natively to SP via WEBDAV through Office apps directly, or through Explorer. There's a whole crapload more that you may want to check out at the SP site.

    To get yourself organized and imported, there are .Net libraries available for you to natively access SP and manipulate the whole system via scripts. Importing and exporting files is a cinch using these APIs. There's also exposed web services via SOAP that let you do the same thing. And, in the end, there's the actual SQL backend that is very straight-forward so if you don't want to use the SOAP or SP .Net libraries, you can manipulate the database directly.

    So no, you are not locked in. And, the licensing cost is the most reasonable out of all the document management software out there.

  115. SharePoint by Anonymous Coward · · Score: 0

    Microsoft Office SharePoint Server 2007 with search capabilities would be a wonderful place to store all of this stuff.

  116. Document Controller by Anonymous Coward · · Score: 0

    Before you look for a technical solution, hire a Document Controller.

  117. Real Men Use by maz2331 · · Score: 1

    Real men use an old TI-99/4A machine with a casette recorder, and files sent via RS-232 connections.

  118. Talk to a Large Lawfirm IT department by thinktech · · Score: 1

    Lawfirms are experts at managing millions of documents using document management software. If you want state-of-the-art document management. Then the software that lawfirms use is what you're looking for.

    --
    What's up with this box everyone has to think inside of or outside of? Why does there have to be a box?
  119. Documentum bad by KhaymanUCSD · · Score: 1

    I'm on the IT Applications side of things, not operations so my experience with this has been more as a user than as an admin (though I've helped that group on a few things)...

    ...but we implemented Documentum and have found it to be slow, difficult to deal with and I've heard no end of horror stories about how hard it was to implement.

    In all honesty we had a properly set up sharepoint (tsk!) solution at another company and it pretty much ran itself and did the job we needed it to do. YMMV.

    --
    Kneel before Sig!
  120. Talk to a Library Systems person by Anonymous Coward · · Score: 0

    ask about digital repository

  121. Very simple setup by massons · · Score: 1

    Simple.. use CVS. Documentation is centralized and de-centralized. You have versioning, log, comment, and overall this... it's free

  122. Filing Cabinet by Anonymous Coward · · Score: 0

    I suggest you put documents in filing cabinets, lots of filing cabinets. You Need a good indexing systems, but the documents will be pretty safe then.

    Pretty easy really, people have been using this technology for hundreds of years, its pretty stable, you dont have to worry about magnetic fields wiping your drives, or dyes leaking out the edges of your dvd/cd's, or file corruption, or power blackouts, or haxors getting in, or people deleting random stuff by accident.

  123. I've done this before. by Anonymous Coward · · Score: 1, Interesting

    I personally dealt with an issue like this at the Australian arm of large international mining equipment manufacturer. I wrote the software solutions mentioned and went on to do my engineering honors project in the area. My first recommendation is, stay away from document management systems, they are bulky, inefficient and tend to lock you into "their way" of doing things. As soon as you want something different, you will find yourself stuck. This is a simple problem don't make it too hard for yourself.

    My solution was multi-layered:
    1) Place exactly 1 person in charge.
    2) Enforce a naming convention. - Our CAD Drafters and Engineers (of which I did both) were notoriously bad at naming their documents correctly. Most of this was ignorance. Document your naming convention and make it well known.
    3) Write or come up with a standardized way of generating document numbers. In my current job as a software engineer I would recommend a simple, incremental numbered approach. Every document, every revision, simply gets a new number. Our engineers did not like this. So we went for a middle ground. Something like XXX-YYY-ZZ.eee Where XXX is the equipment type, YYY is the sub type, ZZ is the revision no, eee is the extension/file type.
    4) Standardize the way you store your documents. For instance, make a folder structure . C:\xxx\yyy\XXX-YYY-ZZ.eee
    5) Register ALL documents in a database with location, comments, purpose, revision, author name etc etc.
    6) Take the Draftsperson or the Engineer out of the archiving process. I wrote a utility that checks the a single "to be archived" folder, fixes obvious mistakes such as using "_" or "." instead of "-" and so on, checks the database to make sure that the document has been registered and then drops the into file system. Make the archive read only access for everyone except the person in charge (and any utilities of course).
    7) Clean up your existing archive. This can be a semi-automated process. I wrote a utility to do this partially, but it just takes a lot of painstaking effort. With 70,000 documents this was a slow and painful process but it can be done.
    8) STICK TO IT. Any exception will erode the system over time making it useless.

  124. Document Management Systems by anexkahn · · Score: 1

    There are a ton of Document Management systems out there, our company uses http://www.opentext.com/ look for DM You can use Microsoft Share point as a document management system, but it is not really what it was designed for. DM will integrate with all the Microsoft applications. It will give you document numbers, version numbers, etc... you can profile your emails as well if you want. We have had some performance problems for the remote locations, but it is still usable. I did a search for open source document management systems on Google and there are a ton out there if you don't feel like paying for something.

    --
    Curious about Storage and Virtualization? Check out
    1. Re:Document Management Systems by Shados · · Score: 1

      You can use Microsoft Share point as a document management system, but it is not really what it was designed for.

      Then please tell me what it was designed for, since a large portion of the default feature involve pure document management.

  125. Document Management System by Anonymous Coward · · Score: 0

    Look at a document management system. Interwoven makes a great one. Some things to consider:
    * Security
    * Version Control
    * Document History (Access, Changes, etc.)
    * Search Capability (Profile Search, Full Text Search, Date Search)

  126. Use Confluence by Dani+Filth · · Score: 1
  127. Microsoft Sharepoint handles documents well - by Anonymous Coward · · Score: 0

    Microsoft Sharepoint seems to handle lots of documents well. It includes document libraries, which are like folders, than you can store documents in. It also has a built in search function, which is described as being able to search through multiple levels of documents and retrieve results. It's also not too expensive. I think there are some specific web parts, or plugins, that even help facilitate document storage and handling.

    The only downside that I can think of is that it requires knowledge of .Net as a framework, but that isn't so hard to learn - check it out, it might take you a long way!

  128. Oracle UCM by Everything+Else+Was · · Score: 1

    I've worked with Oracle UCM (formerly Stellent) for a few years now and would thoroughly recommend it. It's scalable into (at least) the 10s of billions of documents. A single repository for Doc Management, Records, Web Content Management, workflow, imaging. It comes with security, library services, metadata, and search OOTB. Using the WCM, you can make your documents available on an intranet, extranet or internet site, according to specified security policies.

    BTW... offices on satellites... that's so cool! ;-)

    --
    My other account has mod points!
    1. Re:Oracle UCM by profaneone · · Score: 1

      I totally agree (on the UCM and the satellites). My wife is the sole admin for her company's (a power utility) UCM; only a very small part of her responsibilities. My wife is a civil engineer not an EE or computer engineer and her department needed a document management solution years ago. Prior employees had evaluated and installed the system. The IS dept is only brought in when an upgrade is installed; the hardware is managed by IS after all. The system is so easy to use that additional departments keep putting in requests to have their documents added to the system due to word of mouth around the company. In addition to increased productivity, the company has saved hundreds of thousands of dollars in paper/printing.

  129. An Inhouse System by sasha328 · · Score: 1

    We're an old engineering company, and our products last decades, so we need to keep lots of records.
    Recently, we started scanning old documents (a warehouse full of them) to make room for expansion.
    It is a very tedious process, because we can't risk shredding the old files unless we know for sure that the scans are correct. Amyway, for storage, we decided to go for an in house web-based system (some one developed it for us) that is quite basic, and does two important things for us:
    1- it references the file in it's location, rather than store the file in a database and copy it to the webserver
    2- gives us the ability to change meta data (the document indexes) as we find errors in them

    By referencing a file in it's "physical" location gives us two layers of access control: 1- through the database permissions, and the other one through file system permissions. this is important for restricted files...

    Obviously, searching is the important part. and indexing is absolutely critical and the most time consuming process.

    Someone suggested to us Google appliance, but non of the scanned documents can be searched. they are all images.

    The actual application is pretty basic concept (nice interface features, but the concept is simple)
    1- A database to hold the info
    2- a table per document type containing teh meta data and the filename and filepath
    3- a web interface to search and re-search to narrow down the list.

  130. Content Management Systems by BentonMiller · · Score: 1

    I'm sure it's been said by now, but you really should be looking at a content management system. There are several vendors out there that sell various types of document control systems; Pilgrim, Master Control, I'm sure Oracle has something that does that. There are also open source frameworks that you can develop in-house like Drupal. All of those are online document management systems. Users upload documents to them. File naming conventions can be enforced as well as directory structure etc. Many of them allow for document collaboration and approval. It's a complex problem, and a valuable solution will take some serious thought and time. I've heard some people use google documents, but for a company of your size I wouldn't recommend it. In any case, folders on network drives are NOT the answer.

  131. Two words... by roc97007 · · Score: 1

    Google appliance.

    --
    Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
  132. Document Locator by Anonymous Coward · · Score: 0

    Check out http://www.columbiasoft.com

    document locator google it, this is the solution we use

    manages all files in a sql database

    organizes etc.

  133. Document Management Software by Anonymous Coward · · Score: 0

    There's a bunch of different document management solutions out there. I'm very unhappy with the one my company uses, so I'm not going to mention it, but if you do a search for document management on google, I'll sure you'll come up with tons of stuff. There are probably open source solutions.

  134. We use ImageNow 6 by Nimey · · Score: 1

    I work at a midwestern public university in the USA, and we've been using this program for several years and a few versions. Backend can work on AIX, Linux, or Windows, and the frontend at least Windows (don't know if Macs or *nix are supported, we don't have many of those on users' desks). We probably have several gigs of imaged documents in this system, and it seems to work pretty well.

    You'll have to import all the documents into the system, of course. The company recommends certain tractor-feed scanners for this; lighter-duty ones are USB, heavier are SCSI. I think it also has a software printer emulator to let you dump e.g. Word documents into the system; how you organize things is up to you.

    --
    Hail Eris, full of mischief...

    E pluribus sanguinem
  135. It's all about Taxonomy and Metadata by TrekBody · · Score: 1

    Whatever the solution, you have to get staff to declare what it is on the front end. It's not all about the technology. I see some of the benefits of Sharepoint, but depending on your audience (tech-savvy or not) it may become a training issue. Prepare for change management.

    What I like about Sharepoint is the Office integration, the improvements over the last few years, document history (versions), and mostly, the ability to require metadata. If you have a taxonomy of topics, it will make it much easier to create a search appliance that can find what people are looking for. You may be forced to look at auto-classification if you can't get staff to do it, or hire knowledge managers (librarians) to properly catalogue. Trouble for us is getting to agreed-upon taxonomies and hierarchies across divisions (I'm in the knowledge management trenches here).

    A good way to start might be Sharepoint repositories, require a topic field, seed it with however many topics you can come up with, and leave an OTHER field so you can collect what you have not organized. If you analyze what comes into the OTHER topic, you may keep adding new topics.

    Find the logical buckets to start search before they think about searching too. Does your staff only care about 1 project at a time, break it up into project searches. Basically offer them one level of selection before they get to search - it may make things easier (if you are structured that way). They may look for something from a particular function - Marketing search vs. Operations search.

    Also, sharepoint can leverage active directory info, so you may be able to get some metadata automation (Docs from sales staff vs. R&D, etc.)

    Hope these points help. Contact me if you need more.

    --
    Jim - your name is Jim...
  136. Yet another document management system by Anonymous Coward · · Score: 0

    I work for a mid-tier medical company and we use Objective http://www.objective.com/.

    It has its limitations, but it indexes, searches and does version control. Oh, and the FDA know about it.

    No idea of the cost :)

  137. swish-e by ggpauly · · Score: 1

    I implemented swish-e, http://swish-e.org/ for a client with html and .pdf indexing (nightly) in 11 hours from a standing start (never used swish-e before).

    --
    Verbum caro factum est
  138. Alfresco by Anonymous Coward · · Score: 0

    Go with alfresco....can be a pain to setup but its a clustering champ

  139. Go for the little guy by Anonymous Coward · · Score: 0

    A company with the guts to challenge the big guys, IBM and EMC, and usually wins: http://onbase.com

    Besides, their office has two slides. One for speed (metal) and one twisty (plastic, like a playground!).

    They also have a hosted version.

  140. A cool web application :D by zeekren · · Score: 1

    Hi there, I am one of the developers of this nice web tool which in fact was designed to achieve the requirements you say, we are calling it anydata, but dunno if we'll need to change it's name as it's a registered trademark, at least you see our goal ;)

    http://devel.anydata.tv/

    Try it out with firefox if you don't want to see something ugly right now. It's a beta, but in less than 1 month you will see it complete. It looks like a filemanager, pretty well known user interface for browsing documents and information. This system ables you to store files, bookmarks, text notes, contacts and soon pgp'ed passwords for secure-sharing across system administrators.

    In short, keeps the 'tree-browsing' typical schema of filesystems plus generating and showing previews of documents, tagging, automatic keyword gathering from documents and a search engine.

    By the way, it's GPL :D

    Anyone interested just send me an email to kenneth at gnun d-o-t net and I'll give you a testing user or whatever needed.


    Cheers!

    Kenneth

  141. The ultimate document managment system by zerofoo · · Score: 1

    OK, so it is a bit hard to get your documents out once you put them in to this system, but man, does it tidy up a mess of documents.

    -ted

  142. Mediawiki with SemanticMediawiki / hire a li by Anonymous Coward · · Score: 0

    You could use mediawiki as a front end to your documents, possibly with the semantic mediawiki plugin.

    I'm serious! If all your documents have a URL, you can link to them from the wiki, and then build a comprehensive system of summaries, categorisation, and semantic data about the documents.

    But that's just one tool. There are many such tools. There's no magic bullet; you just need someone to organize all your data.. It sounds like you need something like a librarian, possibly you could hire one part time?

  143. figure out if there is metadata by Anonymous Coward · · Score: 0

    if there is decent metadata or the content is somehow indexable, you can try a digital asset management system, perhaps open source, to get some kind of organization and accessibility.

  144. Document Management System?? by bytethese · · Score: 1

    How about Desksite (formerly iManage) or PC Docs?

  145. Two paths by jocknerd · · Score: 1

    You could set up a Document Management System like Alfresco or god-forbid, Sharepoint. Or you could run OS X Server and let Spotlight index everything.

  146. Want it done right the first time? by Xadnem · · Score: 1

    I've got this car, and it doesn't run and it's got all these strange bits inside under this hood thingie. . . . Hire a librarian or someone with a degree in knowledge management who has experience in the corp world.

  147. First, you need a procedure, not a "Solution" by CAIMLAS · · Score: 1

    First, you're potentially dealing with more than one problem here you're trying to solve: slowness, and naming convention. I'm guessing they're somewhat related (large directory listings due to lack of organization), but there might be a deeper infrastructure issue that needs to be dealt with, too.

    As for organizing files, You need a naming convention for your project files, first and foremost. Throwing a bunch of disparate files at a CMS is going to do nothing but complicate things more (from a sane-management perspective).

    Data categorization is key. You need to figure out a way to organize it in a fashion which is both contextual to how people use it as well as how it relates to the other data (in, say, a project).

    For instance, you will want (at a minimum) the equivalent of user-level and group-level data shares. This would, in all likelihood, get kind of tricky with shifting working groups. For this there are multiple ways to use ACLs (as opposed to just user/group/all permissions) within Samba (with or without shackling the machine to a Windows domain/authentication server). ext3 and XFS both have the ability to use ACLs (XFS natively), last I checked. Ultimately, this would probably be better than just using user/group, as it would be more extensible.

    As for a Solution...

    Something to look into specific to samba, is the "veto files" directive for smb.conf. It is per-share. I am uncertain whether it supports regex (it didn't in early 2005 when I last used it), if it did it could be very useful for enforcing a specific namespace (going forward).

    I would recommend "enforcing" namespace. While this is likely a self-created problem (ie you or your predecessor did not set things up properly in the first place), you really need to push to your users the importance of this. You need to tell them "organize your files, it'll make things faster" if there's any bitching.

    There was an article in LinuxMagazine a while ago about determining the age of data. Utilizing this in some sort of auto-sort script to move "old" data to a "pre$date" directory within the original messy directory might speed things up. Also, archiving (or at least moving it to an "old shit" directory) past, unused data is important. It eases the "human element" of data organization.

    Projects should all have a reference number (because there is, in all certainty, hard paper associated with the projects, and sometimes you need to cross reference). Keeping this consistent is important. Use what works, keep it short/demarked so users don't avoid using them. I like each project folder to have the project number to relate to contract/etc. start (short) date (eg. 080112 for Jan 12th, '08) followed by a 2-3 digit number (depending on how many projects are started per day) followed by major revision. End result: something like "080112.01.a Jennings Construction" Or organize by client ID. Or something.

    Requiring and/or encouraging project naming conventions through the managers (at the bequest of your manager/CIO/whomever, or just pleading) might also be worth a try. One department out of 5 doing it would be better than none.

    IMO, once you've reached this step, you can consider putting it in a CMS to help perpetuate/encourage the organization. But remember that a CMS is not a panacea, and might even complicate things further (ie, instead of navigating to a file, -everyone- just searches the whole index, slowing things down further).

    --
    ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
  148. Alfresco by brassmaster · · Score: 1

    A document management system is a must for that many documents. Check out Alfresco. It's open source and as such isn't outrageously expensive like it's competitors. If setup seems too daunting for you, check out tsgrp.com. Technology Services Group is a consulting firm in Chicago with experience working with Alfresco and may be able to make this transition easier for you.

  149. an idea by Anonymous Coward · · Score: 0

    Depending on what you need to store this might or might not be of help. Here we I work we use Atlassian's confluence. We've created spaces for each team and then have pages for things like 'manuals', 'system documentation or whatever and attach the files to those. The attachment can then be linked to the page.

  150. You need a Document Management System... by Derwood5555 · · Score: 1

    Or DMS. Commercial packages include Docs Open, and Soft Solutions.
    Open Source DMS = http://mydms.sourceforge.net/

  151. Manage Docs by Anonymous Coward · · Score: 0

    You could use Documentum. Not inexpensive. It can manage anywhere from 1,000s to 1,000,000,000 docs. Support for remote cache servers is available via several methods. Security, H/A, D/R, distributed docbases, and much more can address a very wide range of problems.

  152. An Aerospace company without process == FAIL by Platinumrat · · Score: 1

    It sounds like they're heading for an epic fail. Aerospace == Process + CMS. They will never survive the NTSB audits and safety Nazi without both. They will need to prove the Change trail for every nut/bolt/software path/data item/paper clip and who authorised/designed/checked/tested it for the rest of their natural lives. So if they don't have Process + CMS, they are screwed beyound belief. To me it sounds like a medium sized software house, that's decided to switch to Aerospace because it's cool or high tech or the marketing guys sold some product.

  153. This seems interesting: by diitante · · Score: 0

    Checkout http://www.dspace.org/ Cheers m

    --
    $ whatis msft msft: nothing appropriate
  154. Document/Content Management Systems by Anonymous Coward · · Score: 0

    I think you are in over your head with this issue. The short list of solutions will fail without clear backing from your executive management to provide "incentives" for users to help whatever new system you deploy be successful.

    The flexibility of your user community will be a huge factor in this solution. The rigor with which documents contain useful metadata and whether the documents are specifically organized or just stored anywhere in the current system(s) will be factors too.

    Here's the list of products that I'd start researching in order:
    - Xerox Docushare
    - Alfresco
    - MS-Sharepoint
    - Filenet ... something
    - EMC/Documentum

    There are other possibilities too, but if document versioning is required, be certain that capability is part of them at the start.

    The simplicity of Docushare is simply amazing when compared to **all the other solutions**. It is worth the first look for anyone dealing with CMS/DMS.
    I've deployed Docushare, Alfresco, Sharepoint, and Documentum. By far, users were happiest with Docushare and this was back in 2000. I can only imagine the progress that's been made since. It isn't the cheapest nor is it anywhere near the costs of the last two which usually require huge infrastructure and expensive per-user licenses.

    Sharepoint had so many issues that it was worthless as a DMS. Heck, searching didn't work. It does have some other interesting features that can be useful in an open, trusting environment, but these are not useful when record level security is a requirement. It's been about a year since I saw sharepoint. It appears cheaper than Docushare and could be a good fit for trivial needs.

    Like I said before, you are in over your head and need to hire a consultant to gather real requirements, learn your workflow, and help you select the best answer to trial in your environment. This isn't really the business that my company is in, but you can contact us at http://algoloma.com./ We aren't affiliated with any of the options listed above and these opinions are my own, not necessarily that of the company.

  155. I wrote a few articles about that by nbauman · · Score: 1
    I wrote a few articles about that for Law Office Computing magazine, so I'm very interested in these comments. It was a long time ago, and the software has changed, but the concepts are still the same.

    http://www.nasw.org/users/nbauman/txtsrch.htm

    http://www.nasw.org/users/nbauman/lawdb.htm

    http://www.nasw.org/users/nbauman/discover.htm

    They were imaging and indexing up to several million documents. During a civil suit, in discovery, companies on each side of the lawsuit have to disclose every relevant document to each other.

    Lawyers probably use the most flexible and all-encompassing systems, since they have to deal with every industry, every profession, everything. They also spend more money on their systems than most people can afford. They told me it costs them about $1 a page to thoroughly index big databases.

    Information scientists told me the best model of a document database was PubMed, which indexes virtually every significant published medical article. http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed

    The big limitation of Google is that you can't search too well by date. Another limitation of text searches is that you can't search for concepts -- just words. Sometimes words (particularly names) match concepts very well, but if they don't, you've got a problem.

    Yeah, it would have been nice if you had set up coding and naming conventions at the beginning, so the original authors could have sorted them as you went along. It may be difficult or impossible to go back and re-code them after the fact. It could wind up costing $1 a document. OTOH, you could be lucky -- some industries have been using standardized filing schemes and standardized jargon since the days of slide rules and T-squares.

    There should be standard filing schemes and procedures throughout your industry, so your solutions may be industry-specific. There should be consultants that deal with your industry who would be happy to talk to you (for the prospect of maybe getting your business). There should be trade magazines in your industry that have covered the same issue for companies of your size. (Hell, if the price is right I'll write a roundup for them.) Or you might have a trade or professional association with some friendly people who have done it before. Trade and professional associations usually have a computer or information technology section, and if you're a member of the association, you can call up the members of the section.

  156. Or you could use the cloud by Anonymous Coward · · Score: 0

    You could have them hosted online, something like Google Docs.

  157. librarian by confused+one · · Score: 1

    everyone is talking about document management software and search appliances. You're going about it all wrong...

    Hire a document management staff.

    Librarians. Hot librarians.

  158. DAM and Extensis Portfolio + Filemaker by digitalcurator · · Score: 1

    Back in the 90's I helped create a media department for large textbook publisher. One of the first projects was an asset library and tracking system. To this message brief. We first needed a naming convention. Look for a constant throughout your products, ours was ISBN numbers. That became the main identity of the product/project and their main digital folder. Every item or product was dropped in a sub folder such as images, design, text, etc. From here the main folders were always scanned by Portfolio and it was told/programmed that the main descriptions should come from the folder names. This allowed anyone with knowledge of the product ISBN to find details on the project. It also greatly minimized keyboarding of metadata onto the files needlessly. Portfolio then will allow check in and check out (versioning) to stay abreast of any edits or updates. The whole metadata catalog would also be exported and brought into Filemaker for secondary backup. Look to a constant for naming convention, keep it simple, look at ways to minimize keyboarding metadata, go over the counter (they are much easier to work with and you can experiment-they are also more than capable of handling 100K documents). Last. Good luck and if needed look for help.

  159. Subversion by lars_boegild_thomsen · · Score: 1

    Why not use subversion? Files will be accessible using a subversion client (including log + history), as webdav (only current version) and through a standard browser (read-only).

  160. Document Locator from ColumbiaSoft by ASBands · · Score: 1

    The company I work for uses a system called Document Locator. It is a Windows-shell integrated document management system. Basically, if you took Subversion and gave yourself extremely fine-grained control of repositories, folders and the like. It scales decently, too -- we have millions of documents spread across 25 major repositories, many of which include AutoCAD, Bentley Microstation, Smartplant 3D and other sizable files. The system is also fairly extensible, as we've built quite a few internal applications off of the DL system and there are plenty of third-party plug-ins available (a notable one being Brava, an application that allows adding QC and other markup to repository files). And if you don't want to be constrained to Windows, there is a web client available, which works decently. While it is not without its problems, the overall experience has been pretty good.

    Full disclosure: My company is ColunbiaSoft's largest customer and, as such, we know a good deal of the development team.

    --
    My UID is a prime number. Yeah, I planned that.
  161. Query Based Document Management Software by indytx · · Score: 1

    My last company relied on a program called isys to index and search documents and email. You don't have to worry about what a document is named, just the type of content you're looking for. This solution can save a lot of time, especially if your users are good and phrasing queries. On the other hand, I did not have to maintain it, so I have no idea how much administration time was devoted to keeping it working.

    --
    Make love, not reality television.
  162. I feel your pain by Anonymous Coward · · Score: 0

    I was on a team that implementedthis for a very large Aircraft Engine company in the 90s. (Still cooking along today). I'll outline what we did (overkill for you but the principles are the same andthe techniquesmay well be borrowed.
    We had over 3 million drawings mostly E size scanned at 200 DPI in our spinning cache. Millions more on optical jukebox (10 inch write-only platters.) (when we were done, scanning was an early part of the project.)
      What we were moving to was not savig the drawings but the 3-d Solid models. We standardized on one CAD/CAM software solution. Sometimes we used others for conversion and interfaces but get one standard. We did have a long standing drawing number and naming convention but when we were looking for files now we were getting into new territory and the naming was breaking down because people start by screwing around and then get discipline later (except they don't) .
    We started by fixing what the process was rather than trying to fix the data first. We had a large network of Unix and Windows workstations and used AFS to be the "official" file system for where the final drawings were stored because of the ability to create an abstract file structure with security and consistency. This would be your analogue to NAS. When files were issued,they became read only and stored with a serialized number with the path in a database (This is the Data management system you don't have but I used to call it the bag and tag system, the programs come down to a database with a number of seachable fields and a pointer to a filename path with a unique serial number identifying that version of the file.) Get get a copy of that file, you log on to the database, find whatyou want and it copiesthat serialized file to your local path and renames it with the proper drawnig/part number.
    We actually got into the drawing formats (the "frame" around the drawing where the drawing number goes and turned that into all parameters. When you werre going to save the files for sharing and adding to the formal process, the drawing fromat forced the proper drawing number and other official info (engineer drafter issued by etc... all were parameters. When you saved the file, it created the filename based on the drawing number and part.suffix.This took care of standard filenames (not paths yet). We then created a script (actually me) using Perl and Tcl/TK that did all the leg work of a simple electronic sign off system. We had acces to the full electronic signoff systems and found them too inflexible and in general a nightmare to use. Our engineers had good disipline WRT drawings and when to issue them so our system made use of that with a few users with certain roles. An engineer could theoretically sign off on someone else's model but no one would unless authorized. so we had several roles when someone wanted to issue a drawing, they chose who to notify mynameand position so they usually knew who needed it andif that person was out with someoone else covering they could still pick it up andmove it along without the beauraucracy.
    Once signed off, the files would be opened by script to get the metadata parameters from the drawing format which was then put into the proper place.

    We also ran into a problem where even a file that was opened, looked at and closed, the file contents changed because metadata was automatically saved to show that the file was opened. We worked with the software CO. to stop thatbehavior. We then went to a Posix Checksum program to test if the checksum of the file on the system matched the checksum of the local file, no match means a change was made.

    I'm way over doing this, but I guess I'm saying get discipline by finding out where discipline already exists (there must be some somewhere) and hooking in to that discipline by automating in software what is stored and where. Then start fixing what you have. Otherwise you're herding cats forever. ALso it starts to take care of itself because the hot stuff is getting used and revised so order starts to come to old files because of the new discipline enforced in software.

  163. index them - other options by Anonymous Coward · · Score: 0

    ok, so everyone else said "google appliance" for a reason, but here's some solutions that do similar:

    Use Lucene to index your data ( by The Apache Foundation, so you know it's good ) - http://lucene.apache.org/java/docs/

    Use Droids ( to crawl your existing data ) - http://incubator.apache.org/droids/
    Use Tika ( to make your existing document formats into an index-able format - excel, word, powerpoint, gzip, bzip, zip, tar, mp3, xml, html, class, jar, odf, plain text, pdf, rtf - all supported by default. ) - http://lucene.apache.org/tika/
    use Solr ( high performance search server built using Lucene Java, with XML/HTTP and JSON/Python/Ruby APIs, hit highlighting, faceted search, caching, replication, and a web admin interface. ) - http://lucene.apache.org/solr/

    optionally you might also find Forrest useful. - "Apache Forrestâ software is a publishing framework that transforms input from various sources into a unified presentation in one or more output formats. " ( http://forrest.apache.org/ ) - it's designed to work with Solr and Droids. :-)

  164. Salesforce Content is another option by 0xbeefcake · · Score: 1

    One of your options is to use Salesforce Content, which is a very usable content & collaboration piece from salesforce.com. It's fully wired in to the rest of the force.com platform and CRM apps suite too, so if you're looking to build out more of your company's apps in the cloud, it's worth taking a look at it. http://www.salesforce.com/crm/marketing-automation/document-content-management/

  165. You're better off doing it yourself by stoicio · · Score: 1

    After looking at backup systems and maintaining libraries of data
    our company found that we needed something that fit our needs.
    We designed a system that worked and knuckled down to programming it.
    We now have a search-able database of documents and files with attributes
    as well as context from content for over 20 years of data and documents.
    We can pretty much find any file in less than 5 minutes.
    We could still make it better but we sure couldn't have done anything like
    it C.O.T.S., Google included.

    If Google failed tomorrow, where would your documents be then?

  166. Scripting by Anonymous Coward · · Score: 0

    If you're using some kind of *nix machine to host smb you could script a perl/bash script to find all files in/under a directory, pars the file name, make it all lower case, turn spaces to _, take whole words and add them to the meta data (for Mac OS X Spotlight or any indexing), and use an index server of some kind...

  167. naming convention, DAM, archive by capsteve · · Score: 1

    establish a naming convention. come up with a few simple rules regarding:
    file names
    directory names
    customer names
    job/project names
    department names
    limit the number of total allowable characters in a file name, and publish and distribute your rules in an easy to follow cheatsheet. for example:
    all files for client "Smith Inc." reside in a directory named "SI"
    all files for Smith Inc for project "Widget X" reside in a subdirectory names "WX"
    all files for Smith Inc for project Widget X have a unique number generated by you accounting system
    all files generated by the sales department need to have "S" after the project number
    enforce using file name extensions for all file types
    so a powerpoint deck created by the sales department for a sales pitch to smith inc for Project X with an internal job number of 1234 would be named "SIWX1234_S.ppt".
    a well structured naming convention with simple but rigid rules will allow users to navigate a file system to find files and identify wrongly filed assets.


    invest in a digital asset management system that with a database backend.
    there are many DAM systems available both commercially and opensource.
    utilize one that has a web front end, so you can enforce consistancy in end user experience(as opposed to a fat client embed metadata into the files themselves in XML format thru the DAM if possible.


    based on the naming convention you've established and the DAM system you've deployed, you should be able to track when a file was created, modified, and last accessed. establish rules regarding when a file moves from disk to tape, and from online tape(in jukebox) to offline tape(out of jukebox), to cold storage(offsite).

    --
    three can keep a secret, if two are dead - benjamin franklin
  168. Bring out the Pitchforks and Rope by moxitek · · Score: 1

    I know that I'll probably get verbally lynched for saying this here, but MOSS 2007 enterpise search is a REALLY nice way of dealing with this . Since MOSS can index your file shares, then all of your users can search for documents contextually using a simple web portal across multiple sites... I better leave before I'm hanging from the Slashdot tree.

  169. Use a Document Management System by Anonymous Coward · · Score: 0

    http://en.wikipedia.org/wiki/Document_management_system

    A DMS sounds like it is what you could use in this case.

  170. Simple is Best by Diagoras+of+Melos · · Score: 1

    I decommissioned a document management system at my client, a smallish law firm, because the system was too complicated, insecure, and expensive. Updating it to run w/ the latest version of MS-Office would have cost thousand$ just for the s/w. We replaced it with Google Search, and we defined a file hierarchy and naming convention for all documents created after the switchover. Client is very happy, their file access is more efficient, and they saved a bundle of money on administration, not to mention all the h/w and s/w they never bought.

    Obviously documents are the lifeblood of any law firm. These guys only have about 100,000 or so, less than the aerospace company in question, but the lesson applies. It's extremely unlikely the IT admin of the aerospace company has the resources to manage, much less install, a proprietary document management system.

    The ONLY reason to have a formal document management system with a database (like Microsoft SQL *ugh*) is to control access. But access control is something that really, really should be done through the directory. So unless you're NASA or another organization with many, many millions of documents and a legally mandated auditing requirement, there's no reason to make this more complicated than necessary. And even then....

    Of course, if we're talking about images with no searchable text, that's another story.

    --
    -- "The only thing that is ever new in the world is the history you do not know." -- Harry Truman
  171. Contract a librarian by Anonymous Coward · · Score: 1, Interesting

    I don't at all mean to be pat or facetious with such a short answer. But, seriously, you're asking the wrong crowd. Librarians have masters degrees in answering just the question you're asking and it goes far beyond just books. A couple of dozen hours of consulting contract with a good librarian can set you straight - whether you keep the samba store or you pony up for document management software. Because if you have a strategy for organizing your information and execute on it you will reap benefits that don't show up on any productivity spreadsheet. And a good librarian will tailor the system to how the people in your organization actually use the information. Get an internship program going with a library school to have someone remotely do the cleaning and maintenance every once in a while. Whole thing should be doable for a few grand.

  172. multiple points by mr100percent · · Score: 1

    You need to deal with this issue on multiple points

    1. Consider PDF with OCR. That way you can search within files for specific words
    2. FIle naming. Use a standard like date_headline.pdf
    3. Hire a library sciences major, as an earlier poster suggested. They spend years studying how to organize and retrieve.

  173. 8 digit index number by Anonymous Coward · · Score: 0

    Easy, just rename all the files with a 8 digit index number and provide an excel spreadsheet with the index number and a description of the file!

  174. Airbus... by Anonymous Coward · · Score: 0

    Airbus? Is that you?

  175. Use Permissions: User & Group: Company Structu by blavallee · · Score: 1

    Outside of shoring up your connectivity to the remote site, you should use the structure of your company to your advantage.

    It sounds like the wild west. You gave everyone full RW access to the fileserver.

    Build a file structure the mirrors the organization of the company and apply permissions appropriately.
    Map drives in the same fashion. An added advantage to this, you can split the files across separate Samba servers later with a minor map change.

    The finance department has no reason digging around in your design documents.
    The engineers don't have any reason to poke around in your sales collateral.
    Does everyone in the company need to be tempted to open "DOD_GPS_NOYB_47-090611.xls"

    Getting every employee to adhere to a single naming convention is like herding cats. Delegate responsibility to the directors and managers to keep their areas on the server organized to their own needs. Then you just need to deal with the occasional outlaw.

    You may also want to deploy Samba servers to the local offices and back them up to a central server regularly. Use this for personal shares and anything that is primarily used ONLY in the local office.

    In most cases, I doubt that "the single person" working on Project X at Remote Site A needs to work off of a centralized copy of their document. Do you really need to share this document across your entire organization? Let the employee keep their file on the local offices share. Let a employee or a manager share it with the entire department. Let the director share it with sales.

    In the end, you may find one small part of the organization that REALLY needs a naming or numbering convention. You can address that when they approach you. For now, you need to stop everyone from treating the company share like their own desktop.

  176. There's technology for that by sribe · · Score: 1

    It's called a "database". You might want to look into it.

  177. Checkout Isys by Odyssey by LBook3 · · Score: 1

    As a PC user, I have found one of the best products to manage hundreds of thousands of documents (*.doc, *.txt, *.wpd, *.xls, *.ppt, and email, images, etc.) is Isys by Odyssey. It requires very little work on the part of the endusers. Just searching. For the IT person, it requires very little to be up-and-running. You can set up automatic indexing to run anytime, without restricting usage and searching. This can be done across all hard drives. I found this little company (and their software) about 15 years ago when I was still using DOS. They have, of course, developed their software to match all the Windows versions that have come out, and have Web versions also. I manage a huge library of both physical and digital documents - all that must be located within seconds. Without this software, I would not be able to perform this job in the high-level capacity that I currently do. Yes, Google is a great contender, but it has its limitations. Google desktop, for example, does not index all different types of software that the hundreds of users may have/use/need. I have found the Isys by Odyssey to not only be extremely fast, high quality, but they have great customer service, and their prices are reasonable. You can always start slow - with a low number of licenses, and work your way up, depending on the company's finances and needs. We have 2 licenses, where I work. I currently am the main end-user to the product, and people request documents or information from me, which I can find and email to them in an instant. It's worth the time to check them out. Their home web page is: http://www.isys-search.com/

  178. Use a content management system: e.g. IBM/FileNet by peterofoz · · Score: 1

    The content engines like IBM/FileNet are set up to manage millions of documents. Many also have the ability to add remote cache servers to improve local performance for repeat document access in satellite offices. Contact Dave at Softech-assoc.com if you need help.

  179. ECM by Anonymous Coward · · Score: 0

    KnowledgeTree or Alfresco. Open source and no charge.

  180. Two words by jevring · · Score: 1

    Search engine

    --
    Move sig!
  181. document management and remote office recall by Anonymous Coward · · Score: 0

    There are typically 2 approaches to this. One option for the files is to pull them into a document management tool like enterprise vault or documentum to name a couple. Those applications will help classify content and reign in administrative controls. As for enhancing the speed to remote offices there are two options there. One option is something like Microsoft's DFS or the andrew file system. These file systems spread data files to where they are needed. As well some of the storage array vendors have caching appliances or capabilities in their gear. In that case you'd have a smaller remote storage array that acts as a read-through cache to the central storage array where files are managed. But for CIFS traffic that gets pretty complicated because, at least SMB1.x is a persistent connection. Option B for remote user performance enhancement might be to look at some packet level de-duplication technology like Cisco WAAS or Riverbed. The WAAS device is really cool because it has a disk cache in it that holds back often called for information. (thus acting as a quasi file cache) The nice part about these things is you don't have to back them up or worry about managing the content.

  182. A document management system perhaps by Anonymous Coward · · Score: 0

    How about getting a real document management system, with version control, unique document numbers, and structured metadata?

    Kronodoc [www.kronodoc.com], Documentum, or something along those lines

  183. Suggestion - A proper Content Management System by NacMacFeegle · · Score: 2, Informative

    Some of the suggestions above says that you should just chuck everything haphazardly into a big pile and then use search engines to trawl the whole mess. I don't buy that. Instead, (like some others) I'd suggest a proper content management system such as the ones from http://www.alfresco.com/, http://www.interwoven.com/ or http://www.hummingbird.com/.

    The reason for this suggestion is that I know that these systems are being used by organisations which handle, as OP said, hundreds of thousands of documents and which have satellite offices (e.g. large multinational lawfirms). They provide several benefits such as the possibility to structure projects, have both project related documents and e-mails saved and indexed in the project folders, allows for searching and proper document version chains (meaning that you can revert to older versions of documents if some klutz breaks a newer version).

    Of course, this means quite an investment, a learning curve for everyone at your company and, most likely, the hiring of an individual with experience of the chosen system.

  184. Users and Spotlight Server by namgge · · Score: 1

    Firstly, you can absolutely forget about any system that requires users to name documents in a way that is descriptive, consistent, unique or anything else that a sane person would do.

    Secondly, MacOS X Spotlight Server (as of version 10.5.7) doesn't work as one would expect/hope. Users' files stored on the server get indexed by the server but this index can only be read by users logged in to the server console (or via ssh), not clients that access the files my mounting them as shared volumes. If a client wishes to search the files, it must build its own index over the network. The workload on the server/network can cause severe performance issues until the clients have built their indexes, a process that will take hours and may take days to complete if you have a lot of files.

    Namgge

  185. Gina2 - web service by steve.decaux · · Score: 1

    Dark Green have just this week gone live with Gina2, a web solution for document archives.

    Have a look at http://www.gina2.net/ - the text is currently in German, but the English translation will be up there in the next couple of weeks.

    Dark Green are offering Gina2 as a hosted service for companies whose core business is not managing IT infrastructure.

  186. Two products by brentc3114 · · Score: 1

    I work for a company that stores terabytes of documents. There are two products that do this well EMC's documentum and Microsoft Sharepoint. Pick your poison depending on whom you want to abuse you.

  187. 200,000 Resumes by Gob+Gob · · Score: 1

    I've written a recruitment app that has 200k resumes and other types of folder indexed in text.

    The files live on the disk in /TYPE/YEAR/MONTH and are converted to text and inserted into MySQL database.

    They can be searched on name, date record id, free text, type, etc, etc; or just browsed to on disk.

    The front end is PHP on MySQL.

    These were imported from a files on disk approach.

    It can scale with master slave replication, etc. Just keeping it simple helps.

  188. Google search appliance by Nefarious+Wheel · · Score: 1

    Go to Google main page and look for business solutions. They have a scheme where they'll charge you x dollars to index y hundred thousand documents, and they throw in the tinware (a custom pre-configured rack of search hardware, very scaleable) for you to plug into your LAN. All strictly inside your firewall. Set it up to crawl all your file shares and it won't matter whether you have a document management system or not. Most document management systems depend on keywords, taxonomies and special file name codes, all of which are decidedly old-hat. Index it and let 'em go search. The smallest version is kind of basic, but go up one level and they'll crawl pdf's, word docs, pretty much anything with text in it compressed or in source libraries or whatnot. They're pretty good. Not cheap, but then you're an aerospace firm...

    --
    Do not mock my vision of impractical footwear
  189. Google? No. CMS! by Elixon · · Score: 1

    Google is just a search engine. They need document management. :-) Correct me if it is not the thing called content management they need?

    Import it into some CMS, sort it and make it available through the website secured by the password. We did something like this for http://www.olympus-ims.com/ (but these are public documents) and it really contains thousands of documents (in dozen languages) together with all the document revisions it is over the hundred of thousands of documents. Easy to search, easy to navigate, easy to manage.

    Simply: CMS is what you need. Do research.

    --
    Well, I've got to get back to work. When I stop rowing, the slave ship just goes in circles.
    1. Re:Google? No. CMS! by kdekorte · · Score: 1

      Correct!

      Tools that should fit the need include FileNet from IBM, and Documentum from EDMS. I'm sure those are others, but I'm familiar with both of them.

      I've never really seen a good open source tool that does this.

      Document Management tools allow organizing, searching, tagging, access control and filesystem or web based access. And 100,000 documents is nothing for one of those systems.

    2. Re:Google? No. CMS! by Lord+Apathy · · Score: 1

      Wrong! Leave Documentum out of that list. Documentum is a piece of shit. You would be far better off piling all your documents in one directory and searching them using grep. Or even better print them all out and tossing them around the office.

      I don't have a real solution for your question, I'm looking into this myself. But I know Documentum is not what you are looking for. Using Documentum is like using a CA product for, well anything.

      --

      Supporting World Peace Through Nuclear Pacification

  190. Convert your docs to MediaWiki by Anonymous Coward · · Score: 0

    *not knowing what format your docs are written in*
    - Write a script(s) (Pyhton, Perl, ... whatever) which goes through docs, converts them and uploads them to a MediaWiki installation (http://www.mediawiki.org/wiki/API).
    - Categorize your docs based upon the directory names.
    - Learn your people how to write their documents in MediaWiki syntax.
    - Everything is web based, which makes less overhead on the network for remote offices, simplifies management.
    - MediaWiki is a controlled document system, with detailed history.
    - MediaWiki is FREE and has WikiPedia (and more) as a reference.
    - Check http://www.smetj.net/wiki/wikiinject for some ideas ...
    - http://openwetware.org/wiki/Converting_documents_to_mediawiki_markup

  191. Check with the NTTC by JSC · · Score: 1

    Several years ago I worked for a NASA project called the National Technology Transfer Center. A big part of the job there is organizing and searching through tens of thousands of pages of research documents. They used a document oriented database at the time although they may have migrated to something else since then. You might want to contact them for advice.

    A friend of mine was the person primarily responsible for scanning in the documents. IIRC, the process involved OCR of the scans for key word search and indexing and then storing a compressible graphic image of the page - this got them around the problem of text databases not storing technical drawings, etc.

    --
    Time's fun when you're having flies. - Kermit the Frog
  192. DMXchange/DMVault by Anonymous Coward · · Score: 0

    Since you mention you are an Aerospace company - are you managing CAD documents? Which can be a head ache because of all the dependencies - Assembly, Parts, Drawings. CAD data can also have some very funky naming conventions - especially the older systems like CATIA V4 and CADDS 5.

    We have developed a distributed doc mgmt/vaulting system based on Open Source technologies (Apache, MySQL, Perl, etc...) called DMXchange - that we currently market and sell as a product with services. All of the source code is included and open.

    Given that you are looking to access the documents from many sites which are connected over a WAN - most of the client/server based approaches will not work very well. For more info see www.dmforge.com

  193. Asking for pain by jandersen · · Score: 1

    Samba shared over a VPN? Man, you are asking for no end of painful trouble. There are many good ways of sharing docs, but putting MS docs in a filesystem shared over a VPN is not one of them. A simple way to improve things would be to drop all the filesystem sharing and create some sort of searchable index on a web server. If you want more sophistication and have money to burn (who hasn't these days?), go and talk to Oracle, they have some very good software for this very purpose.

    I don't know why companies always do it this way - it is the worst possible way of organizing your documents. When you put them in a filesystem, people have to try to remember how to find the one they need; a directory is like a hiearchical database, badly implemented. Sharing it via a networked filesystem makes it even worse, because now you have a huge network overhead and the risk of undetectable corruption when the network stumbles. And the VPN means that your network traffic is something like 10 times as heavy because of the encryption.

  194. I know you are going to laugh by hesaigo999ca · · Score: 1

    The latest installment of Visual Source Safe is pretty good, they improved the performance over the network which used to kill on a domain spread across multiple cities (back during vb6 days), but now is really good repository tool. I also used another , but it lacked the history/detail section and could only keep a max number of files....seeing as you have hundreds of thousands

  195. Alfresco seems right by Anonymous Coward · · Score: 0

    for what you need I suggest Alfresco. It has indexing, and publishing options (CIFS, WebDAV, etc). And it is extensible.

    We are implementing it right now in our company.

  196. Smeadsoft by aapold · · Score: 1

    Smeadsoft might work for you. -- note: I don't work for them or any affiliate of theirs, and have no vested interest in them being used --

    I'm in the process of setting up one of their systems for document management, it seems to be quite capable of that. Its not open source and it would involve some cash to set it up, but I think it worth looking into if those two things don't eliminate it from consideration. (they also handle management of physical files, which is where they came from)... Thus far set up involves setting up a lot of framework and tags for the actual documents, and scanning a lot of physical files to be stored. There is this system of using large scanners with something called VRS, and putting barcode identifier sheets with stacks of documents.

    So for example you could have a large stack of papers, of which half belong to one category (or subcat or subsubetc), the others to a second. You put barcode sheet (a blank paper save for one barcode) for the first category, then all those papers, then a barcode sheet for the next category, and so on. You load them into the scanner (obviously a high capacity one) and it reads them all and puts the scanned documents into the proper location in the database automatically.

    --
    "Waste not one watt!" - CZ
  197. ONBASE by Anonymous Coward · · Score: 0

    Onbase from Hyland

  198. Worldox by Anonymous Coward · · Score: 0

    I would go with Worldox. It allows remote branches to search documents across a WAN and provides security too. Does not use a SQL datgabase (i.e. no expensive licensing).

  199. Content Addressable Storage by hicksw · · Score: 1

    Don't try to use the file name or directory structure. This is difficult to adapt or relocate as the namespace becomes distorted from its original content over time.

    Try this instead:

    Assign arbitrary file names.
    Adopt a directory structure derivable from those names, if you must.
    Build a database of several tables to link keywords, project names, authors, etc, to the arbitrary file names.
    Award small prizes for verified corrections to the database.

    See http://en.wikipedia.org/wiki/Content-addressable_storage for more information.

  200. From someone who has been down this road by CompMD · · Score: 1

    Teamcenter. It freaking rules. Also, as evil as StarTeam is, it will do the job for you as well.

    I have been a user/admin of both Teamcenter and StarTeam.

  201. Hire a librarian! by Anonymous Coward · · Score: 0

    Hire a librarian. Seriously. Get someone in there with a degree in library science, and let them do their thing.

    Organizing a large collection of related documents would be right up their alley...

  202. Help with documents and files by Anonymous Coward · · Score: 0

    Look at www.Blinkedm.com
    It's easy to use, offers a lot of features and far less expensive than a lot of the products out there.

  203. EMC Documentum by mu51c10rd · · Score: 1

    We use EMC's Documentum suite here to manage our large volumes of documents. Expensive, but works great...and integrates with Fax software, MS Exchange, etc.

  204. CamelCase by dna_(c)(tm)(r) · · Score: 1

    Similar to CamelCase. Limits the number of variations on the same name considerably (no: camelcase, Camelcase, Camel case, Cam El Case,...)

    Reminds me of the command 'passwd' in *nix, I always have to 'apropos password' to find the correct spelling. Why is it not 'password' or 'psswrd'? Arbitrarily dropping 1 vowel and 1 consonant is silly.

  205. Content Management by wuglas · · Score: 1

    What you need is a content management system. Such systems do more than store and find documents. They allow true document taxonomy management, records management for compliance and control, and many other features.
    I personally specialize in IBM Content Manager. It's great for companies like yours where you have distributed offices. You can keep your metadata at one central location but have the documents themselves stored at your remote locations, all while maintaining centralized control.
    Doug Hansknecht
    Certified IT Architect
    DougFromOhio@us.ibm.com

  206. three words... by Anonymous Coward · · Score: 0

    ...lotus notes teamrooms

  207. Universal online document viewer by crisgrey · · Score: 1

    To help you with the challenge of sharing documents with your remote sites, there are universal web-based document viewers on the market that you can use to embed document viewing capability into your intranet or web site. The documents can be of different file formats too, they don't all need to be PDF. Some options use Adobe Flash, so a plug-in needs to be downloaded by the end user, but other options do not. Adeptol and Vuzit are two examples, but if you search for "online document viewer" in Google you'll find a number of options.

  208. DMS vs. Repository by oneiros27 · · Score: 2

    I'm surprised that there were quite a few programs not mentions on the DMS wikipedia page -- People might consider them to be more as repository software than DMS (or RMS), but some other ones to mention that would be useful to managing already existing documents:

    And if you're looking for librarians with an IT background, in the libraries they're called "Systems Librarians". You might also check out the oss4lib and code4lib communities.

    --
    Build it, and they will come^Hplain.
  209. Organization and Procedure by Edrick · · Score: 1

    It seems that the first responses to a request like this is to suggest new technologies and programs to solve the problem. It sounds, though, like 95% of the problem is that there are no procedures and organization in place already so that files have a purpose or place to go. A good file storage policy with the appropriate instructions sent to the users could just as easily make this work going forward. I've seen collections of millions of files that were perfectly fine as they were organized by user, purpose, source, destination, etc...and then subdivided as needed...and users knew what the organization was and how to maintain it (to their own benefit as it means they can find their own stuff). You can also institute a more structured system where organization is already there for them to use, but it's your call. ALWAYS ALWAYS ALWAYS figure out how you want things to be organized first! What are the functions of these files, why are they saved, who created them,, who accesses them? This will make the job of sorting the mess out easier.

  210. You forgot the top heirarchy level by linear+a · · Score: 1

    Species.

  211. WAN Optimization by Anonymous Coward · · Score: 0

    In terms of making the WAN experience less painful, you need to get some WAN optimization appliances in your network:

    http://en.wikipedia.org/wiki/WAN_optimization
    http://www.riverbed.com/

  212. solr by Anonymous Coward · · Score: 0

    I've found solr (http://lucene.apache.org/solr/) super easy to install and very effective.

  213. Solr by Anonymous Coward · · Score: 0

    I can't believe no one has suggested Solr yet! It's probably the most flexible and mature search product available and it's free (part of the Apache project)! Set up a solr server and hire a student or three to meta-tag your documents.

  214. IntraLinks or similar? by cloud0909 · · Score: 1

    Related question, has anyone used, or would recommend using IntraLinks to help manage a similar scenario?

  215. HIRE A REFERENCE LIBRARIAN by Anonymous Coward · · Score: 0

    HIRE A REFERENCE LIBRARIAN! Seriously. You have all sorts of ad-hoc suggestions here, none of which addresses the core issue: You have a metric shitload of written, unorganized data. There is a category of professionals who specialize in organizing, cataloging, abstracting and making writtten data available in easily-usable formats. Reference librarians. They even use IT extensively. Check ala.org for more.

    (Besides, some real-life reference librarians are hawt - just not where you live.)

  216. Knowledge Tree by Anonymous Coward · · Score: 0

    Use Knowledge Tree. Download a free version from http://www.knowledgetree.com/

  217. Then there's the obvious solution by Anonymous Coward · · Score: 0

    Here's a nickle, kid. Get yourself a real computer.

    The simple solution is to put them all into a GOOD filesystem that keeps journaling, metadata, and file indexing services, and simply have the person search the Index for the file they want. Then you don't have to deal with creating hierarchies.

  218. Lotus Notes by dogugotw · · Score: 1

    If you don't want to go the google appliance route, Notes works great, is cheap to set up, and simple to administer.
    One db.
    One form with a couple of fields
    One view
    Render to the web
    Write a simple agent that crawls your directory structure, snags the files and attach each one to a Notes doc. Stuff in the directory/file name if you care.
    Let Notes build an index (and it can index damn near any file).
    Poof - done.
    Remove user's rights to leave crap in file directories and make 'em put new stuff into Notes and you have something that's maintainable without a ton of work.
    If you then want to get fancy, you can make users enter some meta data before they can save new docs.
    You can set up access control, etc, etc, etc.

    Documentum costs about a quarter mil just to get it in the door and a boat load of cash to make it useful. (at least it did in the late '90s).
    Notes server license a couple grand. If you need user authentication, it's around $150/client (ask your rep for prices because IBM is working tons of price schemes). If you don't need authentication, all you need is the server license.

  219. Re:SharePoint? Doesn't scale by slashqwerty · · Score: 1

    Where I work we wrote our own Document Management System that integrates with the rest of our systems. The integration has proven quite beneficial. Off-the-shelf systems can integrate but it generally doesn't work very well. Anyway, we were looking at using SharePoint as our back-end to get the indexing support and improved versioning. What we discovered is that SharePoint just doesn't scale very well. When you get into the hundreds of thousands of documents it has problems. When you get into the tens of millions it has major problems.

    Given that the submitter already needs to file 500,000 documents I question if SharePoint is feasible.

  220. google, really? by Anonymous Coward · · Score: 0

    i'm surprised so many folks here would jump to a goog appliance conclusion here. it's one of many search-only answers.

    part of the problem is text search. the other part is how why and when the docs ended up in these directories in the first place. a file directory alone does not a solution make. you need something that can hold and search your current directory based docs, but get you past this obviously inappropriate way of doing things.

    merge blog (journal what's going on and why one my care about a document) and wiki (stable organization of docs over time) and better-than-google search - and you have a solution. www.tractionsoftware.com has an approach for this. there may be others that satisfy as well.

  221. MOSS 2007 | EMC2 Documentum by Anonymous Coward · · Score: 0

    I think that Microsoft Office SharePoint Server 2007 is what you are looking for.
    (http://sharepoint.microsoft.com/Pages/Default.aspx)

    There is also a more robust application called Documentum by EMC2.
    (http://www.emc.com/products/category/subcategory/collaboration-and-document-management.htm)

    If you need a consultancy service to help you, please contact us: www.iteris.com.br

    Good luck!

  222. Office Evolve by DocumentGuy · · Score: 1

    Consider Office Evolve by Documatics. They've a system that will; organise your directories in projects, provides fully indexed searching of all your documents, caters for document generation from templates, has a complete history of all your documents, integrates with Outlook and manages workflow. It's in use at GE. We love it.

  223. RE: How To Manage 100s of 1000s of Documents by Anonymous Coward · · Score: 0

    OnBase from Hyland Software

  224. Hire a professional librarian. by Half+Balford · · Score: 1

    Not a student. This is not a summer job. Even if someone at your office has nothing else to do, they will not be able to do a better job than a pro.

  225. How To Manage Hundreds of Thousands of Documents? by Anonymous Coward · · Score: 0

    the real answer is ofcourse to start to migrate to a model driven envirnment based on ontology and UML/SysML

    word, excel, ppt etc were all designed to automate paper. in a truly digital word these will go the way of the computer drawn blue print (now replaced with 3-D models). information must go the same route.

    if you look at your customers (lockheed Martin, Nothrop Grumen etc) they are all on some path to move to this paradigm.

    Documents will simply become a proxy representation of a model like a 2 D plot of #D model.

    some day designs will be on source forge in UML, SysML and OWL and other sources. just like code.