Slashdot Mirror


Distributing Unix Knowledge Among Admins?

chadworthman asks: "I work in a server support role with 6 other sys admins. We are all responsible for 10 to 25 servers each (various flavours of Unix), mostly grouped by project. The person who is responsible for a server is called prime. We also identify a sys admin as secondary. This system is not working out well. Most sys admins are only familiar with the environments that they are prime for, and when a prime contact is not in the office or leaves the company, the rest of us try to figure out the environment. We are currently trying to figure out the best way to transfer knowledge of environments between sys admins. We have considered a plan that would involve partnering with another co-worker while you trade knowledge, then after a certain number of months, trade with someone else. I was wondering what other techniques for knowledge transfer between sys admins Slashdotters have encountered."

6 of 56 comments (clear)

  1. standardize by tps12 · · Score: 2, Insightful

    You need to try to adapt each environment towards a single standard that everyone then becomes familiar with. Yes, this will sacrifice some features of each platform, but that is the price you pay for greater scalability and flexibility. This is the kind of thing that made different flavors of Windows so popular with sys admins, and it's high time the Unix world followed suit.

    --

    Karma: Good (despite my invention of the Karma: sig)
  2. Document by photon317 · · Score: 3, Insightful


    Set up a knowledgebase of system information, make it versionable, and perhaps commentable in blogs-style. Make it publish to a departmental web server, and have everyone document the hell out of everything there. Things that go there:

    Invetories of systems and of software
    Licenses and whatnot
    Purchase info
    Common practices docs (disk layout procedures, installation procedures, patching procedures, downtime procedures, etc)...
    etc...etc..

    You get the idea. The company shouldn't be reliant on an employee's brain as part of their business plan - document everythign in such a way that if the whole staff went missing, a new staff of competent unix professionals could take over and do somethign useful based on your web docs.

    --
    11*43+456^2
  3. Look! by itwerx · · Score: 3, Insightful

    It's a bird!
    It's a plane!
    It's a flock of binders!

    But seriously, documentation is key to anything like this. I know most sysadmins wail and moan and gnash their teeth at the very thought, but good documentation is almost as important as good backups!
    YMMV but it might actually be worth picking somebody as the "doc-meister" to learn ALL of the systems and have the other admins submit config changes etc. to this person on an on-going basis.
    This also helps prevent the common admin trick of just printing out tons of scripts to fill the binder and saying "See, it's all documented, right here" - except it doesn't actually help anybody understand anything!
    This way if the documentation lead can't understand it then you know a replacement admin won't either and changes can be made before it's really needed.

  4. they're a team, right? by medcalf · · Score: 2, Insightful

    So make them act like a team. All admins are responsible for all servers. I am assuming that you the group doesn't have a lot of time to document (most groups don't), but there are still practical ways to make it work, with minimal time taken in advance:

    1. common file system layouts (for example, all users in /home, all apps in /apps, all admin-only stuff in /admin (or whatever standards you want to use)
    2. one person (team lead) owns all of the licenses, and keeps them up to date, as well as scheduling non-reactive work
    3. if you're not responsible for the applications on the system, then everyone should be able to handle any machine, since no specialized knowledge is needed
    4. of course, specialized knowledge is still needed, because some systems have quirks. Document the quirks only (not standard routines for the whole team) both on the machine (in /admin/local/README or whatever) and on your team webserver - if you don't have one, get one
    5. keep a change log for each machine, in /admin/local and on the webserver, that describes any changes that aren't in someone's home directory and which survive a reboot - who did them, when and why
    6. make sure than standing orders (that is to say, procedures to always be followed, like how to notify clients of an outage) are posted on each machine and on the website
    7. use a common root password, known by the team lead and his manager. everyone else uses sudo su - to get to root, or some similar means. give them the root password if they need it (reinstall system, for example), then change it the next day. ideally, set up a system so that each admin saves to a different history file, so that you can tell who did what if you need to (tracking down mysterious file disappearances and such) - this isn't a tool for discipline, it's a tool for troubleshooting

    That should solve most of the problems.

    --
    -- Two men say they're Jesus. One of them must be wrong. - Dire Straits
    1. Re:they're a team, right? by Meleschi · · Score: 2, Insightful

      Come on now..

      In all honesty, that is a good idea. Common usernames/passwords and root/passwords on an similarly configured machines eases administration, and makes it easier to memorize what the password is.

      I admin over 30 machines. Do you think I'm going to remember 60 different passwords? (one for the user, one for root, because we all know ssh/telnet should beallowed to login directly as root, right?) Hell no. Even if I used my palm, it would still be cumbersome. Instead, each class of server has it's own username/password structure... the linux boxes are of one type, and the "other" boxes are different.. This leaves me with around 10 passwords total to memorize, with one or two of the less used ones in a pgp locked spreadsheet on my laptop.

      Memorizing 60 distinctly different login/passwords is almost impossible.

      Other things other posters have said also have merit. Have the same directory structure on as many servers as possible (not possible when you compoare windows to unix for example). Have the same set of tools available for troubleshooting on the different platforms also.... GNU tools compile on almost any unix flavor. Use that to your advantage! There's no reason to remember the different key sequences to the various unix versions of "df" for example, when you can install the GNU version, and have the same command with the same switches do the same things on all of your unix servers. But please, leave your OS version there, just in case. =)

      In addition, you need someone hell bent on security. Have a couple of people install and setup Nessus and scan your network once a month or so, AND FIX THE HOLES. Every network/system has holes. Different servers will have different holes to patch. Only by actively looking for them will you find them. If a server cannot be patched for whatever reason, isolate it on the network with separate password/logins from the "secure" servers, and ACL/ACI's implemented to prevent that server from being able to access other servers it doesn't absolutely need access to.

      Eternal Vigelence. It is difficult to get to this point in a large network with 20+ specialized servers. But with a team that large, you should be able to do it...

      --
      Meep Meep!
  5. Re:Rotate your primes by metacosm · · Score: 3, Insightful

    The solution presented in the parent post is the correct one. Documentation is fine and dandy, but it doesn't come close to experience. Great thing about rotating primes is that the "true blue expert" is still around, but he is off learning something new too. Everyone puts their ego on hold and pitches in to help. This will generate well rounded techs that can handle a broader array of issues, as a group and independantly.