Distributing Unix Knowledge Among Admins?
chadworthman asks: "I work in a server support role with 6 other sys admins. We are all responsible for 10 to 25 servers each (various flavours of Unix), mostly grouped by project. The person who is responsible for a server is called prime. We also identify a sys admin as secondary. This system is not working out well. Most sys admins are only familiar with the environments that they are prime for, and when a prime contact is not in the office or leaves the company, the rest of us try to figure out the environment. We are currently trying to figure out the best way to transfer knowledge of environments between sys admins. We have considered a plan that would involve partnering with another co-worker while you trade knowledge, then after a certain number of months, trade with someone else. I was wondering what other techniques for knowledge transfer between sys admins Slashdotters have encountered."
You need to try to adapt each environment towards a single standard that everyone then becomes familiar with. Yes, this will sacrifice some features of each platform, but that is the price you pay for greater scalability and flexibility. This is the kind of thing that made different flavors of Windows so popular with sys admins, and it's high time the Unix world followed suit.
Karma: Good (despite my invention of the Karma: sig)
Documentation. It's what you need. Some standardization would probably help too.
Set up a knowledgebase of system information, make it versionable, and perhaps commentable in blogs-style. Make it publish to a departmental web server, and have everyone document the hell out of everything there. Things that go there:
Invetories of systems and of software
Licenses and whatnot
Purchase info
Common practices docs (disk layout procedures, installation procedures, patching procedures, downtime procedures, etc)...
etc...etc..
You get the idea. The company shouldn't be reliant on an employee's brain as part of their business plan - document everythign in such a way that if the whole staff went missing, a new staff of competent unix professionals could take over and do somethign useful based on your web docs.
11*43+456^2
Every two months, the primes all rotate. After a year, you will all be experts on all systems.
And the men who hold high places must be the ones who start
To mold a new reality... closer to the heart
It's a bird!
It's a plane!
It's a flock of binders!
But seriously, documentation is key to anything like this. I know most sysadmins wail and moan and gnash their teeth at the very thought, but good documentation is almost as important as good backups!
YMMV but it might actually be worth picking somebody as the "doc-meister" to learn ALL of the systems and have the other admins submit config changes etc. to this person on an on-going basis.
This also helps prevent the common admin trick of just printing out tons of scripts to fill the binder and saying "See, it's all documented, right here" - except it doesn't actually help anybody understand anything!
This way if the documentation lead can't understand it then you know a replacement admin won't either and changes can be made before it's really needed.
So make them act like a team. All admins are responsible for all servers. I am assuming that you the group doesn't have a lot of time to document (most groups don't), but there are still practical ways to make it work, with minimal time taken in advance:
/home, all apps in /apps, all admin-only stuff in /admin (or whatever standards you want to use) /admin/local/README or whatever) and on your team webserver - if you don't have one, get one /admin/local and on the webserver, that describes any changes that aren't in someone's home directory and which survive a reboot - who did them, when and why
1. common file system layouts (for example, all users in
2. one person (team lead) owns all of the licenses, and keeps them up to date, as well as scheduling non-reactive work
3. if you're not responsible for the applications on the system, then everyone should be able to handle any machine, since no specialized knowledge is needed
4. of course, specialized knowledge is still needed, because some systems have quirks. Document the quirks only (not standard routines for the whole team) both on the machine (in
5. keep a change log for each machine, in
6. make sure than standing orders (that is to say, procedures to always be followed, like how to notify clients of an outage) are posted on each machine and on the website
7. use a common root password, known by the team lead and his manager. everyone else uses sudo su - to get to root, or some similar means. give them the root password if they need it (reinstall system, for example), then change it the next day. ideally, set up a system so that each admin saves to a different history file, so that you can tell who did what if you need to (tracking down mysterious file disappearances and such) - this isn't a tool for discipline, it's a tool for troubleshooting
That should solve most of the problems.
-- Two men say they're Jesus. One of them must be wrong. - Dire Straits
1. Document what you've got. Make the doumentation standard.
2. Move the 'primes' around every couple of months so you all get exposure.
3. Common install base. make sure you can automagically install from scratch the O/S's and applications (ge jumpstart on SOlaris, HPUX and AIX have their variants). If at any stage you need to type anything you've failed.
4. Read, digest and implement "The Practice of System and Network Administration" by Limoncelli and Hogan ISBN: 0201702711. This is a great book for any admin and for me is the K&R of its subject.
(various flavours of Unix)
That's where your problem lies. Pick one, and get everyone comfortable with that one.
Tarsnap: Online backups for the truly paranoid
Why not a Vulcan mind meld?
Should effectively transfer all the needed knowledge, and a little of each persons personality, but that might not be such a bad thing.
I know Solaris/HP-UX/Linux/*BSD and Win2K and Some Cisco. AND I happen to want a new job...
Hire me please........
Not to rain on your job-seeking parade but I know all of the above plus Netware 2.x through 6.x, NT all the way back to 3.50, AT&T Sys-V (not that anybody cares any more) and DG's AOS-VS and it STILL took me a year to find a new job. :(
:)
Admittedly I have been pretty picky, I probably could have gotten a job in a month or so if I just took the first thing that came along, but who wants to do tier-one tech support?!?
Thank god I've never been laid off...
(Well, okay, I'll admit it, I was once - but I was only 12 years old.
In the company I work for I ported all the scripts to Linux and Sun from HP. Thus we have 'sameness'. you want to build a database. It does not matter what platform you are on the command is the same. You want to create a new dev env. It does not matter what platform you are on the script is the same on ALL platforms.
By creating a level of 'sameness' across all your platforms it will not matter weather the server is Sun, HP, Linux, BSD, whatever the scripts will all be the same. Since you are talking about being an admin I'd suggest all scripts in perl or sh. The problem you may run into with perl is that perl rarely gets installed in the same place on all platforms. Thus the start of a script with /usr/bin/perl may not work, where /bin/sh will. Yes and there are coding ways around the perl issue as well.
Granted you will have different machine that do different jobs, this is where documentation comes in. Make sure that all your stuff is documented. If someone sets up a server they need to be required to describe how this server was set up. Using the principle of sameness this cuts down on the need for lots of docs, and thus anyone can set up the server.
Shells.. standardize on a shell. Standard login shell. More importantly is the standardization of what shell people use. I go with tcsh, as I like it better than ksh csh and sh, and it is available everywhere (Sun, HP, BSD, Linux, etc). It is also feature rich. You can standardize on any shell, but make sure it is everywhere you need it to be.
Once you have standardized on a shell, use a standard login env. Thus when you login to your BSD box it feels like your Sun box, which feels like your Linux box, etc.
If people want to add to this have a process in place to make it happen.
Except for system tools like Sam on HP, and Redhats sysadmin tools, there is no reason that many other tasks cannot be done in scripts that are standard. You can even standardize on what a database server setup should include, what a web server setup should include, and have standards that are the same or different (I prefer sameness unless performsance is an issue) for each flavor of UNIX.
Only 'flamers' flame!
The job market sucks BIGTIME right now. I am at a dot.com that may soon turn into a dot.bomb.
Hopefully I will find something before that happens. I have been looking for hard for about 3 months now.
I could have had 3 or 4 positions for security IF I wanted contract work.
Any good suggestions where to look? The job boards are pretty much worthless. I appreciate any suggestions.
I have found this to be very effective.
Healthcare article at Kuro5hin
As mentioned by other posters, documentation and job rotation is good.
To make sure the documentation is up-to-date, accessible etc. you can make the support calls from the secondary admins to the "prime" non-paid. After a few support calls in the night, on your holiday etc. you will make sure that the documentation is up-to-date, accessible, people know where it is, etc.
Yes, I am evil :-)
Your company needs to do two things.
First, fire the manager who got you into this situation. If you've "been doing it this way for years," fire the manager who left this system in place. (If that manager just left and the new guy realizes there's a problem and that's why you're asking this, then obviously there's no action taken against him.)
I'm not being bloodthirsty here - anything short of this will leave people doubtful that upper management is serious that this is *the* biggest problem your company faces today, and people will continue to do what they've been doing for anything but the most trivial problems. Senior management needs to send an unambiguous signal that the status quo is unacceptable.
Second, rotate the primes and secondaries as others have suggested, but with a twist. Rotate the secondaries first, and their sole responsibility is to write a list of questions - a long list of questions - about everything that "surprises" them or that needs to be documented somewhere. (An example of the latter is "what are the partitions, what are their sizes, and how was this size determined?"
They turn over their questions to the primes who spend a few weeks documenting the answers while the secondaries cover for their old prime, and this documentation is provided to the next set of secondaries rotated in to ask questions. Lather, rinse, repeat.
By the third time around (maybe 3 months?) you'll have documentation that actually covers almost everything someone will need to get up to speed on the peculiarities of a particular project, and the primaries can start rotating while the secondaries answer any remaining questions.
Finally, I'm deliberately putting the emphasis on the secondaries here because one of the classic problems with your old setup is that it can cause the secondaries's skills to stagnate if the prime handles all of the "hard" or "interesting" problems. You need to give the secondaries room to grow, even if it increases your turnover rate because they're competent enough to be hired as primes at other companies.
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
Considering that 1/2 the readers here are probably admins themselves...
Damn... coffee just sprayed out of my nose, and all over my keyboard. I had no idea the +0 comments were so damned funny.
Slashdot is jumping the shark. I'm just driving the boat.
Often, when writing documentation we leave out things because we take them for granted or "well, it's just like that." Other people without that experience don't know that, and the gaps become apparent QUICK.
Documentation AND experience. One can't take up the slack for the other being weak.