System Admin's Unit of Production?
RailGunSally writes "I am a (strictly technical) member of a large *nix systems admin team at a Fortune 150. Our new IT Management Overlord is a hardcore bean-counter from hell. We in the trenches have been tasked with providing 'metrics' on absolutely everything from system utilization to paper clip recycling. Of course, measuring productivity is right up there at the top of the list. We're stumped as to a definition of the basic unit of productivity for a *nix admin. There is a school of thought in our group that holds that if the PHBs are simple enough to want to operate purely from pie charts and spreadsheets, then we should just graph some output from /dev/random and have done with it. I personally love the idea, but I feel the need for due diligence, so I put the question to the Slashdot community: How does one reasonably quantify admin productivity?"
of Jolt Cola consumed.
They say the first thing to go is your penis. Well, it's either that or your brain. I forget which...
How many tickets answered per day? Completed per day? /dev/random is probably the most elegant though
The best sys admins are the ones you never notice. If the productive workers in a company never see or need to talk to a sys admin it's been a productive day for the admins.
"How does one reasonably quantify admin productivity?""
If no one in the building but HR and your line report need to know your name, you're doing your job...
Other than that, it would be like a trash collector counting how many cans he emptied during the day or a wildfire firefighter how many burning bushes he chopped. If there weren't any fires or trash these people wouldn't be needed, would they?
You can't quantify SA productivity.
It's easy to quantify /my/ productivity as a support tech (at the U of CA) in number of tickets resolved per shift. But sysadmins have a number of duties which they are performing /continuously/, so how can you quantify that?
.:Semper Absurda:.
Since the real proof of actual productivity for network admins is negative: nothing goes wrong (no trouble tickets). Also, the PHB will get their wish: No one to pay is infinite productivity (measured as output per $ spent).
Unit of Productivity = 1 / (hours of down time)
They are paying you to keep bad things from happening.
Do uptime. Unless your team has serious problems, those numbers should always look good. If you do any sort of work in response to in-house or outside tech support requests, you can measure how long it takes to resolve issues.
You aren't building automobiles or painting teapots. You are a support function and not a line function.
You should have business plan objectives. These things are usually annual; there can be longer strategic objectives. If the person who set these things did it right, they should be measurable.
What I'm trying to say is, if you're banging your head against the wall trying to figure out how your performance should be measured, your higher up didn't set your objectives correctly.
This doesn't apply anywhere and everywhere. When the organization is in the business of IT itself, you might be measured differently since you'd then be contributing directly to the organization's core business. But from the description provided, it sounds like you're not.
The Banjo Players Must Die!
Systems Administration falls into several categories.
:)
Projects, Service requests, Patching, and user satisfaction are a few.
Once you have an idea of what you do, define some SLAs with your customers and the metrics are easy from there.
Now compare your defined SLAs to the following.
Metrics:
Time to ticket close?
Were the requesters satisfied?
Projects completed in the expected time?
Resource allocation is at what percentage?
Don't forget to measure your ongoing education and professional development. How much should you get, are you getting it?
Patch schedule being met?
Availability metrics.
Resource loads on the systems are easy and provide management nice graphs, plus they can be automated.
My systems roll all this information up and e-mail it for me.
While none of this is really important to us, the management teams operate almost entirely on this data. Take this as an opportunity. In some shops I've worked, management defines the metrics and they mostly are irrelevant. In your case it seems you have the rope to hang yourself so take care to present the data that is important and will help you meet your goals. As always, a good admin will automate the task but not tell anyone.
--russ
He is a *nix sysadmin... there are no regular patch THURSDAYS *OR* TUESDAYS!
What you need to do is contact some other F150 companies and ask their senior IT admins/CTOs how they measure productivity. I work for a major investment firm and we have metrics for everything we do (even though we're private) because of two primary reasons:
1. its how you improve, and
2. its what our competitors do too.
Its that simple.
pi=sigma{n:0-infinity}[(1/16)^n][(4/(8n+1))-(2/(8n +4))-(1/ (8n+5))-(1/(8n+6))]
Assume for a second you had a perfect server farm. Its always up, backups are made, users are added and removed, etc. While we are at it, assume you have a staff of say two admins per shift, 24x7. That's at least 8 admins, probably more to cover holidays, vacation, etc. In this case, their productivity is zero, they have nothing to do. In reality, they are working their tails off, and deserve a nice bonus. So tell the PHB that productivity is not important, its problems. Its uptime, transactions delivered, average delay on transactions, etc. Get the Users to define what the 'requirements' are, and have the sysadmins deliver it. That is the measure of what is important.
RailGunSally wrote:
>We in the trenches have been tasked with providing 'metrics' on absolutely everything from system utilization to paper clip recycling.
This pretty much says it all; your manager wants you to do HIS job. Shouldn't he develop his own metrics? He can ask you for ideas but he should do the work himself. As for metrics, I'd suggest downtime percentages for each machine. If the services are up and running and the machines are online providing service then that should be metrics enough.
Codifex Maximus ~ In search of... a shorter sig.
The problem is the numbers don't look good. To quantify what you're looking for you'd want "number of hours spent idle" i.e if a sysadmin did his job well and has everything running smoothly, how many hours does he have with nothing needing to be done?
Once any manager or other authority type sees that number though rather than seeing you did a good job at keeping things reliable, they'll see you as lazy and assign work you shouldn't be doing (other peoples jobs).
Really just about anything other than data entry is hard to quantify in the computer field. Someone suggested troubletickets.. but theres a huge difference between a ticket that requires you to restart apache, and one that requires you to strace half your system to debug, and raw ticket numbers don't tell you that.
On the same note, lines of code mean nothing to actual programming, nor do "functions per day" or anything similar as again, you can't quantify the effort required in an easy line vs hard line. Is it a simple debug print or core logic you had to scratch out on a whiteboard to keep sane?
Pain lasts, kid. Its how you know you're alive. Sometimes I think this growing up thing is just pain management-TheMaxx
They consider you only as good as your last mistake. The bosses don't want to know what goes on "under the hood". It just has to work. Anything less than 100% uptime is considered a failure in their eyes.
What?
Keep track of uptime. Are the systems only down for scheduled maintenance? If they are down outside of scheduled maintenance windows, what is the percentage? Was it hardware or software or a mix (old firmware with updated driver requiring newer firmware), etc...
Was the outage extended due to vendor timing? if so, maybe stock of typical spare components should be maintained to shorten the window.
Typical maintenance like adding/deleting/unlocking user accounts, resetting passwords, printer maintenance, disk admin should be a small part of an admins day. The rest should be keeping an eye out on the real world looking for potential problems like security vulnerabilities, patches, planning the next updates / upgrades.
Tell the bean counters that their demands to quantify everything will only reduce uptime and complicate matters to where you spend more time doing paperwork than you do managing systems. If they can't understand that, it's time to go elsewhere. Be sure to tell the bean counter that they'll be lucky to find anyone talented to work under their regime.
I've seen systems that went from 20% loaded to always overloaded because of the number of *accounting* applications, programs and monitoring solutions that were *demanded* by the bean counters. After a user and business unit rebellion, the *fluff* was removed, as was the bean counter. This left the systems running in a state where the end users could do their work, and the business units had satisfied customers.
Who is general failure, and why is he reading my hard drive?
I wonder how much he'd have to post to get a bigger Christmas bonus.
[Fuck Beta]
o0t!
Hours Worked Fixing Problems divided by Hours Worked Doing Routine Work
The lower the number, the more efficient the sys admin is. A good sys admin doesn't have to do anything, because everything is already set up and working. If the admin is constantly fixing servers, bringing them up, restoring data from backups, etc., etc., etc., then he isn't doing his job. If the majority of his day is spent sleeping in his chair and responding to the occasional email and things are running smoothly, then you can't ask for anything more.
Love sees no species.
Simple answer is that you don't. Productivity in terms of IT and related fields has become a dirty little word but more than that it is a business term, not technical. If you aren't a director or higher in title, and your duties don't include justifying expenses and planning resources for solutions, then it isn't really your realm to measure something like productivity. If this guy has an MBA or similar qualifications, it is he who should know how to measure productivity. But alas the word productivity has become corrupted by half-assed business journalists trying to write articles about over all productivity and how your employees waste too much time on facebook. If this guy just wants a number and gives you no guidelines as to how to come up with the number, then my guess is that he just wants to kiss up to the CEO that "productivity" is up 40% or he wants a number to justify laying off people. Either way, if he cant tell you how he reached his number, I would suggest getting your resume ready.
Also ideally, a CTO wouldn't be asking those in the trenches how to measure productivity, but rather how to improve it. As someone in the trenches, you probably know where the snags are in efficiency, or what software you would need to purchase to help smooth things along or even where people are over worked or over looked. This is the positive way to improve productivity. Basically he should be asking you what you need in order to get your job done, and he should get it for you (within reason of course)
meep
This is a thoughtful and intelligent response. I would add to it though in the following way: What is the VALUE of each of these things? That is, instead of trying to measure efficiency, or "productivity", it would be more useful to the CIO to measure the actual value of what IT does. For example, if security is increased by X amount as a result of investing Y dollars, then the ROI of that investment is easily calculated and it can be compared with other investments. Note that to measure security, one must estimate (somehow) the expected future loss per unit of time. Thus, an increase in security means a lower level of expected loss per future time. The same can be done for other aspects of IT operations, including those you mention above (reliability, response time, etc.). In the end it must be measured against the value to the organization. Productivity is the wrong approach. Value is the right approach. - Cliff
1) Ask him what he wants to hear
2) Tell him what he wants to hear.
If you can't reasonably tell him what he wants to hear, tell him how much it will cost to produce what he wants to hear.
This is not a technical consideration. This is a political consideration. He already has an idea of how to cover his ass. Give him the asbestos he wants.
...no matter what your boss says. Just don't do it. It is management's responsibility to come up with metrics. If they can't do that, they're not qualified to hold their position, and frankly, I would tell them to their face. It might get you fired. But I've taken the "this is not my responsibility" tack before with some success. The reason this stuff happens is because workers allow it to happen, and if you don't stand your ground once in a while they will just keep shoving this type of crap at you.
Get back to work before they outsource you to a perl script.
Obama likes poor people so much, he wants to make more of them.
Books burned per hour?
Recursive: Adj. See Recursive.
I remember an old joke about a furnace repairman coming to a home and after looking at the furnace for about a minute and a half, listening to the rumbles and gurgles. He takes his hammer out and at once precise place he hits the furnace. The furnace starts up and runs fine as if it was brand new.
The bill was $200.
The homeowner asks why so much when all he did was hit it once with a hammer?
The repairman takes back the bill, and itemizes the bill still totaling $200.
Cost of hammering, $1
Knowing where to Hammer $199
Any idiot can muck about on a UNIX box, I worked at one Fortune 500 company where everybody in the dept had Double E's. Still their main Solaris server crashed ever 3-5 hours daily and had been for months.
Took me a week to unscrew it and put everything back in order.
Me, I am high school dropout with no GED and some non-technical college courses. Still most of what I was doing was letting them do their work and not have to bother about broken systems. My value was on par with theirs as it was time they didn't lose on their work.
Nevertheless beancounters are stupid (also Beancounters are not accountants), they know the cost of everything and the value of nothing. If you really want to send their head swirling take the entire labor budget for each dept expressed as an hourly unit. Every time you work for a dept internally charge the company that much for each hour you work on a project or ticket for them or better still your company and tell them thats how much it costs. Without Sysadmins nobody does anything but fight technical fires and gets no work done.
Likely this joker found out that Auto Mechanics have a book to calculate how much to charge for each service and repair with details on how long each job should take. This doesn't work because Sysadmins are closer to being chefs or doctors then low end auto mechanics.
Even so, people who own Jaguars, Ferrari, and Maserati don't take them to Jiffy Lube.
If they complain tell them the story about about that hammer. (or better yet use on on them)
Sorry about the writing. Robot fingers, you know? Cliff Steele in DOOM PATROL #23
>you just saved two full weeks of your live by passing 'that guy.'
What? Since when is going fast a time machine?* It just means you got some where quicker, and could start being an asshole to someone at your destination earlier.
* Ignoring special relativity.
Open Source Drum Kit, LPLC deve board - mjhdesigns.com
This pretty much says it all; your manager wants you to do HIS job. Shouldn't he develop his own metrics? He can ask you for ideas but he should do the work himself.
You are right, but you are also skating on thin ice here. Asking someone who has no clue what is happening to set metrics is just asking for trouble....
HA! I just wasted some of your bandwidth with a frivolous sig!
productivity = (SystemUsers) * (TimeReadingSlashdot) / (SystemDowntimeRate)
And while that is obviously a joke, it also happens to be pretty much correct.
Obviously if you increase SystemUsers while keeping all else constant, productivity goes up. If you reduce SystemDowntimeRate while keeping all else constant, productivity goes up. If you spend less time on general maintenance and random panic fixes (increase TimeReadingSlashdot) while keeping all else constant, productivity goes up.
-
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
... in my opinion, is to be as bored as possible. Everything which is done on a regular basis should be as automated as possible, and as much effort and resources thrown at avoiding potential problems as the finances and customers will allow (data backups, spare or redundant equipment, etc.).
Much of a "good" sysadmin's time should be spent doing regular, but occasional spot checks on the automation (which can also be greatly automated) to ensure everything is running as smoothly as possible.
Obviously, not all problems can be avoided, especially hardware failures, but if everything else is in place, even recovering a dead, but critical server can be fairly painless.
When a IT dept is feeling underfunded they merely threaten to quit. The reaction is akin to a three year old being told mommy is leaving. Good IT people are worth their cost, and most are not so greedy. They stay because they want to not because of the money. Beancounter bosses tend to pay for this sort of thing later as nobody wants to work for them when word gets around. Soon they have ex-MickyD employees running the Desktop support and the smell of French Fries just won't leave the server room.
Has was stated earlier, You can not derive a metric for a negative event. You can only measure a sysadmin's stability factor in terms of (well) stability. Other metrics could be created by how long new services appear after they are requested, but there is no way to gauge the factors relating to them.
Management:
We would like to replace Oracle with MySQL how long is it going to take?
All our servers are running Solaris 8, the current version is Solaris 10, please upgrade asap.
We want to move off linux and on to HP-UX before next month.
As to the opinions of others....
Hi, welcome to IT you must be new. Per hour performance metrics is about the stupidest thing I have heard. We are not factory workers we are information workers. One problem can generate 2000 helpdesk requests in 10 minutes and take less then that to fix. I am not going to write out two thousand emails to tell everybody that we had a fiber cut from the ISP and our mail server will be down. Sounds like you've been reading the latest management fad like Sigma 7 or some other such tripe.
Actually thats not quite how it works. If the folks in accounting (non-beancounters) usually call up and ask if there is a problem and you give them a update on whats up and they go back to what ever they where doing. If the dept needs more money next time around they just explain what its for in a meeting. If it's not ludicrous then the usually get it. If the manager gets the numbers wrong then that manager has to fix it some way. A good manager will pad it and use Departmental luncheons which tend to eat up the slack.
Sorry about the writing. Robot fingers, you know? Cliff Steele in DOOM PATROL #23
"How many tickets were not opened in the first place because things just work."
"Functionally stable": A euphemisim meaning your project/system is going nowhere and getting nothing.
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
I am an SA who became a bean counter. One of my primary motivations was that I saw f*ck-ups getting rewarded with less work and raises while hard-working SAs suffered with more work and dead end jobs.
I think management deserves to know what is good work and what isn't. If you leave it up to them, they are going to pick something like tickets resolved or customer satisfaction and you are going to see the a**-kissers move up while the hard-working straight-shooters get the shaft.
I think the metrics described here are good ones, but I'd change #4 to the ratio of load to capacity -- which is a measure of efficiency and good planning. Overall, a good SA should be able to maximize delivery of services. I'd also change #5 to security risk measured as ELV (expected loss value). I know a lot of security professionals who hate this and think it is meaningless, but so far none has given me any better metric to show management that security risks are actually getting better managed over time.
In short, think of what a good SA does for a company and propose metrics that reflect that. Do NOT leave it up to management like some have suggested. THey are asking for your opinion as an expert. Step up and show that you are the expert by giving them an expert answer. Show them that you know the difference between a good job and a bad job.
... because a system administrator isn't producing anything, any more than a safety engineer is. They're there to preserve certain non-functional properties of the system. The appropriate measure is how much of the time the system meets or exceeds the service level agreed to, and what the cost is in staff hours to do that.
Trying to turn it into a "productivity" measure will have the inevitable effect of maximizing whatever is being measured, whether it's LOC of scripts, service tickets closed per hour, or kumquats per fortnight.
Form a union or form a company. Go off site with the other sysadmin colleagues and either form or join a union or form a company.
If union: Form it legally. Then negotiate for better pay, recognition and benefits. Take legal actions as necessary to achieve your collective bargaining objectives. Use the "unreasonable" demands for undefined metrics as justification. Cite the beancounters by name.
The basic problem with current IT Sysadmins and Janitors (which is what they think you are) is that no one has the balls our grandparents had to join together to stand up to PHB's. Collective bargaining is a reasonable and effective strategy, if you let them divide you, they will win.
Are you really so naive as to believe that MBA classes are about how to measure things and how to "listen" to employees. ROFL. Grow up will ya. MBA's are about how to obtain and wield power to become personally wealthy. It's that simple.
Part of their strategy is hiding behind smoke screens and FUD, using HR and other gullible people types to deploy change. Measurement of things where the units are intrinsically undefinable is an old and effective technique. Play their game, their way, and you will lose.
If "form a company": This is a much better alternative (if you are a small group of Sysadmins they need and can reach a practical agreement). Form your own services company. Then all quit on the same day and immediately offer your services (on reasonable terms) to the same company. The terms should offer no loss of continuity or support and an effective SLA, based on Your Metrics This is quite legal and reasonable, it's how many companies start, it's called free enterprise. Hopefully during the few days it takes to complete the negotiations, none of their systems fail catastrophically. But if so... why would you care... Somebody else's problem.. right. Either your worth paying for your services, or your not. The real question is do you have the testosterone to really find out?
The only question any MBA cares about, is who dies with the most toys. Ethics is a course taken so they know what to avoid doing by accident.
The only option for working practitioners is either collective bargaining (poor and outdated) or running your own services company (better imho, but takes real organization and balls)
So... have you got a pair?
What part of "Get Control", don't you guys understand?
There is no god; get over it already! Never exchange a walk on part in the war, for a lead role in a cage.
The metric should be 'number of times the sysadmin has to be consulted', and it should be driven as close to zero as possible.
I might get moded 'funny', or 'flamebait', but I'm serious.
Think about it. When is a sysadmin needed? When there is some kind of crisis. "I can't get to the internet", "I can't check m email", "My computer thinks I might have won a million dollars", "I lost that important project file". A good sysadmin will prevent these things from ever happening, and when they do happen will have them resolved quickly, without a lot of technobabble or attitude (like the SNL skit guy), and will fade into the woodwork. Ironically, the middle-of-the-road IT guys are often thought of as heroes by the staff they support. They might be thought of as the firefighters, but unfortunately, they are also often the pyromaniacs.
Other useful metrics:
If you don't already have a ticket support system, get one. It will generate useful metrics for you. Some useful things out of it would be:
- The AGE of the OLDEST OPEN SUPPORT TICKET. Proves you aren't dilly-dallying
- Number of Priority 1 Tickets opened per quarter (see above - should be as low as possible)
- Everything you do, you should open a ticket for. Upgrading that linux box? Ticket it. Updating anti-virus definitions? ticket it. From this you will get:
- Nunber of tickets open per day
- Nunber of proactive vs. reactive tickets (tickets you opened vs. someone else opened. You should get credit for fixing things before they become an issue someone notices.
And if the bean counter needs some big numbers to justify things, just count up stuff that the logs on public boxes find. Seriously - have you ever looked at the stuff from logwatch? Just yesterday I had 2163 unique failed attempts to log in as root, not to mention all of the other assorted hackery it catches. "Number of successfully defended intrusion attempts" is a metric that will scare a bean counter enough that he won't take the liability of getting rid of you.
Try to get him to understand that some deliverables are 'negative' deliverables. Uptime (lack of downtime) or security (lack of intrusions) are good examples. They are partly the expression of your due diligence, good practices, savoir faire, and flair. These will never be piechart-able. If he does not understand that or does not want to understand it, pack away and get working somewhere they deserve you better. A job is not just an exchange of money for work. You have to get some consideration and self-fulfillment out of it.
Part of the problem is that the sysadmin job is somewhat reactive (like the plumber who responds to problems), somewhat preventative (like the security guard keeping the bad guys out), and somewhat prescriptive (like the carpenter adding on another 20000 SF of building). Try to divide the general role into these different categories and come up with metrics for each. Coming up with a single metric will be nearly impossible because of the diversity of the responsibilities of the job.
Find other jobs that have similar, "preventing the negative" jobs. How would you measure the security guard's efficacy?