System Admin's Unit of Production?
RailGunSally writes "I am a (strictly technical) member of a large *nix systems admin team at a Fortune 150. Our new IT Management Overlord is a hardcore bean-counter from hell. We in the trenches have been tasked with providing 'metrics' on absolutely everything from system utilization to paper clip recycling. Of course, measuring productivity is right up there at the top of the list. We're stumped as to a definition of the basic unit of productivity for a *nix admin. There is a school of thought in our group that holds that if the PHBs are simple enough to want to operate purely from pie charts and spreadsheets, then we should just graph some output from /dev/random and have done with it. I personally love the idea, but I feel the need for due diligence, so I put the question to the Slashdot community: How does one reasonably quantify admin productivity?"
How many tickets answered per day? Completed per day? /dev/random is probably the most elegant though
Keep track of uptime. Are the systems only down for scheduled maintenance? If they are down outside of scheduled maintenance windows, what is the percentage? Was it hardware or software or a mix (old firmware with updated driver requiring newer firmware), etc...
Was the outage extended due to vendor timing? if so, maybe stock of typical spare components should be maintained to shorten the window.
Typical maintenance like adding/deleting/unlocking user accounts, resetting passwords, printer maintenance, disk admin should be a small part of an admins day. The rest should be keeping an eye out on the real world looking for potential problems like security vulnerabilities, patches, planning the next updates / upgrades.
Tell the bean counters that their demands to quantify everything will only reduce uptime and complicate matters to where you spend more time doing paperwork than you do managing systems. If they can't understand that, it's time to go elsewhere. Be sure to tell the bean counter that they'll be lucky to find anyone talented to work under their regime.
I've seen systems that went from 20% loaded to always overloaded because of the number of *accounting* applications, programs and monitoring solutions that were *demanded* by the bean counters. After a user and business unit rebellion, the *fluff* was removed, as was the bean counter. This left the systems running in a state where the end users could do their work, and the business units had satisfied customers.
Who is general failure, and why is he reading my hard drive?
You can evaluate how many users the SA's systems serve, how many systems the SA maintains, and how much data throughput all these users/systems generate.
A confused Microsoft-SA running in circles around an Exchange server all day in order to serve 200 users is not "efficient" compared to a Linux-SA running an MTA which services 25.000 users (with better response times).
On the other hand, a non-skilled Linux-SA who is fiddling with a SAMBA server in order to maintain 200 users with Windows clients is not very "efficient" compared to a skilled Microsoft-SA with a well configured AD.
Off course you can measure SA efficiency. And there is nothing bad about it. In most cases it is even a benefit for the *nix admins.
- Jesper
My security clearance is so high I have to kill myself if I remember I have it...
Hours Worked Fixing Problems divided by Hours Worked Doing Routine Work
The lower the number, the more efficient the sys admin is. A good sys admin doesn't have to do anything, because everything is already set up and working. If the admin is constantly fixing servers, bringing them up, restoring data from backups, etc., etc., etc., then he isn't doing his job. If the majority of his day is spent sleeping in his chair and responding to the occasional email and things are running smoothly, then you can't ask for anything more.
Love sees no species.
This is a thoughtful and intelligent response. I would add to it though in the following way: What is the VALUE of each of these things? That is, instead of trying to measure efficiency, or "productivity", it would be more useful to the CIO to measure the actual value of what IT does. For example, if security is increased by X amount as a result of investing Y dollars, then the ROI of that investment is easily calculated and it can be compared with other investments. Note that to measure security, one must estimate (somehow) the expected future loss per unit of time. Thus, an increase in security means a lower level of expected loss per future time. The same can be done for other aspects of IT operations, including those you mention above (reliability, response time, etc.). In the end it must be measured against the value to the organization. Productivity is the wrong approach. Value is the right approach. - Cliff
1) Ask him what he wants to hear
2) Tell him what he wants to hear.
If you can't reasonably tell him what he wants to hear, tell him how much it will cost to produce what he wants to hear.
This is not a technical consideration. This is a political consideration. He already has an idea of how to cover his ass. Give him the asbestos he wants.
...no matter what your boss says. Just don't do it. It is management's responsibility to come up with metrics. If they can't do that, they're not qualified to hold their position, and frankly, I would tell them to their face. It might get you fired. But I've taken the "this is not my responsibility" tack before with some success. The reason this stuff happens is because workers allow it to happen, and if you don't stand your ground once in a while they will just keep shoving this type of crap at you.
I remember an old joke about a furnace repairman coming to a home and after looking at the furnace for about a minute and a half, listening to the rumbles and gurgles. He takes his hammer out and at once precise place he hits the furnace. The furnace starts up and runs fine as if it was brand new.
The bill was $200.
The homeowner asks why so much when all he did was hit it once with a hammer?
The repairman takes back the bill, and itemizes the bill still totaling $200.
Cost of hammering, $1
Knowing where to Hammer $199
Any idiot can muck about on a UNIX box, I worked at one Fortune 500 company where everybody in the dept had Double E's. Still their main Solaris server crashed ever 3-5 hours daily and had been for months.
Took me a week to unscrew it and put everything back in order.
Me, I am high school dropout with no GED and some non-technical college courses. Still most of what I was doing was letting them do their work and not have to bother about broken systems. My value was on par with theirs as it was time they didn't lose on their work.
Nevertheless beancounters are stupid (also Beancounters are not accountants), they know the cost of everything and the value of nothing. If you really want to send their head swirling take the entire labor budget for each dept expressed as an hourly unit. Every time you work for a dept internally charge the company that much for each hour you work on a project or ticket for them or better still your company and tell them thats how much it costs. Without Sysadmins nobody does anything but fight technical fires and gets no work done.
Likely this joker found out that Auto Mechanics have a book to calculate how much to charge for each service and repair with details on how long each job should take. This doesn't work because Sysadmins are closer to being chefs or doctors then low end auto mechanics.
Even so, people who own Jaguars, Ferrari, and Maserati don't take them to Jiffy Lube.
If they complain tell them the story about about that hammer. (or better yet use on on them)
Sorry about the writing. Robot fingers, you know? Cliff Steele in DOOM PATROL #23
You can measure your productivity in the same way that doctors do: by measuring how much better you make life for your users/patients.
The medical unit is the QUALY, or "Quality Adjusted Life Year". So If you're a paramedic and you save the life of a 25-year-old who will live to 85, that that's 60 QUALYs. However, if their "quality of life" will be degraded, e.g. due to a disability, then you quantify that.
QUALYs are normally used to determine where to spend money, e.g. if this cancer drug costs $X and adds 3 weeks of semi-unconcious life, is that better than spending $X on drug Y.
How do you apply this? For each thing that you do, count how many people are affected. Then try to judge the "quality of (work) life" effect on them. E.g. being without email for a day = 30% impairment to quality of work life. So if an email outage affectng 100 people lasts half a day that's 100 * 30% * 0.5 = 15 QUALdays. To measure your productivity you then need to judge what would have happened if you weren't there (i.e. outage lasts all week or no outage at all).
Presumably you already have some sort of issue-tracking system. I think you just need to extend this to send out an additional message: "now that this ticket is closed, please tell us how much your QUAL was impaired during the period of the problem". And the rest is a perl script.....
productivity = (SystemUsers) * (TimeReadingSlashdot) / (SystemDowntimeRate)
And while that is obviously a joke, it also happens to be pretty much correct.
Obviously if you increase SystemUsers while keeping all else constant, productivity goes up. If you reduce SystemDowntimeRate while keeping all else constant, productivity goes up. If you spend less time on general maintenance and random panic fixes (increase TimeReadingSlashdot) while keeping all else constant, productivity goes up.
-
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
When a IT dept is feeling underfunded they merely threaten to quit. The reaction is akin to a three year old being told mommy is leaving. Good IT people are worth their cost, and most are not so greedy. They stay because they want to not because of the money. Beancounter bosses tend to pay for this sort of thing later as nobody wants to work for them when word gets around. Soon they have ex-MickyD employees running the Desktop support and the smell of French Fries just won't leave the server room.
Has was stated earlier, You can not derive a metric for a negative event. You can only measure a sysadmin's stability factor in terms of (well) stability. Other metrics could be created by how long new services appear after they are requested, but there is no way to gauge the factors relating to them.
Management:
We would like to replace Oracle with MySQL how long is it going to take?
All our servers are running Solaris 8, the current version is Solaris 10, please upgrade asap.
We want to move off linux and on to HP-UX before next month.
As to the opinions of others....
Hi, welcome to IT you must be new. Per hour performance metrics is about the stupidest thing I have heard. We are not factory workers we are information workers. One problem can generate 2000 helpdesk requests in 10 minutes and take less then that to fix. I am not going to write out two thousand emails to tell everybody that we had a fiber cut from the ISP and our mail server will be down. Sounds like you've been reading the latest management fad like Sigma 7 or some other such tripe.
Actually thats not quite how it works. If the folks in accounting (non-beancounters) usually call up and ask if there is a problem and you give them a update on whats up and they go back to what ever they where doing. If the dept needs more money next time around they just explain what its for in a meeting. If it's not ludicrous then the usually get it. If the manager gets the numbers wrong then that manager has to fix it some way. A good manager will pad it and use Departmental luncheons which tend to eat up the slack.
Sorry about the writing. Robot fingers, you know? Cliff Steele in DOOM PATROL #23
I have never been a SysAdmin but I have done IMR. In IMR if you are really doing your job, then no one knows your there but the bean counters, unless of course someone wants to add something new to the mix or operator error. Of course it was more obvious when you regularly had to change the machine setups to run different products. Pretty much everywhere I did that there was a ticket system to make the bean counters happy and the better you did your job the more creative you had to be to account for your time. If you didn't make the bean counters happy they would start working to get your job eliminated and if you worked directly for production in this respect the departmental manager might try making you do production work when not doing maintenance or repairs.
Fortunately the at the place I worked the longest in that field the plant manager was a former IMR technician and he would give a dressing down to any bean counter or ignorant lower management who messed with us, as he put it "this plant is running very smoothly because of this IMR crew and they all know that the smoother things are running the more time they have to relax and plan future required interventions to keep themselves relaxed with free time to think." Course we were not always thinking of such things while relaxing and he knew that, but he knew as long as we got rewarded with relaxing time at work, the smoother we would keep things running to preserve the relaxed atmosphere.
Personally, I think the SysAdmin who contributed the question here should check the obviously clueless "new IT Management Overlord"'s computer to make sure they not violating copyright via downloading recordings; make sure they are no trojan infesting the system; and make sure they are not downloading large amounts of porn, it would be a shame if this "bean counter" had to account for anything like that.
Welcome to the real world. In corporate management, it appears that it is the job of the manager to order his subordinates to find out what his job is. Actual productivity is irrelevant. It is all a shell game to provide the semblance of useful work for the new feudal overlords.
Which is why I quit. SMEs have their problems, and they are not entirely free of the MBA Morons, but it's a damn sight better than any Fortune x00 company.
Mart"I know I will be modded down for this": where's the option '-1, Asking for it'?