Ideal, and Actual, IT Performance Metrics?
An anonymous reader writes "Recently it was revealed that our company measures IT performance by the time it takes to close trouble tickets. I consider IT's primary goal to be as transparent to the user as possible, thus this metric was rather troubling to me. Shouldn't we be focused on reducing calls, rather than simply closing them quickly?
My question is: How is your IT performance measured, and how do you think it should be measured?"
We usually try to measure how many libraries of congress we can get to the new blade server in under 5 minutes.
our best is 12.
Non impediti ratione cogitationus.
I thought IT got paid for the number of times they said 'No' to us during the day.
go figure.
They Live, We Sleep
Customer Satisfaction, and pro-active problem solving
I will not give in to the terrorists. I will not become fearful.
...by the number of callers left alive at the end of the day.
I think poster has a point.
A nice metric might be the count of tickets that are never opened.
An IT-department, IMHO, should be working on making itself obsolete.
...timeliness of the TSP reports
I think the focus should ultimately be on reducing calls. So, perhaps, you're doing really well if the average calls per week continues a downward trend each week.
However, since many IT departments are actually split into different subdivisions, how can you measure the group that just takes calls, addresses issues, and closes tickets. It may be their ONLY job to close tickets/issues. They may have exceedingly little control over any underlying problems. So, to measure their performance, perhaps number of issues closed is not entirely wrong. But, managers of this group should be evaluated over time. Any recurring issues should be brought up as potential bugs or user training or just needing general improvement to the system, whatever that might mean.
...reduces jobs.
I consider IT's primary goal to be as transparent to the user as possible, thus this metric was rather troubling to me. Shouldn't we be focused on reducing calls, rather than simply closing them quickly?
Not for "stupid" users, the ones you see on a day-to-day basis. Now, this all depends on who you are giving support to, competent IT professionals or the day-to-day office worker. If you are giving them to fellow IT people, it should be a goal to be transparent. For the office worker the main job is productivity, that means fix the problem as soon as possible or tell them there is no problem and have a good day.
Taxation is legalized theft, no more, no less.
It's kinda difficult to measure how often something doesn't happen, unless you just track uptime. You'd need to do that on a per-workstation basis to get some idea how few calls come in. I don't think the speed of closed tickets should be the only measure. Customer satisfaction should also be tracked, both in terms of service calls and system reliability.
Well here's a newsflash you can't quantify most IT jobs. We are an ever changing backbone to the business in most cases. Metrics are meaningless to us. If you'd like to have a way of evaluating IT then set goals. Salespeople use metrics as a way of increasing sales and ridding themselves of dead weight. As an IT person you can be on fire one week and dead the next and it's all the same. So to answer the OP, take a picture of your ass and give it to the person that originally put the thought in your head that we need to justify what we do or how quickly we do it.
But it's close. Of course, closed tickets are something a manager can measure. Needless to say, it measures nothing meaningful. For example, I tell a customer to reboot. Close the ticket. That takes little time and closed the ticket fast. In fact, I can improve my metrics by telling that same person to do this ever 4 hours for several years. OR, I can get up, go to their desk, and solve the problem permanently. It takes longer, making my metrics look bad, but in reality-land (a land far, far away from management land), that person is doing productive work longer and more efficiently because the interruption and downtime have been removed.
Please do not read this sig. Thank you.
Showing up on time. It usually doesn't happen.
s/metrics/bullcrap
A good metric should be
1 - Enterprisy looking
2 - Easy to gamble by the interested
Your boss wants a number, give it to them quickly. It's all BS (or 99% of it at least. Don't agree? Do the job then) in the end.
So good metrics could be.
- Unplanned downtime
- Number of users, number of bytes used, etc (that plots a nice ascending graph, and ASCENDING IS GOOD, you can print that and put it in the wall)
If they stay on 'time to close the ticket' NEEDINFO and WORKSFORME is your friend.
how long until
As someone in the support field the company I work for looks at the # of steps in the ticket to resolve the issue, the overall time in resolving the issue and the complexity of the issue. The goal is always to reduce the amount of calls and steps but call complexity also must be measured in the overall metric.
Amount of service calls resolved: h
Server/network downtime (in hours): d
Use formula '(s / h) + 2d"
Use resulting number to chart IT support performance, assuming that the network + server uptime and stability is more important than user inconvenience. You could decide that anything above a certain threshold is too much, or use it to compare personnel with each other.
Yet Another Tech Blog
(but so much more, including game and movie reviews)
http://yanteb.peasantoid.org
Any metric is, at best, indicative. You can spend all day designing a better metric and by the end, you're still not going to get anything better than, well, indicative.
As with all data-analysis, make sure that whoever's using these numbers know how bad they are. If we're dealing with reports and decisions, make sure that there's a short explanatory comment by somebody in the know about to which degree you feel that these numbers are representative (example : overall performance is improved, but averages are scewed by a large of number complicated bugs on New Product).
Oh, and if the people making decisions are MBA's unable to read a single short sentence, you're screwed either way. Then you just have to roll with it :)
"" How about taking the safety labels off everything, and let the stupidity-problem solve itself? """
Time to resolution is a perfectly acceptable metric for a help desk department. For services some of the important metrics are availability (including performance!), mean-time-between-failure, number of new releases, percent of successful releases, etc. For specific processes you should have metrics for example for how long the new employee on-boarding process takes, or how long it takes to bring additional capacity online, etc.
I recommend you talk to someone that are experts in IT Business Service Management. If you're in the US one of my previous employers (www.maryville.com) could help you.
you need to engineer easily fixable problems... up goes your rating... whee!!!
For example, for every fax successfully sent via the fax server without IT intervention, the IT department gets one point.
For every fax that needs IT intervention to be sent, the IT department loses one point.
For every person who becomes aware of a problem with the fax server, the IT department loses one point. No more "heroics". The goal is to be as invisible as possible to the end users.
And similar items for every other server/service that IT supports. If nothing else, it will show exactly where the problems really are.
In my department, we have an agreement with the rest of the company outlining the level of service that must be performed within a pre-determined amount of time, based on incident priority. With the right tools, it's fairly easy to track the percentage of incidents resolved within the terms of the SLA.
"Ask not what your country can do for you." --John F. Kennedy
I think that when the metric is to reduce the number of calls, the natural human tendency is to ignore calls, shift calls to other people, etc. to make it look like you're doing better when you're not.
So that's why most people look at your find versus fix ratio, the number of bugs you find versus the number you fix / the length of time it takes to fix them. It's not great to have zillions of issues, but you should always try to fix the issues as quickly as possible.
In the low-level government job I suffered through for 2 miserable years, IT performance was measured by presence in your chair. If you kept the chair at a satisfactory egg-hatching temperature, and never made your presence otherwise known, you were a star. If you did work, you were a source of trouble.
There's one metric that can capture everything:
Bits of Shannon entropy processed per hour.
My only political goal is to see to it that no political party achieves its goals.
Can you please explain what defines "IT performance" so that we know what we have to measure.
Clearly a single metric is unlikely to be best way to measure performance. A weighted some of various measurements sounds better and has the additional benefit of creating spirited discussions of what the weights should be. Brave souls even dabble with nonlinear functions of the measured items.
Shouldn't we be focused on reducing calls, rather than simply closing them quickly?
We should be focussed on both.
My question is: How is your IT performance measured, and how do you think it should be measured?
ITIL principles are a great starting point.
Examples are using Key Performance Indicators (KPIs) such as at the bottom of this page and this page.
If the company has an IM solution, such as IBM Lotus SameTime, you can measure gaps. Look for holes in user availability.
If I am available on work IM 8-5, then my workstation was up during that time. If I am on work IM 8-9:12 and 9:18-5, I probably had a six minute downtime.
-- Support a free market in the field of government
Close the discussion. We have a winner.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
80% of Users are a PIA, 20% never call the help desk. Metrics can be construed in many different ways to represent a positive or negative interpretation.
For a level 1-2 support position, metrics should probably be focused on how efficiently tickets are resolved. Once you get into administrative functions and management, those positions should be the ones focusing on increasing stability and reducing the amount of tickets submitted in the first place. Hopefully your managers and administrators are being assessed for their success in those areas, just as the junior staff are being assessed for how efficiently they can process the issues.
At my former employer, customers would call the national helpdesk, who were rated by their time on a call. Let me tell you, the type of customer service you get from that environment is crap. They would have the customer reboot their machine, and if that didn't work, they would escalate the call to a state level operations center that could dispatch technicians (where I worked). They were, for the most part, useless. They made the customers angry, and really served no purpose other than a filter.
Management gets the behavior that it rewards, not necessarily the behavior that it pretends to ask for
Whenever I see a metric that measures quantity instead of quality, that tells me the manager gets a bonus. Hopefully, you're getting a piece of that bonus.
in other departments when IT introduces new systems to them.
No, it's not an easy metric to obtain there, but IT is not a simple discipline to categorise. As an example, ask someone to give a hard metric on how much useful information they know (to 2 decimal places).
Sounds pretty normal for a call center. At my last job management got excited if a case was open for over two weeks regardless if the issue was resolved or not. That's what I call great customer service!
Here is the problem... you are trying to assign arbitrary numbers to something that cannot be measured. These are numbers for accountants, they want one number to be able to show them where to cut cost. Problem is that there is no way to quantify how much money an IT department saves a company. Metrics have gotten out of control in this country. We are always measuring the cost and never measuring the value. How do you assign a number to a person who is not a number? How do you quantify the guy who spent all weekend fixing the server? How do you quantify the accrued knowledge of a human being? It impossible to do. The accountants never ask questions like, "How would my quality of life be affected if I couldn't get effective tech support?", "How much money would the company loose if these computers and programs didn't exist?". You need to measure the man and his work as a whole, person to person.
Ideal:
I know about IT, having worked there for many years. In fact, I'm still working there. Keep up the good work, I know there's a lot of bullshit to put up with.
Actual:
I heard some buzzwords. When can we implement them in order to actualize our potential? Also, I'll need you to stay late and fix my computer. It's got some sort of virus or something I don't know.
(As you're fixing his machine, you see a note on his desk right next to the post-it with his passwords)
Hire grad student from India [x]
Get what's his name to train him. [ ]
Fire what's his name. [ ]
Synergize. [ ]
I think the average time taken to close a trouble ticket is important, but it's not the only factor you want to look at.
The primary purpose of issuing unique trouble ticket numbers is to provide an easy "one stop" tracking mechanism for the issue. A customer (or employee) should always be able to reference a ticket # to support staff, and in turn, they should be able to pull up a fairly comprehensive history of what's been done so far to resolve the issue.
If you push too hard for closing tickets quickly, you'll see a tendency for new tickets to get issued on things which should REALLY be continuations of an existing ticket, held open longer.
(EG. I call in complaining that my inkjet printer won't print yellow. A ticket is created and they tell me my color cartridge is clogged up, so put a new one in and I should be fine. Ticket is closed. I switch cartridges with a new one, and discover it STILL doesn't print yellow. I call in and a new ticket is made for what's really the same issue. I'm told how to run the printer through cleaning cycles, and instructed that I may have to do it "up to 10 times" to see results. Ticket closed. I get around to trying that the next day when I get time, and even after 10 or 15 attempts, no yellow is coming out. I call back in, only to have ANOTHER new ticket opened, and the tech wastes my time asking me if I "tried a new cartridge yet?" and I have to interrupt him in the middle of re-explaining how to do a cleaning cycle. Problem is eventually determined to require a replacement printer ... but should obviously have all been filed under one ticket.)
Boss: How many IT tickets did you close today?
IT Drone: Oh, Boss, I don't keep score.
Boss: Then how do you measure yourself with other IT workers?
IT Drone: By height.
In the IT company I work for, we mesure it but the number of irate salespeersons that survived inside a SL8500 with tape cleaning engaged.
It takes 40+ muscles to frown, but only four to extend your arm and bitchslap the motherfucker
We're doing a lot of social outreach, and measured by metrics like how many new members join through our outreach. We're still searching for the best metric to measure our progress in this realm. To that extent, we had to develop our own tool (!), available for free to others at http://www.sociafyq.com/ . Cheers, --Dave
So your IT department ends up with servers/services that are designed correctly (by the vendor) are when a user does something stupid ONLY THAT USER IS AFFECTED.
Other servers/services are "designed" by idiots and any user can cause problems for every other user just by doing something stupid.
So the monthly meeting shows that you have
+2,000 points (-0) for service A
+1,500 points (-0) for service B
+1,000 points (-200) for service C
+300 point (-800) points for service D
Without knowing anything else about those systems, where would you probably start looking for improvements to be made?
Metrics?
I'm a USian, and use Imperials, you insensitive clod!
Our competitors measure their performance by time to close tickets. They are consistently rated worst in support. We use surveys. Simple questions like: Was your problem resolved? Was it resolved promptly? We are consistently rated best in support.
more cowbell
Did you install the new cartridge right? (You are excused if you are a manager...)
I've had some experience with this. Here's what ends up happening.
You setup a system so that a ticket is assigned, the person assigned gets into the ticket and adds comments. Technically they've answered the ticket at that point. The thing that lots of people don't want to realize is that some tickets will go on for months on end.
Bouncing customers is a good way to keep them from calling back -- grandma is much more likely to phone up 'lil Tim for computer advice if she knows the hotline tech is going to bounce her to ten different places; where I work, we get a good bit of troubleshooting work because the customers hate calling the hotlines provided by the manufacturer. Sadly, annoying your customers is a good way to keep them from calling back, and as long as your product is good enough people will still pay-up. E.g. I'm screwed into Suddenlink where I live. After being promised $85.01 TV/Net, I got a $100.00 bill because of hidden fees. Guess what -- I'm screwed into paying, because the only alternative (Cox) was bought out by Suddenlink.
"Sorrow is better than laughter, for by sadness of face the heart is made glad." [Ecclesiastes 7:3]
We have a help desk ticketing system that automated issues get logged in. The on call personel will get pages. Also... other individuals in the company can make requests and log issues into the system to assign them to groups or individuals. The only metric really recorded is response time to respond to the client or automated event. The first concern is communicating early that the concern has been noticed and is/will be scheduled for work.
if the helpdesk clock doesn't start until they open the ticket. i get users all the time who say they spent an hour or more on the phone with the helpdesk. i ask them for future reference if the help desk technician cannot resolve their issue within 10-15 minutes, to open a ticket and escalate the issue.
it is inexcusable for a first level tech who clearly doesn't know how to fix the issue to waste our client's time fumbling around and not resolving the issue. this is why we have tiers of support - log a ticket and send it to a more experienced technician who will save the client time. what i would like the first level techs to do is track the tickets they could not resolve and later look them up and read the audit trail so they learn what the fix was.
http://shutupandreboot.net/Funny_Stuff.html
I will admit I am not sure what would make the best IT metric of service. However I can tell you without a shadow of a doubt what does NOT make a good metric, and how many tickets you close is one of them.
I think my organization must use that metric for evaluation. When I call, I get a ticket. Then they generate a ticket, that they created a ticket, and send me a ticket. Then nothing happens for a long time. Then after I get tired of waiting, I call. Another ticket is generated about the first ticket. Eventually someone will look at it, and say oh we are not responsible for that, would you like use to make a ticket to flag this problem? Great. In the end months later someone may or may not call you about the final ticket that is essentially about not being able to help you at all, ask if you wish the ticket removed. Otherwise it is assumed that it has been taken care of after awhile, and thus satisfying all the rest of the tickets in some sort of orgasmic cascading ticket extravaganza. Then at year end they see they have close 8 Billion tickets, congratulate each other on a job well done, pats on back and bonuses for everyone. Hazzah!
That has been my experience so far anyway.
When your only IT guy goes on vacation for a week+, measure on a scale from 1 to 10 how much he/she was missed.
that your organization has made your job measurable. It does not matter what they measure your performance by, as long as it is something tangible.
So, you get payed by how many tickets you managed to close in a month. Fine. So, you close as many as you can in a month, resulting in lower quality of each problem fix, resulting in more tickets posted and assigned to you, resulting in you having ensured that next month you have enough tickets as well.
This can go on indefinitely, or your wise superiors might decide to measure your work somehow else.
As the island of our knowledge grows, so does the shore of our ignorance.
"Shouldn't we be focused on reducing calls, rather than simply closing them quickly?"
You need to do both.
Customer service should be primary and just use SLA/Time metrics to make sure people aren't goofing off. People are more happy to have support from someone that is slow, communicates and is positive than someone who treats them like garbage and is quick.
.recount toggle
The role of the writer is not to say what we can all say, but what we are unable to say. -Anais Nin
Prior to my software company being bought out, my It department was focused on "customer service." This means that everyone in the company is treated like a customer. I personally work in our software support department and this made utter sense to me.
Under the new company, our new IT works for itself, and primarily is concerned with closing calls as quickly as possible, without regard for the quality of the information or assistance. They are concerned with reducing their own call load, but they don't try very hard, and they don't offer a lot of value over that. Any good customer service department is concerned with closing calls, but they want provide good quality service where each call is resolved as quickly as possible, but also as accurately as possible and leaving a good feeling with the customer. IT should be a resource utilitized to make the company more efficient and reduce costs, not a bunch of yahoos who fix broken PCs and then disappear back under their rock when they are finished.
In customer service, quantitative metrics are used to judge the department trends as a whole, and can be important, but even more important art qualitative measures, like surveys and feedback, example cases, and periodic reviews of every rep, team leader and supervisor. Did the rep do "The Right Thing" (tm) and how many times did they do that, and are they approaching doing the right thing 100% of the time? If a rep provided the user with the right answer, but all they did was email a timid accountant a 5 page document on setting up .NET properly just so the user can properly export his reports to an email to his boss, and then the rep closed the case and offered this less than technical person any real help, how service oriented is that, really?
Sometimes that means taking fewer cases per rep and leaving them open longer, if service improves dramatically.
"All great wisdom is contained in .signature files"
We don't use trouble tickets here. Our VPs are well enough aware that I am busy, as are the other 3 IT staffers trying to keep everybody moving forward. Most issues are resolved on a FIFO basis and each staffer has their own AO. Barring ISP issues most problems get taken care of the same day. Good infrastructure+ good co-workers+ VP support= Happy and Efficient IT people
Some metrics to consider:
If you are good, then the last one will justify your existence.
* = Might also prove cluelessness of user base
I don't say "no" any longer. I ask them what their budget is for accomplishing the task they want.
me: "How much do you have budgeted for this project"
them: "Budget? You mean it costs money? I thought you could do this for free"
me: "We can't do that for free" (laughing to myself the whole time) .... later they come back ...
them: "We have $400 for the project"
me: "Does that include the licensing? Does that include ongoing support? Does that include setup, training, and installation of new infrastructure needed to support your project?"
them: "Uh, no. What do you mean?"
me: "Well, when you want a project ... say for a new building, do you just present $400 and say can you build the building for that?"
them: "Well, no, we have professional architects design the building, then we have professional contractors bid on the project, then we included additional maintenance in the budget for the new building and .... "
me: "So, what you are saying is that you don't view IT as being professional"
them: "No no no no! That's not what I mean at all."
me: "So, how come you just expected us to do what you wanted without asking us what it would take to do it?"
them: "Because it is too expensive when I do ask that"
me: "It is more expensive to do things right. If you want to do it wrong, any non-professional can quote you a lower price. You can get a building and have it built a lot less expensive if you don't hire Architects and Contractors to design and build a building, and it will get built, but it will be missing things you probably want and need. But you know this, and that is why you trust those professionals."
them: "yes, but you are too expensive"
me: "Then the answer is no"
---
Sometimes it is just easier to say "NO". The sad fact is, people don't respect IT professionals AS professionals. We often don't deserve it either, but that is another topic.
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
I think you're stretching things a bit.
"How do you quantify the guy who spent all weekend fixing the server?" You look at the number of times it's happened and you figure out how much it would cost to get that level of service agreement from an outside vendor.
The accountants are much more likely to be asking questions like "how would the business be affected if we outsourced IT at a cost of X, thereby allowing us to save Y in salaries, at a cost of Z in reduced productivity due to longer resolution times".
There are cases where it really doesn't make sense for a shop to handle their own IT. On the other hand, there are definitely cases where it does.
Our IT department is apparently has graduated metrics based on issue priority. Tickets are prioritized based on the number of users affected. When we submit a high-priority ticket for an outage that blocks our entire software lab, the first thing they do is downgrade the priority. This happens in a matter of minutes. Then, after a few hours, they close the ticket without contacting the submitter.
In response, we then have everyone in the lab submit a ticket. Once enough tickets have been submitted, they send out a broadcast message saying that they are aware of the issue and working on it, and ask that we stop submitting new tickets.
Then, eventually, they fix the issue and close the tickets.
A week later, when the same unreliable service goes down again, we repeat the process.
The only thing I can figure is that they have metrics to meet for open time by ticket priority. When they downgrade the priority, the metrics allow them more time to not fix the problem.
Why are you here?
It ain't the money. A network engineer, yours truly -- I rake in about, what, 85 grand a year? You can't buy a decent sports car for that.
It ain't sex. Hey, being here won't get you laid. Oh, you're a dental hygienist? I'm a Cisco Certified Internetwork Expert. -Hello?!
What about fame? Our failures are known. Our successes...are not. That's the company motto. You save the world, they send you to some windowless office, give you a little lemonade and cookies, and show you your medal. You don't even get to take it home.
So it ain't money, it ain't sex, it ain't fame.
What is it?
I say we are all here in this room because we believe. We believe in technology, and we choose technology. We believe in right and wrong, and we choose right. Our cause is just. Our enemies...everywhere. They're all around us. Some scary stuff out there.
Which brings us here... to the server farm. You have all just stepped through the looking glass. What you see, what you hear -- nothing is what it seems.
(paraphrased from 'The Recruit')
We are big on SLAs. Department directors have to sign off on an SLA before IT will support their stuff. Actually this is how IT gets it's budget.
For example, marketing comes to IT and asks for a service like sales tracking. After figuring out what they want we give them a quote with SLA and how much it will cost. After buildout there is a sign off and the service is available for use. To the users there is no concept of hardware of server. They just know if their stuff is working or not. I mean they are marketing people. Any problems that occur are tracked by our ticketing system, and its just a matter of tracking resolution time, incident severity and number of incidents. All of this is defined in the SLA. Resolution time usually comes into play when looking at service availability, and in the incident review process for high or critical outages.
For our team individual performance usually comes down to how well we contribute to the team. My review is not that much different from a kindergarden report card. "Plays well with others" is now "Maintains positive relationships with external partners"
Stop asking to do stupid things like
- run an internet server without a firewall
- Setup accounts without passwords
- Use 1-off proprietary software when we've selected the best solution for everyone in the company. Too bad our selection costs 3x more than the other stuff.
- Bring a 64-way server up without a fail over, test, dev, and DR instances too.
- Bring a 32-way server up this week, when your project hasn't been approved yet. These things take about a month to get delivered and another month to get installed, configured, connected to the SAN and ready for applications
- Allow an outsourced vendor unlimited access to internal networks with 10,000+ servers without a corp-2-corp VPN in place.
- Send and accept unlimited sized emails without any virus and malware checks.
- Demand something fast because YOU didn't schedule and budget properly - MARKETING, this is for you.
- Run a machine that will be hacked easily and turned into a torrent, porn, music, VoIP server a few months after it gets placed onto the network.
Stupid metrics are part of the problem. When I worked for Gateway, they wanted your call average to be between 7 and 11 minutes. If you went above for the week/month, you were too slow and bad at your job. If you went below, you were probably just getting people off the phone without solving their problems.
That metric worked for most people, because they talk slow and have to look up every single issue.
For me, it was killer. I was consistently getting 5 minutes averages, even with that inevitable once-a-day 1-hour phone call. I got reprimanded twice about it before I gave up and quit. Almost every caller was happy with how I helped them. The others couldn't be helped, or I made a mistake. (I told a guy he could clean his keyboard, once... They had switched to keyboards that fall apart if you try to open them, apparently. In my defense, I had offered to send one, but the guy thought cleaning it would be a lot faster.)
Also note that a certain percentage of calls were recorded and reviewed, and I -never- got talked to about any of my calls. The only complaint I had was the keyboard guy. And yet I still got yelled at for short call times.
Again, stupid metrics are stupid. Call-time has nothing to do with customer satisfaction.
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
help desks are generally not measured by how many cases they close. Here are the normal metrics that help desks use for preformance "this is a 30,000 foot level mind you"
1. Costs - how many buts in seats you have
2. volume - how many incidents you receive
3. Aging - how long does it take you to respond to incidents
4. Customer satisfaction - how happy are your customers.
If you can keep 1 and 2, and 3 down, number 4 typically takes care of itself.
A good IT department will deploy a team of problem management specialists that will work problems instead of incidents. That is what keeps the calls from coming in, in the first place. And goes back to number 1 and 2 on the list.
If they're just monitoring how quickly tickets get closed and faster is always better, then your observation of the utility of the metric is spot on. Otherwise, productivity is a valid measure and how fast someone turns around work is important.
Corporate IT performance, and support/operations performance can include a timeliness measure, but the interpretation and use of the metric matter. So, for instance, the correct use of such a measure would be to see if a technician is turning work around quickly enough relative to others doing the same sort of work at a given level of quality. YOU CANNOT USE THIS MEASURE WITHOUT CORRESPONDING QUALITY MEASURE(S)... it becomes meaningless in the absence of those controls much to the OP's point.
Still, productivity measures are real and when you know, and control for, the factors that go along with that then you can make use of such a measure to effectively bring value to your organization.
Many lesser IT managers will, though, just look at 'how fast' or 'how many'. The last company I was with just looked at help desk tickets opened and closed, for instance, to judge help desk success. The numbers closed went up and the numbers opened dropped... therefore success right? Nope. This overly simplistic view, with no real quality measure, actually meant that it was so hard to get the help desk to do anything that users just stopped opening them and found 'out-of-band' ways of getting help. Those that were closed weren't necessarily complete, but again, users didn't get any value out of challenging so just gave up. So what our operations management was seeing as success was actually a more accurate measure of their failure!
A simple random sampling of tickets with automated 'how did we do?' surveys could have collected this information and provided some meaning to the productivity measures... as well as if people were closing things too fast or too slow while maintaining quality.
I used to work at a smallish ISP -- about $8m revenue/year -- with an industry reputation for well above average customer support.
The way we measured worker productivity was a combination of a few metrics:
(1) Tickets Closed -- How many tickets were closed in a time period.
(2) Customer Satisfaction -- How satisfied the customers were with the employee's work.
(3) Difficulty -- How difficult the tickets were.
A manager would also randomly sample each of the employee's tickets and make sure that the "difficulty" claimed by the employee was reasonable.
These metrics together give you a rough estimate of output in a support environment. Tickets closed will be impacted by the difficulty of the tickets you accept. The guy who takes a long time to solve hard problems is just as valuable as the guy who solves many easy problems -- you just need to make sure you have the right mix of both. And for actual quality of work, nobody is more likely to discover and bring to your attention poor quality work than your customer.
You should not have closed the ticket until you confirmed the customer could print yellow. You should have "resolved" the ticket. Only the customer can "close" the ticket by confirming resolution was effective.
It's IT management from a wholistic point of view.
SLAs are only one aspect of IT management.
There is no point measuring something unless you are going to do something with the information. Are your metrics getting better because things are getting better or are you just getting better at fighting the same old problems. Are you measuring a metric because it's easy to meassure or because the business needs that metric to be good?
Ultimately the idea is to get incidents themelves to zero because that means a smoothly running infrastructure operating exactly as the users and business expect it to. Not exactly possible, but at least it provides a direction to move in... And if your incident management system is any good, it'll tell you where the problems are, and where money should be spent to fix them. That may be user training, education on the portfolio of services that IT provide, or replacing a critical application that falls over every 10 minutes or is too slow, etc etc.
Deleted
Are we talking general IT support or just Help Desk?
Personally, I work server administration. If my servers are offline without notification, that should be a ping. If I can keep my servers going without any outages that users can see, I'm doing my job.
Unfortunately, I've had managers who've only worked with regular office workers or with programmers. They don't quite "get" what I do, so they start demanding metrics. They're uncomfortable with just leaving me/us alone and doing our jobs.
Yes it's technically fudging it
If you really are trying to track the number of calls, then it isn't fudging anything. Tracking every call can be time consuming (especially if the tools you have to do such tracking were not designed for that level of detail tracking), but how else can you analyse the data if you don't collect the data? And yes, if you track your work in too deep a level of detail, you will decrease the amount of actual work done.
This problem is fundamental to the size of an organization. In a small organization, the policy makers are close enough to the people doing the work that they can see that the workers are being productive. The boss doesn't have to come up with metrics to measure productivity, he knows whether a worker is producing.
In a large organization with multiple levels of management, the supervisor of the workers can see their productivity, but he needs a way to show this productivity to his manager, so that his manager can show the productivity to his manager. The problem is that depending upon the "product" there may be no good metrics. But you need some way to know if your organization is producing what it ought to be producing, and you need data about production to be able to improve production.
In a large organization, middle managers, and even higher managers, start using metrics to justify budgets and the existence of departments. This is not a problem of metrics, this is a problem of people using the wrong metrics (the ones that look good) for the wrong reasons (justification as opposed to analysis).
If collecting data is seriously eroding actual productivity, then management should choose a different collection method or find alternate data that can be efficiently collected to allow the necessary analysis.
In any case, using the tool that management gives you to collect the data that management wants collected is not fudging. As a responsible employee, whether worker or lower management, you ought to point out to upper management that the tool is not the best tool, or that the tool is not collecting the right data, or even that the tool is collecting too much data. But if management wants you to track calls, then tracking calls is not fudging.
When I was involved in a Help Desk, we had two different tools. One for tracking problems; and one for capturing simple data about quickly resolved calls ( 1|2|3|... minute call about e-mail|central file service|central print service|local print service|... ).
Measure in happiness of all parties. If either side, the sending or receiving side are unhappy, they either leave or get fired. It's up to management to understand and establish the balance of limits and expectations to push.
Excellent points.
....
There's the IT pro that will be able to identify the actual root complaint of an unknowledgable customer, fix the issue, and move on to the next case. And then there's the guy in [insert off-shore labor country] who costs 1/6th the wage who will run the customer around in circles for half an hour, make them irrate, tell them the issue is due to an unsupported configuration, and then disconnect them trying to transfer them to another department. While the latter case has 'closed' a case in slightly more time than the former at 1/6th the wage cost, no issue has actually been resolved and the end user's productivity is a fraction of what it could have been with more appropriate funding in the IT staff.
There's also the interpretation of the metrics. Our IT group is measured in a number of ways. One of them is the uptime of the systems we support. We manage only the servers that host the applications of the end users (code developers). The applications are beyond our scope because they are constantly in flux by the devs, and everyone knows it would be an unrealistic hope for us to manage those applications. But if the devs push out corrupt or unstable code and their application is offline for half a day, it is reported as downtime at the end of the month. The reality is that we've met our obligations (and in fact, exceeded them). The machines were always online and stable. The code the devs put their sucked, but the infrastructure was perfectly stable. But the interpretation by management is that the application was offline so we failed.
From there you could debate the cost differential between purchasing new hardware or purchasing extended warranties and service agreements on outdated equipment, and how that impacts the level of service possible for an IT department. We spend thousands of dollars per year per unit to continue warranties on 5year-old+ hardware, where we could instead spend that much and get brand new hardware that would be under warranty for free for several years to come. Accounting says we can't afford new hardware (that would be more stable, more reliable, more powerful, more manageable, more cost-effective), but we can spend even more on just the warranties purcahsed yearly for the old crap.
In the end, far too many IT departments are managed by people who have no clue about technical issues and who work from all-inclusive statements about the best-case scenarios in IT. The metrics they require you to provide (which take an appreciable percentage of your weekly man-hours to produce) will be misinterpreted (rarely in your favor) and be largely irrelevant to the actual function of your IT department. You can try to explain how the metrics actual give creedance to your beleifs on how the funding for the department should be reallocated for the sake of efficiency, but
Unfortunately in the mind of the management and accounting teams, the alternative would be to allow the black magic voodoo in the basement to continue without absolute (and faulty) quantification. That cant be allowed. Those IT freaks would start sacraficing chickens and making bonfires out of bundles of cash.
"But we have to pass the bill so that you can find out what is in it,..." - Nancy Pelosi
Not so certain the metric is stupid, it is measures something worthwhile among many other things that should be measured. Solving things quick is a good thing, but also reducing the number of problems is a good thing. Up time, reducing particular classes of calls, and probably many others are good things to measure. But, I am not sure they are even performance measruements, they are there to give the IT team information so they can reduce the occurrence of problems, not so management can hammer them or base compensation on the metrics.
The metric is valid when looking at the model where you have INCIDENT MANAGEMENT versus PROBLEM MANAGEMENT.
That first line of call-in is about making sure the human caller gets to a human as quickly as possible. Within 15 minutes flipping that call should be done OR escalated to PROBLEM MANAGEMENT. The reasoning is while you are talking with somone there is another caller trying to get a hold of someone.
Turn Around time is relevant to INCIDENT MANAGEMENT versus PROBLEM MANAGEMENT. The problem is when there is not a clear difference between incident and problem management groups.
Three metrics that are needed:
Caller Hold Time
Call Turn Over Time
Ticket Resolve Time
Hold time is the customer's experience in getting thier problem addressed. Not neccessarily resolved, but addressed.
Call Turn Over Time is key on hinting at the type of problems. If 90% of your calls are resolved in under 5 minutes, you more then likely have training issues. If 50% are resolved in the first 5 minutes and 25% are escalated to PROBLEM MANAGEMENT then you may have a process failure or technical issue.
Ticket resolve time is over all the volume of touble you have in regards to the severity of the problem. Logging 1200 hours a week of SEV1 tickets tells of serious problems verus 1200 hours a week of SEV3 or 4 problems.
Mostly management uses those metric for determining what areas need to be addressed. They are not performance metrics on their own, in fact useless for measuring performance. You would need at least the % of tickets escalated to even start determining performance.
This of couse is under the assumption of a split between INCIDENT and PROBLEM management.
-=[ Who Is John Galt? ]=-
Five good metrics:
Money saved/earned through the use of IT.
man-hours and/or dollars lost to IT issues
" per ticket
" caused by users not following policy/directions
" caused by (mis)management
There is no "-1 offended" or "-1 you don't agree with me" mod options for a reason.
Lets all set aside the PEBCAKB attitude that we all have, yes myself included when I hear the same groop of people complaining about the same stuff I want to throw the largest compaq rack server I can find right at there head.
all that being said I think we must all look at IT and MIS proformance form 2 sides 1 is up time and the other is user interaction.
1. Up-time is easy what percentage of the time is the system up or down. How often does it go down.
2. Every time there is an interaction between the IT dep. and the end users it means that something went wrong. Now insted of looking at tickets or phone calls lets look at insted the % of problems solved in some resnoble amount of time like say the same day or with in 24 hrs.
This is how I mesure my personal success.
On a completely diferent side of this issue is the problem of under funded under paid IT workers. And make no mistake we are almost all under paid. There is no business that cant bennifit form listning to it's IT workers. We usaly find problems before they even exist and then when we form a soloution we are told "no it is to expensive wait till it crasshes" then we are told in a panic by someone who can barly turn on a computer " I NEED IT WORKING NOW what do you mean it will be down for 24 - 48 hours waiting for parts. Cant you just go to bestbuy or walmart and get one...." OMFG if this happens again I will scream...
O and to all the people who run IT staffs rember if you do it right the first time you will not have the expense of doing it a second time....
We have had this problem as well at our company. We recently started using zendesk as a way to track our trouble tickets. This allows for easy web based tracking of what we work on and you can set up quality milestones and see how you measure against it.
We look for accuracy of solution, responded to in a timely manner (fixing things can take weeks sometimes), communication was kept updated, etc. The biggest thing we've noticed is since we update our tickets and the system notifies the user they feel more in the loop on what's being done and therefore are generally happy. I think most IT depts have problems keeping users informed throughout a process beucase we are too busy putting out fires and fixing things.
In the past, I worked as as a contractor in the internal (for employees only) helpdesk for a very large and famous company. For much of the time I was there, the count of tickets resolved per week was the primary measure of performance and stated by management as the deciding factor in who stayed and who went in frequent RIFs.
I don't believe the emphasis on speed contributed to better service. it was fairly obvious that any ticket that didn't look like an "easy resolve" was treated like a hot potato by the vast majority of techs, which is wrong, but understandable, since we lived and died by the numbers (I was among those who "died", eventually).
When I worked at RSA Security several years ago, the primary metric for support calls was the customer satisfaction survey. They deliberately avoided paying much attention to time-to-close because they were very aware that measuring that leads to support techs playing games with the system and rushing customers to close a ticket rather than sending them away happy and problem resolved.
Anyone who loves or hates any language, platform, or manufacturer, doesn't know what they're talking about.
The more CYA, finger pointing (FP) and douche-baggery (DBAG) done between IT managers the better we must be doing. [br] [br] All metrics are just a bastard-child of inter-management dynamics and are in themselves political tools used in executing CYA, FP and DBAG [br] Most metrics are completely divorced from the real accomplishments that IT support people achieve everyday. Nobody cares that the average time to resolution goal was exceeded by 38%. What needs to be recognized is when IT saves the business money and increases user productivity due to the individual actions of IT pros in the trenches. [br] But that simply is not possible when the accomplishments rollup to a managment that is about reports, charts and projections and arcane metric tracking.
If a business service owner signs off then what is the problem? They are the ones getting fired when it all goes to shit.
Just make sure your change management board includes them, and finance as well. If you have a change management system you can even point to the change number and the requestor and say this guy caused N million doillars worth of bad press/whatever to the share price,
It isn't ITs job to say no, it's ITs job to explain the risks.
Deleted
The problem is not just one of meeting metrics, even when there is a customer focus, there is still a trade-off between how thuroughly you fix a problem vs. how long you take (and how long your customer is without their computer/internet).
For example, in college I worked at my campus' ResNet department, and generally students would come down with problems only when issues got so bad that they couldn't start their computer. A lot of time, this could be fixed in a few hours by running chkdsk/r from recovery console. However, upon fixing the bluescreen, it became aparent that the computer was also loaded down with gobs of malware. The trouble ticket was "solved" within a few hours, but it was clear that there were more issues - do we return the laptop and check off an issue, or actually try to improve the students computer?
Our general MO was to get the computer entirely virus/malware clean and updated, but from time to time when we encountered a new piece of malware, it would take usover a week to fix things, and also there was a fair bit of redundent effort ('well, we think we cleaned this, but lets run another spybot scan, just to be sure') or people who just didn't know what they were doing. This MO was leading to far to great a focus on depth of service, and emphasising metrics actually helped us better our turnaround time, which in turn improved our reputation throughout campus, and only very rarely would we hold onto a computer for a week.
The trick is setting up a good balance between depth of service and turnaround time.
There is no metric to measure competent, responsible people. Managers should stop trying to do this.
Use your call log history.
If it cost X to do the same thing before then it should cost around X again.
We had a nice system at one place that records about 5 key fields of information. Based upon those 5 fields you could see a trend of how long it took to find a solution. It was all costing.
Funny thing was one field was the individual. A certain VP always had a problem that could be solved by going to his office and turning off the cap locks... even though he would always claim he never touched it and it must be a virus or something. Almost like the time his laptop would not boot anymore and the issue was related to the 100GB+ of porn on it. Dam viruses! he he he.
We found about 5% of the users consumed 95% of the resources.
and take a page from sales: do customer satisfaction surveys... short, 5-10 questions tops, conducted via telephone by your helpdesk staff. People rarely have the guts to put someone (or their peers) down in person. They'd much rather do it anonymously or through email (preferably indirectly through a manager or two).
Have the helpdesk smile while they're on the phone (yes, you really can tell if someone is really smiling over the phone), make sure the questions are light and to your benefit, with no techno-babble, and a simple 4-choice value for each one. Order pizza afterwards and have a pow-wow with your survey callers and get a feel for how it went. Thank them, promise nothing, and submit your (guaranteed) high scores in a pretty PDF to your overlords.
???
Bonus.
body massage!
Customer Satisfaction
Mean / Average Time to Response
Mean / Average Time to Close
Mean / Average Number of Updates to Ticket
Servers per Headcount
Ticket Responses per Headcount
Ticket Closes per Headcount
Process Failures
Cost per Server
All of these are meant to make you look for a problem and solve it whether it's a problem with a policy, process, procedure, or sometimes(but lastly considered) a staff member. Keep a continous process improvement attitude and make sure you include the front line people on your CPI team.
When I worked for an unnamed company (they're big and you see them in every movie) they based our metrics off...
;)
-dispatch rate on calls (a decent metric to determine how much troubleshooting your are doing vs. just sending a part to maybe fix a problem)
-repeat dispatch rate (really determines if you are a good troubleshooter - if you don't get it right the first time, what good are ya?
-cases closed per week (self explanatory)
-repeat call rate (when a person doesn't call you back directly even though they have your extension)
-customer satisfaction score (The survey you might get asked... there is only one question that counts--- if you give anything less than a 7 to the question "How satisfied are you with *company x* you are hurting the tech)
I could go on, but there are the ones that are somewhat in a tech's control. I get the feeling the submitter works at the same company.
You cannot improve your performance if you cannot measure it. I think that the metric of "time to resolution" is a bad first try, but the direction of thought isn't bad - if you aren't trying to game the system, you generally want to resolve issues quickly.
I think that what you really want is the length of time all tickets were opened. That way closing a ticket after resolving one minor issue only to open another one for the next minor issue does not give you an advantage.
You certainly want to divide that by the size of the group you are supporting. And you probably want to penalize issues that are affecting multiple people.
So something like: sum over all tickets (ticket_open_time * #of_people_affected^1.2) / size_of_group [ the 1.2 is a random # I pulled out of my a** ]
m
Measure availability of systems. The time a fault is opened until the fault is closed. So that if a fault is not really resolved it gets re-raised. It gets more complex, use this for traffic systems (traffic Lights in London, UK. Paper (PDF) and CCTV across London.
It is madness to measure for the sake of measuring.
I work for a company that evaluates programmer performance based on number of lines committed and total number of commits. I was actually reprimanded for not committing enough because, unlike my fellow programmers I didn't commit after every save. I have since learned better. Enter, enter, enter. Commit. Space, space, space. Commit...
Life in IT....
When things are going well.......
Business: "What the hell are we paying all these IT people for? I don't see them doing a damn thing. get rid of them.."
When things aren't going so well......
Business: "What are we paying all these IT people for? Why didn't they prevent this????"
"How do you quantify the guy who spent the weekend fixing the server?" You look at the number of times it's happened and you figure out how much it would cost to get that level of service agreement from an outside vendor.
The accountants are much more likely to be asking questions like "how would the business be affected if we outsourced IT at a cost of X, thereby allowing us to save Y in salaries, at a cost of Z in reduced productivity due to longer resolution times".
There are cases where it really doesn't make sense for a shop to handle their own IT. On the other hand, there are definitely cases where it does.
As you point out, the accountant is asking how much they could 'save' by switching vendors and having someone else manage the server. But the accountants never had the advisement of the guy who was in all weekened passed on to them from the guy's manager. The advisement was (and has been for months) that the server is a pile of outdated shit and needs to be replaced. The manager refuses to request a purchase of a new server from accounting because he knows accounting will say no, even though accounting will automatically renew the extended warranty and service contract on the old POS server without question every year. Those warranties and service contracts cost as much as a new server (every year) but that comes from a different colum on the spreadsheet.
The accountant also doesnt know that the guy that was in all weekened (again) who makes $60k knows every cable in the entire building. He knows which switch has to be tickled every Thursday at 2pm so that it doesn't reboot itself (the same switch he's requested be replaced for 6 months). He knows that a well placed 6 pack of micro brews will get the maintenance crew to -actually- do the preventative maintenance on the emergency generator that is supposed to kick in when the power fails and prevents the servers from going offline. He knows another 20 such items the accountant doesn't have a convenient colum on his spreadsheet for, so they have no 'value'. But certainly some kid with no experience and a 2 year IT degree will do just fine at $25k a year...
The point being that the accountants don't have a clue what is actually happening in the IT department. There's a good chance that even the manager of the IT department doesn't understand what's happening even if he knows the specifics. Their percieved dollar value of any particular facet of the department's function might have some merit. But collectively they don't often have a clue.
"But we have to pass the bill so that you can find out what is in it,..." - Nancy Pelosi
I can get up, go to their desk, and solve the problem permanently.
I have to admit, I am more than a little curious about whether this 'permanent solution' results in bloodshed.
Since when did smart computer literate people measure this? Last I checked it was from the computer illiterate higher ups that say 'do this' and expect it to be done yesterday. But I forgot to mention that they never told you what they wanted in the first place. Where I work, we have no budget and we are almost totally reactionary to problems because of all the nonsense work that we have to do. We even have "self turning wrenches", "mind reading toilet paper", and "self tightening lug nuts". The whole environment is 'I hate IT', 'IT doesn't know anything', 'It's ITs fault' yet everyone comes to us for answers and we deliver. This whole metrics idea was invented by people that didn't know anything and also didn't want to pay for what they have. Sorry about the rant.. Its just that no one but IT people realize the value of our skill and we are the only ones able to measure it. But we never have the authority or buying power to make the company we work for 'great' without proving why we need to spend a dime.
You should determine performance by how often you're yelled at. If you're not getting yelled at for something, ur doin' it wrong.
Here's the process they go through when they consider something like that, though:
"Let's see... that's likely 24/7/365 support, with a couple hours turnaround. We've got 20 servers at $3K/year for a service contract that does that. But I think we can get away with off-weekend and next-day support, so we can afford the $1800/server rate. Yeah, we can get rid of that guy!" ... 6 months later, the position is opened up again.
(In short, I'd like to see someone find a comparable service level to what a resident professional can provide, for even twice the cost. 90% of the time, you -still- have to troubleshoot the problem when you've got those contracts, because their 'technician' is someone who worked for 5 months during the dotcom boom as some IT Director's gopher.)
Same thing happens when a company dumps a professional for a "technician". Dump the professional and before you know it shit starts breaking due to entropy, and you've got to get someone competent back in there or the ship is sunk.
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
"Performance" can be viewed as creating value. But at what cost? In the IT world, cost and value are often measured in incommensurate units. Once you get a handle on cost you can start to tackle value. I recommend starting at http://www.usenix.org/event/lisa05/tech/couch.html
I worked for a large dataceneter/hosting company for a while a few years back. One of the most tedious, lengthy troubleshooting processes is e-mail failures, and I sort of specialized in those, therefore I didn't close as many tickets as the other TS guys. On the other hand, once the problem was fixed the customer had no need to call back. Eventually I ended up leaving the company, partly over pay and partly over dissatisfaction with the job. Unfortunately, there was no scoring system that adequately measured my contribution to customer satisfaction, so the company wasn't totally pleased with my performance either.
Ultimately, the goal of Tech support is to collect data that can be used to correct problems upstream and prevent the customer from ever having to call tech support. That is a very lofty goal, and probably unreachable in reality, but it is useful as an ideal.
Customer problems caused by features or policies in the company's offering should definitely be corrected by the company. Work-arounds should be made available as soon as the problem is detected and handled, and that information should be shared with everyone.
These types of problems should be classified as to their importance, difficulty, and lapsed time. A numerical scale can be used to score these problems. If a customer calls back with the same problem, the ticket should be re-opened. This creates an incentive to close a problem completely rather than closing incompletely-solved tickets to rack up a higher closing rate. Since more than one tech may be working on a ticket over multiple shifts, time spent on the ticket ought to be credited, and the score distributed accordingly. Common problems ought to have a troubleshooting tree or decision table for testing and resolution. These tools could be made web-available so the customer can work their own problem or work cohesively with a tech. (Once a problem has been solved, it should not need to be solved again; only administered.)
Customer tutoring will always be important. This type of tech support should not be scored at all, since customer understanding will vary the closing time of the ticket.
I propose that this allows a program of incentives to get support techs to be working in the areas they are most effective. A good tutor with good understanding of the product and good language skills should be evaluated on the time spent tutoring, and the troubleshooters should be scored on the points they earn solving a variety of problems. Obviously, some techs are going to figure out how to "Work" the system so they get more points, so there ought to be a peer score applied to determine any bonuses.
The ultimate goal should be customer satisfaction with the process. (Dell? Quickbooks? Are you LISTENING?)
The first measure of output ought to be the customer's satisfaction. However, measuring progress requires a SYSTEM. I strongly suggest a system like Kepner-Tregoe. It works well for individuals and teams, progress is easily determined, and even management can analyze the results.
I recommend, "The New Rational Manager" by Kepner and Tregoe ( http://www.kepner-tregoe.com/webstore/webstore-Pub-Software-PUB.cfm#RatMan ), and, "The Thinkers Toolkit" by Morgan Jones ( http://www.amazon.com/Thinkers-Toolkit-Powerful-Techniques-Problem/dp/0812928083/ref=sr_1_3?ie=UTF8&s=books&qid=1245180924&sr=1-3 ).
"The mind works quicker than you think!"
If teams are being measured on how quickly they close tickets as the only metric, that is what they will target in their efforts. So you will end up with a bunch of dissatisfied customers who had their tickets closed without getting their problem solved. This metric is the biggest reason call center service is so lousy for so many companies.
If your company wants to measure turnaround time, a less-direct approach is better. Something like number of tickets closed in a month with separate categories for tickets based on whether they had to be escalated or not. In addition to this, you need to measure the number of repeat calls from customers. There's always a few cranks that call over every little thing, but if you have a large number of customers calling back within a few days of their ticket closing, problems aren't getting solved the first time. This is better than just "reducing calls". A large number of calls is more likely to indicate a problem in design or production of the product rather than a problem in the call center. But repeat calls, or
A customer satisfaction survey is also useful. In general you'll only get a self-selected response from the customers who are extremely happy or extremely dissatisfied, but even the ratio between the number of responses in those two groups tells you a lot. And if you get comments back from the customers, that's even better.
A large percentage (too large) of people spend time trying to game any system. This can range legal activities like taking SAT prep classes to lobbying the government for favorable laws to illegal acts like adulterating toothpaste with ethylene glycol to reduce costs or shoplifting and returning merchandise for credit.
So think of ways someone could circumvent your metrics to boost their numbers without providing the desired customer service. For example, the repeat calls metric could be gamed if a call center operator doesn't notify (or blocks auto-notification) the customer that their first ticket has been closed. That could delay the customer calling back for status until the metric time had passed. You'd have to check the logs on the customer record to see if their email address was erased or if there was some other activity that shows a scam. Before you reward your "outstanding" employees, you need to do some cross-checking of the metrics to make sure they're real.
We are the 198 proof..
Gah...bad edit in P2. Just ignore the fragment at the end, pls.
We are the 198 proof..
Years ago I learned that most managers are so remarkably ignorant of what good IT workers do, you know preventative work that ensures users can do their jobs without interruption, that the only way to get ahead is to be a bad IT worker.
Meaning if you let all sorts of bad stuff happen and then rush in and be the savior of the day you will be rewarded with promotions and bonuses.
A few years ago, just before I left my last job, I demanded a job review having gone 6 years without one. I got to sit through my review by the VP in charge of the division I supported and my direct IT boss from Denton Tx, only to be criticized for not socializing with some techs who came up from Denton to help with a move of the office 80 mile to a new city. I had elected to do my appointed tasks for the move, baby the servers and double check backups prior to taking them down packing them up and reinstalling them in the new sites server room.
Had I done the socializing I would have ignored my duty to the corporation but not been f*cked over during the review. If the servers had not come up I would likely have been heralded as a saint had I been able to resurect them too.
It makes no sense but my advice is don't bother with looking at meaningful metrics unless it is to satisfy your own needs. Focus on the only metric management sees... Crisis frequency and crisis resolution... be the superhero!
You make many excellent points, yet I have to strongly disagree with this statement:
"everyone in the company is treated like a customer"
Unless you provide IT services to someone outside of your company, you're not working with customers, you're working with colleagues (simple test - how much is the person you're talking to paying for service?) This is a very different dynamic as a customer has less responsibility than a coworker - other people in your company have to follow policies and adhere to guidelines that customers don't. There's no real difference between someone asking for a new computer when their current system is perfectly fine and someone asking HR for an equal amount of money, yet the second request gets laughed out of the building. The real customer is the person who can make the decision whether to outsource IT or not.
Fixing this perception would get us a long way towards a better relationship between IT and non-IT parts of the business.
It's obviously better
Here, our performance is measure with this hierarchy:
First, if the number of calls exceeds total man-service hours times 10, call the situation "swamped" and ignore the rest. I.e. average call is expected to take 5 minutes with a little time for overhead.
Second is percentage of calls that are answered live (i.e. the users didn't have to leave a message and we had to get back to them) followed by the percentages of calls that are returned within: 30 minutes, 1 hour, 2 hours, 4 hours, and took longer than 4 hours.
Third is number of tickets still open at the end of the day.
Its not just computers; clever people who know the system can figure out how to game it. The problem is humans; but more importantly, its the naive reliance "systems" to manage humans that is the real problem.
Proper management should know how to measure performance using their human intelligence to subjectively manage things; yes, this requires managers to think as well as know their job-- but replacing that with a written system is likely more stupid than the manager. Middle management seems bent on not doing their job or really understanding what its purpose is; HR doesn't help any of this neither do the lawyers (a "good" lawyer is often considered one who can 'hack' the legal system.)
We had a web page where the 10,000-ish users initiated their own ticket. Or, they send an email to help@blah.com. Then we'd create a ticket for them. No ticket, no work. Next!
I like the basket ball score board that the Cisco Systems support team uses. They should add a buzzer to it when its time to take a break. Or overtime.
When people get rewarded for all the red jelly beans they have and punished for having blue ones, they get out the cans of red paint.
$ spent/user all inclusive, and user satisfaction surveys.
You should be low on the $/user and a little above middling on satisfaction. If you're too high on satisfaction, you're likely overstaffed and could get the first number down.
That's how the business is going to look at it unless your'e generating revenue. Then it's all about ROI.
the major advances in civilization are processes which all but wreck the societies in which they occur - A.N. White
Ultimately, no amount of metrics is going to save you from idiotic bosses who don't understand what the metrics actually measure, and who try to game the system. Couple of examples:
* Number of time customer spends on hold: you take the hit for an understaffed department.
* Length of Calls: you will be forced through a script that offloads the actual work to another department.
* Duration of time tickets are open: you'll get hit every time a customer leaves a ticket open, which is basically always.
* Duration of time that you work on a ticket: you'll be forced to again offload to either a different department, or provide some lack hack that will break in about 2 days.
* Customer satisfaction as measured by surveys: damn near nobody replies to them. Not to mention that there's no standard for what is good and excellent. You'll get hit by the prick customer who thinks that debugging his app on the fly is par for the course.
* Customer satisfaction as measured by renewal of licenses: you'll be at the mercy of the account managers, and at the mercy of the overall economy.
And on and on. For every single metric that you come up with, I'll show you a real-life example of how it was abused by an incompetent/malicious front-line drone, manager or executive.
Here's the only thing that'll work: settle on a metric. Get everyone - drone, manager, executive to agree on what the shortcomings are and how the metric can be gamed. Then, when it comes to review, make sure that the spreadsheet is accompanied by a discussion on what the data means, how it came about and what the root causes behind it are.
Yes, it's - almost - a pipe dream. But as much as I've seen perfectly valid metrics being ruined, I've seen sucky metrics be used to their full capability in turning a department around.
The take away is that collecting metrics is the easy part. The hard part is what to do with them.
Those who can, do. Those who can't, sue.
I am blessed to work in the IT dept of a company that "gets" it. There are metrics thrown about, but at the end of the day the only metric that carries any weight is: Did everything that needed to get done, get done?
"He was a man who knew the cost of everything and the value of nothing."
The best metric in most organizations, where IT is in a operations support role, is Quality of service. The easiest way to measure quality is via polling.
A previous position worked had a policy, which was well supported by upper management, that REQUIRED the completion of a satisfaction survey at the end of every trouble call, as well as monthly satisfaction surveys.
- A typical help desk call was followed by a 30 second phone survey
- A desk side visit was followed by a ~1min web based survey
- A project was followed with an ~5 minute web survey
- A large project that involved a project manager was followed by a 15 minute interview with those people who helped define the scope of the project.
All surveys were issued and managed by a team outside of the IT structure (though they were IT management types).
The great thing about this system was that it was easy to determine who deserved accolades and who didn't do their jobs... at the help desk level, they fired about 70% of new hires within a month or so because the customer simply didn't like their personalities or they didn't communicate well, but if you stayed on you were paid well and treated exceptionally. At the desktop support level, similar story, and finally the monthly surveys allowed overall satisfaction to be gaged.
We IT folk were also surveyed about each other, and our customers... which I felt was the most important thing. A customer who we regularly rated as polite, patient, and somehow exceptional would often be rewarded by our management (appreciation lunches, software for their personal use, USB drives for personal use, etc.) ... so the customers were rarely unfair. My boss even gave a weekend at a local trendy hotel to one customer for simply giving valuable feedback in the form of a monthly survey comment; he wrote a page detailing how our support saved the company tens of thousands of dollars because IT showed him how to link data from an SQL database into Excel where he was able to better analyze it... this was done by a desktop guy who happened to work late at someones desk and was walking through the building turning off lights. Needless to say, the desktop guy was handsomely rewarded for simply giving a damn.
I realize that it sounds idealistic, and in many ways it was. Sure time was still an issue, as users would deduct for a slow response, but it was only part of the story. Most importantly was that everyone was motivated to be respectful because we all knew that any negative trends would be caught and questioned.
As I understand it, it was hardest to determine if we were too large as a department... sure the customer will be happy if they have their own dedicated IT person... but it's just not cost effective. I was not privy to how this was measured, but I imagine they had their ways.
Sometimes the best solution is to stop wasting time looking for an easy solution.
Some realistic metrics...
Repeat Issues..
Ineffective IT will see an issue that reoccurs and never fixes the source of the issue instead supports and extends the issue rather than fixing it.. and if you think there are issues that cannot be resolved.. then your not using the right tool or educating people correctly..
Resolutions that worked.
If someone comes to you with an issue and you get them to "try" 22 "things" and the problem still persists.. You have either not asked the correct information.. Received the correct information.. or you do not understand the issue and are trying the pin the tail on the donkey approach of support which makes more issues than it solves.
PEBKAC Factor
User training issues need to be identified. Admin wants to see IT be more efficient.. educate users so IT staff does explain to someone how to use the tools they need to know how to use to do their job properly.. Mechanic's that would call to ask which way tightens and which way loosens should never pick up a wrench to work on a car.. why do we let people that don't don't understand there is a right click and a left click work on computers..
If you see a theme.. its because there is.. reduced call volumes.. That is effective IT and it needs to be looked at differently than most anything else because its a dynamic environment. Increased Call volumes come from change and decreased call volumes come from people working effectively..
Who needs WiFi when we can have Packet Over Sheep! http://datacomm.org/PoS-InternetDraft.txt
An IT-department, IMHO, should be working on making itself obsolete.
I disagree.
As an inspirational aside, consider Peel's Principles about what an ethical police force is.
Police, at all times, should maintain a relationship with the public that gives reality to the historic tradition that the police are the public and the public are the police; the police being only members of the public who are paid to give full-time attention to duties which are incumbent upon every citizen in the interests of community welfare and existence.
I must've skipped the history lesson where all the other kids were told that the public is the police and vice versa, so I don't have much to say about it.
What I think is relevant is the idea of paying someone to pay full-time attention to a particular task even if others could do it. It takes time filling out order forms for more backup tapes and drives for the storage system. It takes time repairing the tape robot (or being the tape monkey). It takes time installing ssh-tunneled IRC daemons and mail servers and... whatever other tasks the IT department is tasked with.
Plus, I question the degree to which you can obviate the need for specialized knowledge. Even if it's a piece of cake installing ssh-tunneled IRC daemons, knowing that you want that rather than jabbber daemons (or sending internal memos through MSN servers!) is not trivial.
Any department should work on (1) doing what it's there to do; and (2) do it more effectively. But its tasks (presumably) need to be done. By dissolving the department, you only shift the tasks onto someone else. Is that really the smart(est) thing to do?
A nice metric might be the count of tickets that are never opened.
I agree with this, wholeheartedly! So does Robert Peel:
The test of police efficiency is the absence of crime and disorder, not the visible evidence of police action in dealing with it.
My users resent the fact that their bosses don't know how to measure their competence, just like my IT co-workers. Over time (It's taken me most of a decade.) I've played on my customers shared irritation at the PHBs and convinced them to conspire with me to game the system. They call me first, not the help desk. I fix their problem. Then, they open a ticket, I instantly assign it to myself, document, and close. Voila - super-fast closure.
OK, this is an overstatement. I mostly work the tickets I'm assigned. But in nearly every case where I get a chance (those "meet in the hallway" requests for help, mostly) I'll try to run the procedures I outlined in my first paragraph.
You should have a lot more to measure than just the time it takes to close a ticket.
Your metrics should be used to measure individual performance as well as pinpoint needed training--and has the added value of highlighting the needy/silly users, too.
Our company measured:
--time to respond (as in how busy we were; do we need to hire more people?)
--type of problem (as a measure of how long it should take tech to resolve)
--time to resolve (as in does the tech know his stuff)
--number of repeat issues (are we really fixing the problem, i.e., providing a permanent solution vs. masking the symptoms; do we need more training on this issue?)
--number of repeat calls from the same user (are we blowing him off? is user too stupid to understand the solution?)
Metrics, if used correctly, are a great tool for understanding an employee's strengths and weaknesses. Also for getting a handle on how well you are communicating with your users.
Hello, tech support:
http://www.prometheus-music.com/audio/techsupt.mp3
If you're a zombie and you know it, bite your friend!
Absolutely. The problem isn't a technical issue per se. The problem here is customer service, or the lack of it.
Regardless of the technical steps required to assist you with the issue, the ticket should have been left open. Closure of a ticket should only happen once the technician has receive an e-mail or voice verification that the problem has been resolved to a satisfactory level.
An internal IT department should be run like a business within a business. Sending out surveys to all employees is a good measure of how well you are seen and felt throughout the company. If there are any problems, you work them out internally. If they reveal your department in a positive light, can the CFO and the rest of the bean counters really provide a counter argument to your relevancy?
Life is not for the lazy.
We have 5 different people, all of which do things 5 different ways. On one end of the scale you have folks who address issues as they are brought to light, on the other end of the scale you have folks who work to resolve issues before them come to light. What's worked for me is looking over my trouble tickets to see if there is a pattern. Users having issues with an application? Ok, let's look at that further. Is it due to a troublesome application or lack of user training/understanding? If it's a troublesome app I look at getting the problem resolved. If it's something that I can duplicate I go to the vendor with it and ask them to resolve it w/o my having to purchase an upgrade if possible. Training has always been an issue, our hiring process says that users need to have an understanding of Windows XP and Office 2003 along with basic internet/email skills. It's right there, plain and simple. Often this part is ignored - they'll ask the user if they can use Windows/Office/etc and they always say yes. I end up kicking that back to HR asking them to define use - hell, my kid was moving the mouse around and randomly typing on the keyboard when she was 2. For many of our apps I've written basic training documentation, that seems to help. I also try to be proactive in regards to security. I check our AV logs daily and whenever a new patch is released for a product we use I throw it on the test box to see how it plays with what we were running. If it passes I'll apply it - not too hard to do. Write a patchlink script or just deploy it ASAP. Some of my workers wait until out monthly security report is due then they scramble to get caught up. I've also worked to close a lot of security holes. Email is one - let's see....no reason for users to email .bat, .exe, etc...so I block them. Mailing lists are locked down to members only, everything else has to be approved. Earlier this year a greeting card link that contained Trojan.Vundo hit the mail system. I saw the first one come through from an outside source, which it blocked. I then wrote a filter to reject the content of it. None of my 300+ users received it. The others? Many people ended up clicking on the link and ended up with downtime, a couple of places were so infected that we dropped their network connection until they cleaned up.
The folks above me have different metrics. My boss has a motto - "due diligence". When Vundo started taking over machines we had a conference call and had to report in. When I had to share my experience I said "what trojan? I blocked that thing at 5am, it's been quiet out here". Bad move....I was scolded for not doing my due diligence. He'd rather have us step in and work all night to clean up a mess than to prevent the mess in the first place. Yeah, show me how that works. While you guys were working 18 hour days cleaning up the mess I worked an 8 hour day then went out to a nice quiet dinner.
Basically you could get the developers level and separeted then, something like junior, medium and senior, after that you can use some metrics do have one avarege time for each one in basics and advanced tasks and using it as base you might create one table with hours for each sort of service. However never forget that to give maintenance in systems that other person did, will take some extra time to got all programming logic even with documentation. Did you ever heard about agile methodology ? If no is a good point for you start study the times that each one takes, SCRUM is a good point to start. Because all senior developers have in mind how time each task will take, and talking with then you will might have the basis for your study.
If they are responsible as well for implementing or administrating the system generating the tickets, then arrival rates need to also be measured. You don't want the team creating easy to fix outages so that they can bias the metrics.
In general you need to try to work out what behaviors a given metric will tend to, or could, produce, and you need to combine the metric with elements that measure unwanted behaviors. You want the resulting metrics to be scale free, so that the metric cannot be gamed simply by changing some parameter.
Simple example, suppsoe you measure a maintenance team by how many new regressions they create (per month say) in the maintained system. The team can get zero (which in this case is good) by never fixing any existing bugs). So as a metric this is useless. Instead of course, you should be measuring regressions per bug fixed. This is scale free because you can measure over one month or one year and the size of the values will not change just because the period changed.
Squirrel!
My company has multiple IT departments and all are allowed to come up with their own metrics, so I have seen a few. The most brilliant metric I have seen is to force all customers to attach a dollar value to every request (how much do you estimate the company will benefit from doing this). The items with the largest value are done first and then the values of completed items are added up to show how much money IT is making/saving for the company. Of course these numbers very loosely based on reality to begin with and then when customers figure out that larger numbers gets their thing done first you can guess what happens. They never have problems getting funded, although someone might figure it out when they start making/saving more then he gross income of the company.
Performance == money OR at least it used to be that way. Seriously, there is so much wrong in "performance" in IT today that it isn't even funny! I myself miss the days when IT was for profit, a profit center with own budget, autonomy, etc as other business units in any company / corporation. It really changed when "kids" came to this business, all they wanted was eight to five, a paycheck, a managers blessing for their existence, a carrot once or twice a year, you know? Some of them carried grades from economy schools, had degrees in statistics, even had courses in speaking and were able to convince the middle management that instead of positive buddgets the paper metrics were the way to promotions, etc - the top management really didn't and doesn't have time to look details so anything what looks good must be good?
Seen these "performance metrics", "performance reports", "performance evaluations" (by managers who once a year need to know what their subordinates do - very weird?), "performance statistics" (you know, statistics don't lie!), and so on over years - have seen the results, I'd give about (at most) five years to any organization / department which starts that way, then there will be reorganization, termination, whatever - seen that about 20 times in small and large corporations over 40 years in IT.
Amazingly, not so much in IT (excluding very few) but in other organizations which are still profit centers, they still are going strong?
Maybe ticketing systems should allow the instigator/customer to connect a new ticket to an old one, creating one long ticket. If you give a customer the run around, or don't bother trying to find out what they need, then all the wasted time gets added to your metric.
A good metric is counting the wtf per minute that the customer shouts on the telephone. Less is better.
At my company, IT Performance is not measured at all! :D
Well, that's why you don't have just one KPI. If the second KPI is customer satisfaction, one of two things happen:
1) You get 100's of complaints to your boss, getting you fired. Congrats, you win!
2) If the whole IT org does this, the CEO at ops staff gets complaints from every GM, resulting in a 20% budget cut of IT, specifically tech support. When IT looks at the worst offenders by complaints, you are one of the 20%. Congrats, you win!
The good metric for any service (IT or not) someone provides is:
"Full costs / Best external alternative full costs" (less than 1 is better)
Hey you forgot quality!!! We do it far much better than others!!!
Yeah, sure... I mean "external ALTERNATIVE full costs" that is: full detailed acceptance of current service level agreements (even currently non paid ones).
When external provider agrees to do exactly the same service, price uses to raise even higher than internal actual costs. Don't forget they are a business too, and they have to earn some money. Their commercial margin plus the change expenses should be the leap among numerator and denominator.
External service survey is not something to be scard of. We should be the ones to push for periodic checkings to know how good or bad we are.
Yes, definitely. Furthermore, you should be focused on receiving calls about NEW issues all the time. Receiving calls on known issues over and over means the root causes aren't being fixed by IT. Service Desk managers mistakenly use "First Call Resolution Rate" as a metric - how many calls did the Service Desk resolve on the same call that the user registered the issue with. This is a false possitive - if this is "nice and high" like so many misguided managers want, all it means is that you employ a bunch of robots who are answering the same question over and over and over, and IT still sucks.
>>>My question is: How is your IT performance measured, and how do you think it should be measured?"
Malcolm Fry has written and presented extensively on the common-sense metrics for an IT service desk to track for all IT operational processes.
Some paraphrased excerpts from memory for INCIDENT MGMT:
Total number of incidents - over time, broken down by day and time-of-day. Use to predict and manage workload and staff at baseline.
Mean elapsed time to achieve incident resolution or circumvention - Also for managing workload and staff, because if it takes longer, you need more staff. Notice you don't pick an arbitrary time for a call to last - it lasts however long it has to last to get the customer working again.
First call resolution - long touted as a "great" service desk metric if it is "nice and high", this number should be low and plateau low eventually if PROBLEM MANAGEMENT is doing its job - fixing root causes. The service desk should have to solve/dispatch NEW and UNKNOWN issues more often than being a robot, since that says IT is solving old and existing problems before new ones crop up - PROACTIVENESS. Correlate this metric with the Number of Incidents over time to see if IT fixing things allows you to ramp down Service Desk staff.
There are ways to use metrics to improve the org performance, but ignorant managers frequently use metrics for personal gain and not organizational gain. They should have their bonuses withheld automagically.
I have worked at a tier one place for over 2 years now and have to say you always want to cut down on return calls etc but by definition it is not tier 1 job to do this, it is there job to gather information , document information , and try and resolve any issues you can with in a short period of time, the reason for this is because tier 1 is the primary point of contact for pretty much the whole network. There simply has to be time limits in place to insure work flow.Although they don't deploy one at my current job i am a big fan of the ACD systems as a metric for performance and not closed tickets. If there is a recurring issue or problem it is tier 2's job to resolve it. When this breakdown or line in the hierarchy is crossed the team cannot function at maximum efficiency. So my point is if you cant fix something in the allotted time you are given it needs to be escalated if not (exceeding time limits consistently) you are causing a huge obstacle for not only your team but the departments work flow as well.
Reduction of tickets is a nice idea... However, users can be idiots and will always generate stupid tickets such as: My printer is broke (printer actually says replace toner). My computer is locked (screen saver kicked in and requires user to unlock) The carpet needs cleaned (yes, I actually received that ticket) The light is off in my cubicle (that one too). Will you fix my personal computer from home? (um, NO!). The testing stations (computers by reception for tests) are not working (computers were at login screen for windows). I have not received any email in an hour, is it broke? (No, you are just not that important today). on and on and on. If we want to reduce tickets, we would have to shoot all the Lusers.
Rather than a numeric metric calculated from automated systems, just send internal customer satisfaction surveys to employees. Have some numeric question ("Overall how satisfied are you...") and lots of opportunity for people to write things out ("What problems did you think IT could have addressed better?") Even if you don't employ the sophisticated techniques available to collect meaningful, accurate data, you will certainly learn more than just looking for a time metric.