Ideal, and Actual, IT Performance Metrics?
An anonymous reader writes "Recently it was revealed that our company measures IT performance by the time it takes to close trouble tickets. I consider IT's primary goal to be as transparent to the user as possible, thus this metric was rather troubling to me. Shouldn't we be focused on reducing calls, rather than simply closing them quickly?
My question is: How is your IT performance measured, and how do you think it should be measured?"
We usually try to measure how many libraries of congress we can get to the new blade server in under 5 minutes.
our best is 12.
Non impediti ratione cogitationus.
I thought IT got paid for the number of times they said 'No' to us during the day.
go figure.
They Live, We Sleep
Customer Satisfaction, and pro-active problem solving
I will not give in to the terrorists. I will not become fearful.
...by the number of callers left alive at the end of the day.
I think poster has a point.
A nice metric might be the count of tickets that are never opened.
An IT-department, IMHO, should be working on making itself obsolete.
I consider IT's primary goal to be as transparent to the user as possible, thus this metric was rather troubling to me. Shouldn't we be focused on reducing calls, rather than simply closing them quickly?
Not for "stupid" users, the ones you see on a day-to-day basis. Now, this all depends on who you are giving support to, competent IT professionals or the day-to-day office worker. If you are giving them to fellow IT people, it should be a goal to be transparent. For the office worker the main job is productivity, that means fix the problem as soon as possible or tell them there is no problem and have a good day.
Taxation is legalized theft, no more, no less.
But it's close. Of course, closed tickets are something a manager can measure. Needless to say, it measures nothing meaningful. For example, I tell a customer to reboot. Close the ticket. That takes little time and closed the ticket fast. In fact, I can improve my metrics by telling that same person to do this ever 4 hours for several years. OR, I can get up, go to their desk, and solve the problem permanently. It takes longer, making my metrics look bad, but in reality-land (a land far, far away from management land), that person is doing productive work longer and more efficiently because the interruption and downtime have been removed.
Please do not read this sig. Thank you.
s/metrics/bullcrap
A good metric should be
1 - Enterprisy looking
2 - Easy to gamble by the interested
Your boss wants a number, give it to them quickly. It's all BS (or 99% of it at least. Don't agree? Do the job then) in the end.
So good metrics could be.
- Unplanned downtime
- Number of users, number of bytes used, etc (that plots a nice ascending graph, and ASCENDING IS GOOD, you can print that and put it in the wall)
If they stay on 'time to close the ticket' NEEDINFO and WORKSFORME is your friend.
how long until
Amount of service calls resolved: h
Server/network downtime (in hours): d
Use formula '(s / h) + 2d"
Use resulting number to chart IT support performance, assuming that the network + server uptime and stability is more important than user inconvenience. You could decide that anything above a certain threshold is too much, or use it to compare personnel with each other.
Yet Another Tech Blog
(but so much more, including game and movie reviews)
http://yanteb.peasantoid.org
For example, for every fax successfully sent via the fax server without IT intervention, the IT department gets one point.
For every fax that needs IT intervention to be sent, the IT department loses one point.
For every person who becomes aware of a problem with the fax server, the IT department loses one point. No more "heroics". The goal is to be as invisible as possible to the end users.
And similar items for every other server/service that IT supports. If nothing else, it will show exactly where the problems really are.
In my department, we have an agreement with the rest of the company outlining the level of service that must be performed within a pre-determined amount of time, based on incident priority. With the right tools, it's fairly easy to track the percentage of incidents resolved within the terms of the SLA.
"Ask not what your country can do for you." --John F. Kennedy
I think that when the metric is to reduce the number of calls, the natural human tendency is to ignore calls, shift calls to other people, etc. to make it look like you're doing better when you're not.
So that's why most people look at your find versus fix ratio, the number of bugs you find versus the number you fix / the length of time it takes to fix them. It's not great to have zillions of issues, but you should always try to fix the issues as quickly as possible.
In the low-level government job I suffered through for 2 miserable years, IT performance was measured by presence in your chair. If you kept the chair at a satisfactory egg-hatching temperature, and never made your presence otherwise known, you were a star. If you did work, you were a source of trouble.
There's one metric that can capture everything:
Bits of Shannon entropy processed per hour.
My only political goal is to see to it that no political party achieves its goals.
Yeah, that's pretty much it. Managers and executives can't handle anything that doesn't have a nice, neat, single number that tells them everything they need to know without having to actually know what's going on.
You can count calls, count time spent on calls, how long it takes between when a call is received and a tech is dispatched. You can count how many devices you have deployed in the field. All of these numbers tell you different things, and not one of them tells you much of anything by itself. Management needs to actually be in touch with the field and truly understand what's going on in their IT department, otherwise all those numbers are pretty meaningless.
I don't know about you, but my servers run on the power of cotton candy and happy thoughts. -Anonymous Coward
Shouldn't we be focused on reducing calls, rather than simply closing them quickly?
We should be focussed on both.
My question is: How is your IT performance measured, and how do you think it should be measured?
ITIL principles are a great starting point.
Examples are using Key Performance Indicators (KPIs) such as at the bottom of this page and this page.
At my former employer, customers would call the national helpdesk, who were rated by their time on a call. Let me tell you, the type of customer service you get from that environment is crap. They would have the customer reboot their machine, and if that didn't work, they would escalate the call to a state level operations center that could dispatch technicians (where I worked). They were, for the most part, useless. They made the customers angry, and really served no purpose other than a filter.
Management gets the behavior that it rewards, not necessarily the behavior that it pretends to ask for
I remember it as being 10% of the people (whiners) take up 90% of the support time.
Whenever I see a metric that measures quantity instead of quality, that tells me the manager gets a bonus. Hopefully, you're getting a piece of that bonus.
Problem is most PHB's CANT understand that.
They think that solving it faster = better. and if the number of calls goes down, you are doing something wrong.
we were unable to fight it at comcast as the idiot CTO demanded it. so I instructed my guys to put in a ticket for EVERYTHING and if it was a instant fix, open the ticket and close it in 1 minute.
Before I left my department got an award as our ticket accounting was 4X higher than the rest of the division and we had the lowest time to resolution. Problem is that Remedy sucks so it actually slowed down my guys having to enter all those useless tickets.
Yes it's technically fudging it, but the executive in charge was not listening to any of us so we skewed it to our favor.
Do not look at laser with remaining good eye.
Sounds pretty normal for a call center. At my last job management got excited if a case was open for over two weeks regardless if the issue was resolved or not. That's what I call great customer service!
Here is the problem... you are trying to assign arbitrary numbers to something that cannot be measured. These are numbers for accountants, they want one number to be able to show them where to cut cost. Problem is that there is no way to quantify how much money an IT department saves a company. Metrics have gotten out of control in this country. We are always measuring the cost and never measuring the value. How do you assign a number to a person who is not a number? How do you quantify the guy who spent all weekend fixing the server? How do you quantify the accrued knowledge of a human being? It impossible to do. The accountants never ask questions like, "How would my quality of life be affected if I couldn't get effective tech support?", "How much money would the company loose if these computers and programs didn't exist?". You need to measure the man and his work as a whole, person to person.
Ideal:
I know about IT, having worked there for many years. In fact, I'm still working there. Keep up the good work, I know there's a lot of bullshit to put up with.
Actual:
I heard some buzzwords. When can we implement them in order to actualize our potential? Also, I'll need you to stay late and fix my computer. It's got some sort of virus or something I don't know.
(As you're fixing his machine, you see a note on his desk right next to the post-it with his passwords)
Hire grad student from India [x]
Get what's his name to train him. [ ]
Fire what's his name. [ ]
Synergize. [ ]
I think the average time taken to close a trouble ticket is important, but it's not the only factor you want to look at.
The primary purpose of issuing unique trouble ticket numbers is to provide an easy "one stop" tracking mechanism for the issue. A customer (or employee) should always be able to reference a ticket # to support staff, and in turn, they should be able to pull up a fairly comprehensive history of what's been done so far to resolve the issue.
If you push too hard for closing tickets quickly, you'll see a tendency for new tickets to get issued on things which should REALLY be continuations of an existing ticket, held open longer.
(EG. I call in complaining that my inkjet printer won't print yellow. A ticket is created and they tell me my color cartridge is clogged up, so put a new one in and I should be fine. Ticket is closed. I switch cartridges with a new one, and discover it STILL doesn't print yellow. I call in and a new ticket is made for what's really the same issue. I'm told how to run the printer through cleaning cycles, and instructed that I may have to do it "up to 10 times" to see results. Ticket closed. I get around to trying that the next day when I get time, and even after 10 or 15 attempts, no yellow is coming out. I call back in, only to have ANOTHER new ticket opened, and the tech wastes my time asking me if I "tried a new cartridge yet?" and I have to interrupt him in the middle of re-explaining how to do a cleaning cycle. Problem is eventually determined to require a replacement printer ... but should obviously have all been filed under one ticket.)
Our competitors measure their performance by time to close tickets. They are consistently rated worst in support. We use surveys. Simple questions like: Was your problem resolved? Was it resolved promptly? We are consistently rated best in support.
more cowbell
Bouncing customers is a good way to keep them from calling back -- grandma is much more likely to phone up 'lil Tim for computer advice if she knows the hotline tech is going to bounce her to ten different places; where I work, we get a good bit of troubleshooting work because the customers hate calling the hotlines provided by the manufacturer. Sadly, annoying your customers is a good way to keep them from calling back, and as long as your product is good enough people will still pay-up. E.g. I'm screwed into Suddenlink where I live. After being promised $85.01 TV/Net, I got a $100.00 bill because of hidden fees. Guess what -- I'm screwed into paying, because the only alternative (Cox) was bought out by Suddenlink.
"Sorrow is better than laughter, for by sadness of face the heart is made glad." [Ecclesiastes 7:3]
that your organization has made your job measurable. It does not matter what they measure your performance by, as long as it is something tangible.
So, you get payed by how many tickets you managed to close in a month. Fine. So, you close as many as you can in a month, resulting in lower quality of each problem fix, resulting in more tickets posted and assigned to you, resulting in you having ensured that next month you have enough tickets as well.
This can go on indefinitely, or your wise superiors might decide to measure your work somehow else.
As the island of our knowledge grows, so does the shore of our ignorance.
Prior to my software company being bought out, my It department was focused on "customer service." This means that everyone in the company is treated like a customer. I personally work in our software support department and this made utter sense to me.
Under the new company, our new IT works for itself, and primarily is concerned with closing calls as quickly as possible, without regard for the quality of the information or assistance. They are concerned with reducing their own call load, but they don't try very hard, and they don't offer a lot of value over that. Any good customer service department is concerned with closing calls, but they want provide good quality service where each call is resolved as quickly as possible, but also as accurately as possible and leaving a good feeling with the customer. IT should be a resource utilitized to make the company more efficient and reduce costs, not a bunch of yahoos who fix broken PCs and then disappear back under their rock when they are finished.
In customer service, quantitative metrics are used to judge the department trends as a whole, and can be important, but even more important art qualitative measures, like surveys and feedback, example cases, and periodic reviews of every rep, team leader and supervisor. Did the rep do "The Right Thing" (tm) and how many times did they do that, and are they approaching doing the right thing 100% of the time? If a rep provided the user with the right answer, but all they did was email a timid accountant a 5 page document on setting up .NET properly just so the user can properly export his reports to an email to his boss, and then the rep closed the case and offered this less than technical person any real help, how service oriented is that, really?
Sometimes that means taking fewer cases per rep and leaving them open longer, if service improves dramatically.
"All great wisdom is contained in .signature files"
I don't say "no" any longer. I ask them what their budget is for accomplishing the task they want.
me: "How much do you have budgeted for this project"
them: "Budget? You mean it costs money? I thought you could do this for free"
me: "We can't do that for free" (laughing to myself the whole time) .... later they come back ...
them: "We have $400 for the project"
me: "Does that include the licensing? Does that include ongoing support? Does that include setup, training, and installation of new infrastructure needed to support your project?"
them: "Uh, no. What do you mean?"
me: "Well, when you want a project ... say for a new building, do you just present $400 and say can you build the building for that?"
them: "Well, no, we have professional architects design the building, then we have professional contractors bid on the project, then we included additional maintenance in the budget for the new building and .... "
me: "So, what you are saying is that you don't view IT as being professional"
them: "No no no no! That's not what I mean at all."
me: "So, how come you just expected us to do what you wanted without asking us what it would take to do it?"
them: "Because it is too expensive when I do ask that"
me: "It is more expensive to do things right. If you want to do it wrong, any non-professional can quote you a lower price. You can get a building and have it built a lot less expensive if you don't hire Architects and Contractors to design and build a building, and it will get built, but it will be missing things you probably want and need. But you know this, and that is why you trust those professionals."
them: "yes, but you are too expensive"
me: "Then the answer is no"
---
Sometimes it is just easier to say "NO". The sad fact is, people don't respect IT professionals AS professionals. We often don't deserve it either, but that is another topic.
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
I think you're stretching things a bit.
"How do you quantify the guy who spent all weekend fixing the server?" You look at the number of times it's happened and you figure out how much it would cost to get that level of service agreement from an outside vendor.
The accountants are much more likely to be asking questions like "how would the business be affected if we outsourced IT at a cost of X, thereby allowing us to save Y in salaries, at a cost of Z in reduced productivity due to longer resolution times".
There are cases where it really doesn't make sense for a shop to handle their own IT. On the other hand, there are definitely cases where it does.
We are big on SLAs. Department directors have to sign off on an SLA before IT will support their stuff. Actually this is how IT gets it's budget.
For example, marketing comes to IT and asks for a service like sales tracking. After figuring out what they want we give them a quote with SLA and how much it will cost. After buildout there is a sign off and the service is available for use. To the users there is no concept of hardware of server. They just know if their stuff is working or not. I mean they are marketing people. Any problems that occur are tracked by our ticketing system, and its just a matter of tracking resolution time, incident severity and number of incidents. All of this is defined in the SLA. Resolution time usually comes into play when looking at service availability, and in the incident review process for high or critical outages.
For our team individual performance usually comes down to how well we contribute to the team. My review is not that much different from a kindergarden report card. "Plays well with others" is now "Maintains positive relationships with external partners"
Stop asking to do stupid things like
- run an internet server without a firewall
- Setup accounts without passwords
- Use 1-off proprietary software when we've selected the best solution for everyone in the company. Too bad our selection costs 3x more than the other stuff.
- Bring a 64-way server up without a fail over, test, dev, and DR instances too.
- Bring a 32-way server up this week, when your project hasn't been approved yet. These things take about a month to get delivered and another month to get installed, configured, connected to the SAN and ready for applications
- Allow an outsourced vendor unlimited access to internal networks with 10,000+ servers without a corp-2-corp VPN in place.
- Send and accept unlimited sized emails without any virus and malware checks.
- Demand something fast because YOU didn't schedule and budget properly - MARKETING, this is for you.
- Run a machine that will be hacked easily and turned into a torrent, porn, music, VoIP server a few months after it gets placed onto the network.
Stupid metrics are part of the problem. When I worked for Gateway, they wanted your call average to be between 7 and 11 minutes. If you went above for the week/month, you were too slow and bad at your job. If you went below, you were probably just getting people off the phone without solving their problems.
That metric worked for most people, because they talk slow and have to look up every single issue.
For me, it was killer. I was consistently getting 5 minutes averages, even with that inevitable once-a-day 1-hour phone call. I got reprimanded twice about it before I gave up and quit. Almost every caller was happy with how I helped them. The others couldn't be helped, or I made a mistake. (I told a guy he could clean his keyboard, once... They had switched to keyboards that fall apart if you try to open them, apparently. In my defense, I had offered to send one, but the guy thought cleaning it would be a lot faster.)
Also note that a certain percentage of calls were recorded and reviewed, and I -never- got talked to about any of my calls. The only complaint I had was the keyboard guy. And yet I still got yelled at for short call times.
Again, stupid metrics are stupid. Call-time has nothing to do with customer satisfaction.
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
It's IT management from a wholistic point of view.
SLAs are only one aspect of IT management.
There is no point measuring something unless you are going to do something with the information. Are your metrics getting better because things are getting better or are you just getting better at fighting the same old problems. Are you measuring a metric because it's easy to meassure or because the business needs that metric to be good?
Ultimately the idea is to get incidents themelves to zero because that means a smoothly running infrastructure operating exactly as the users and business expect it to. Not exactly possible, but at least it provides a direction to move in... And if your incident management system is any good, it'll tell you where the problems are, and where money should be spent to fix them. That may be user training, education on the portfolio of services that IT provide, or replacing a critical application that falls over every 10 minutes or is too slow, etc etc.
Deleted
The metric is valid when looking at the model where you have INCIDENT MANAGEMENT versus PROBLEM MANAGEMENT.
That first line of call-in is about making sure the human caller gets to a human as quickly as possible. Within 15 minutes flipping that call should be done OR escalated to PROBLEM MANAGEMENT. The reasoning is while you are talking with somone there is another caller trying to get a hold of someone.
Turn Around time is relevant to INCIDENT MANAGEMENT versus PROBLEM MANAGEMENT. The problem is when there is not a clear difference between incident and problem management groups.
Three metrics that are needed:
Caller Hold Time
Call Turn Over Time
Ticket Resolve Time
Hold time is the customer's experience in getting thier problem addressed. Not neccessarily resolved, but addressed.
Call Turn Over Time is key on hinting at the type of problems. If 90% of your calls are resolved in under 5 minutes, you more then likely have training issues. If 50% are resolved in the first 5 minutes and 25% are escalated to PROBLEM MANAGEMENT then you may have a process failure or technical issue.
Ticket resolve time is over all the volume of touble you have in regards to the severity of the problem. Logging 1200 hours a week of SEV1 tickets tells of serious problems verus 1200 hours a week of SEV3 or 4 problems.
Mostly management uses those metric for determining what areas need to be addressed. They are not performance metrics on their own, in fact useless for measuring performance. You would need at least the % of tickets escalated to even start determining performance.
This of couse is under the assumption of a split between INCIDENT and PROBLEM management.
-=[ Who Is John Galt? ]=-
If a business service owner signs off then what is the problem? They are the ones getting fired when it all goes to shit.
Just make sure your change management board includes them, and finance as well. If you have a change management system you can even point to the change number and the requestor and say this guy caused N million doillars worth of bad press/whatever to the share price,
It isn't ITs job to say no, it's ITs job to explain the risks.
Deleted
Years ago I learned that most managers are so remarkably ignorant of what good IT workers do, you know preventative work that ensures users can do their jobs without interruption, that the only way to get ahead is to be a bad IT worker.
Meaning if you let all sorts of bad stuff happen and then rush in and be the savior of the day you will be rewarded with promotions and bonuses.
A few years ago, just before I left my last job, I demanded a job review having gone 6 years without one. I got to sit through my review by the VP in charge of the division I supported and my direct IT boss from Denton Tx, only to be criticized for not socializing with some techs who came up from Denton to help with a move of the office 80 mile to a new city. I had elected to do my appointed tasks for the move, baby the servers and double check backups prior to taking them down packing them up and reinstalling them in the new sites server room.
Had I done the socializing I would have ignored my duty to the corporation but not been f*cked over during the review. If the servers had not come up I would likely have been heralded as a saint had I been able to resurect them too.
It makes no sense but my advice is don't bother with looking at meaningful metrics unless it is to satisfy your own needs. Focus on the only metric management sees... Crisis frequency and crisis resolution... be the superhero!
There are three types of companies.
Generic
Brands with no real value
Brands with a good reputation, worthy of trust.
A Generic company honestly should not give a crap about user satisfaction. Your goal should be to spend as little $/user as possible without resulting in lawsuits. You are selling crap and people are buying it because it is cheap. They won't care about customer satisfaction, otherwise they would have bought a brand name.
You correctly describe a no real value brand. Middle satisfaction is the goal.
But a real brand (like Apple) needs HIGH satisfaction. They want it as high as possible. This sell their product at a premium, using their reputation as the reason. They need people to say it is worth spending more for their product.
Similarly, the same thing works for a company that wishes to maintain a good reputation as being on the cutting edge. If you are a law firm, like say the one I work in, and you are trying to attract the best lawyers, it helps a TON to be known as someone with the best stuff and 'it just works.' (to copy a certain slogan). You can't do that if people go around saying nothing works and IT doesn't get back to you. You need IT to solve your problems ASAP, keeping all employees happy and not thinking "This wouldn't happen if I took that job at XXXX"
excitingthingstodo.blogspot.com