The Four Fallacies of IT Metrics

← Back to Stories (view on slashdot.org)

The Four Fallacies of IT Metrics

Posted by samzenpus on Wednesday December 14, 2011 @01:03PM from the more-or-less-than-the-sum-of-the-numbers dept.

snydeq writes "Advice Line's Bob Lewis discusses an all-too-familiar IT mistake: the use of incidents resolved per analyst per week as a metric for assessing help-desk performance. 'If you managed the help desk in question or worked on it as an analyst, would you resist the temptation to ask every friend you had in the business to call in on a regular basis with easy-to-fix problems? Maybe you would. I'm guessing that if you resisted the temptation, not only would you be the exception, but you'd be the exception most likely to be included in the next round of layoffs,' Lewis writes. 'The fact of the matter is it's a lot easier to get metrics wrong than right, and the damage done from getting them wrong usually exceeds the potential benefit from getting them right.' In other words, when it comes to IT metrics, you get what you measure — that's the risk you take."

22 of 223 comments (clear)

Min score:

Reason:

Sort:

Business planning by InsightIn140Bytes · 2011-12-14 13:03 · Score: 5, Insightful

It's bad business planning, but it's also the way any big name linux distroy works. Something not working on your Red Hat Linux? No problem, call us! And that's how they make money. They make money on the promise of fixing problems, and that includes saying that their OS is broken.
1. Re:Business planning by fsckmnky · 2011-12-14 13:08 · Score: 5, Insightful
  
  SCO was famous for this. $5,000 minimum support contracts, with $1,000 per incident fees, whether they fixed it or not.
  
  "Thank you for calling SCO may I help you ?"
  
  "Yeah, my manufacturing plant just shut down because your kernel panic'd."
  
  "We're sorry to hear that, but you have the newest version, so there are no updates you can apply to resolve the issue. ($1,000 cha ching)"
2. Re:Business planning by AdamWill · 2011-12-14 13:21 · Score: 5, Insightful
  
  Support doesn't just mean 'fixing bugs'. It also means 'helping you set things up right', 'helping you optimize your configuration', 'helping you figure out what tool you need for the job at hand', and so on.
  Selling support does not require that the underlying product be broken.
3. Re:Business planning by Anonymous Coward · 2011-12-14 14:57 · Score: 5, Insightful
  
  Flipping burgers ain't bad if you own the restaurant.
4. Re:Business planning by Gription · 2011-12-14 17:41 · Score: 5, Funny
  
  I used to have two standard replys to the, "It's broken" type of complaint.
  - "How can you tell? Is there an axe sticking out of it?"
  and
  - "How can you tell? Is it on fire?"
  
  One day I had this young kid came up to me saying, "My computer is broken." so of course I respond, "How can you tell? Is it on fire?"
  He looked a bit embarrassed and said, "Well it was smoking and made a buzzing sound but it has stopped now."
  His one day old computer's power supply had burned up in a spectacular fashion.
  
  (Still waiting to see an axe...)
5. Re:Business planning by FairAndHateful · 2011-12-14 17:44 · Score: 5, Interesting
  
  Support... ... also means 'helping you set things up right', 'helping you optimize your configuration', 'helping you figure out what tool you need for the job at hand', and so on.
  Worked at a support center... I was a "talk to them until they understand" guy, playing the long game... I figured while it might not take every time, if I got people to understand, they could get back to work and not break things for just a little bit longer. You know, it costs two people money if they have to talk to me while I help them.
  One of my coworkers got huge amounts of management praise for processing lots and lots of cases... My management was too dumb to run numbers on how many callbacks he had, that the rest of us were fixing...
  Yeah, sure I was spending too much time with each person, but half of my time was fixing this jerk's mistakes. There's probably some of that at every support center. It takes 10 minutes to fix a problem, but 5 minutes to get them to go away. You can look very busy by making them go away, if management isn't clever enough.
  I'm rather happy with my new position... I get to review other people. And I do it fairly.
6. Re:Business planning by justsayin · 2011-12-15 00:53 · Score: 5, Insightful
  
  My father was an auto and large truck mechanic, pretty good one too. He had three questions you can ask to begin diags on pretty much any system with humans involved.
  
  1) Did it ever work?
  2) When did it quit?
  3) What have you done to it lately?
  
  Pretty much the foundation of my IT Career right there.
7. Re:Business planning by nahdude812 · 2011-12-15 02:55 · Score: 5, Interesting
  
  The major fallacy many big companies fall into is that some of these systems have been running flawlessly for years, because they hired a competent IT staff. They look at the price of those paychecks and shiver. Why are we paying so many high priced engineers when we've never had a problem, they think.
  So they reduce staff and start to rely on support contracts instead of on-site gurus. The gurus are still there to solve any oh-shit moments. But that back investment in good engineers has produced a stable infrastructure that runs with few problems for years. So they reduce staff more, pay for more support contracts, and eventually the system critical mass is greater than the engineers who can support it. It's no problem until it's a problem.
  Eventually something minor goes wrong, but nobody notices or if they do it's not really their field of expertise so they don't understand it's minor now but could escalate. When it does, something else goes wrong, and a cascade effect takes out more and more systems. With a full staff, you have enough guys that when the critical mass is reached, they can start defensive measures and get things back in working order in no time. With support staff only, things are going wrong faster than they can deal with it.
  "Call on our support contracts," shout the bosses! So now your on-site staff are all on hold instead of troubleshooting. When they get through to someone, they have to spend the first hour or two describing their infrastructure to the technician on the other end, who starts making random suggestions that maybe help, but probably don't.
  My anecdote on this front is a company I used to work for. It's a long read, but demonstrates the failures at several levels which is the direct result of this kind of thinking. The Oracle transaction log disk was getting full. Some warnings came in, but disks running low on space was an every day occurrence, we'll send an email to the person on record as being responsible for those servers, and troubleshoot why the "Executive Dashboard" is responding a bit slow today (it's for the execs, it's automatically high priority). Except that person is currently aboard an airplane on his way to help reduce staff in east Asia, he'll be incommunicado for the next 19 hours or more.
  It seems like an innocent enough problem, it's just a log disk, the worst thing that could happen is we lose some logs, right? Whoops, transaction logs are pretty important for Oracle. The fact that the disk is filling up at all is itself an indicator that something bigger is wrong; this shouldn't happen. But critically once the disk does fill up, Oracle will enter read-only mode. Or it should. This time it doesn't, it shuts down. BOOM, offline. So down goes SAP. With SAP down, our entire business is offline. We can't take orders, we can't ship orders, we can't pay bills, we can't pay paychecks, the hourly workers whose shift is starting can't even clock in. Some buildings with tighter badge access can't even be entered unless someone inside opens an emergency door to let someone in.
  Once the transaction log disk was full, Oracle will no longer start up, it needs some space on the log disk to log startup-related transactions. Two hours on hold with Oracle Gold Pressed Latinum level support they finally get an engineer. Wow, this is something he's never seen before, Oracle should have gone into read-only mode before this happened! The only solution anyone can seem to think of is to get some bigger disks for the transaction logs, clone the data over to these new disks and give the startup another go. We have hot spares on a shelf, but nobody knows this. Finding disks requires a different support contract, they can have disks out to us tomorrow. Yeah, that's not going to cut it. Someone literally drives out to a distribution warehouse. Two more hours down (they actually send two different guys in different cars with instructions to take different routes in case one runs into traffic or gets in an acciden
  
  --
  Slay a dragon... over lunch!
Any metric can be gamed by russotto · 2011-12-14 13:09 · Score: 5, Insightful

Losers realize this simple fact, instantly think of several ways to game the metric, then don't do it figuring that "obviously" the decisionmakers realize the metric is horribly broken. Then they get laid off. Winners spend hours, days, or weeks coming up with one way to game the metric, pat themselves on the back for being so clever, and do it. Then they get promoted, eventually to a position where they come up with metrics of their own.
1. Re:Any metric can be gamed by Hatta · 2011-12-14 13:12 · Score: 5, Insightful
  
  Unfortunately, this is true. Evil will always triumph because good is dumb.
  
  --
  Give me Classic Slashdot or give me death!
2. Re:Any metric can be gamed by AdamWill · 2011-12-14 13:23 · Score: 5, Insightful
  
  Yeah, this is pretty much the problem. Performance evaluation should really be done by crazy, high-tech methods such as you and your peers and manager sitting down and discussing what you've achieved, but that kind of thing is way too hard to stick into an Excel macro, after all...
  Another classic example: call centres which measure 'performance' mainly by the average call time metric. Which gives tech support workers all the incentive in the world to give out any piece of bogus advice that'll get the customer to hang up as fast as possible. Or just hang up on them, if the phone system isn't sophisticated enough to detect it.
3. Re:Any metric can be gamed by Ethanol-fueled · 2011-12-14 13:46 · Score: 5, Insightful
  
  As a widget-fixer, our analog is obviously how much shit we can fix in a given time frame. One of the biggest mistakes I've seen in multiple companies(ranging from laptop to medical device repair) is that the PHB keeps a board or chart showing how many widgets each tech fixed during a given timespan.
  
  Any idiot can see that the misguided sweatshop-style metrics cause the following problems:
  
  Cherry-picking - Techs choosing and even stashing away (!) the returns with the easiest and quickest problems to fix. It matters not that your expensive gadget has been sitting there for a month, there are numbers to be made and we'll get to yours when we want to regardless of the order they came in.
  
  Racing - When there are no "easy" ones to be cherry-picked, then the techs will race to fix your item. They will ignore problems and cut corners on others. Stripped screw hole? Super-glue the screw in. Low output? Game the settings so the tests will pass. Part shortage? Cannibalize and rob Peter to pay Paul in a hardware-sort of Ponzi-scheme.
  Status Quo and mediocrity - The top performers will become accustomed to the attaboys and will continue to produce slipshod repairs, even if there is a slowdown in work when they can do their job right. Meanwhile, the low performers will become used to it and feel no need to better their work.
  
  My idiot boss in the company I'm in now considered it and was shot down by every tech. In this company, due to the variety of products, one person could make tens of thousands of dollars with 1-2 days work while another tech working on a different product will have to spend more labor and overhead juggling external vendors and all the headaches it involves only to make a couple thousand dollars. Yeah.
  
  Fortunately, the consultants we brought in are smart. They listed generic milestones and a cheeky "100%" as the goal with the smiling disclaimer that it will probably never happen.
4. Re:Any metric can be gamed by inviolet · 2011-12-14 14:34 · Score: 5, Insightful
  
  Losers realize this simple fact, instantly think of several ways to game the metric, then don't do it figuring that "obviously" the decisionmakers realize the metric is horribly broken. Then they get laid off. Winners spend hours, days, or weeks coming up with one way to game the metric, pat themselves on the back for being so clever, and do it. Then they get promoted, eventually to a position where they come up with metrics of their own.
  It's not just IT. Our entire society has converted over to metrics. An easy example comes to mind: the stock market versus a company's quarterly performance. Another set of particularly nasty examples is found in our justice system: police officers evaluated by their number of citations, prosecutors by their number of convictions, prisons by their dollars per inmate per day.
  I get the financial impetus to switch to metrics. Where it used to be one skilled manager overseeing per 5-7 employees, it can now be one schmuck manager with an Excel spreadsheet overseeing 30 employees.
  I even get the psychological impetus. Numbers give us that all-important feeling of certainty, and at low cost too... while the traditional alternative requires legwork, mindwork, judgment, contemplation, and mistakes.
  But it's wrecking our society.
  
  --
  FATMOUSE + YOU = FATMOUSE
5. Re:Any metric can be gamed by DigiShaman · 2011-12-14 16:48 · Score: 5, Interesting
  
  I work for an MSP (Manage Service Provider). We account for time every 15 minutes. Inactive, internal department active, billable active, and non-billable active. All of this logging of time gets calculated out as metrics that define our bonus. So the outcome is pretty much as you've stated. But that's ok, we know how the metric get calculated and thus we game the system of metric without cheating our clients out of money. Naturally, that would be dishonest to do otherwise. But I'll be damned if I sit back and be judged and taken advantage of by some MBA that can't even interoperate the concept of what those numbers are supposed to mean in the first place. They only need to know two things. Is the work billable to the client, and how much. They're free to speak to a manager if they wish to contest the hours performed and/or quality of work. The point is, we want their business. So it serves no point to lose clients for us.
  It will get worse I hear. Rumor has it we will be timed every 5 minutes with a USB activity button. Sort of like a Chess timer or some such. Also, our keyboards will be logged for activity and application fields will track mouse moment and other activity. It's absolutely nuts. At this rate, they'll need to hire me a secratary just to do the logging for me while I focus on actual work. Hey, now that's cost effective right? I bet they didn't think of that, did they. Doh!
  
  --
  Life is not for the lazy.
Dilbert Minivan by tedgyz · 2011-12-14 13:14 · Score: 5, Funny

This problem was aptly portrayed in the classic dilbert comic strip in 1995.
I'm going to code myself a minivan.

--
"No matter where you go, there you are." -- Buckaroo Banzai
ain't pretty. by Caerdwyn · 2011-12-14 13:17 · Score: 5, Insightful

Such metrisc also disincentivize people taking proactive steps to reduce the number of incoming tickets (i.e. making the system/environment more robust or your users more educated), and disincentivizes managers for so doing by reducing the number of people needed to service incoming tickets (thus reducing the size of the empire and the pay grade of the manager).
I've seen both "disincentives" in action. It ain't pretty.

--
Everybody gets what the majority deserves.
This makes me sad by multiben · 2011-12-14 13:22 · Score: 5, Interesting

Metrics are great for some things. For making sure that your employees are working they are terrible. I used to work in a metric free environment and there was a great team atmosphere. Then metrics came along and it all went to hell. Now everyone is so focussed on making their numbers look good that the whole organisation is suffering from a weird sense of internal competitiveness. People no longer collaborate on difficult problems because there is no measure within the metrics system to reflect that this occurred. People who used to be innovative are no longer so, because they are not rewarded for spending time innovating. It has achieved nothing good that I can see.
Re:Who does this? by wisnoskij · 2011-12-14 13:32 · Score: 5, Insightful

Probably the same people who consider number lines of code written per hour as a good metric to evaluate their employees productivity.

--
Troll is not a replacement for I disagree.
"That which gets measured gets fudged." by bfwebster · 2011-12-14 13:35 · Score: 5, Informative

The quote above is from Jerry Weinberg, and it is true.
There's an entire brilliant, short book about this problem: Measuring and Managing Performance in Organizations by Robert Austin (1996). It's actually a fairly rigorous, somewhat philosophical work, but it is pretty unrelenting to documenting that, indeed, trying to manage by metrics almost always introduces distortions, which in turn are almost always counter-productive. The problem isn't just with IT, it's with any type of effort that seeks to reward or punish based on metrics.
The only metrics that I've found actually useful in IT are those that are predictive -- for example, aiding to estimate the actual delivery date of a project under development. The metrics that seek to somehow measure "accomplishments to date" solely for the purpose of reward or punishment are always gamed and are almost always useless. ..bruce..

--
Bruce F. Webster (brucefwebster.com)
Re:My metrics are superior. by Ethanol-fueled · 2011-12-14 14:03 · Score: 5, Interesting

Good. glad to see that some VP did the smart thing for once and cut the middle managers instead of the people who actually get the work done.
You get what you reward by perpenso · 2011-12-14 14:03 · Score: 5, Insightful

Its not just the losers. Talented and rational technicians and engineers bend to the rules of the system too. Basically you get what you incentivize, what your reward. If you reward people for complying to some metric then they will generally comply. It does not matter what everyone agrees is right, it does matter if management says quality is important. If the metric decides whether you get to keep your job or get that raise then the metric is what the company gets regardless of what the company asks for or whether the company's goals are actually advanced.
There is no such thing as bad statistics by msobkow · 2011-12-14 17:41 · Score: 5, Interesting

Be warned: my example is way off topic, but a pet statistic I keep track of.
There is no such things as bad statistics, only bad layman statisticians who don't understand what the numbers actually measure.
Take lines of code, for example. Some people hate it because you can bloat the numbers by adding comments, neglecting to consider how useful those comments are for future maintenance, and thereby a useful application of a developer's time. If you use a consistent formatting style for two projects, you can get a fair grasp of their complexity from the line count, though that will gloss over details about how the code actually works.
The most interesting pattern I've notice in line counts over the years is that the use of templates and other code abstraction facilities really hasn't decreased the size of code much at all, though it's improved readability, maintainability, and programmer API usability substantially. So line counts only give you an approximation of complexity with a language like Java, but do nothing to measure the quality of the code.
One other thing I've found is that complex code looks fat and heavy from it's sheer size, but often compiles to very reasonable executable size and runs rings around supposedly "tight" code that makes heavy use of dynamic techniques like introspection. As only one image of an executable is loaded by a reasonably competent OS, a fat binary does not mean a fat application at runtime.
Big code is only scary if it's not following recognizable patterns and is instead a mishmash of different developer's pet syntax, algorithms, style conventions, naming conventions, and even preferred APIs. If you manufacture it predictably, fat source code becomes a joy to maintain, enhance, and use.
But back to the core topic: help desk performance.
The only help desk stat I care about is a low number on customer complaint reports about the quality of information and assistance provided by the tech team. If it's my company and my budget, I'd rather hire more technicians to handle the load and produce happy customers in the end than I would saving money by overworking and burning them out by even thinking about useless numbers like "calls handled per week."
In the end, if you care about your business, the only thing that truly matters are happy customers who want more services or products in the future, and who will gladly tell others about their good experiences in dealing with you.
There is no substitute for a good word-of-mouth reputation and repeat business. No one ever got fired for buying IBM not because they're perfect, but because their people will go the extra mile to make things work.

--
I do not fail; I succeed at finding out what does not work.