Vendors Take Blame For Most Data Center Incidents
dcblogs writes "External forces who work on the customer's data center or supply equipment to it, including manufacturers, vendors, factory representatives, installers, integrators, and other third parties were responsible for 50% to 60% of abnormal incidents reported in a data center, according to Uptime Institute, which has been collecting data since 1994. Over the last three years, Uptime found that 34% of the abnormal incidents in 2009 were attributed to operations staff, followed by 41% in 2010, and 40% last year. Some 5% to 8% of the incidents each year were tied to things like sabotage, outside fires, other tenants in a shared facility. But when an abnormal incident leads to a major outage that causes a data center failure, internal staff gets the majority of blame. 'It's the design, manufacturing, installation processes that leave banana peels behind and the operators who slip and fall on them,' said Hank Seader, managing principal research and education at Uptime."
Let's remember that the first outage was caused by an external force also--a moth.
I think it's time to switch to Gamemaker. Have to face the music some day, yes?
I'm sure outside forces installing things are disruptive. But then are they the primary forces doing installations in general? And if that's the case, then it would be more appropriate to call them simply installation related issues... and that's both common and to be expected.
Install anything new and teething issues tend to crop up.
I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
They need to monitor and control their vendors/contracts/etc better.
---- Booth was a patriot ----
It sounds like this is just some kind tool to show that "it's not our fault, really" -- but at the end of the day, aren't the internal staff responsible for managing the "outside forces" up to and including setting standards, supervision, etc?
Or is this one of those deals where so much it oursourced that it's easy for everyone to deny culpability?
Corporate America loves to outsource. Not because it's efficient or cheap, but because it provides someone to blame!
Outsource the network to one firm, the generator to another, the HVAC to a third. Hire temp contract lackeys to staff the place, and rent-a-cops to "guard" it. Then, when something goes wrong, blame them. If it's a big enough issue fire them and replace them with the next batch of people who won't be trained, won't care, and will eventually screw up.
This article isn't illuminating, it's simply restating the design parameters of the system!
80-90% of abnormal incidents caused by vendors was the previous vendors fault.
When a company tries to get around minimum wage laws by hiring low-paid monkeys to do their design, manufacturing, and installation, they get exactly what they deserve.
My favorite is getting notifications that all our servers went offline. Now typically, that would be at the network (ISP) level. So come to find out later that the entire facility lost power. Apparently they performed an internally scheduled UPS test without letting us know before hand. Well, they completed the test alright. It was a failure.
In that whole event, we ended up with dirty NTFS volumes that needed to have chkdsk ran and one or two servers with a failed drive in their respective RAID5 arrays. Not happy!
Life is not for the lazy.
If you let them in your datacenter, it's your fault if anything goes wrong in there.
If your vendor botched a deployment or delivers a functionally useless product, it's your fault for buying into their marketing campaign and not understanding what you just got yourself into.
But mostly, I think the blame system was by design here...Hire someone else to do the job for everything possible. Fire them/drop contracts when they don't work for you, then file insurance claims to compensate (plus extra if you do it right) for the damages. The trick is to keep the damages rolling as expected--enough to keep insurance revenues up, but not enough so that your premiums adjust to make it unprofitable.
Abby Normal
That's why we pay service contracts. If covering your ass wasn't a part of business, we'd all be using free or OSS tools.
i don't know but i have been taking these pills, my wife is happy, and they were recommended to me by the Uptime institute as well, so this study must be close.
contractors and sub contractors add middle man and overhead.
Some times to the point where a sub may get a job with little to documentation or a job with poor or bad documentation.
Or a sub may hit a issue and have to work though alot middle man off site managers to get things fixed or just be told do as the documentation says and we will have to get a other contract to fix it.
rotating contracts leads to people with no knowledge of the site and more errors as people get up to speed.
When a data center is working on another company’s server then the one that they should be working on?
http://thedailywtf.com/Comments/Remotely-Incompetent.aspx
My experience working at a technical support center for a major OS software vendor is with outsourcing IT staff overseas. These so-called IT professionals were barely functional and created the vast majority of their own problems. I saw major outages in corporations in North America caused by mis-steps in trying to rectify what started as a minor issue. It was always annoying to deal with an IT admin and their issues when his manager constantly whispering into his ear and monitoring the call. Very often asked the tech to ask his manager to go for a coffee break :) as they were not helping the issue.
Problem is outsourcing to the lowest cost without qualifying the abilities of staff.
The headline said "blame", not "credit". I think the former is more relevant here.
Back in the prehistoric days a group of us were sitting in a bull-pen outside the datacenter. There were big windows on the datacenter wall so we could all ooh & ahh at the blinky lights on the servers and switches. Suddenly, my workstation froze - and when I (and every other person in the bullpen) yelled and looked up, we saw our network admin standing in the datacenter looking back at us with a "What?" look on his face. In his hand was the Ethernet cable he had just pulled out of a core switch...
Is this surprising? The vendors/contractors do more of the risky work. When it comes time for UPS maintenance, our vendor comes in to take the UPS offline and do the work. If they screw up when they bypass the UPS, they can take down the datacenter. Likewise, when it comes time to add a new disk tray to the storage system or replace a failed controller board, instead of having our staff do it (who may add one tray every year if that), we have the vendor do it, so there's more chance of him doing the wrong thing and bringing down our storage system -- but there's less chance of the vendor causing a problem than our own staff since the vendor's engineer does this twice a week.
Which is PERFECT if you don't want things to work correctly in the first place! Good products delivered efficiently become cheap.
:)~
Its about profit maximization. More fuckups = more billable hours and expenses to pass on to the customer
Quality in a data center, or any facility for that matter, depends on controlling the processes within that facility. If vendors have signed on to working within the procedures developed by the data center operators, fine. There should be minimal problems. But if vendors are allowed on the property to do work not covered by these plans and controls, antics will ensue.
There is nothing inherently wrong with bringing in outside vendors. As long as their function has been planned for. And there is some means to hold those vendors to working within that plan. But all too often, data center managers overlook certain functions in their procedures. Like installation and commissioning new equipment, for example. So when these operations become necessary, people are brought in (or the task is handed to in house techs) with insufficient directions on how to proceed. The difference between vendors and your own techs is that vendors come in familiar with their own equipment, but unfamiliar with data center processes. In house technicians have the opposite problem. They know their way around the facility, but not so much the equipment. Either way, somebody is going to need training.
So, do you train your people on functions that they'll rarely have to perform? Or do you expect vendors to learn your processes when they may not return for months or years.
Have gnu, will travel.
But with the possible exception of a meteor strike, there's always someone to blame for a data center problem.
I always blame Anonymous Coward. He's the one that failed to order the meteor sheilds.
now we need to go OSS in diesel cars
Several years ago, I was working a support case with a major bank. Their remote storage mirroring between BFE, [Southwest State Here] and BFE, [Flyover Country State here] failed, and they wanted to know why. I obtain SAN switch logs from both fabrics and attempt to troubleshoot the issue. The logs revealed that the network ports dropped offline one by one, about 5-7 seconds apart, and then the problem hit the other switch. They came back online one-by-one about three minutes later. The ports were scattered all over the respective switches.
I inform the customer of my findings and am informed that there happened to be somebody working on the cabling in that exact same rack cabinet, but he swears he didn't touch anything having to do with these cables and that the problem MUST be within our hardware. I inform the customer that hardware or software issues do not spread to random ports within a switch, and then to a switch that has NOTHING in common besides a nearby rack cabinet, and ONLY affect a particular group of ports that are otherwise completely randomly spread throughout the switch. (We are talking good old-fashioned light loss here... not some esoteric failure that could be caused by software.)
The customer replies: "We'll be having a "discussion" with that cabling contractor."
Just last year we had a team of painters come in and paint the inside of the data center.(I have no idea why) One morning we get a call from the monitoring team saying a server went down. Shortly after, another server randomly went down. We go back to the computer room to figure out what was going on and immediately see the painters had covered a server rack with plastic sheets from head to toe. We quickly uncover the rack and hear every fan on all severs screaming for mercy. Later on, watching the security videos, we even saw them walking on top of the covered racks to paint.