Slashdot Mirror


Risk Management - A Cautionary Tale

Mr. Ghost writes "By now many people have heard about the fiasco and financial blunder Comair had over the 2004 Christmas holiday. An article on CIO provides a timeline of the decisions that led up to the system failure costing the division of Delta Airlines $20 million. The article points out the need for proper risk management and what can occur when a risk analysis is not performed or ignored. It goes on to mention that although this was a very public failure, this type of system failure can occur in other companies." From the article: "The prospect of replacing the ever-maturing crew management system was floated again the following year, with plans laid out to select a vendor in 2000. But that didn't happen. Over the next several years, Comair's corporate leadership was distracted by a sequence of tumultuous events..."

8 of 203 comments (clear)

  1. Re:Why didn't the CIO yell louder? by Anonymous Coward · · Score: 2, Informative

    I can't help but commiserate with the folks at Comair. Technology projects can be hard enough without having to deal with labor unions - which is really key to understanding Comair's problem. I was the project manager in the late 90's at TWA, hired to implement just a portion of what Comair is trying to replace. Scheduling systems hit at the heart of the pilot's work rules and they won't give up a single work rule without a fight. That was true even when the union was the instigator of the change. Even after the agreement on work rules, there were unique training issues, legal agreements, and of course egos. Pilots are a very confident class of people (great skill for flying planes) and that confidence is evident even when they are negotiating on things for which they have little knowledge. It is very hard to get an agreement on how to change the airline.
    My project was to implement a new scheduling system for the pilots. It eventually took a complete restart on the project and a little over 3 years. I had to do things as a project manager that I would have never dreamed would be part of a technology project. I gave speeches to the Union governing council and was part of the official negotiation team. One year of that project was used in just negotiations with the pilot union.
    I ended up both loving and hating that project. I even quit at one time, but came back after a few weeks. I was constantly frustrated by the lack of progress that was being made in negotiations, the feeling that I was the whipping boy, and the anger that was projected at me by some who thought they were being "forced" to make a change. Even near the end of the project, my boss commented that he was ready to give me a "real" project as soon as I wrapped that one up. No matter what happened, no one believed that it was all that tough. If you think it is tough to turn a company, try doing it with a union. On the other side of the spectrum, though, is that we ended up with a successful installation. We took a survey (2 months into the roll out) of the pilots and the union agreed that we got a 94% acceptance rating for that project. I also had to admit that I grew a lot on that project -- as a PM and as a person. I believe that a lot of my people, negotiating, finance and legal skills are all due to that project.
    So, given the struggle that it takes to get the unions to agree to even mutually beneficial change, the company is left in the position of trying to get old work rules to fit with modern technology. That is the wrong way to do it and they will find themselves starting over several times before they realize that, as painful as it might be, you have to update the rules if you want to update the technology. The PM and CIO has to learn to be a salesman, negotiator and technocrat all at the same time.
    Overall, I feel sorry for the Comair CIO. Your project has about a zero percent chance of succeeding unless you have just the right business people tied into the project to pull it off.
    Darrell Hamilton Strategic Director LabCorp


    Blame the unions!!!
    Everything is their fault, right?

  2. Article text by daVinci1980 · · Score: 4, Informative

    Site is already sluggish.

    Bound To Fail
    The crash of a critical legacy system at Comair is a classic risk management mistake that cost the airline $20 million and badly damaged its reputation.
    BY STEPHANIE OVERBY

    When Eric Bardes joined the Comair IT department in 1997, one of the very first meetings he attended was called to address the replacement of an aging legacy system the regional airline utilized to manage flight crews. The application, from SBS International, was one of the oldest in the company (11 years old at the time), was written in Fortran (which no one at Comair was fluent in) and was the only system left that ran on the airline's old IBM AIX platform (all other applications ran on HP Unix).

    SBS came in to make a pitch for its new Maestro crew management software. One of the flight crew supervisors at the meeting had used Maestro, a first-generation Windows application, at a previous job. He found it clumsy, to put it kindly. "He said he wouldn't wish the application on his worst enemy," Bardes recalls. The existing crew management system wasn't exactly elegant, but all the business users had grown adept at operating it, and a great number of Comair's existing business processes had sprung from it. The consensus at the meeting was that if Comair was going to shoulder the expense of replacing the old crew management system, it should wait for a more satisfactory substitute to come along.

    And wait they did. The prospect of replacing the ever-maturing crew management system was floated again the following year, with plans laid out to select a vendor in 2000. But that didn't happen. Over the next several years, Comair's corporate leadership was distracted by a sequence of tumultuous events: managing the approach of Y2K, the purchase of the independent carrier by Delta in 2000, a pilot strike that grounded the airline in 2001, and finally, 9/11 and the ensuing downturn that ravaged the airline industry.

    A replacement system from Sabre Airline Solutions was finally approved last year, but the switch didn't happen soon enough. Over the holidays, the legacy system failed, bringing down the entire airline, canceling or delaying 3,900 flights, and stranding nearly 200,000 passengers. The network crash cost Comair and its parent company, Delta Air Lines, $20 million, damaged the airline's reputation and prompted an investigation by the Department of Transportation.

    Chances are, the whole mess could have been avoided if Comair or Delta had done a comprehensive analysis of the risk that this critical system posed to the airline's daily operations and had taken steps to mitigate that risk. But a look inside Comair reveals that senior executives there did not consider a replacement system an urgent priority, and IT did little to disrupt that sense of complacency. Though everyone seemed to know that there was a need to deal with the aging applications and architecture that supported the growing regional carrier--and the company even created a five-year strategic plan for just that purpose--a lack of urgency prevailed.

    After the acquisition by Delta, former employees say Comair IT executives didn't do the kind of thorough management analysis that might have persuaded the parent airline to invest in a replacement system before it was too late. Instead, Delta kept a lid on capital expenditures at Comair, with unfortunate consequences. The failure of the almost 20-year-old scheduling system not only saddled Delta with a plethora of customer service and financial headaches that the airline could ill afford but it also provides a cautionary tale for any company that thinks it can operate on its legacy systems for just...one...more...day.

    The five-year plan that wasn't
    Today, Cincinnati-based Comair is a regional airline that operates in 117 cities and carries about 30,000 passengers on 1,130 flights a day, with three or four crew members on each. But back in 1984, when Jim Dublikar joined the company as director of finance and risk management, Comair had

    --
    I currently have no clever signature witicism to add here.
  3. Re:risk management 101 by linuxbert · · Score: 3, Informative

    You preform a TRA - Threat and risk Assesment. and you are quite right, it is a profession all of its own.

    for the do it yourselfers : http://www.cse-cst.gc.ca/en/publications/gov_pubs/ itsg/itsg04.html Grab the Pdf, and it will guide you through the process.

  4. Re:Why did this system fail? by Jayfar · · Score: 4, Informative
    The article conveniently leaves out the reason for the failure.

    No, the article conveniently explained that the sw had a limit of 32000 schedule changes per month. A severe winter storm necessitated enough changes to make the system fall over.

  5. Re:Yep by code_chick · · Score: 2, Informative

    sorry - I mistakenly drifted into the IT section of slashdot... You IT guys are all so threatened by real developers! (since you're all just developer want-a-bee's) And a female developer - that's the scariest of all! I won't make this mistake again... I wouldn't want to subject you to crying in the fetal position.

  6. What if they won't listen? by swb · · Score: 3, Informative

    I work in a business that isn't defined by technology (at least not historically), and I don't think that management actually listens or comprehends when it comes to a lot of IT issues.

    When they do listen, they tend to reduce it to profit/loss and destroy the subtlety of the information and its meaning. CIOs that "push" issues, especially when they're expensive, tend to get canned as gadflys, big spenders or for not being "team players".

    When it comes to technology, managers often don't care and don't want to know, except when it costs money.

  7. Re:software decays by qwijibo · · Score: 2, Informative

    The problem can also occur because the original application is tested against the real system, not the documented API. So a bug fix to the underlying system can both be correcting a bug and create an application error.

    Throwaway systems are cost effective in the short term. That makes them popular with people who look at this quarter's stock price as both a goal an duration of their attention.

  8. 32767 may be a lot, but it's not enough by Anonymous Coward · · Score: 1, Informative

    IT'S OBVIOUSLY NOT ENOUGH .

    The root cause was not a problem with too much data.

    The root cause was not addressing the problem of what would happen when an undefined amount of data was fed into the system.

    I'm personally getting sick of programmers blaming something in "the data" for their code puking on its shoes. No input no matter how insane should ever cause your system to fail to do what it was designed to do! If your code can't handle incorrect input, spit out an error and move on. Don't crash. Don't stop processing data. Keep working properly!!!!

    Data from an uncontrolled source can never be trusted. Anyone who depends on certain characteristics or amounts of uncontrolled data inputs is a f*****g idiot.

    And I'm not sorry at all if my standards are to high for you.