Slashdot Mirror


More Airline Outages Seen As Carriers Grapple With Aging Technology (reuters.com)

An anonymous reader writes: Airlines will likely suffer more disruptions like the one that grounded about 2,000 Delta flights this week because major carriers have not invested enough to overhaul reservations systems based on technology dating to the 1960s, airline industry and technology experts told Reuters. Airlines have spent heavily to introduce new features such as automated check-in kiosks, real-time luggage tracking and slick mobile apps. But they have avoided the steep cost of rebuilding their reservations systems from the ground up, former airline executives said. Scott Nason, former chief information officer at American Airlines Group Inc, said long-term investments in computer technology were a tough sell when he worked there. "Most airlines were on the verge of going out of business for many years, so investment of any kind had to have short pay-back periods," said Nason, who left American in 2009 and is now an independent consultant. The reservations systems of the biggest carriers mostly run on a specialized IBM operating system known as Transaction Processing Facility, or TPF. It was designed in the 1960s to process large numbers of transactions quickly and is still updated by IBM, which did a major rewrite of the operating system about a decade ago.

25 of 145 comments (clear)

  1. Dumb by geek · · Score: 5, Insightful

    "Most airlines were on the verge of going out of business for many years, so investment of any kind had to have short pay-back periods,"

    You really only see this type of thinking in the West. Most sensible companies know that when times are good, you build a war chest, when they are bad you invest the war chest to grow your business and be competitive. The problem wasn't that times were bad. You can always say times are bad. The problem was that they didn't make the best of things when times were good, and therefore deserve the cluster fuck situation they are in now.

    1. Re:Dumb by prograsm · · Score: 5, Insightful

      Agreed. Aging tech isn't the problem here, a complete inability to listen to or fund IT is the problem here. If they had a usable rolling backup system, it wouldn't matter how old everything is. If they had all brand new equipment and no functional load balancing system to compensate for outages that will always be a potential issue, they would still be offline for as long as it takes to fix everything. I have a hard time believing the words "off site redundancy" never came up in any IT budget meetings over the past half century, so their failures are 100% bad business decisions not IT issues. It would be no different if they had refused to budget for more fuel than exactly as much as predicted they would need. Tblaming the aircraft rather than the person that made the stupid decision to run out of fuel wouldn't make sense. It only works here because people don't understand IT, and the people that chose to allow outages like these aren't willing to admit it so they will repeat them again.

    2. Re:Dumb by clodney · · Score: 4, Insightful

      From what I have read, this was not an obvious WTF moment. Delta apparently has a complete disaster recovery facility with duplicate hardware. But they had a single point of failure in their infrastructure, which caused them to lose power to the entire datacenter, and everything went down. That part might be a WTF. But once they got everything booted up again, they had to contend with trying to get a system restarted that simply wasn't designed to ever fail completely. So it took hours to get all the pieces back up and communicating again.

      Then their are the real world problems - flight A feeds into flight B, but flight A was late, meaning all those connections were missed and passengers have to be rebooked. And flight B can't fly anyway, because the plane is still sitting 500 miles away because the flight that would bring it to this airport was cancelled as well. And the flight crew that was supposed to bring flight B to this airport technically went on duty the moment they reached the airport, and now they have reached the max allowed hours in the day, so a new crew is needed. But that crew is in a different city...

      This incident will span some fascinating failure analyses, and no doubt people will get fired and lawsuits will be filed. And like most DR scenarios, it is way harder in real life than it seems in planning and exercises. I wouldn't be surprised if this causes a big project to deal with outages and restarts, so that this doesn't happen as easily next time.

    3. Re:Dumb by Zontar_Thing_From_Ve · · Score: 3, Insightful

      You really only see this type of thinking in the West.

      While there is certainly some component of that, it's not the major reason why things are as they are. Airlines in the USA are not owned or managed by the government. If it really came down to this, the US government might let all them go out of business and let new airlines be built out of the ashes. Switzerland did that. Plus, some airlines are actually owned by the government in the countries where they are based or the relationship is not all that independent. Air France, for example, is theoretically an independent company, but if they were going out of business the French government would surely step in and save them despite it violating EU laws to do so. The US airlines know that the government may not have their backs like they did the automobile industry. Plus, being publicly traded on the US stock market is probably for them a bad thing. This causes them to make decisions for short term profit to keep the stock price high. Finally, in the USA most customers who fly in coach only care about price and literally everything else is negotiable. They will make decisions on price alone. This puts pressure on the majors US carriers to compete at perhaps unrealistic price levels with the smaller airlines, which no doubt reduces money available to spend to upgrade old computer systems.

    4. Re:Dumb by DerekLyons · · Score: 3, Insightful

      "Most airlines were on the verge of going out of business for many years, so investment of any kind had to have short pay-back periods,"

      You really only see this type of thinking in the West. Most sensible companies know that when times are good, you build a war chest, when they are bad you invest the war chest to grow your business and be competitive.

      You really only see this kind of simpleminded thinking in people who don't know what they're talking about - but who can repeat something they read somewhere like a parrot.
       

      The problem wasn't that times were bad. You can always say times are bad. The problem was that they didn't make the best of things when times were good, and therefore deserve the cluster fuck situation they are in now.

      The problem is, times have never really been good for the airlines for any extended period. An airline is a capital intensive business, and runs on paper thin margins. Historically, as soon as they get their head above water, they get pushed under again. The shift to jets in the 50's, the fuel shocks and deregulation in the 70's, recession in the early 80's, another recession in the early 90's, a race to the bottom in fares sparked by the rise of internet ticketing, the post 9/11 drop in business, the Great Recession of 2007... (just to hit the high spots) There's a reason why practically every major airline has ended up bankruptcy court at least once.

    5. Re:Dumb by tlhIngan · · Score: 3, Informative

      But once they got everything booted up again, they had to contend with trying to get a system restarted that simply wasn't designed to ever fail completely. So it took hours to get all the pieces back up and communicating again.

      Well, mainframe computers have such excellent uptimes (you almost never reboot one) because everything is hot-swappable. CPU failure? Remove the CPU module, insert new one, and continue - all while powered up. The OS takes care of suspending the failed one and scheduling around it. Ditto all other components. Effectively, you should never reboot them.

      Of course, the thing is, when you eventually do reboot them, they take hours to boot all the way up as they perform comprehensive integrity checks (who knows why it was rebooted?).

      Then their are the real world problems - flight A feeds into flight B, but flight A was late, meaning all those connections were missed and passengers have to be rebooked. And flight B can't fly anyway, because the plane is still sitting 500 miles away because the flight that would bring it to this airport was cancelled as well. And the flight crew that was supposed to bring flight B to this airport technically went on duty the moment they reached the airport, and now they have reached the max allowed hours in the day, so a new crew is needed. But that crew is in a different city...

      This is, IMHO, the far bigger issue. Airlines are scheduled tight - if the plane's not flying, it's costing money. Ultra-low cost carriers have very right schedules to ensure the planes are always in the air.

      Getting the crews and equipment all prepositioned in the right place and ready to fly is a delicate balance at the best of times and a complete nightmare when you have to start from scratch.

    6. Re:Dumb by RabidReindeer · · Score: 3, Insightful

      That's because in the West, the post-1980 business philosophy has been "Efficiency to the Max!"

      You are expected to give "110% percent". Everyone is running Big Data analytics. The bean-counters scour the numbers and do cherry-picking to get the big profits and lemon-dropping to discard the losers. JIT inventory. On-demand elastic clouds. And all in the context of the next quarter's earnings report.

      No sane general would commit troops without maintaining reserves for the inevitable unforeseen, but modern western businesses cannot stand a moment of wasteful "idle time" or resources sitting around unused.

      And so, when the inevitable happens - train wreck. There's no spare parts, no idle people to put to work, nothing. No reserve capacity.

      As they used to say back when computers were expensive objects of reverence, "Never before in human history has it been possible to screw up so badly on so large a scale so quickly". Such is the 2-edged sword of modern technology.

    7. Re:Dumb by Jeremiah+Cornelius · · Score: 5, Informative

      I worked on the first big web-enablement for AA's Sabre system, back in 97-99. Saabre was the key to inter-airline reservation scheduling. Travel agents used this as their main system, and some other folks around here may remember the gateway with CompuServ. eAAsySaber. LOL.

      It was unlike any dotcom experience I had around that time. Super legacy. Impossible to change anything - and grave uncertainty that changes were even possible!

      The Sabre core compute and data storage stack was built on a series of different mainframe and mid-range systems, back when instead of writing new business functions, you instead attached new business systems.

      The glut of stuff crossed vendors occasionally. Mostly IBM. Parts dated to the 70's and through the 90's. I never met anybody who had a "mastermind" view of how it all worked. Instead, lots of analysts with diagrams - mostly from vendors and "big five" firms. Any proposed change had to be run through an exercise that called on the various experts in different parts of the system. Most were not so much expert, as "acquiring some expertise". ;-)

      Our work became the basis for travel services like Expedia, and customer offerings by American Express Travel, etc.

      I'm sure that this may have changed only somewhat. Saber was sold off, and became a core to Travelocity - who in turn were finally bought by Expedia, who consumed Saber information. Behind it all, there are a 360 and some front-end processors ported to AS400 systems, I'm nearly certain.

      --
      "Flyin' in just a sweet place,
      Never been known to fail..."
    8. Re:Dumb by dgatwood · · Score: 3, Insightful

      From what I have read, this was not an obvious WTF moment. Delta apparently has a complete disaster recovery facility with duplicate hardware. But they had a single point of failure in their infrastructure, which caused them to lose power to the entire datacenter, and everything went down. That part might be a WTF.

      No, the WTF is not that the datacenter had a single point of failure. If their IT setup had been designed properly, that would have been a minor inconvenience. The WTF is that they didn't have at least three datacenters in geographically isolated locations with hot failover and regularly test the hot failover to ensure that it worked reliably and quickly in the event of a sudden, catastrophic loss of their primary datacenter.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    9. Re:Dumb by im_thatoneguy · · Score: 3, Interesting

      The problem is that if all of your competitors are willing to go bankrupt every 15 years its' really hard to not go bankrupt before they do. It doesn't do you any good if they'll be going bankrupt in 8 years if you go bankrupt next year because you're 10% more expensive.

      We see that in my industry all the time. Lots of people undercutting sustainable rates. They inevitably go bankrupt but if you don't match prices you'll go bankrupt waiting for them to go first. And since they're offering products at under cost they can also appeal to investors with fat grosses and rapid growth.

        Imagine for instance you were trying to take on Amazon. Amazon hasn't really ever made money. But they can point to their rapid growth for long term investors. If you're an airline you probably won't see growth and it's hard to say "look we're losing a lot of money now but in 8 years when our competitors hit a hard time we'll make some money then until someone else comes along and promises to do what we do but cheaper and never go out of business." You see that with Jetblue and Virgin America. Jump into the industry with lots of investment. Offer a product at razor thin margins and capture a ton of market. But their business plan isn't tested to survive a big recession. So it's a gamble. They'll either do great or it'll reveal they were built on sand.

      When your competitors are playing with fire it means they capture all of the revenue when times are good leaving you nothing to save for the "bad times" and then you only prosper when the market is crappy anyway and they can write off their debt in bankruptcy. It's a lose lose.

    10. Re:Dumb by lgw · · Score: 4, Interesting

      No one credible would count duplicate equipment in the same data center to be any kind of DR plan at all. That's like confusing RAID with backup. And just like you don't have a backup unless you've tested it, you don't have a DR plan unless you've tested it.

      But a "disaster plan" needn't be limited to IT in any way. Air France had some sort of computing disaster recently, a similarly total outage, but they completed all their scheduled flights (not on schedule, but still). They had a disaster plan involving everyone behind a counter at an airport on the phone to a massive call center, where everything was verified "manually" from offline backup systems (and possibly print-outs). "Is Joe Slashdotter booked for flight 123?" "Give me a minute - yup, let him on the plane." Low tech, but it worked.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    11. Re:Dumb by quetwo · · Score: 2

      The problem with your example of Amazon is that Amazon invests every penny they earn back into the business. Companies like Delta don't. So when there are bad times, Amazon will be much better poised to do well because they've diversified and built up their business to handle it. All it takes is a generator to malfunction and Delta could be out of business forever (yes, a bit of a stretch, but still).

      Delta has a virtual monopoly for a large swath of the nation. You have no choice to fly delta in the midwest and large portions of the south. In recent history, they were never /that/ bad off. In the last market crash, they decided to burn their cash on buying NorthWest instead of modernizing their own systems or investing in their own infrastructure. In the last three years where they've been posting record profits they continued to do cost-cutting measures in all portions of their business, and move money out of the business by paying investors and the execs.

    12. Re:Dumb by swb · · Score: 2, Interesting

      Were the airlines really in that tough a shape for that long of a period of time?

      If they have only recently returned to profitability and actually experienced extended times of economic uncertainty, how do you explain Boeing outperforming the S&P 500 and gaining 8000% in value since 1978? Overseas sales explain some of it, but not all of it and an extended depression in American airline business you would associate with some decline in Boeing's business, but it's been continuous growth.

      And airports and crowds? Airports I've flown through for 20 years are only bigger and busier than they ever were. I don't remember a time when I thought the airport was too big or empty, either, it's been steady if not increasingly busier and more crowded. Airports all seem to expand, not contract.

      Overall, the aviation sector seems to have done nothing but grown. So how is it exactly that the airlines were truly losing money? I don't doubt they reported low stock prices or reported lower profits on paper, I just don't know that the industry truly shrank and lost money.

      I also remember schemes where airlines went through leveraged buy-outs and the new owners sold off air fleets and then leased them back to the airlines and had them make huge payments to consulting companies owned by the new buyers. I think the airlines got bled dry and then had to fight acquire less parasitic management focused on the business rather than just sucking the capital out of them.

    13. Re:Dumb by SvnLyrBrto · · Score: 2

      I'm going to go out on a limb, and guess that the people responsible for that mainframe transitioned into the role from conventional rack & stack servers. While that sort of discipline is otherwise admirable; the whole reason that you give IBM the kind of money it takes for them to deliver and support a mainframe is to make that sort of discipline unnecessary. IBM was guaranteeing uptime of 20 *years* back in the S/390 days. And I'm pretty sure it's only gone up from there. Basically, the only time they should be down is if the whole building burns to the ground or is taken away to Oz in a tornado. And, though it was never my job to work on them, I've seen some IBM installations that I could very much believe would survive the rest of the building burning or being blown away.

      --
      Imagine all the people...
  2. Aging? by sexconker · · Score: 5, Insightful

    What's wrong with aging tech? If most airlines are on TPF and TPF works and TPF is still maintained by IBM, what's the problem with TPF?
    Something being old doesn't mean it's bad. Quite often, the reverse is true. The mainframe is still the king when it comes to reliability and transaction integrity, for example.

    1. Re:Aging? by geek · · Score: 2

      What's wrong with aging tech? If most airlines are on TPF and TPF works and TPF is still maintained by IBM, what's the problem with TPF?
      Something being old doesn't mean it's bad. Quite often, the reverse is true. The mainframe is still the king when it comes to reliability and transaction integrity, for example.

      It's not "mobile first, cloud first" as my boss would say .................

    2. Re:Aging? by rmullig2 · · Score: 2

      "What's wrong with aging tech?" The documentation isn't written in Hindi.

  3. Re:If it isn't broke... by LWATCDR · · Score: 5, Insightful

    The Delta outage was caused by a power outage. Seems like TPF is not the problem.
    Considering how well this 1960s tech seems to be working replacing it may and doing it better may not all that easy.

    --
    See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
  4. That's greed for you by H3lldr0p · · Score: 3, Insightful

    Always short sighted and thinking tomorrow will be the same as today.

    What I'm afraid of is this business / investment / management continues to infect the rest of the world. I can't wait until all of the stock markets are controlled by algorithmic trading, with the next quarter's number the sole goal.

  5. I call BS by wcrowe · · Score: 4, Insightful

    This is bullshit. Software does not "age" the same way that a car or a washing machine ages. The hardware can age, but the hardware can be replaced, and in this case we are talking about IBM software and hardware, which has a long-standing reputation for reliability and for maintaining backwards compatibility.

    I think the more likely story is that the interfaces to these systems are being compromised. That's why it's happening, first at one airline, then another. Someone, somewhere is fucking around with the airlines' reservation systems.

    I think these stories about "fires" and "aging" software is covering up for the fact that these systems are getting hacked. If people start to lose confidence in the systems they'll fly less or stop flying altogether.

    --
    Proverbs 21:19
  6. Re:If it isn't broke... by Jason+Levine · · Score: 2

    Age can be a problem in IT if your system was designed for 1960's travel habits/workload and now has to cope with 2016 travel habits/workload.

    --
    My sci-fi novel, Ghost Thief, is now available from Amazon.com.
  7. Re:is it that complex? by HBI · · Score: 2

    Sounds awfully much like the old e-mail form that used to get passed around every time someone had a solution for spam.

    Disclaimer: I used to work on what is essentially a middleware message processor for military use. It supported dozens of different inputs from myriad systems, some indeed "mainframey". The failing of the system wasn't the system itself - it was rock solid, UNIX based, and had decent hardware and procedures for maintaining maximal uptime in crappy environmental conditions. The ancillary systems that provided message feeds, on the other hand, weren't so reliable, and when they failed, guess who got blamed?

    I suspect airline reservation systems probably are in the same boat.

    --
    HBI's Law: Frequency of calling others Nazis is directly correlated with the likelihood of the accuser being Communist.
  8. self servingly lies by Lead+Butthead · · Score: 2

    continuous complaint about times are bad or union is rendering the business unprofitable has never stopped their officers from drawing ever larger compensation packages, nor has it prevented their board from approving those compensation packages. to claim that there's no money to reinvest into company infrastructure is but a self serving lie.

    --
    ELOI, ELOI, LAMA SABACHTHANI!?
  9. Re:is it that complex? by im_thatoneguy · · Score: 5, Informative

    Reservation system could be implemented in chapter 10 of your first programming book. It seems trivial thing?

    It's actually really really complex.

    It's not just a "reservation system" where you lock out a ticketed space for X seconds until someone completes a transaction. You really have to view it as "The Company" when you're talking about airlines. Let's say a pilot has been in the air too long due to a delayed departure in New York. He hits his max flight time for the next 24 hours but he was scheduled to fly from his destination to another leg. So now you need to replace the pilot. Which pilot? Well there is a plane coming into the airport around the same time as the NY flight. But is that pilot rated to fly that same aircraft? Ok he is, great. But because 30 of the 300 passengers are going to miss their connections now because of the delayed arrival they need to be moved to different flights. But those flights are maxed out. So you have to bump some passengers on a scheduled flight and move them to a later flight as well. Because the plane is getting in late it's also going to depart late. So you also need to either arrange all of the passengers at the next destination to be on different flights and set of a chain reaction or you need to pull in a different plane at the 2nd destination to short circuit the chain reaction. But where can you get a plane from for the cheapest? And how much will it cost to put people up in a hotel vs flying an extra crew in on overtime?

    This is all simple enough to calculate with like 1-2 planes. But when you have 1,000 aircraft and all of the seat assignments effectively being interdependent along with business interests (profit/loss of changes), customer service interests such as ticket class... and you have to stay up to date instantaneously with dozens of terminals all trying to do the same thing manually in addition to the automatic callbacks for unexpected events... it's big engineering effort to not create some sort of automatic-trading style feedback loop that accidentally sets off a chain reaction that cancels every flight in the country.

    Every change has a cost. No human can orchestrate thousands of interdependent variables with millions of passengers manually. You have to have a central director system which instantaneously handles all of the callbacks and dependencies for a change throughout the entire graph.

    It's actually very cool when you stop and think about how well it does at keeping everything relatively straight.

  10. Ugh by sootman · · Score: 4, Informative

    Do reporters even read these stories as they are writing them?!? "Airlines will likely suffer more disruptions like the one that grounded about 2,000 Delta flights this week because major carriers have not invested enough to overhaul reservations systems based on technology dating to the 1960s... [TPF] is still updated by IBM, which did a major rewrite of the operating system about a decade ago."

    Big, complicated system, written by a big, experienced company, still maintained... Do they think we'd be better off if it were rewritten from the ground up as a Ruby on Rails app or something?

    Psst, I don't want to cause a panic, but I heard that large, important chunks of the Internet run UNIX, which also dates back to the '60s.

    --
    Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.