Slashdot Mirror


IT Crash Causes British Airways To Cancel All Flights (cnbc.com)

An anonymous reader quotes CNBC: British Airways canceled all flights from London's Heathrow and Gatwick airports on Saturday as a global IT failure upended the travel plans of tens of thousands of people on a busy U.K. holiday weekend. The airline said it was suffering a "major IT systems failure" around the world. Chief executive Alex Cruz said "we believe the root cause was a power-supply issue and we have no evidence of any cyberattack." He said the crash had affected "all of our check-in and operational systems." BA operates hundreds of flights from the two London airports on a typical day -- and both are major hubs for worldwide travel. Several hours after problems began cropping up Saturday morning, BA suspended flights up to 6 p.m. because the two airports had become severely congested. The airline later scrapped flights from Heathrow and Gatwick for the rest of the day.

23 of 184 comments (clear)

  1. a power supply failure?? by Anonymous Coward · · Score: 5, Insightful

    So a power supply failure can bring down all operations on a global scale. Good to know that BA had outsourced part of their IT staff to India!!!

    1. Re:a power supply failure?? by dbIII · · Score: 5, Funny

      So a power supply failure can bring down all operations on a global scale. Good to know that BA had outsourced part of their IT staff to India!!!

      As another poster quoted "BA in 2016 made hundreds of dedicated and loyal IT staff redundant and outsourced the work to India".

  2. Somewhere, an IT guy is crying by whoever57 · · Score: 4, Insightful

    Somewhere, there is probably an IT guy who has been begging for the budget to upgrade some old machines, or move the services onto a cloud provider and was ignored.

    He's crying today, because this huge revenue loss could probably have been avoided with a small budget for newer hardware or more redundancy.

    --
    The real "Libtards" are the Libertarians!
    1. Re:Somewhere, an IT guy is crying by nadaou · · Score: 4, Insightful

      He's crying today, because this huge revenue loss could probably have been avoided with a small budget for newer hardware or more redundancy.

      And despite that s/he knows who will take the blame for it.

      --
      ~.~
      I'm a peripheral visionary.
    2. Re:Somewhere, an IT guy is crying by grasshoppa · · Score: 5, Insightful

      move the services onto a cloud provider

      "Cloud" service providers have no place in mission critical roles by virtue that the "Cloud" is a faster way of saying "abdicating responsibility". If you make millions of dollars a day on the back of your IT infrastructure, then the last thing you do is outsource the responsibility of said infrastructure to a 3rd party company which has different priorities than you do.

      Any IT manager making such a recommendation is a) lazy, b) useless and c) should be fired.

      --
      Mod me down with all of your hatred and your journey towards the dark side will be complete!
    3. Re:Somewhere, an IT guy is crying by AmiMoJo · · Score: 4, Insightful

      The only alternative is to spend vast amounts of money building your own redundant systems, which clearly BA were unwilling to do. Using cloud services makes perfect sense.

      Take Amazon's cloud services as an example. To get that kind of reliability, with systems distributed around the world for responsive operation an redundancy you are going to need a large number of geographically distributed services and a team to look after them. A team that is available 24/7 with response times in minutes.

      And you will still have the same local problems, like internet connection reliability, and the same development problems. You don't have to waste time and effort administering your own servers either, dealing with mundane stuff like HDD failures or managing 30 different datacentre operators.

      Unless your company is willing to put a massive amount of effort into that stuff for some reason, it's dumb to even try.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    4. Re:Somewhere, an IT guy is crying by Dogtanian · · Score: 5, Informative

      Somewhere, there is probably an IT guy who has been begging for the budget to upgrade some old machines, or move the services onto a cloud provider and was ignored.

      On the contrary, that IT guy was probably made redundant in 2016. As the BBC article notes:-

      The GMB union says this meltdown could have been avoided if BA had not made hundreds of IT staff redundant and outsourced their jobs to India at the end of last year.[..]

      "BA in 2016 made hundreds of dedicated and loyal IT staff redundant and outsourced the work to India... many viewed the company's actions as just plain greedy."

      Let's hope BA continues to reap as many "savings" from that outsourcing as they did today. :-)

      He's crying today.

      Going by the likely response of the laid-off employees to the predicament of BA, I guess he *would* have tears coming out of his eyes.

      --
      "Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
    5. Re:Somewhere, an IT guy is crying by AK+Marc · · Score: 4, Funny

      Don't blame the IT workers. I had the same thing happen where I work. The middle manager, trying to look good, cut necessary costs. One power blip in the grid, and everything was dead because we had undersized UPSs everywhere, and they couldn't handle the load. He said "inrush current" thousands of times, but never knew what it meant.

    6. Re:Somewhere, an IT guy is crying by Kjella · · Score: 4, Insightful

      Unless your company is willing to put a massive amount of effort into that stuff for some reason, it's dumb to even try.

      IAG (the holding group of British Airways) have a market cap of 13 billion GBP or about 17 billion USD, guesstimating by fleet size BA is almost half of that. I'd understand if you were talking about a fly speck of a company but an 8 billion dollar company can damn well run their own infrastructure without a cloud provider with geographical distribution, 24/7 available teams and all that.

      --
      Live today, because you never know what tomorrow brings
    7. Re: Somewhere, an IT guy is crying by fluffernutter · · Score: 4, Insightful

      Usually the problem is when they go in house they want their administrators to work for $15 an hour, and then when they can't find good ones and the systems fail, they throw their hands up and go back to paying way more for cloud services than proper admins would have cost in the first place.

      --
      Laws are rules for the court, but merely a bottom bar to hit for life. Think beyond laws in your actions always.
  3. The major issue is... by Anonymous Coward · · Score: 5, Funny

    ...the outsourced IT guys from TCS in India need to fly to the UK to fix the 'power supply' issue but currently they are unable to book a flight on British Airways.....

    1. Re:The major issue is... by Anonymous Coward · · Score: 5, Insightful

      Funny, but the bigger issue is there anyone at Tata that was there the last time BA restarted their systems? At the bank I used to work at, we were replaced by contractors, and two years later when they restarted the zSystem, they found-out the hard way that no one knew what to do.

  4. Pilling up technical debt is utterly stupid by gweihir · · Score: 4, Insightful

    Of course, it requires more than the myopic 3-month planning that most MBAs are capable of at maximum. It also requires a real understanding of risk management and staying away from all short-term optimization. Otherwise, you end up at "save a million, lose a billion", as this seems to be a fine example of.

    Claiming this was a "power supply issue" is just lying by misdirection. The root cause is lack of redundancy, lack of resilience and lack of effective business continuity management. All things that cost money and that do not generate profit _unless_ something like this happens. In a healthy infrastructure, one (or even several) power supplies blowing up will not kill your ability to do business.

    Events like that are almost universally due to gross mismanagement and should not only result in termination but also prosecution of the "leadership" that allowed this to happen by not being prepared.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  5. Re:Other sources: IT outsourcing by gilgongo · · Score: 4, Informative

    It looks like BAE has recently replaced most of its IT workforce with south Asian contractors.

    OT: it's BA, not BAE. The latter is a different company concerned mainly with blowing up flying objects, along with people in them. Easy mistake to make though.

    --
    "And the meaning of words; when they cease to function; when will it start worrying you?"
  6. Re:Idiots in charge! by gweihir · · Score: 4, Insightful

    "We" (as in people that actually have a clue what they are doing) have indeed known that. But the decision-makers have no such understanding. While it is really tacky, I have had to explain catastrophe scenarios to customers that would have killed their company, and all that was needed was a failed software functionality update (which they wanted to do without a possibility to roll-back and no working plan for keeping business going any other way). The people making the decisions these days are bean-counters with zero understanding of risk-management or "visionaries" that have even less of an understanding about the reality of things. And, unfortunately, this often is aided by a corporate culture of "don't rock the boat" and people that warn of consequences get silenced.

    Expect more of these utterly pathetic failures.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  7. Re:Other sources: IT outsourcing by gweihir · · Score: 5, Insightful

    "Power supply failure" does not take down a well-designed and well-maintained infrastructure. This is just a smokescreen to hide incompetence.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  8. Re:Busy U.K. Holiday Weekend... by Ecuador · · Score: 5, Informative

    It is actually "(Late) Spring bank holiday". The UK has depoliticized and dereligionized most of their holidays (notable exceptions are Christmas and Easter), so there is a bunch of "bank holidays" around the year that fall on Mondays (to provide extended weekends). This particular holiday seems to have replaced "Whit Monday" (day after Pentecost), which was a moveable Christian holiday. So, as you should expect, it is not related to the US Memorial Day.
    The equivalent to the US Memorial Day for the UK (and Commonwealth nations) is the "Remembrance day" on November 11th (end date of WWI), which is not a bank holiday (so you normally go to work that day, usually wearing a poppy).

    --
    Violence is the last refuge of the incompetent. Polar Scope Align for iOS
  9. Re:Is anyone tracking causes for Airline outages? by __aaclcg7560 · · Score: 4, Interesting

    There are some where the IT infrastructure could not handle one specific system going down, and that is not a technical issue, but something else which usually is called "gross negligence".

    Technically, that's known as a single point of failure.

    https://en.wikipedia.org/wiki/Single_point_of_failure

    The term "gross negligence" doesn't come into play until a lawsuit is filed. Since no one died and/or injured from this outage, a gross inconvenience doesn't rise to gross negligence.

  10. Re:Manual backups by __aaclcg7560 · · Score: 4, Interesting

    If you're going to have people fallback on pen and paper, they need to be trained to use pen and paper. I worked at a restaurant when a power outage took down the ordering stations. The restaurant kept doing business until the power came back online an hour later, as sunlight through the large windows and emergency lighting illuminated the interior. The kitchen kept on cooking with gas-powered appliances and emergency lights. The wait staff struggled to calculate bills and make change with only one calculator in the entire building. Management added backup power to the ordering stations a week later.

  11. Maybe in bringing it back up they can... by h4x0t · · Score: 5, Funny

    .. find my fucking bag that they lost A WEEK AGO, the fucking fucks.

    *cough*

  12. Re:Backup plan.. by Anonymous Coward · · Score: 5, Interesting

    Actual BA passenger here, currently in Austin TX, and was due to fly to LHR on today's direct flight at 6pm central time. Just to highlight how catastrophic the failure is:

      - Heard about the outage this morning, and looked online for more information, very little actual info available. I logged into BA with my flight booking, and the page indicated that the flight was still fine. The system also had my email address and made the statement "we will contact you if there are any problems".

      - Based on this I assumed the flight was OK.

      - Turned up at the airport and the BA check-in is closed. There was a large crowd of unhappy people, a haggard team of BA staff behind the counter, but no one was moving and nothing was happening. After 20 minutes I went and told the BA manager that he had better tell the crowd what is happening before things get out of hand. Eventually, he did redeem himself by doing a walk-through and chatting with people and handing out a letter explaining that the flight was canceled.

      - Not only was the flight canceled, but their systems were unable to do any rescheduling. They asked us to leave the airport, find a hotel, contact them tomorrow, and ultimately seek reimbursement for expenses.

      - Disappointed, I wandered down to American Airways (a One World partner, with whom I am saphire) and had a chat with their staff. As if by magic, they somehow pulled my booking from the BA system and put me on some AA flights free of charge. Amazing.

    Not sure how much of it is staff incompetence, or the system is just completely fucked, but this mess is going to take days to resolve...as for me, I'm off in a few minutes, best of luck to the other BA passengers caught up in this mess!

  13. Obligatory Bastard Operator from Hell by UnknowingFool · · Score: 4, Funny

    "No the server isn't down. You must be using it wrong, idiot." *unplugs coffee maker, plugs server back in*

    --
    Well, there's spam egg sausage and spam, that's not got much spam in it.
  14. Either amature hour or a lie by Murdoch5 · · Score: 5, Insightful

    Massive world wide systems like this, should always have at least two entire working deployments, one kept in a down state and one kept up and working, that way if a problem happens, you just bring the second data center online and off you go.

    If a power supply issue could bring down your entire system, you didn't design it correctly, PERIOD! If your entire system hinges on a single power supply failure, you ALWAYS have a second one on an alternative supply, in fact, you'd have multiple supplies to each data center, from different providers, just to make sure power issues can't cause these types of issue.

    If the problem really comes down to a power supply, fire the IT department, fire the System Architects and start doing things properly.