Slashdot Mirror


British Airways IT Outage Caused By Contractor Who Accidentally Switched off Power (independent.ie)

An anonymous reader shares a report: A contractor doing maintenance work at a British Airways data centre inadvertently switched off the power supply, knocking out the airline's computer systems and leaving 75,000 people stranded last weekend, according to reports. A BA source told The Times the power supply unit that sparked the IT failure was working perfectly but was accidentally shut down by a worker.

15 of 262 comments (clear)

  1. Re:Am I in the Matrix? by Anonymous Coward · · Score: 2, Insightful

    The new article has more details.

  2. N+1 guess not by silas_moeckel · · Score: 3, Insightful

    So it was all running in a single DC with a single power bus? Plenty of room at real datacenters they need to stop running out of a closet somewhere.

    --
    No sir I dont like it.
  3. Re: LOL by haemish · · Score: 5, Insightful

    Right. It's not the poor guy that turned off the power supply. It's the shit-for-brains managrrs who wouldn't let the engineers put in redundant power supplies and hired cheap lobour that had no clue how to architect for fault tolerance.

  4. Yeah, yeah... blame the contractor... by __aaclcg7560 · · Score: 5, Insightful

    This is human error because a contractor accidentally turned off a power supply that caused a world-wide outage? It should be operational error for allowing such a single-point of failure to exist.

  5. not the contractor's fault by ooloorie · · Score: 4, Insightful

    When your business depends on your IT infrastructure like that, turning off the power to a single machine or data center shouldn't bring down your operation; that's just stupid and bad design. Good enterprise software provides resilience, automatic failover, and geographically distributed operations. Companies need to use that.

    And they should actually have tests every few months where they do shut down parts of their infrastructure randomly.

  6. Re:Did they try... by sycodon · · Score: 5, Insightful

    Bullshit.

    Even a brand new IT graduates knows computers should be plugged into UPS devices that protect against this.

    Handling power outages is about as basic of an IT task as they come. Basic Lock Out practices that prevent power from accidentally being turned off is also Server Maintenance 101.

    For this to actually have been the cause means their IT organization was run by rank amateurs.

    --
    When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.
  7. Re:How does one DR test in a 24/7 business? by silas_moeckel · · Score: 4, Insightful

    You do it in production because none of it should cause a massive failure. They bought a DR site and failed to test it. Working at some big shops the DR site was prod every other quarter.

    --
    No sir I dont like it.
  8. Re: LOL by thegarbz · · Score: 5, Insightful

    who wouldn't let the engineers put in redundant power supplies

    That's an interesting assumption. Have you seen anything even remotely indicating that the data centre didn't have redundant power? No amount of redundancy has ever withstood some numbnuts pushing a button. But i'm interested to see your knowledge of the detailed design of this datacentre.

    Hell we had an outage on a 6kV dual fed sub the other day thanks to someone in another substation working on a wrong circuit. He was testing intertrips to a completely different substation, applying some power to an intertrip signal, realising he hit the wrong circuit (A), he immediately moved to the one he was supposed to do (B), both in the wrong cubicle successfully knocking out both redundant feeds to a 6kV sub and taking down a portion of the chemical plant in the process.

    Not sure what's worse, managers who don't put in redundant power, or armchair engineers who just *assume* that they didn't because redundant power can't ever go out.

  9. Re:Did they try... by swb · · Score: 5, Insightful

    I think they also suffer from what I call "efficiency savings hoarding".

    If you have a process that requires 10 labor inputs to achieve and you buy a machine that reduces it to 5 labor inputs, your ongoing savings isn't really 5 labor inputs. You have to spend some of that labor savings in keeping the machine maintained and operational and investing in its replacement when it reaches end of life.

    When I started working for a company in 1993, they had some 40 secretarial positions whose workload was about half spent doing correspondence and scheduling meetings. In 2001, thanks to widely deployed email/calendaring system they had cut about 30 of those positions because internal meetings could be automatically planned via email and the bulk of internal correspondence shifted from paper memos to email.

    Yet when it came time to expand/replace the email system due to growth it was seen as a "cost". I actually got the project approved by arguing that the cost of the replacement was actually being paid for by the savings realized from fewer administrative staff -- they still had ample savings (the project was less than 1 administrative FTE). But the efficiency gain from the project wasn't free on an ongoing basis.

    Too many business gain efficiencies and savings from automation, but assume these are permanent gains whose maintenance incurs no costs.

    I have an existing client with a large, internally developed kind of ERP system that supports a couple of thousand remote workers. The system is aging out (software versions, resources, performance issues all identified by their own internal developer) and of course the owner is balking at investing in it without realizing that the "free money" from reduced in-office staff needed to process faxes, etc, needs to be applied to maintaining the system to keep achieving the savings.

  10. Re:Did they try... by jcr · · Score: 3, Insightful

    For this to actually have been the cause means their IT organization was run by rank amateurs.

    Given the duration of the outage, I'd say that's a fair conclusion.

    -jcr

    --
    The only title of honor that a tyrant can grant is "Enemy of the State."
  11. Re: Did they try... by LS1+Brains · · Score: 5, Insightful

    I've been in a few of these "career changing event" over the years. If I make a mistake, I step forward, take responsibility and fix the problem (if I can).

    As an IT Manager/Director, THANK YOU. Everyone screws up at some point, it's what you do after that really matters.

  12. Re: LOL by chispito · · Score: 3, Insightful

    Not sure what's worse, managers who don't put in redundant power, or armchair engineers who just *assume* that they didn't because redundant power can't ever go out.

    It isn't armchair engineering. The CEO should accept full responsibility because that's what it means to be at the top of the reporting chain when such a devastating preventable outage occurs. If he was misled by his direct reports, then he should fire them and take full responsibility for not firing them sooner. Maybe he resigns maybe he doesn't--the point is that he must own the failure, whatever the logical conclusion.

    --
    The Daddy casts sleep on the Baby. The Baby resists!
  13. Re: Did they try... by jellomizer · · Score: 5, Insightful

    You mean for the Executive who didn't approve of the hot offsite fail over solution ?

    You know the stuff that normal large organizations have to make sure their business can be operational.

    --
    If something is so important that you feel the need to post it on the internet... It probably isn't that important.
  14. Re:Did they try... by ghoul · · Score: 4, Insightful

    Managers get paid to take the blame and the stress while workers get paid to do the work.

    --
    **Life is too short to be serious**
  15. Re: Did they try... by lactose99 · · Score: 3, Insightful

    "we didn't budget for that"

    "well does your budget include a multi-day downtime when the primary site goes offline?"

    "now how could the primary site possibly go offline?"

    Unfortunately I run into this far more than I should in this industry.

    --
    Fully licensed blockchain psychiatrist