Slashdot Mirror


BP Gulf of Mexico Rig Lacked Alarm Systems

DMandPenfold writes "BP's monitoring IT systems on the failed Deepwater Horizon oil rig relied too heavily on engineers following complex data for long periods of time, instead of providing automatic warning alerts. That is a key verdict of the Oil Spill Commission, the authority tasked by President Barack Obama to investigate the Gulf of Mexico disaster."

22 of 92 comments (clear)

  1. As opposed to... by toejam13 · · Score: 4, Interesting

    Three Mile Island, where the complaint was that there were too many alarms going off.

    1. Re:As opposed to... by Gruturo · · Score: 4, Insightful

      Three Mile Island, where the complaint was that there were too many alarms going off.

      Yeah, surprisingly alarms have to be neither missing nor useless (by being irrelevant, hard to understand, going off for the wrong reasons, presenting wrong scenario, not correlating causes etc etc etc).

      Who'd have thunk it.

      --

      Vacuum cleaners suck. Kings rule.
    2. Re:As opposed to... by ColdWetDog · · Score: 4, Insightful

      Truly amazing, indeed. Too lazy to look it up, but earlier reports had shown that Transocean (the rig owners, not BP like the stupid article mentions) had shut down many automatic warning systems because of too many false positives.

      It's not like we've never seen this sort of thing before ...

      "You are about to do something."

      CANCEL, or ALLOW?

      --
      Faster! Faster! Faster would be better!
    3. Re:As opposed to... by omglolbah · · Score: 5, Interesting

      Indeed. Alarm suppression is a complex thing to set up in many cases. I personally work in the business and know how much thought goes into the alarm handling of the plants operating in Norwegian waters.

      One example of a "simple" suppression case is that if Controller A goes down, you do not need to tell the operator that ALL signals on this controller is in "bad quality" or out of bounds. What you need to tell them is that the controller is down, and which systems are affected (which they will see on their displays as valves change color or somesuch. Our system uses white asterisks and white color to indicate that something is 'dead')

      More complex cases are things like not throwing alarms for low flow rates in pipes where the valves are closed, or not throw electric alarms on equipment set to maintenance mode.

      Regardless of all this, there should be an alarm system that has priorities.

      Pri 1 alarms are such that they require IMMEDIATE attention. Such as a dangerous triple-high alarm (HHH or 3H) of a tank, pressure or temperature or a controller going down.
      Pri 2 would be alarms that could develop into Pri 1 if not handled within a few minutes (H/HH) alarms etc.
      Pri 3 would be what we call "pre-alarms". Things that could cause process upset or issues down the line. Like a low flow of coolant even though the temperature of the equipment being cooled hasnt started raising yet. Or a low level in a fuel tank.
      Pri 4 we usually assign as maintenance issues. Like two redundant sensors having more than 0.5% deviation between them (But not enough to cause a real alarm). Things that should be looked at but within a day or so.

      Being able to filter alarms like this helps immensely during an emergency. This is an old system with a limited number of 'alarm groups' and 'priority levels' but it still works fairly well. Operators can see what happens even with several hundred alarms going off at the same time. On our simulator we did a fun test where we tripped 70% of the plant (about 18000 distinct 'tags' or io points went into Bad quality and several thousand in alarm).
      The operators were able to stop the cascade failure and no pipe burst in the simulator :)

      Shit -will- hit the fan. It is always nice to be able to filter it so that only the important shit actually hits the wall :p

    4. Re:As opposed to... by turbidostato · · Score: 2

      "Easier said than done."

      Of course, or there wouldn't be bussiness around it.

      "Always triggering the right alarm - and only the right alarm - amounts to creating a system that somehow knows exactly how to handle any situation, no matter how complex."

      Wrong. It amounts to getting rid of false asumptions or trying to sell a solution as the magic snake oil that will end all and every problem. Triggering the right alarm and only the right alarm is as easy as:

      1) Known situation: manage automatically
      2) Unknown situation: rise "human needed" alarm
      3) Gather information among the "human needed" risen alarms to extract patterns so more and more situations can be managed by 1).

      It only needs the guts from some manager to say "no, sir, we don't know that much about managing this kinds of situations, so it will need a lot of clever, well paid engineers that you will need to listen to as God Himself speech" instead of "Of course it'll be done, sir, in order for you to collect the bonuses".

    5. Re:As opposed to... by turbidostato · · Score: 2

      "Excuses that there were too many false positives just means that people needed to fix the false positives instead of ignoring or disabling them!"

      While I'm with your overall message, you seem to forget that for this to work, bonuses and penalties need to be aligned; when they are not, things like this are expected to happen.

      I.E: I certainly should care about each and every rised alarm, and I'm even told to do so. *But* I'm not payed to take care of rising alarms as soon as I can but to accomplish a different task (like bring up to production an oil rig) by the earliest date, while taking care of rising alarms gets in the middle. Since they both tasks are ireconcillable which one do you think a sensible person (or even a manager) should expect to suffer?

  2. Re:Seems a little unrelated by tomhudson · · Score: 3, Interesting
    And there was another near-disaster because at one nuke plant, the button you had to press was back-lit by a bulb that, over time, had caused the plastic to expand to the point that the button COULDN'T be pressed - which they found out the hard way.

    Things will always fail in weird, unexpected ways - that's why you need humans in the loop.

  3. Re:Why do they even bother? by tomhudson · · Score: 5, Informative
    Here's one fact - the regulators screwed up. Blaming it on a lack of alarms is disingenuous at best, corrupt at worst.
    1. Regulators Failed to Address Risks in Oil Rig Fail-Safe Device
      http://www.nytimes.com/2010/06/21/us/21blowout.html?_r=1&pagewanted=all
    2. Spill report: It could happen again
      'Failure of management' and regulators given blame for disaster
      http://www.chron.com/disp/story.mpl/business/7367856.html
    3. Slick Operator
      How British oil giant BP used all the political muscle money can buy to fend off regulators and influence investigations into corporate neglect.
      http://www.newsweek.com/2010/05/07/slick-operator.html

    This wasn't a technical failure - it was a failure brought out by greed and corruption. The blow-out was only the symptom, and addressing the symptom isn't going to prevent similar incidents from happening again.

    We've seen this before - the mortgage disaster and bank bailouts, the savings and loan disaster, etc.

    Start by fixing campaign financing - private donations only, strict annual limit per capita, no 3rd party involvement, etc.

    -- Barbara

  4. When r they getting theirs? by hesaigo999ca · · Score: 2

    When will we get a governing body that can punish or apply fines for this and enforce those fines or punishments...seriously, we need to evolve with these types of companies that spit all over international laws (or lack of)

    1. Re:When r they getting theirs? by ScrewMaster · · Score: 2

      When will we get a governing body that can punish or apply fines for this and enforce those fines or punishments

      Two words: regulatory capture.

      --
      The higher the technology, the sharper that two-edged sword.
  5. Re:Why do they even bother? by nomadic · · Score: 2

    Exactly; the private sector cannot be trusted to do things safer/more efficiently/better. This is exactly why strong government regulation, especially when it comes to environmental and health issues, is needed.

  6. Nagios by IceCreamGuy · · Score: 3, Funny

    Haven't they been on Nagios Exchange recently? check_catastrophe.pl has been out for like 3 years!

    check_catastrophy -H blowout-preventer716.haliburton.com -w ANY_LEAKS - c ANY_FRIGGIN_LEAKS

  7. Re:how much did that cost by hedwards · · Score: 3, Insightful

    I think everyone's familiar of that phenomenon regarding the alarm that cried wolf due to all the car alarms. Rarely do people even turn their head when they hear a car alarm.

    Competent professionals don't do that. The problem with car alarms is that they aren't aimed at professionals, competent or otherwise, they're aimed at the general public and the mechanism they use isn't typically going to assure that anything is going on.

    Competent professionals like the ones that are supposed to be running rigs should know to check them out every time and not turn the alarm off withotu ascertaining that the alarm is in fact false. Disabling an alarm should only be done when there are adequate contingency plans in place to handle if the condition happened and how they would respond.

    I used to work security at a high rise and we'd often times have alarms turned off on portions of the building. It was the only way to ensure that under certain circumstances that work wouldn't cause a false alarm. It was done in a controlled way with plans in place to make sure that there was somebody keeping an eye on it while the work was being done, and that the alarms would be turned back on when they could be.

    And every time that building had an alarm go off which wasn't a known cause, it was always investigated promptly. Alarms that go off repeatedly need to be fixed, not disabled.

  8. Re:how much did that cost by Rob+the+Bold · · Score: 4, Insightful

    I don't even want to know how much tax payer money was pissed away for that "key verdict" - having worked with quite a few monitoring and alarm systems for years I can tell you that most of the time "automatic alarms" get ignored and in fact can cause worse problems when an actual real alarm does occur because of how the operators tune them out - seems like they completely missed the mark on this - the real problem is most likely where you would expect it, the people running the system - human error I am sure !

    You don't even have to ignore the alarm that isn't there. But I don't think the "alert" that we're discussing is the big klaxon/flashing sign reading "OIL LEAK," or an oil pressure light with electrical tape over it. What the article indicates was missing was an automatic method of indicating that a failure was imminent. As far as the cost of determining this: learning from mistakes can be expensive. Not learning from mistakes is likely even more so.

    --
    I am not a crackpot.
  9. Re:Seems a little unrelated by dogsbreath · · Score: 2

    Actually, there were BPs in a redundant configuration but when the control was lost the main failed to operate and the backup's batteries were in too poor condition to work. As with most disasters there were a myriad of contributing factors. After looking at numerous reports (everyone is certainly trying to make sure their investigations are public) it looks like:

    1. Familiarity breeds contempt. Alarms shut down or ignored partly because of annoyance and partly because incorrect conclusions were made about the state that the well was in, leading to a dangerous situation and disastrous consequences. Not unlike pilots in poor visibility conditions who ignore their instruments and distrust them leading to controlled flight into terrain.

    2. Money trumps safety. There was tremendous corporate pressure to bring the well in. In the oil production world, almost everything is done by contract with petroleum producer owning and operating very little of what is going on. Rigs, crews, services are all contracted to do certain jobs and the competition is fierce. No one wants to be the company that could not do the task or who were late getting it done. Consider: if some different decisions were made and the well was brought in safely but say two or three months late and with several million more dollars spent, we would have never heard about anything and some of the well contractors, including individuals such as the rig boss, contract engineers, may have been looking for work elsewhere.

    I'm interested to see if anything changes after all of the investigations, a la airline safety after a TSB investigation.

  10. Re:Why do they even bother? by omglolbah · · Score: 3, Insightful

    Have a peek at the Norwegian sector. We've been doing this shit since the 70s and try damn hard to not have another Alexander Kielland...
    http://en.wikipedia.org/wiki/Alexander_L._Kielland_(platform)

    The norwegian petroleum oversight is something... The regulators are ruthless when it comes to compliance and better yet... they are not directly controlled by politicians ;)

    The cost of one fuckup is too much to allow people to cut corners.

    I sure as hell dont in my job... and I do it for a living. When we have the option of doing it right, or doing it fast.. we pick right. Every time. I dont care if the customer is pissed at things being delayed. We do it -right-.

  11. Re:Apparently abusing engineers by omglolbah · · Score: 2

    Unfortunately, a single alarm configuration on a "tag" could cost anywhere from 10k to 100k dollars.

    The configuration isnt all that hard or time consuming but the testing of the system after the modification is brutal. At least here where it has to be certified to be allowed into operation ;)

  12. Re:A perl script? by omglolbah · · Score: 2

    Operator: "I cant do that, that has to be run through the PCDA office and certified by the technical staff first."

    Manager: "Ok, I'll submit the paperwork"

    PCDA: "This is a bad idea, lets fix it instead..."

    Or something like that is how it goes here :p
    If it even passes the manager. Most of the time the technical staff handles the alarms without telling any 'manager'. The operator responsible for the shift has authority over the day to day operation without any manager interference.

    You cant operate if non-techies have more control than the techies over tech questions. It has been tried and abandoned ;)

  13. I know BP leased the rig, but come on by AGMW · · Score: 4, Interesting
    it was Transocean that owned and operated the rig?, so perhaps the story could better be titled:-

    Transocean Gulf of Mexico Rig, leased to BP, lacked Alarm Systems

    --
    Eclectic beats from Leeds, UK
    handmadehands.co.uk
  14. This means they learned nothing by magus_melchior · · Score: 4, Interesting

    They had this exact problem with Texas City-- they didn't do maintenance on the systems, so a subsystem overfilled with volatile hydrocarbons with no alarms going off at all-- and when one alert sounded at the monitoring area, they ignored it. They didn't invest the (relatively) small cost of installing a flare (to burn off excess), so the excess hydrocarbons spilled out into the open. Cost-cutting and an incredibly cavalier approach to maintenance from the London management generated a fucking fuel-air bomb in Texas.

    This is one instance where the Brit management, when they changed to Hayward, should have told their investors to "fuck off-- er, give us a few years" and spend the necessary money to get their facilities up to snuff, or decommission the facilities that are too costly to maintain. Alas, profit motive proved more powerful than basic empathy or responsibility.

    --
    "We are Microsoft. You shall be assimilated. Competition is futile."
  15. Re:Apparently abusing engineers by omglolbah · · Score: 2

    Doing the change: 3-4 hours of work.

    Organizing the update to the controller in the field?
    - Requires a look into what could be influenced by the change
    - Requires in some cases an 'offline' load of the controller which can only be done at a time of a maintenance downtime (once a year at most, sometimes every 2-4 years)

    Documentation:
    - Documentation of what functionality changes for operators
    - Update of system configuration diagrams
    - Update of various tag info in the plant documentation system

    Install:
    - A job package must be written detailing every change made to the system.
    - A test package must be written with a full test suite to check that nothing broke during the change. People make mistakes and this is important.

    Now... How much will all this cost?

    When I'm working on jobs like this the company I work for charges about 170 bucks an hour...

    4*3 hours (The change, verification and signoff, various overhead)
    5*2 hours (Field work, included travel time etc, x2 for 2 people)
    8*3 hours (documentation, x3 due to document controllers, various overhead)
    6*2 hours (job/test package)
    5*5 hours (testing)

    83 hours, 170 bucks an hour, 14110 USD.

    This is a fairly average estimate of what something would cost on -our- side of a very small change. If hardware is involved it rapidly skyrockets in cost.

    In addition there is a myriad of people that need to check and verify the change on the -other- side of the fence. Namely the owner and/or operator of the plant.

    All these time-consuming road-blocks put in place are barriers against making changes that could breach safety. They look arcane and silly to quite a lot of people but they are there for a reason.

    Most of the accidents where I work happen when someone do a quick tiny change. One that "wont cause any issues" except that it turns out it does.

    To see why small changes can have huge impacts have a look at this book: http://www.amazon.com/What-Went-Wrong-Histories-Disasters/dp/0884150275

    I realize it would be horribly boring reading for anyone not interested in it :p

  16. Re:how much did that cost by thegarbz · · Score: 2

    The problem is who is the competent professional who is working on alarms?

    Is it the maintenance team who is backlogged with bullshit alarms that go off under normal process conditions because someone decided that it would work to prevent some disaster which may occur?
    Is it the process / technical team who decided yet another alarm will be cheaper than re-designing the process to meet the safety guidelines?
    Is it the console operator who has gone mental at the alarm going off constantly in the middle of the night and has requested the bypass?
    Is it the control engineer who has approved the bypass for the same reasons without a process safety review?

    Ask different people on what they do with alarms and you'll get different answers even within the same discipline. We have two process safety engineers at our refinery with two distinctly differing opinions. The one thinks advanced warning is god and requests a process alarm be put on everything, and that every instrument becomes a layer of protection. The other wishes that this was 1960 where signals were pneumatic and adding an alarm to the operations console cost a frigging fortune, because back then we had only sane and highly critical alarms.

    The former is winning in my opinion. An alarm goes off in our control room every 2 minutes. There is a list of standing alarms for each area on each console operator's screen, and some alarms even go unacknowledged. Many more are bypassed. But all for what? In reality when something goes wrong the operator screen is flooded with priority 1 critical alarms and the operator can acknowledge maybe 1 or 2 before they just start playing on instinct and training to bring things under control.

    If you get to the stage where you are relying on an alarm you have lost. Relying on operator intervention for process safety is the absolute last resort.