Slashdot Mirror


Flawed Analysis, Failed Oversight: How Boeing, FAA Certified the Suspect 737 MAX Flight Control System (seattletimes.com)

In one of the most detailed descriptions yet of the relationship between Boeing and the Federal Aviation Administration during the 737 Max's certification process, the Seattle Times reports that the U.S. regulator delegated much of the safety assessment to Boeing and that the analysis the planemaker in turn delivered to the authorities had crucial flaws. 0x2A shares the report: Both Boeing and the FAA were informed of the specifics of this story and were asked for responses 11 days ago, before the second crash of a 737 MAX. [...] Several technical experts inside the FAA said October's Lion Air crash, where the MCAS (Maneuvering Characteristics Augmentation System) has been clearly implicated by investigators in Indonesia, is only the latest indicator that the agency's delegation of airplane certification has gone too far, and that it's inappropriate for Boeing employees to have so much authority over safety analyses of Boeing jets. "We need to make sure the FAA is much more engaged in failure assessments and the assumptions that go into them," said one FAA safety engineer. Going against a long Boeing tradition of giving the pilot complete control of the aircraft, the MAX's new MCAS automatic flight control system was designed to act in the background, without pilot input. It was needed because the MAX's much larger engines had to be placed farther forward on the wing, changing the airframe's aerodynamic lift. Designed to activate automatically only in the extreme flight situation of a high-speed stall, this extra kick downward of the nose would make the plane feel the same to a pilot as the older-model 737s.

Boeing engineers authorized to work on behalf of the FAA developed the System Safety Analysis for MCAS, a document which in turn was shared with foreign air-safety regulators in Europe, Canada and elsewhere in the world. The document, "developed to ensure the safe operation of the 737 MAX," concluded that the system complied with all applicable FAA regulations. Yet black box data retrieved after the Lion Air crash indicates that a single faulty sensor -- a vane on the outside of the fuselage that measures the plane's "angle of attack," the angle between the airflow and the wing -- triggered MCAS multiple times during the deadly flight, initiating a tug of war as the system repeatedly pushed the nose of the plane down and the pilots wrestled with the controls to pull it back up, before the final crash.

[...] On the Lion Air flight, when the MCAS pushed the jet's nose down, the captain pulled it back up, using thumb switches on the control column. Still operating under the false angle-of-attack reading, MCAS kicked in each time to swivel the horizontal tail and push the nose down again. The black box data released in the preliminary investigation report shows that after this cycle repeated 21 times, the plane's captain ceded control to the first officer. As MCAS pushed the nose down two or three times more, the first officer responded with only two short flicks of the thumb switches. At a limit of 2.5 degrees, two cycles of MCAS without correction would have been enough to reach the maximum nose-down effect. In the final seconds, the black box data shows the captain resumed control and pulled back up with high force. But it was too late. The plane dived into the sea at more than 500 miles per hour. [...] The former Boeing flight controls engineer who worked on the MAX's certification on behalf of the FAA said that whether a system on a jet can rely on one sensor input, or must have two, is driven by the failure classification in the system safety analysis. He said virtually all equipment on any commercial airplane, including the various sensors, is reliable enough to meet the "major failure" requirement, which is that the probability of a failure must be less than one in 100,000. Such systems are therefore typically allowed to rely on a single input sensor.

14 of 471 comments (clear)

  1. Regulatory capture at its worst by JoeyRox · · Score: 4, Informative

    Forget the revolving door between the aerospace industry and the FAA - Boeing took out the middleman by convincing the government to let it self-regulate, even on matters of extreme importance like the airworthiness certification of aircraft. It's a win-win: Boeing wins because they reduce R&D and materials costs in getting subpar designs certified that otherwise would be rejected. Politicians win because they get their healthy campaign donations. The only people who lose are the ones who screamed for their lives as their plane plummeted to the earth.

  2. Re:Questions for the system designers here by Nidi62 · · Score: 4, Informative

    If they had turned off the faulty system, it would have stayed off and they would have been fine. They didn't tell the system to stop. They just counteracted its instructions.

    Especially with the Lion Air crash, it's kind of hard to turn off a faulty system when you aren't aware of it's existence. The crew on the final flight the night before got lucky: they had the same issue but went through a checklist that had the very fortunate but unintended side effect of disabling the system.

    --
    The only thing necessary for evil to triumph is for it to be pitted against a slightly greater evil
  3. VERY defective safety analysis! by Futurepower(R) · · Score: 5, Informative

    The safety analysis:

    "1) Understated the power of the new flight control system, which was designed to swivel the horizontal tail to push the nose of the plane down to avert a stall. When the planes later entered service, MCAS was capable of moving the tail more than four times farther than was stated in the initial safety analysis document."

    "2) Failed to account for how the system could reset itself each time a pilot responded, thereby missing the potential impact of the system repeatedly pushing the airplane's nose downward."

    "3) ...

    I think this is the most important story on Slashdot in a long time.

    The article linked by Slashdot is the best, deepest story in a long time: Flawed analysis, failed oversight: How Boeing, FAA certified the suspect 737 MAX flight control system.

  4. Re:Questions for the system designers here by michelcolman · · Score: 5, Informative

    They were flying by hand. The problem is that the MCAS system is designed to add extra control inputs to the pilot's inputs to make the airplane behave the same as older 737s. And at high angles of attack (or when it thinks the angle of attack is high) it pushes the nose down to prevent a deep stall which would otherwise be a serious risk on this variant of the 737 due to the placement of its engines.

    The part about single sensors being allowed if the chance of failure is less than 1 in 100000 is the biggest bullshit I've ever heard in my life as an airline pilot. If there are hundreds of parts that all have a 1 in 100000 chance of failure, that means failures will happen quite often, especially with thousands of planes flying around. And indeed failures do happen regularly, that's why airplanes normally have loads of redundancy. Airspeed, for example, is measured on most airplanes by three different pitot tubes that feed into three different air data computers that constantly compare their data. If one of them is different, it shuts down and throws a failure message.

    That's just one example of many. Airliners have two fuel pumps per tank, several isolated hydraulics systems, several electrical systems with equipment spread over many buses with fault monitoring on all of them, etcetera.

    If an engineer designs a plane so it overrides pilot inputs and pushes the nose down based on input from a single sensor, that engineer deserves to go to prison and be barred from practicing engineering for the rest of his life. In aviation, this kind of screw-up is simply unforgiveable.

    Boeing is currently testing the common sense fix which crosschecks the angle of attack with other data (airspeed, inertial reference system, attitude,...), which is what they should have done from the beginning.

    By the way, Airbus made a similar mistake not that many years ago, which was fixed with extra procedures and software updates. One would have expected Boeing to learn from Airbus' mistakes...

  5. Re:wrestling with automatic systems by darkmeridian · · Score: 4, Informative

    There's an auto-trim cut-out switch that shuts off MCAS. The pilots on the Lion Air flight kept on manually adjusting the trim (correctly diagnosing the problem as an auto-trim issue) but didn't cut off the auto-trim system. The penultimate flight crew on the same Lion Air jet also experienced the same problem, but disabled auto-trim and landed.

    --
    A NYC lawyer blogs. http://www.chuangblog.com/
  6. Re: Auto pilot was off. by bobbied · · Score: 5, Informative

    Auto pilot has nothing to do with this.

    Except in the 737 MAX it kind of does. The mode of the adjustments used by the autopilot and the stall avoidance system is by adjusting the aircraft trim by turning the rear horizontal stabilizer trim jack screw. Also, I'm told that the same sensors used by the autopilot, or at least some of them, are used by the stall avoidance system.

    However, turning off the autopilot doesn't disengage the stall avoidance system. It keeps doing it's thing regardless.

    The problem here is that the aircraft is pretty much flyable even with the system malfunctioning, IF you understand what's happening and how to counter it. IF you don't understand what's happening though, the aircraft becomes unstable and your natural tendency is likely to do the wrong thing.

    There are a number of non-intuitive things about flying that pilots must be trained to do. Stall/spin recovery is one of these things. When the nose of the aircraft fall though the horizon when you are tying to pull it up, the tendency is to pull harder, but when you are stalled, the right thing to do is push the nose down, stay coordinated and add power if you can. If you go with your natural tendency, and keep pulling, you are going to spin it eventually, which is MUCH worse. So, pilots practice this... A LOT... Fly into a stall, or nearly a stall, recognize how the aircraft feels as the AOA approaches the limit and then when it actually stalls, immediately do the right recovery... How do I know? It was one part of my check ride that I nearly failed when I went for my license. The Examiner said I spun on departure stalls (i.e. didn't stay in coordinated flight on stall so it broke one side first) but let me pass when he tried it twice and spun himself because the C150 was so badly rigged. Made me do 5 hours of departure stall and spin recovery training as a condition of granting my license.

    --
    "File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
  7. Re:Now I am even more worried... by ceoyoyo · · Score: 4, Informative

    With two sensors, if they disagree, you scream and don't do anything. The human then has to decide what's going on. That scenario is fine (even desirable) for a supplemental system like the MCAS. It's very, very unlikely that both of the sensors would get stuck in the same position, although you'd want to make sure that doesn't happen if some twit leaves a protective cover on them or something.

    A really critical system, that can't be shut off, should have triple redundancy.

  8. Re:Questions for the system designers here by thegarbz · · Score: 4, Informative

    A system designed to overcome aerodynamic flaws of larger engines is not a major failure scenario?

    Of course it is, but what is the safe action? Return control to the pilot when the system is designed to actively kick in to prevent the pilot stalling, or to maintain control in the face of being wrong (which is what happened here).

    what pilot would trim up 21 times before disconnecting auto pilot and flying by hand while figuring out what is going on. This is showing the pilot is way to reliant on computers rather than hand flying the plane

    You made a dangerous assumption and misplaced your blame. Autopilot was not enabled here. Engaging autopilot disables MCAS and disabling autopilot enables MCAS, which comes back to the pilot training component of the Lion Air investigation. The system as designed is too complex to disable under stress.

    Who's at fault here, Boeing or the country with lax pilot regulation?

    With your second point specifically identified early in the investigation, Boeing. They are the ones who provide the pilot training materials for their planes.

  9. Re:Now I am even more worried... by Anonymous Coward · · Score: 2, Informative

    Technically at 1 failure out of 100k makes this a seven 9's system. That's on pair with medical devices and aerospace systems. That's a very stable system in general.

    To put that into perspective, if you were running trying to run a system with a seven 9's uptime that ran 24/7/365 you would only have outage of about 36 mins over the course of a year. These are very stable and dependable systems.

    99.99999% ("seven nines") is only 3.16 seconds per year, not not 36 mins. That is closer to 4 nines, which might be fine for Facebook, but not the plane that I'm getting on.

  10. Dept of Transport - OIG Report by ytene · · Score: 5, Informative

    On June 29th, 2011, the Department of Transport's Office of Inspector General issued a detailed (23 page) audit report that examined the Federal Aviation Authority's approach to Risk Management.

    You can read the report directly here.

    This report, published in June 2011, documents in stark detail that the approach taken by the FAA - to significantly scale back oversight of aircraft manufacturers - represented significant risk, even if that activity were performed adequately.

    In more detail, the report explains how the FAA took the decision to delegate responsibility for the hiring of individuals to serve as "FAA engineers" - essentially the supposedly independent inspectors who are intended to be able to objectively assess the effectiveness of the design and modification procedures conducted by the company that hired them.

    If that wasn't bad enough, the report goes on to say that once the FAA had conducted initial inspections [the document quotes a 2 year time window of monitoring] it then stepped back from even an oversight role. In other words, there was no way that the FAA could have had any confidence that the modifications introduced with the 737 MAX aircraft were actually functional as claimed.

    If you read around this news story in search of more details, you might find a couple of other relevant pieces of information. Staggering pieces of information...

    One is that Boeing's design/development process broke down, so that when the "final" aircraft was reviewed / safety inspected by their in-house "FAA engineer", all the presented paperwork showed that the force imparted on the contol column by MCAS was set at relatively low, original design levels. In truth the design had changed, to the extent that one of the pilots in Lion Air flight incident had been attempting to fight the controls with over 100lbs of force - and had failed to overcome the aircraft's systems.

    Another is that the sensor input to the MCAS system that turned out to be closely related to the problem may have been basing decisions on a single, faulty attitude sensor.

    Whatever the causes of the two recent failures in terms of the operational characteristics of the two aircraft involved, I think the 2011 Inspector General's report clearly shows that both of these events were clearly avoidable and could have been prevented had the FAA leadership performed their duties responsibly.

  11. Re:Now I am even more worried... by Shotgun · · Score: 3, Informative

    It doesnt' have to cost extra. There are already other sensors that would give signals corresponding to an approaching stall condition that the computer could use to correlate.

    --
    Aah, change is good. -- Rafiki
    Yeah, but it ain't easy. -- Simba
  12. Re:Now I am even more worried... by Shotgun · · Score: 4, Informative

    You can't disable a primary flight control system suddenly. That's what the problem is here. They get data from 2 sensors to determine AOA, one gets anomalous readings but the system doesn't know that. There's no way to know with 2.

    Except, with an airplane, there is. There is GPS data. There is historical telemetry (and by historical, I mean the past ten seconds). There is engine speed data, altitude data, and airspeed data. All of this is already collected.

    If the AoA is increasing, you'd expect the altitude to start increasing, the plane to start slowing, the engine RPM to decrease due to the increase load. All of these would correlate with GPS telemetry. Having the lives of 150 people hang on the reliability of a potentiometer attached to a weather vane is incredibly stupid.

    --
    Aah, change is good. -- Rafiki
    Yeah, but it ain't easy. -- Simba
  13. Re:Questions for the system designers here by Anonymous Coward · · Score: 2, Informative

    If you actually RTFA, a runaway stabiliser does not behave the same way, it doesn't operate in sort of pulses, so it was just luck the previous crew tried that procedure even though it WASN'T a runaway stabilizer, and it didn't feel like one either.

  14. Re:This is going to be one of the biggest lawsuits by Beeftopia · · Score: 5, Informative

    The term is "regulatory capture", and it's been blamed for the Deepwater Horizon incident, and Wall Street's shenanigans.

    From that second link, "the process by which regulatory agencies eventually come to be dominated by the very industries they were charged with regulating. Regulatory capture happens when a regulatory agency, formed to act in the public's interest, eventually acts in ways that benefit the industry it is supposed to be regulating, rather than the public."