Flawed Analysis, Failed Oversight: How Boeing, FAA Certified the Suspect 737 MAX Flight Control System (seattletimes.com)
In one of the most detailed descriptions yet of the relationship between Boeing and the Federal Aviation Administration during the 737 Max's certification process, the Seattle Times reports that the U.S. regulator delegated much of the safety assessment to Boeing and that the analysis the planemaker in turn delivered to the authorities had crucial flaws. 0x2A shares the report: Both Boeing and the FAA were informed of the specifics of this story and were asked for responses 11 days ago, before the second crash of a 737 MAX. [...] Several technical experts inside the FAA said October's Lion Air crash, where the MCAS (Maneuvering Characteristics Augmentation System) has been clearly implicated by investigators in Indonesia, is only the latest indicator that the agency's delegation of airplane certification has gone too far, and that it's inappropriate for Boeing employees to have so much authority over safety analyses of Boeing jets. "We need to make sure the FAA is much more engaged in failure assessments and the assumptions that go into them," said one FAA safety engineer. Going against a long Boeing tradition of giving the pilot complete control of the aircraft, the MAX's new MCAS automatic flight control system was designed to act in the background, without pilot input. It was needed because the MAX's much larger engines had to be placed farther forward on the wing, changing the airframe's aerodynamic lift. Designed to activate automatically only in the extreme flight situation of a high-speed stall, this extra kick downward of the nose would make the plane feel the same to a pilot as the older-model 737s.
Boeing engineers authorized to work on behalf of the FAA developed the System Safety Analysis for MCAS, a document which in turn was shared with foreign air-safety regulators in Europe, Canada and elsewhere in the world. The document, "developed to ensure the safe operation of the 737 MAX," concluded that the system complied with all applicable FAA regulations. Yet black box data retrieved after the Lion Air crash indicates that a single faulty sensor -- a vane on the outside of the fuselage that measures the plane's "angle of attack," the angle between the airflow and the wing -- triggered MCAS multiple times during the deadly flight, initiating a tug of war as the system repeatedly pushed the nose of the plane down and the pilots wrestled with the controls to pull it back up, before the final crash.
[...] On the Lion Air flight, when the MCAS pushed the jet's nose down, the captain pulled it back up, using thumb switches on the control column. Still operating under the false angle-of-attack reading, MCAS kicked in each time to swivel the horizontal tail and push the nose down again. The black box data released in the preliminary investigation report shows that after this cycle repeated 21 times, the plane's captain ceded control to the first officer. As MCAS pushed the nose down two or three times more, the first officer responded with only two short flicks of the thumb switches. At a limit of 2.5 degrees, two cycles of MCAS without correction would have been enough to reach the maximum nose-down effect. In the final seconds, the black box data shows the captain resumed control and pulled back up with high force. But it was too late. The plane dived into the sea at more than 500 miles per hour. [...] The former Boeing flight controls engineer who worked on the MAX's certification on behalf of the FAA said that whether a system on a jet can rely on one sensor input, or must have two, is driven by the failure classification in the system safety analysis. He said virtually all equipment on any commercial airplane, including the various sensors, is reliable enough to meet the "major failure" requirement, which is that the probability of a failure must be less than one in 100,000. Such systems are therefore typically allowed to rely on a single input sensor.
Boeing engineers authorized to work on behalf of the FAA developed the System Safety Analysis for MCAS, a document which in turn was shared with foreign air-safety regulators in Europe, Canada and elsewhere in the world. The document, "developed to ensure the safe operation of the 737 MAX," concluded that the system complied with all applicable FAA regulations. Yet black box data retrieved after the Lion Air crash indicates that a single faulty sensor -- a vane on the outside of the fuselage that measures the plane's "angle of attack," the angle between the airflow and the wing -- triggered MCAS multiple times during the deadly flight, initiating a tug of war as the system repeatedly pushed the nose of the plane down and the pilots wrestled with the controls to pull it back up, before the final crash.
[...] On the Lion Air flight, when the MCAS pushed the jet's nose down, the captain pulled it back up, using thumb switches on the control column. Still operating under the false angle-of-attack reading, MCAS kicked in each time to swivel the horizontal tail and push the nose down again. The black box data released in the preliminary investigation report shows that after this cycle repeated 21 times, the plane's captain ceded control to the first officer. As MCAS pushed the nose down two or three times more, the first officer responded with only two short flicks of the thumb switches. At a limit of 2.5 degrees, two cycles of MCAS without correction would have been enough to reach the maximum nose-down effect. In the final seconds, the black box data shows the captain resumed control and pulled back up with high force. But it was too late. The plane dived into the sea at more than 500 miles per hour. [...] The former Boeing flight controls engineer who worked on the MAX's certification on behalf of the FAA said that whether a system on a jet can rely on one sensor input, or must have two, is driven by the failure classification in the system safety analysis. He said virtually all equipment on any commercial airplane, including the various sensors, is reliable enough to meet the "major failure" requirement, which is that the probability of a failure must be less than one in 100,000. Such systems are therefore typically allowed to rely on a single input sensor.
This judgement is going to run into 10 digits.
[quote]only two short flicks of the thumb switches[/quote]
In the systems you design, typically how many times is the user expected to press the Stop Trying To Kill Us button before the system leaves off trying to do so?
The statement of using only one sensor is scary especially for something that automatically adjust the flight path, but even having two is scary. With 2 sensors how does the software know which is right when they disagree ? For true fault tolerance you need a minimum of 3 sensors
This smells like a collusion between Boeing and the US Government (FAA) in order to rush through certification to be anti-competitive to the Airbus product that was ready for this area.
The resulting hundreds of dead is a testament to failed oversight and cost-cutting, lack of redundancy, and what appears to be basic lying to other air regulators.
Almost certainly this will come back to bite Boeing badly - firstly the lawsuits from the families of the dead, second with sales on what many people would consider a flying death trap of a plane design. It will take a while for this taint to be forgotten, assuming that it is fixed, redundant systems are installed on all planes, and that they pass more robust certification processes around the world.
Forget the revolving door between the aerospace industry and the FAA - Boeing took out the middleman by convincing the government to let it self-regulate, even on matters of extreme importance like the airworthiness certification of aircraft. It's a win-win: Boeing wins because they reduce R&D and materials costs in getting subpar designs certified that otherwise would be rejected. Politicians win because they get their healthy campaign donations. The only people who lose are the ones who screamed for their lives as their plane plummeted to the earth.
> Yet black box data retrieved after the Lion Air crash indicates that a single faulty sensor -- a vane on the outside of the fuselage that measures the plane's "angle of attack," the angle between the airflow and the wing -- triggered MCAS multiple times during the deadly flight, initiating a tug of war as the system repeatedly pushed the nose of the plane down and the pilots wrestled with the controls to pull it back up, before the final crash.
Jesus, what a nightmare. And, I'm sure, no way of turning off the MCAS even though it was clearly malfunctioning. That has to be the worst last moments for a pilot, ever.
I read in a different article that the reason for the airframe design has its roots in the way airports were designed decades ago. Before they had those mobile tunnels that connected between the terminal and the plane, passengers had to walk out to the plane and ascend on a portable stairway. To make boarding easier, the original 737 was designed to be lower to the ground, so there wouldn't be as many steps to board. That part of the 737 design was never changed, and it made the airframe changes for the Max very awkward to implement. Hence the necessity for something like the MCAS, and hence the current mess.
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
> "Going against a long Boeing tradition of giving the pilot complete control of the aircraft, the MAX's new MCAS automatic flight control system was designed to act in the background, without pilot input"
Or notify them either, it seems. Or be disabled when it erroneously kicks in over 20 times causing unexpected dives. Fuck everything about this system. Even if they fix it I'm not flying on any aircraft that has this.
> "this extra kick downward of the nose would make the plane feel the same to a pilot as the older-model 737s"
And that's also ridiculous. Because of the change in the engine configuration it is an aircraft that handles differently. "Compensating" so the pilot doesn't know the difference causes confusion, something you don't need when in charge of a passenger jet. Do they make 747s feel like you're flying a TriStar? Of course not.
"Wait. Something's happening. It's opening up! My God, it's full of apricots!"
Part of the problem is Boeing didn't want pilots to have to retrain and certify under a different type of aircraft.
So they've jiggled things around to make it look like it's just like any other 737, but it now has different flight characteristics.
So now Boeing has created a situation where they wanted this to appear seamless to the pilots, but that it apparently doesn't work and is anything but seamless to the pilots. They took something which wasn't fly by wire, and made it fly by wire.
What we're seeing now is a case where the FAA let Boeing decide there was no material difference for pilots, when there actually was ... in which case their attempt to not have to force pilots to re-certify in type has now potentially led to two crashes.
When the pilot is saying up, and the system is saying down ... bad things happen.
And clearly, despite Boeing saying it would fly exactly the same, it doesn't.
The safety analysis:
...
"1) Understated the power of the new flight control system, which was designed to swivel the horizontal tail to push the nose of the plane down to avert a stall. When the planes later entered service, MCAS was capable of moving the tail more than four times farther than was stated in the initial safety analysis document."
"2) Failed to account for how the system could reset itself each time a pilot responded, thereby missing the potential impact of the system repeatedly pushing the airplane's nose downward."
"3)
I think this is the most important story on Slashdot in a long time.
The article linked by Slashdot is the best, deepest story in a long time: Flawed analysis, failed oversight: How Boeing, FAA certified the suspect 737 MAX flight control system.
Just like how the FDA relies on the drug companies to run all the tests, submit supporting docs, etc.
Why not just use a stick pusher, like any other non-FBW aircraft with stall issues? Design it so it can be overridden with appropriate back force on the control wheels. Using trim for this is stupid, since with full down trim, you might not have enough elevator authority to recover quickly from a dive (i.e. even if the system is turned off, trim may have to be cranked back manually before the plane can recover).
This looks like criminal stupidity on the part of Boeing engineers.
Dude, auto pilot was off. All auto systems that were in the manual were off.
Attitude is only one element of the aircraft's operation -- what about airspeed?
Surely if there was a large disparity between the aircraft's airspeed and its attitude (ie: it is accelerating beyond 500mph while the attitude sensor says it's in a steep climb) then the safety system ought to have recognized that there was a fault condition and triggered an alarm which would allow pilots to disable it with the simple flick of a switch.
Sadly, it seems that this system was never designed to be disabled -- because it was part of the FBW system used to modify the apparent flight characteristics of the new Max8 model so that it would fly like an earlier 737. This was done (so I understand) solely to make the plane more attractive to airlines that didn't want the extra expense of having to get their pilots "rated" for a new aircraft type.
When it comes to the mighty dollar versus safety -- you *know* which one wins :-(
Meanwhile, some people are still saying "it's only a matter of time before a drone brings down an airliner". I wish they'd shut up and focus on the *real* risks that are *actually* claiming hundreds of lives in the aviation industry.
> This system is designed to detect when the pilot has seriously screwed up, pointing the nose way too high.
Not even close! The plane NATURALLY wants to stand on it's tail at high power output. That's what moving the engines CAUSED. To compound the matter, the engine nacelle shape itself at certain AoA adds to the lift which can exacerbate the problem till it's no longer recoverable. Put your RC plane near vertical and watch what happens... (Well, RC planes generally have massive imbalance of thrust to weight ratio unlike real planes so doubtful you can actually demonstrate the problem)
"regulation" implies a neutral third party. The Credit Card Industry has PCI. Video Games have ESRB. Movies the MPA. None of those things are as immediately lethal as a busted airplane though.
But I wouldn't call it "regulatory capture" either, since Boeing were left to their own devices. They didn't have anything to capture.
No, what we have here is plain, good 'ole deregulation. These days regulation > deregulation is automatic in most people's minds. Between this, Flint Mi, and the 2008 crash I hope folks are starting to change their minds in that regard.
Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
This issue seems like something the pilots can work around if they know what is going on, which the U.S. pilots seem to.
Based on:
That's a lot of flights they have done with the plane, so it's not like the plane is inherently unsafe
You can't draw that conclusion from the data they presented you. This isn't like a plane that is hard to fly. What we are talking about here is pilots responding to a very specific instrumentation problem. The only relevant statistic for how well U.S. pilots can cope with this is how many times Southwest Cargo pilots have suddenly had MCAS fail on them and try to trim down the nose, and how many times in the face of this problem they successfully disabled the system and landed safely. The total number of hours in the air is entirely meaningless to what your pilots know or are capable of.
so it's not like the plane is inherently unsafe
Indeed the plane is not inherently unsafe, however it presents an incredible risk to crew and cargo when a very specific instrumented failure occurs.
You could blame it on the Obama administration or the Trump administration, but it goes back a long time.
The Federal Aviation Act of 1958 was the original statute allowing FAA to delegate activities, as the agency thinks necessary, to approved private people employed by aircraft manufacturers. Although paid by the manufacturers, these designees act as surrogates for FAA in examining aircraft designs, production quality, and airworthiness. The FAA is responsible for overseeing the designees' work and determining whether the designs meet FAA requirements for safety.
Often old and simpler is far better....
Right until you look at outcomes. You're speaking emotionally from a recent tragic incident. You're not speaking based on data. The airline (along with others such as the process and automotive) industries have had a long downward trend of safety incidents. One of the primary drivers of that has been taking control away from people. As a Boeing noses down to prevent a stall, a car somewhere in the world saves a drive thanks to forward crash avoidance. An operator who mistakenly lowers the level from a high pressure separator is greeted by flashing alarms on his screen and a valve slamming shut in the field to prevent an explosion.
Humans make mistakes, giving them full control is not the answer. It's always worth remembering why this system was built, and how in the past pilots have through their own failure demolished plenty of planes due to putting the aircraft into a stall.
Sidenote: The thing that is really missing here which goes against industry trends is a lack of inherently safer design. A more stable plane is preferable to a plane that is only stable when a certain control system is active.
On June 29th, 2011, the Department of Transport's Office of Inspector General issued a detailed (23 page) audit report that examined the Federal Aviation Authority's approach to Risk Management.
You can read the report directly here.
This report, published in June 2011, documents in stark detail that the approach taken by the FAA - to significantly scale back oversight of aircraft manufacturers - represented significant risk, even if that activity were performed adequately.
In more detail, the report explains how the FAA took the decision to delegate responsibility for the hiring of individuals to serve as "FAA engineers" - essentially the supposedly independent inspectors who are intended to be able to objectively assess the effectiveness of the design and modification procedures conducted by the company that hired them.
If that wasn't bad enough, the report goes on to say that once the FAA had conducted initial inspections [the document quotes a 2 year time window of monitoring] it then stepped back from even an oversight role. In other words, there was no way that the FAA could have had any confidence that the modifications introduced with the 737 MAX aircraft were actually functional as claimed.
If you read around this news story in search of more details, you might find a couple of other relevant pieces of information. Staggering pieces of information...
One is that Boeing's design/development process broke down, so that when the "final" aircraft was reviewed / safety inspected by their in-house "FAA engineer", all the presented paperwork showed that the force imparted on the contol column by MCAS was set at relatively low, original design levels. In truth the design had changed, to the extent that one of the pilots in Lion Air flight incident had been attempting to fight the controls with over 100lbs of force - and had failed to overcome the aircraft's systems.
Another is that the sensor input to the MCAS system that turned out to be closely related to the problem may have been basing decisions on a single, faulty attitude sensor.
Whatever the causes of the two recent failures in terms of the operational characteristics of the two aircraft involved, I think the 2011 Inspector General's report clearly shows that both of these events were clearly avoidable and could have been prevented had the FAA leadership performed their duties responsibly.
You are. This crash merely shows yet again that a badly trained pilot - and many of them are - will crash the aircraft as soon as something unexpected happens. The cycle repeated for 21 bloody times yet the pilot kept fighting the aircraft instead of executing the correct procedure for a runaway stabiliser (essentily flicking two switches and manually cranking the stabiliser in the correct position).
Bad pilots are a fact of life, hence the only way to protect passengers from pilots is more automation, not less.
"It's such a fine line between stupid and clever" -- David St. Hubbins, Spinal Tap
Yea, just so you know.. My father was an avionics and radio mechanic for a major US airline for 38 years, including a decade stent keeping flight simulators running at their pilot training center (to which I got to regularly go and "fly" the big sims), so I grew up around airplanes all my life. I also worked as an avionics engineer on a Navy fighter aircraft and I've done some private flying on my own. I'm not a expert on Boeing's avionics or modern flight control systems, but I do have a few clues about how they work.
"File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
Everything will be in the paper trail. That, and dead bodies, is a conviction.
If not in the USA, then in any other country willing to prosecute on behalf of their dead citizen.
Sure, but if said person is not IN said country and extradition treaties are not in existence, what does it matter? Not a whole lot.
I can go to Sealand and get a judgment, but who's going to enforce it? Who's going to honor the judgment in the USA? It's not like you can contact the local Sherriff and get him to enforce a judgment from outside the country.
Also, don't forget there is a vast difference between civil judgments (i.e. money awards) and criminal charges.
"File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
I love software testing, but I quit and moved on cause the field just doesn't get the respect/resources is needs.
Developers and managers are always trying to "reign in" the testing staff and make them stick to a stupid script --
written by the same developers that made the mistakes in the first place.
My most important bug discoveries were almost always the result of informal testing, or thinking about the test script
and "trying something" that wasn't on the script. Overnight "random monkey testing" with the automated test harness was
very effective at finding real world problems -- but invariably got a rebuke from some manager, "Why were you doing that?"
This sounds a lot like that, but with the added bureaucracy of Aerospace+gov't.
The development process then adapts to minimize bureaucracy, instead of maximizing safety.
So as I see it, one of two things happened:
1. There was a test engineer somewhere who thought about these failure modes before the first crash. He was ignored and didn't have the power to escalate the issue.
2. The tests were stupid and were run by stupid people.
There were enough red flags -- I think it was #1.
Test Engineers -- throw off your chains !
The safety of the world depends on you.
There's a good chance the aftermath of this is going to bankrupt Boeing.
The evidence for gross engineering negligence is piling up, and they are not going to live through the results.
And my ex-wife likely was responsible for the OS that the plane was using. Certification is backwards. The company making the OS or plane or drug should not be paying for the certification. The buyers of the product need to group together to do it. When I did security certification at IBM no one ever failed. Our customer was the maker of the product so we couldn't fail them. We almost never asked the customer to make changes (and when we did we never verified that they did make the changes), all the certification process was about getting the paper work correct. For the OS certification it might actually be worse. The certifiers probably aren't very good programmers. Their tests are running automated code checkers and running a subset of the tests the OS maker made. One really bad mistake my ex's team made was misunderstanding a processor errata spec on cache misses. A non-trivial percentage of the worlds aircraft were nearly grounded because of that*. My ex's team had misread the errata and the certification house had relied on her teams interpretation of the errata (or more likely had no clue what it meant).
Critical systems don't allow free() so all non-stack memory will be in static locations. Someone was able to write a program to analyse the executable images to determine if this particular cache miss would ever happen. Turned out that no production systems were affected. The scary part though is change the length of a single text string could trigger this problem.
On a side note, this story from the Seattle Times shows how important investigative reporting is to society. If the government ever gets serious about regulating private enterprise again, it will be due to stories like this, and the resulting public outrage. We are yet again in their debt.
"One current FAA safety engineer said that every time the pilots on the Lion Air flight reset the (trim) switches on their control columns to pull the nose back up, MCAS would (reset its 0 degree reference and) have kicked in again and “allowed new increments of 2.5 degrees.”
“So once they pushed a couple of times, they were at full stop,” meaning at the full extent of the tail swivel, he said.
So in summary a system FAA-certified on the basis of being able to adjust nose-down trim by 0.6 degrees could actually, (after a few cycles of the pilot correcting it a little bit with trim up), command full nose-down trim, about 5 or 6 degrees tailplane tilt.
All of this relying on input from a single angle-of-attack sensor. Get this, the plane has two such sensors, one on each side, but the MCAS only uses input from one of them!!! ! !! ! ! ! ! What the hell? If you use two of them, then your software can check if they diverge, and disable systems relying on the input, and warn the pilots. That is some criminally bad development cost saving judgement there.
Where are we going and why are we in a handbasket?
It isn't just the FAA, this is a problem with many if not MOST of the Federal Regulatory agencies....
Look at the FDA rosters, and you can easily see why we won't ever get sensible food regulations/recommendations the would actually help address obesity, etc....in the US.
Light travels faster than sound. This is why some people appear bright until you hear them speak.........
Boing is at fault. They should have made it mandatory and presented it as a major system which could lead to major lethal problem in case of misunderstanding or failure or mishandling. Instead they made it an option, a "don't worry not too important" case. They are the one knowing the consequence, so they are the ones which should have insisted. But by the sound of it, it was passed off as a minor problem or no accent was really put on it.
C. Sagan : A demon haunted world:
http://www.amazon.com/gp/product/0345409469/
visit randi.org
One in 100,000 what? Seconds, minutes, hours, lifetimes?
It is stupid to make something that can kill people rely on a single input sensor. I programmed experimental tests in nuclear reactors and we always had multiple inputs (thermocouples, flow sensors, etc.) and had sanity checks on the values to identify failed equipment.
Seems like Boeing's software could have taken more things into consideration than just the angle of attack? What about speed, altitude, rate of climb/descent, etc.
"Almost every wise saying has an opposite one, no less wise, to balance it." - George Santayana
Boeing did all these dodgy hardware and software hacks just to avoid the time and cost of certifying a new type. This was a panicked rush to market, to compete with Airbus 320neo. Which isn't crippled by stubby landing gear like the 737, so its engines can placed in an inherently aerodynamically stable position.
Because it wasn't a new type, FAA did not require that pilots be certified. And furthermore Boeing buried the details of how to fully return the plane to manual control, because that would conflict with the story they told the FAA about unchanged flight characteristics. Unfortunately for all involved, Max 8 really did have a new flight characteristic: falling out of the sky under computer control.
So yes, Boeing is going to pay out the biggest settlement in aviation history. There is just no way to escape culpability. And we have a huge indictment of Trumpist deregulation too: industry didn't win by weakening FAA oversight, rather it lost big league.
When all you have is a hammer, every problem starts to look like a thumb.
I'm not a big fan of MBAs, but this was a pretty long and complicated chain of errors. From what I gather: Boeing wanted to keep the 737's low ground clearance but needed to put bigger engines on to match the efficiency of the new A320s, which meant changing the aerodynamics. Boeing also wanted pilots to be able to do a simple difference training course, rather than have to recertify on a new aircraft, so they invented MCAS. The engineers must have figured that it was a supplemental system, and easy to turn off if it malfunctioned, so they chose to make it kick in aggressively rather than conservatively (either sensor says go, rather than both sensors say go). They also made it harder to turn off than the old system, probably by accident. Then Boeing decided not to mention the new system to pilots in that difference course, to avoid confusing them.
Lots of errors to go around. Some are definitely cost saving, but some are probably a result of not enough whole-system oversight. The decision to go based on one sensor is a bit mystifying. There are already two AoA sensors on the aircraft, and lots of other ways of cross checking them. In fact, Boeing is releasing a software update to add all that cross checking in, so it's not even a hardware limitation.
The 737 MAX isn't actually aerodynamically unstable in normal flight. Any airliner, including all the 737s, with the standard under-the-wing engines will have off-axis thrust that will add a bit of pitch up. The aircraft is designed to compensate for that in normal flight, but in a stall if the pilot gooses the engine it can make it impossible to recover. 737 pilots (including the older model) are trained NOT to increase throttle in a stall because of it. The MAX handles differently in that situation, so they added MCAS so the pilots wouldn't have to be trained in a new stall recovery procedure.
The term is "regulatory capture", and it's been blamed for the Deepwater Horizon incident, and Wall Street's shenanigans.
From that second link, "the process by which regulatory agencies eventually come to be dominated by the very industries they were charged with regulating. Regulatory capture happens when a regulatory agency, formed to act in the public's interest, eventually acts in ways that benefit the industry it is supposed to be regulating, rather than the public."
This crash merely shows yet again that a badly trained pilot - and many of them are - will crash the aircraft as soon as something unexpected happens.
Your post merely shows that you are an idiot. The pilots were properly trained, however the MCAS and means of disabling it are undocumented, and Boeing claimed that pilots did not need to be retrained for this version of the 737.
When all you have is a hammer, every problem starts to look like a thumb.