Failed Avionics a Possible Cause of BA038 Crash
Muhammar writes "As you may have heard by now, both engines of the Boeing 777 aircraft flight BA038 suddenly cut off without warning at very low altitude and low speed during autopilot-assisted landing at Heathrow. A prompt reaction of the pilots prevented the stall and saved all lives aboard. The crash landing short of the runway tore off the landing gear on impact, and the fuselage plowed a long, deep gouge in the grass. With the investigation ongoing, the available information points to an electronic control problem as the most likely cause of the sudden engine power loss."
In two other instances in large jets of engine failure by fuel starvation (Air Transat 236 and Air Canada 143), the failure of the engines was not simultaneous: one engine kept working for a few minutes longer than the other.
The fact that the engines responded the same way, at the same time, strongly suggests a single point of failure in an electronic flight control system.
Toronto-area transit rider? Rate your ride.
See my journal, I write things there
These OSes typically are not custom designed. (although a few in older aircraft are) There are a few commercial rtoses that are commonly used, they are specially marketed to the avionics industry as conforming the DO-178B standard. The most common would probably be Integrity-178B sold by Green Hills Software and VxWorks 653 Platform sold by Wind River.
I follow several aviation forums regularly and this has obviously been the number one topic since it happened and thought I should share some interesting findings:
A report of an earlier software problem with the 777.
The interesting part:
"a second accelerometer then failing and the latent software anomaly allowing
the ADIRU to once more utilise the previously failed accelerometer
information with its high output values in its computations, resulting in
erroneous acceleration outputs into the flight control outputs but not the
navigation (ground speed, velocity, position, etc.) outputs."
Of the two current theories - i.e. a sotfware issue or contaminated fuel - I'm more inclined to believe a software issue since as a precaution during landing both engines use separate fuel tanks and pumps without crossfeeding and it would be quite a coincidence if two such independent systems failed at the same time. An analysis of the fuel filters will probably reveal a lot. If it indeed turns out to be the computer that failed, it will be somewhat ironic that the first* such accident involves a Boeing even though many have considered the higher degree of automation Airbus scary.
*) Neither the official investigation nor the conspiracy theory blames the computer for the A320 crash in Mulhouse-Habsheim but those who aren't familiar with the conspiracy theory immediately assume that the theory blames the computer since it was the first civilian fly-by-wire.
The first linked article is more-or-less gossip, and gives no reason to blame the avionics. Not to say that it wasn't, but we want some evidence. The second is a much more reasoned article, and gives a number of possibilities, including avionics but also a number of others, all of which is possible. My favourite is fuel contamination - but we shall see.
The simple "running out of fuel" hypothesis is very unlikely. All aircraft are supposed to carry reserves to divert to another airport (not far in this case) plus ninety minutes flying. While cheapo airlines might short-cut on this, I cannot imagine BA doing so. There is no indication that the aircraft had been "stacked" for any length of time, so it shoudl have landed with two hours worth of fuel on board. There have been cases of aircraft being misfueled, but on a regular run between two sophisticated endpoints, this seems unlikely.
Consciousness is an illusion caused by an excess of self consciousness.
Don't think it is just funny... it might be truth. BA is a customer of SPARKAda (as listed in http://www.praxis-his.com/sparkada/customers.asp). I expect the software run by the aircraft is proved to be correct to its specification using that, which is a variant of Ada.
Otherwise, how else would it have been able to cope with the expansion of the airframe during flight? Ok, it was not FBW as we know it today but remember this was an aircfact designed in the early 1960's. It used LOTS of technology only ever used before in Military aircraft.
They probably do. This is the time to whip out An experimental evaluation of the assumption of independence in multiversion programming by Knight and Leveson. It's a 47-page paper, but here's the summary:
Of course, one would think there would be two types of redundancy: The software would be N-version programmed and there would be separate systems for each engine. The chances of two independent N-version-programmed programs failing at the same instant seems particularly low.
It's easy to jump to the it-must-be-the-computers conclusion because PCs are unreliable in everyday use compared to washing machines, cars or compact disk players. But until the accident investigators' report comes out there really isn't much evidence to base speculations upon; the problem could have been anything.
Just my $0.02
"Goodness me, how unlike the FBI to abuse the trust of the American public." -- The Onion
FADEC = Full Authority Digital Engine Control. On the Rolls Royce Trent 800 engine its called an Electronic Engine Control System (EECS).
The article describes the EPR (Engine Pressure Ratio, a measure of the power output) as slowly decreasing in both engines at the same time. If thats true it doesn't sound like fuel starvation. One: the EPR would simply drop to zero, not tail off, and two: the engines are unlikely to both stop at the same time.
There was a 767 that ran out of fuel over the Atlantic some time ago, their salvation was that one engine ran for several minutes after the first one quit. In that case they were feeding off different tanks. I'm not a systems guy but I believe that's the normal way of doing things, because what's the point of having independent engine systems if the fuel source itself for the two engines isn't independent.
The 777 was the first twin to get ETOPS (Extended Twin Operations (or as some call it Engines Turn Or Passengers Swim)) to allow it to operate in situations where it might have to fly for two hours on one engine to get to the nearest airport. To get that certification the engine systems have been scrutinized by the FAA who are, shall we say, detail-oriented people.
Something as obvious as taking the fuel for both engines from the same tank is unlikely to be procedure on that plane.
Having said all that, maybe on landing the fuel system is configured differently than for cruise.
I just don't think it feels like a software thing. They tend to be catastrophic and weird and scary. I like the fuel contamination theory. It was coming from China right? Who knows what gets into fuel in China.
Equine Mammals Are Considerably Smaller
Maybe that's your current thinking, but it doesn't necessarily reflect reality. Turbine engines don't "switch into reverse". They do have thrust reversers, but that's a mechanical device that redirects the exhaust flow. They're typically activated in the "last stages of landing" i.e. after the plane is fully on the ground.
There are a set of interlocks involving both weight being present of the landing gear and the wheels rotating to prevent the reversers deploying.
If a cell phone can do this much damage, why the hell am I allowed to bring one (several even) on a plane?! These days, a swiss army knife will maybe get you as far as row 6 before people dogpile you, and they are confiscated. But a plane has easily 50 cell phones on it at any given time. If the only thing between me and engine failure are passengers dutifully following crew member instructions, then we are all screwed. So I am going to respectfully suggest that you are mistaken, because the alternative seems ludicrous.
Yes it is likely. We are expected to believe that a single consumer grade device caused the simultaneous failure of both engines?
You're right that it's more likely than RF interference. But neither is likely at all.
A software glitch of this type (if that's what it was) has never happened in aviation history. Certainly not in the 10 year history of the 777, with more than 500 of them flying around the world, but not to any other type either.
Also, the engines didn't "fail". The engines were running both before and after the stall (and yes, the aircraft did stall, despite what the article summary says). "Failure" and "failure to respond" are two different things.
In some ways that's even more scary, because it rules out simple explanations like fuel exhaustion. It's one thing for engines to fail, quite another for them to simply ignore control inputs.
First, there were MANY credible witnesses that swore they saw a missile shoot into the sky before the explosion.
a) no, they were not credible, and
b) they by and large didn't claim they saw "a missile".
What they claimed is that they saw a "streak of light" or some variation thereof. Only a few people claimed they saw "a missile", and those people by and large are the people that made it onto the news. So it probably seemed like there were more of them than there were. The news outlets chose the most radical, attention whoring witnesses to put on the air.
But if you read the NTSB report, they break down the witness statements. Out of something like 2,000 witnesses, only a relatively small percentage (I'm remembering it being something like 25%) saw a "streak of light". Of that percentage, about half saw the light going up, half saw it going down. Some saw it going to the left, some going to the right. In other words, none of them had any idea what they were looking at.
This is pretty normal for witnesses to an airliner crash. Nobody's expecting to see what they're seeing, so their mind initially doesn't record things correctly. What the NTSB has to do is filter out the crud and see if there's anything that everybody agrees on. If there is, then they investigate that. In this case, a large enough percentage of people indicated they saw a flash of light, and that ended up supporting the mid-air explosion theory.
But the NTSB never gave any real credence to it being a missile. Neither did the FBI, for that matter. There was just never any evidence. The FBI had pretty much ruled out terrorism within 2 days of the accident.
Obviously you didn't check the website either or you'd know that the site doesn't indicate whether the plane was a 772 or 773, only that it was a 777, of which there are several different types. Other places on the net, including the news sites, say it was a 777-236ER, which is definitely a 772.
In case people are confused by people talking about a BA772 or a 773, these are standard designations. a Boeing 777-200 is referred to as a 772, the 777-300 is a 773, etc. Other common ones you'll find are things like 742 and 744 which designate 747-200s and 747-400s, respectively. Airbus planes also have similar designations.
- The autothrottle system commanded an increase in thrust from the engines which did not respond
- The autothrottle demanded further increases in thrust again with no results
- The PIC commanded an increase in thrust via movement of the throttles, with no result
- The aircraft slowed and subsequently lost height
http://www.aaib.dft.gov.uk/latest_news/accident__heathrow_17_january_2008___initial_report.cfmFor both engines to have not responded to either the autothrottle or manual throttle movements, we are looking at a software issue in either the FADEC or the EMC.
BA does not operate any 777-300 aircraft.
In this case, then, the quote needs to be properly attributed and sourced, which I neglected to do. Apologies. The quote comes from this thread, post #6 by a user named IADCA.
Each engine has its own separate EEC. Each EEC has full authority over engine operation. In the normal mode, the EEC sets thrust by controlling EPR based on thrust lever position. EPR is commanded by positioning the thrust levers either automatically with the autothrottles, or manually by the flight crew.
Engine flameout protection is provided for an auto-relight and rain/hail ingestion. The auto-relight function is activated whenever an engine is at or below idle with the FUEL CONTROL switch in RUN. When the EEC detects an engine flameout, the respective engine ignitors are activated.
Fuel is supplied by fuel pumps located in the fuel tanks. The fuel flows through a spar fuel valve located in the main tank. It then passes through the first stage engine fuel pump where additional pressure is added. It flows through a fuel/oil heat exchanger where it is preheated. A fuel filter removes contaminants. If the filter becomes clogged, the filter will be bypassed, passing fuel directly to the engine. In that case, a Advisory EICAS message "ENG FUEL FILTER L/R" will be displayed.
When main tank fuel pump pressure is low, each engine can draw fuel from its corresponding main tank through a suction feed line that bypasses the pumps.
No - it shows that the specification did not define what should happen with out of range conditions. The use formal specification languages to define what they want the software to do, but it is precisely these sorts of unforeseen circumstances which show that the spec was wrong, and the code only did what was specified.
Consciousness is an illusion caused by an excess of self consciousness.
Trans-Atlantic flights are often 90 minutes of flying time from a suitable runway. Trans-Pacific flights can be 3 hours or more of flying time from a suitable runway. Needless to say, airlines cannot glide with no power for hours. Air Canada Flight 143 (see http://www.wadenelson.com/gimli.html) was estimated to have a glide ratio of 11:1 with both engines windmilling. So from 40,000 ft, the maximum glide distance would have been about 100km. Sink rate was estimated at 2000 ft/sec meaning with all engines out, you will be visiting some destination at sea level within about 20 minutes.
Indeed. If I'm piloting a turbine engine aircraft, I much prefer for the engines to just fail then for them to ignore my commands. Fly-by-wire is pretty cool until the engines ignore your commands and you have no way to shut the fuel off to them.
It is substantially different. (and integrity is different from integrity-178b also)
The 653 in the name is a reference ARINC-653, which is an industry standard that specifies the api that the OS exposes to the user. (Integrity also supports this same api)
I havent used VxWorks 653, but I am very familiar with both Integrity and Intregrity-178b, and there is no question that the latter is a LOT more reliable.
There may be a little bit of code reused in these platforms, but really the name is the same for marketing reasons. (kind of like how windows CE is completely different from the windows you run on your desktop)
Another data point to consider is that the failure was not transient. Normally if you introduce some noise into the channel then you lose some symbols here and there, or the clock even. But the higher level protocols take care of that. Pull the network cable, for example - your SSH session will stay alive for half a minute, until TCP timers run out. I am sure that in an airplane loss of a message will be first noticed and logged, then reported as a potential trouble, and if it continues then some other emergency action will be taken. But if the error ceases to be then the message gets through and you can continue using the controlled device.
Since the malfunction occurred quite far from the airport, and it did not fix itself after the aircraft moved away from a possible jammer location, then in my uneducated opinion the relevant controls just "wedged" somewhere, asking for a hard reset. It will take some Boeing engineers with the diagrams to find out where two independent engine control paths merge or at least get close to each other. And they still have the physical electronics of the airplane, most of it probably undamaged. On top of that they have every single bit from every single flight data recorder, and those are of improved type that record more parameters than usual.
In addition, if the two engines are identical (as they should be) and have the same firmware loaded into their controllers, then the same command sent to both engines could easily take them out at the same time. It could be a fairly complicated sequence, for example, but as long as both engines are operated by another computer (autopilot / autothrottle) then you can be fairly sure that the two engines would be as much in sync with each other as possible, and the "ping of death", so to say, would affect both.
It is indeed far more convenient to blame the pilots, regardless of the real cause. However in this case Boeing and BA and Rolls Royce have no such an easy way out. The airplane was on autopilot when the error occurred. Pilots involved themselves only when they had to, after the failure was apparent. In addition, they have megabytes of data intact on all flight data recorders, and they won't be allowed to change even a single bit of that, since these companies are not the investigators.
No they didn't. The majority of the 777 code is written in Ada, and instead of separate implementations of the same spec, Boeing used the same code for all the redundant hardware systems. A little googling will give you the details.
It may not be just a software bug. It may be that the software cannot handle some unforeseen hardware state, as happened on the Malaysian Airlines incident a few months ago (that incident was a near-miss but did not result in a crash-- the problem was that the software was unable to handle properly bad data coming in from an accelerometer). Whether this counts as a "software bug" or a "hardware failure" I don't know....
You can prove that the software is bug free for any set of foreseen inputs. The question becomes whether there are unforeseen inputs which can cause problems. Suppose for example, that a sensor fails in an unexpected way-- for example shorting a circuit instead of breaking it, or by sending incorrect data to the computer. In essence you not only have to handle valid inputs from sensors, and normal sensor failures, but you also have to handle sensors which fail in unexpected ways, and you also have to handle every possible electrical fault as well. And then you *still* have to make some assumptions about the underlying communictions between the remaining components.
How, here is the real issue:
Software exists only to process information on underlying hardware. When you have failures in that hardware which cause the information to be corrupted, you cannot count on any results on the software. Hence you software can only be proven bug-free within a reasonably limited set of circumstances. Or, in simpler terms, garbage in? garbage out.
LedgerSMB: Open source Accounting/ERP
I think a single software glitch is unlikely to be the cause of the failure. However, best guess at the moment is that the engine issues were software initiated.
You can only mathematically prove that software is bug free given some basic assumptions about hardware performance. If those assumptions fail, then your bug-free software is now buggy because the hardware is buggy and it can't sort out valid from invalid information.
TFA mentions another avionics glitch where a failed accelerometer caused a near accident on a 777 in Australia. The software inappropriately responded to the failure because the failure condition wasn't foreseen.
Most likely the root cause is hardware-related, not software-related. For example, maybe water-based corrosion on some contacts somewhere where the seal was damaged, or a short circuit on some sensor somewhere else. The issue is that this may have triggered failure conditions that were not previously foreseen in the software design.
The 777 has an impressive safety record. However incidents where, say, water gets into circuitry and causes problems, or some previously unforeseen failure situation arises, there will be problems.
As for the "first of its kind" remark-- this is not the first software initiated problem in the 777 if indeed that is the case. It *is* however, the first 777 crash ever. Which ought to make one a little less inclined to question previously unforeseen problems.
LedgerSMB: Open source Accounting/ERP
A nice idea, but commercial airliners have several characteristics that would make that unworkable. First off, in the landing configuration (flaps 30 and gear down), the descent angle would probably be close to 6 or 7 degrees rather than the normal 3 - leading to a descent rate of 2000 fpm or more. In a sailplane (with a very low moment of inertia around the lateral axis), when you command pitch up, the lag between your pulling back on the stick and the airplane rotating to a different angle of attack and increasing lift is almost zero - i.e. near instantaneous response in vertical speed to pitch commands. In a commerical jet, the moment of inertia is much greater, so it takes a few seconds for the plane to rotate to a different angle of attack and thus generate more lift. If you didn't time your flare perfectly, you would smash into the ground quite smartly.
Secondly, if you instead had the airliner attempt to land in a much 'cleaner' configuration with a better glide ratio, closer to 3 degrees, your landing speeds would probably be 50% faster, probably near 200 knots. The required landing distance is proportional to the square of the velocity, so you would need to double the size of existing runways. Not likely....
Third, jet engines have relatively slow response characteristics, particularly from idle (much better than a decade or two ago, but they are still slow compared to piston powered engines); this caused several crashes back in the late 50's and early 60's - pilots would be doing idle thrust approaches, then circumstances called for a go around, and when they advanced the thrust levers, it took a good 10 seconds (or more... DC-9s particularly sucked in that area, from what I remember) for full thrust to be developed... and they didn't have 10 seconds to wait. So, it was decided that jets should approach in a 'thrusted-up' configuration; one where the engines were developing much more than idle thrust throughout the final approach - if go around was required, the time to full power was much smaller. But, to maintain such a 'thrusted-up' configuration, the approach slope had to be shallow (a good idea as I mentioned above), and the airplane had to have a very draggy configuration. The amount of extra lift at a given speed from flaps 15 to 30 is very small, but the additional drag is quite large... that's the reason airplanes take off with very small flap settings (typically 5 degrees), for maximum additional lift with little additional drag, but put out full flaps, with lots of drag, for landing, so the engines stay spooled up until about touchdown.
IAAAE (I Am An Aeronautical Engineer) and to take serious issue with that statement.
According to the Times today, there have been at least 2 reported computer 'glitches' on 777s in the last 3 years. One lowered the airspeed from 270 to 158 knots along with putting the a/c in a 3000'/min climb causing it to stall. The other caused an uncommanded lurch to the right.
There have been numerous other computer (software AND hardware) glitches and failures in many aircraft, some leading to accidents (remember the A320 landing in the woods?) but most detected and corrected by the pilots. A brief search of the AAIB database should show that.
Of course it stalled. It hit the ground short of the runway - the pilots were doing everything possible to get over the fence. After flaring the aircraft, it is usually lowered to the ground. By holding off till stall (at a few metres above the ground), they probably got an extra 20 or 30m of flight. This was probably enough to get the aircraft onto the tarmac where it stopped, easing the evacuation and recovery. It did not, however, stall during flight when the error began.
Is crushing a suspect's child's testicles illegal?
John Yoo: "No, [if] the President thinks he needs to do that."
I doubt the aircraft stalled: a large aircraft like a Boeing 777 will _not_ recover from a stall in 600 ft, and everyone would have been dead. If it stalled at all, it would have been just before touchdown while the crew were trying to arrest whatever sink rate they could before impact.
As for fuel exhaustion - that was ruled out very quickly - plenty of fuel leaked from at least one breached fuel tank. It's the first thing the investigators would have done - look in the tanks and see if there was fuel. That doesn't rule out fuel STARVATION though - you can have plenty of fuel on board, but something stopping it from reaching the engines.
Oolite: Elite-like game. For Mac, Linux and Windows
"Good luck implementing an OS with Ada."
http://www.adahome.com/articles/1998-07/nw_ghs.html
"Written in Ada, RT Secure is a real-time, pre-emptive multitasking microkernel optimized for mission-critical applications that require true hard real-time response."
I'm not going to change your sheets again, Mr. Hastings.