Software Bug Caused Qantas Airbus A330 To Nose-Dive

← Back to Stories (view on slashdot.org)

Software Bug Caused Qantas Airbus A330 To Nose-Dive

Posted by Unknown on Monday December 19, 2011 @05:12PM from the bugs-on-a-plane dept.

pdcull writes "According to Stuff.co.nz, the Australian Transport Safety Board found that a software bug was responsible for a Qantas Airbus A330 nose-diving twice while at cruising altitude, injuring 12 people seriously and causing 39 to be taken to the hospital. The event, which happened three years ago, was found to be caused by an airspeed sensor malfunction, linked to a bug in an algorithm which 'translated the sensors' data into actions, where the flight control computer could put the plane into a nosedive using bad data from just one sensor.' A software update was installed in November 2009, and the ATSB concluded that 'as a result of this redesign, passengers, crew and operators can be confident that the same type of accident will not reoccur.' I can't help wondering just how a piece of code, which presumably didn't test its input data for validity before acting on it, could become part of a modern jet's onboard software suite?"

14 of 603 comments (clear)

Min score:

Reason:

Sort:

Re:What about Google driverless car? by timeOday · 2011-12-19 17:19 · Score: 5, Interesting

But the best part is that once you fix a bug in an automated system, it's fixed forever, whereas a fresh new crop of novices hits the roads/skies every day.
There were people against airbags, too, because they killed some people who otherwise wouldn't have died. You work on fixing those things. But whether the system as a whole is worthwhile is judged on whether it saves more than it kills.
don't just wonder, learn by fche · 2011-12-19 17:22 · Score: 5, Interesting

"I can't help wondering just how could a piece of code, which presumable didn't test its' input data for validity before acting on it, become part of a modern jet's onboard software suit?""
How about reading the darned final report, conveniently linked in your own blurb? There was lots of validity checking. In fact, some of it was relatively recently changed, and that accidentally introduced this failure mode (the 1.2-second data spike holdover). (Also, how about someone spell-checking submissions?)
1. Re:don't just wonder, learn by inasity_rules · 2011-12-19 17:33 · Score: 5, Interesting
  
  Mod parent up. Anyhow, information from a sensor may be valid but inaccurate. I deal with these types of systems regularly(not in aircraft, but control systems in general), and it is sometimes impossible to tell with out extra sensors. Its one thing to detect a "broken wire" fault, and a completely different thing to detect a 20% calibration fault, for example, so validity checking can only take you so far. Its actually impressive the failure mode in this case caused so little damage.
  
  --
  I have determined that my sig is indeterminate.
2. Re:don't just wonder, learn by wvmarle · 2011-12-19 18:11 · Score: 4, Interesting
  
  Agreed, valid but inaccurate.
  Though such an airliner will have more than one air speed sensor, no? Relying for such a vital piece of information on just one sensor would be crazy. And that makes it to me even more surprising that a single air speed sensor to malfunction causes such a disaster. But then it's the same kind of issue that's been blamed on an Air France jet crashing into the ocean - malfunctioning sensors, in that case ice buildup or so iirc, and as all sensors were of the same design this caused all of them to fail.
  Another thing: I remember that when Airbus introduced their fly-by-wire aircraft, they stressed that one of the safety features to prevent problems caused by computer software/hardware bugs, was to have five different flight computer systems built and designed independently by five different companies, using different hardware. So that if one computer has an issue causing it to malfunction, the other four computers would be able to override this. And a majority of those computers should agree with one another before an airplane control action would be undertaken.
Re:What about Google driverless car? by Geldon · 2011-12-19 17:29 · Score: 5, Interesting

It's so interesting to see people's reaction to the whole driver-less car thing. It's incredible to see the kind of ethical thought-experiment that must necessarily go through everyone's mind when they come to this conclusion: How many lives must be saved before I will tolerate someone being brutally slain by a malfunctioning computer?
Every day, children are run down by drivers who are not paying attention, tired, drunk, or just plain don't have time to react. Since a driver-less car is incapable of being drunk, tired, or distracted, then it's a safe bet that they'll be much better at avoiding those accidents that can be avoided. But the reality is that the latter scenario (no time to react) would still lead to the deaths of many children (and others!).
At what point does it become "worth it"? When the driver-less car causes 1/10th as many fatalities? 1/100th? 1/1,000th? How many human deaths must be prevented by letting computers drive cars before we're willing to accept 1 single death by those same computers?
It's a real-life example of the "Trolley Problem"
http://en.wikipedia.org/wiki/Trolley_problem
Re:What about Google driverless car? by HeavyDDuty · 2011-12-19 17:32 · Score: 4, Interesting

nothing in software is ever free of bugs. just because it's a bug-fix doesn't preclude the possibility of the bug-fix itself (or its side effects) from introducing new bugs, or being an incomplete fix which just happens to pass whatever inadequate test was thrown at it.
Re:it's more complicated than that by holophrastic · 2011-12-19 17:37 · Score: 5, Interesting

yup. all the while forgetting that the while altimeter shows altitude, it rarely actually measures distance to the ground, it measures air pressure, and then assumes an aweful lot.
Re:What about Google driverless car? by thisnamestoolong · 2011-12-19 17:49 · Score: 4, Interesting

This is such a common fallacy -- we would expect an AI driver to be fucking perfect before we would ever call it "safe". Sure, they will have bugs, and people will die. But they will have nowhere near as many bugs as the meat computer that we have in our heads. Amazing as it is, the human brain is simply not meant for the types of tasks that we often apply it to, and as such, tens of thousands of people die on the road each year. Even if the adoption of driverless cars cut that down to 1% of the current death rate, people would still be screaming about the cars killing us. George Carlin was right; some people are really fuckin' stupid.

--
To the haters: You can't win. If you mod me down, I shall become more powerful than you could possibly imagine
Re:What about Google driverless car? by DigiShaman · 2011-12-19 19:00 · Score: 4, Interesting

I think you're looking at it all wrong. This has nothing to do with a comparative death ratios. This has everything to do with liability. At the end of the day, people want a legitimate target to point their finger at regardless of the fact injury or death could have been prevented. If people are allowed to take Google to court and render justice, then I'm sure this new automated driving technology would be ok in their minds. OTOH, if Google is given sanctuary from public lawsuits, hell no!

--
Life is not for the lazy.
I had the problem once by Anonymous Coward · 2011-12-19 19:33 · Score: 5, Interesting

Posting anon because I moderated.
I had a very similar problem once with firmware on a TI DSP. The symptom was that a peltier element for controling laser temperature would sometimes freak out and start burning so hot that the solder melted. After some debugging, it turned out that somewhere between the EEPROM holding the setpoint, and the AD converter, the setpoint value got corrupted.
The cause turned out to be a 32 variable that was un-initialized, but always set to 0 by the stack initialization code.
Only the first 16 bits were filled in because that was the value stored in the EEPROM. The programming bug was that the other 16 bits were left as is. In >99% of the time, this was not a problem. But if a specific interrupt happened at exactly the wrong moment during initialization of the stack variable, that variable was filled with garbage from an interrupt register value. Since the calculations for the setpoint used the entire 32 bits (it was integer math) it came out with a ridiculously high setpoint.
Having had to debug that, I know how hard it can be if your bug depends on what is going on inside the CPU or related to interrupts.
There may only be a window of less a micro second for this bug to happen, so reproduction could be nigh on impossible.
Re:What about Google driverless car? by EdIII · 2011-12-19 19:36 · Score: 4, Interesting

It's not about choosing one or the other, but hybrid systems operating at the same time.
If you are going to compare quality, the human will win every time. We can give anecdotal evidence about how bad drivers are, but statistics show that driving is not so dangerous that we need to consider stopping it altogether. Really think about it for a second. During your average day, how many really bad drivers did you personally interact with that created a dangerous situation resulting in an accident? Pretty low huh? I would expect so, otherwise insurance would cost thousands and thousands per month, instead of per year.
Humans are not the inferior solution overall right now. Not by far.
It is also not because Google is not perfect either. Specifically, it is because of the time required, and the complexity of shifting control from Google to the driver. Once such a system becomes normal to a driver, their attention is not going to be on the road, but on their interaction with other devices. You cannot reasonably expect a person to be in complete awareness, hands at 10-2, ready in a split second to take control. You would get too bored without immediate feedback, your mind would drift. This would be completely normal too.
This is not to say that the system itself might not be useful, but it would have to be under very controlled conditions excluding human drivers altogether. It could work, provided the shifting of control was at a controlled rate in relatively controlled conditions. Give the human being time to adapt and obtain situational awareness.
As cool as this sounds, it is just not ready to fully replace a human, unless it could perform at a human level or better. The dream of a car that can drive itself completely under all conditions is still some ways away.
The idea of changing carpool lanes over to high efficiency lanes where human control is not allowed seems like a more pragmatic approach that decreases the complexity and uncertainty that the Google system has to deal with. It has very high value as well since it can optimize traffic patterns far better than a human simply because it can cooperate with a much larger number of cars over greater distances. A human could never hope to do that with our inherent limitations.
That system could realize some serious fuel savings and increase productivity by essentially mimicking an airplane in auto pilot mode. The human is really just there to get the system to the point where it can safely transition in and out of a computer controlled lane. That will be extremely advantageous to overall traffic.
Re:What about Google driverless car? by Anonymous Coward · 2011-12-19 19:59 · Score: 4, Interesting

The Airbus will also change the throttle to the engines without moving the throttle levers whereas the Boeing will move the levers to where the computer set the throttle, When the autopilot takes a crap and you put your hands on the throttle, you must remember that the controls are lying to you and act accordingly.
Re:What about Google driverless car? by andyn · 2011-12-19 20:57 · Score: 5, Interesting

Back in my Finnish Air Force days I talked to a captain who had flown the F-18C in his last three active flight years. He told that when you're straight and level in the Hornet and peek over your shoulder you probably see the ailerons swaying back and forth as the computer tries to keep the plane stable.
Re:This is why I like fuzzing by MartinSchou · 2011-12-19 23:20 · Score: 4, Interesting

In theory some permutation of the data should eventually resemble what you describe.
True ... but you may not ever have enough time to hit all the corner cases.
If it's a single 32-bit word, that can cause the issue, then yes, you can go through every single permutation fairly quickly. There are only 4,294,967,296 of them - nothing that a computer can't handle.
Suppose for a moment that the issue is caused, not by one single faulty piece of data, but two right after each-other. Essentially a 64-bit word causes the issue. Now we're looking at 18,446,744,073,709,551,616. Quite a bit more, but not impossible to test.
Now suppose that the first 64-bit word doesn't cause the fault on its own, but "simply" causes an instability in the software. That instability will be triggered by another specific 64-bit word. Now we're looking at 3.40282367 x 10^38 permutations.
Now, keep in mind that at this point, we're really looking at a fairly simple error triggered by two pieces of data. One sets it up, the other causes the fault.
Now let's make it slightly more complex.
The actual issue is caused by two different error conditions happening at once. If they are similar as above, we're now looking at, essentially, a 256-bit word. That's 1.15792089 x 10^77 permutations.
In comparison, the world's fastest super computer can do 10.51 petaflops, which is 10.51 x 10^15, and it would take that computer 0.409 microseconds to go through all permutations in a 32 bit word. About 30 minutes for a 64 bit word. 10^15 years for a 128 bit word and 10^53 years for a 256 bit word.
Yes, you can test every single permutation, if the problem is small enough. But the problem with most software is that it really isn't small.
Even if we are only talking 32 bit words causing the issue, will it happen every time that single word is issued, or do you need specific conditions? How is that condition created? As soon as the issue becomes even slightly complex, it becomes essentially impossible to test for.