Software Bug Caused Qantas Airbus A330 To Nose-Dive
pdcull writes "According to Stuff.co.nz, the Australian Transport Safety Board found that a software bug was responsible for a Qantas Airbus A330 nose-diving twice while at cruising altitude, injuring 12 people seriously and causing 39 to be taken to the hospital. The event, which happened three years ago, was found to be caused by an airspeed sensor malfunction, linked to a bug in an algorithm which 'translated the sensors' data into actions, where the flight control computer could put the plane into a nosedive using bad data from just one sensor.' A software update was installed in November 2009, and the ATSB concluded that 'as a result of this redesign, passengers, crew and operators can be confident that the same type of accident will not reoccur.' I can't help wondering just how a piece of code, which presumably didn't test its input data for validity before acting on it, could become part of a modern jet's onboard software suite?"
There were people against airbags, too, because they killed some people who otherwise wouldn't have died. You work on fixing those things. But whether the system as a whole is worthwhile is judged on whether it saves more than it kills.
"I can't help wondering just how could a piece of code, which presumable didn't test its' input data for validity before acting on it, become part of a modern jet's onboard software suit?""
How about reading the darned final report, conveniently linked in your own blurb? There was lots of validity checking. In fact, some of it was relatively recently changed, and that accidentally introduced this failure mode (the 1.2-second data spike holdover). (Also, how about someone spell-checking submissions?)
It's so interesting to see people's reaction to the whole driver-less car thing. It's incredible to see the kind of ethical thought-experiment that must necessarily go through everyone's mind when they come to this conclusion: How many lives must be saved before I will tolerate someone being brutally slain by a malfunctioning computer?
Every day, children are run down by drivers who are not paying attention, tired, drunk, or just plain don't have time to react. Since a driver-less car is incapable of being drunk, tired, or distracted, then it's a safe bet that they'll be much better at avoiding those accidents that can be avoided. But the reality is that the latter scenario (no time to react) would still lead to the deaths of many children (and others!).
At what point does it become "worth it"? When the driver-less car causes 1/10th as many fatalities? 1/100th? 1/1,000th? How many human deaths must be prevented by letting computers drive cars before we're willing to accept 1 single death by those same computers?
It's a real-life example of the "Trolley Problem"
http://en.wikipedia.org/wiki/Trolley_problem
yup. all the while forgetting that the while altimeter shows altitude, it rarely actually measures distance to the ground, it measures air pressure, and then assumes an aweful lot.
Posting anon because I moderated.
I had a very similar problem once with firmware on a TI DSP. The symptom was that a peltier element for controling laser temperature would sometimes freak out and start burning so hot that the solder melted. After some debugging, it turned out that somewhere between the EEPROM holding the setpoint, and the AD converter, the setpoint value got corrupted.
The cause turned out to be a 32 variable that was un-initialized, but always set to 0 by the stack initialization code.
Only the first 16 bits were filled in because that was the value stored in the EEPROM. The programming bug was that the other 16 bits were left as is. In >99% of the time, this was not a problem. But if a specific interrupt happened at exactly the wrong moment during initialization of the stack variable, that variable was filled with garbage from an interrupt register value. Since the calculations for the setpoint used the entire 32 bits (it was integer math) it came out with a ridiculously high setpoint.
Having had to debug that, I know how hard it can be if your bug depends on what is going on inside the CPU or related to interrupts.
There may only be a window of less a micro second for this bug to happen, so reproduction could be nigh on impossible.
Back in my Finnish Air Force days I talked to a captain who had flown the F-18C in his last three active flight years. He told that when you're straight and level in the Hornet and peek over your shoulder you probably see the ailerons swaying back and forth as the computer tries to keep the plane stable.