Software Bug Caused Qantas Airbus A330 To Nose-Dive
pdcull writes "According to Stuff.co.nz, the Australian Transport Safety Board found that a software bug was responsible for a Qantas Airbus A330 nose-diving twice while at cruising altitude, injuring 12 people seriously and causing 39 to be taken to the hospital. The event, which happened three years ago, was found to be caused by an airspeed sensor malfunction, linked to a bug in an algorithm which 'translated the sensors' data into actions, where the flight control computer could put the plane into a nosedive using bad data from just one sensor.' A software update was installed in November 2009, and the ATSB concluded that 'as a result of this redesign, passengers, crew and operators can be confident that the same type of accident will not reoccur.' I can't help wondering just how a piece of code, which presumably didn't test its input data for validity before acting on it, could become part of a modern jet's onboard software suite?"
There were people against airbags, too, because they killed some people who otherwise wouldn't have died. You work on fixing those things. But whether the system as a whole is worthwhile is judged on whether it saves more than it kills.
"I can't help wondering just how could a piece of code, which presumable didn't test its' input data for validity before acting on it, become part of a modern jet's onboard software suit?""
How about reading the darned final report, conveniently linked in your own blurb? There was lots of validity checking. In fact, some of it was relatively recently changed, and that accidentally introduced this failure mode (the 1.2-second data spike holdover). (Also, how about someone spell-checking submissions?)
It's so interesting to see people's reaction to the whole driver-less car thing. It's incredible to see the kind of ethical thought-experiment that must necessarily go through everyone's mind when they come to this conclusion: How many lives must be saved before I will tolerate someone being brutally slain by a malfunctioning computer?
Every day, children are run down by drivers who are not paying attention, tired, drunk, or just plain don't have time to react. Since a driver-less car is incapable of being drunk, tired, or distracted, then it's a safe bet that they'll be much better at avoiding those accidents that can be avoided. But the reality is that the latter scenario (no time to react) would still lead to the deaths of many children (and others!).
At what point does it become "worth it"? When the driver-less car causes 1/10th as many fatalities? 1/100th? 1/1,000th? How many human deaths must be prevented by letting computers drive cars before we're willing to accept 1 single death by those same computers?
It's a real-life example of the "Trolley Problem"
http://en.wikipedia.org/wiki/Trolley_problem
nothing in software is ever free of bugs. just because it's a bug-fix doesn't preclude the possibility of the bug-fix itself (or its side effects) from introducing new bugs, or being an incomplete fix which just happens to pass whatever inadequate test was thrown at it.
yup. all the while forgetting that the while altimeter shows altitude, it rarely actually measures distance to the ground, it measures air pressure, and then assumes an aweful lot.
This is such a common fallacy -- we would expect an AI driver to be fucking perfect before we would ever call it "safe". Sure, they will have bugs, and people will die. But they will have nowhere near as many bugs as the meat computer that we have in our heads. Amazing as it is, the human brain is simply not meant for the types of tasks that we often apply it to, and as such, tens of thousands of people die on the road each year. Even if the adoption of driverless cars cut that down to 1% of the current death rate, people would still be screaming about the cars killing us. George Carlin was right; some people are really fuckin' stupid.
To the haters: You can't win. If you mod me down, I shall become more powerful than you could possibly imagine
I think you're looking at it all wrong. This has nothing to do with a comparative death ratios. This has everything to do with liability. At the end of the day, people want a legitimate target to point their finger at regardless of the fact injury or death could have been prevented. If people are allowed to take Google to court and render justice, then I'm sure this new automated driving technology would be ok in their minds. OTOH, if Google is given sanctuary from public lawsuits, hell no!
Life is not for the lazy.
Posting anon because I moderated.
I had a very similar problem once with firmware on a TI DSP. The symptom was that a peltier element for controling laser temperature would sometimes freak out and start burning so hot that the solder melted. After some debugging, it turned out that somewhere between the EEPROM holding the setpoint, and the AD converter, the setpoint value got corrupted.
The cause turned out to be a 32 variable that was un-initialized, but always set to 0 by the stack initialization code.
Only the first 16 bits were filled in because that was the value stored in the EEPROM. The programming bug was that the other 16 bits were left as is. In >99% of the time, this was not a problem. But if a specific interrupt happened at exactly the wrong moment during initialization of the stack variable, that variable was filled with garbage from an interrupt register value. Since the calculations for the setpoint used the entire 32 bits (it was integer math) it came out with a ridiculously high setpoint.
Having had to debug that, I know how hard it can be if your bug depends on what is going on inside the CPU or related to interrupts.
There may only be a window of less a micro second for this bug to happen, so reproduction could be nigh on impossible.
It's not about choosing one or the other, but hybrid systems operating at the same time.
If you are going to compare quality, the human will win every time. We can give anecdotal evidence about how bad drivers are, but statistics show that driving is not so dangerous that we need to consider stopping it altogether. Really think about it for a second. During your average day, how many really bad drivers did you personally interact with that created a dangerous situation resulting in an accident? Pretty low huh? I would expect so, otherwise insurance would cost thousands and thousands per month, instead of per year.
Humans are not the inferior solution overall right now. Not by far.
It is also not because Google is not perfect either. Specifically, it is because of the time required, and the complexity of shifting control from Google to the driver. Once such a system becomes normal to a driver, their attention is not going to be on the road, but on their interaction with other devices. You cannot reasonably expect a person to be in complete awareness, hands at 10-2, ready in a split second to take control. You would get too bored without immediate feedback, your mind would drift. This would be completely normal too.
This is not to say that the system itself might not be useful, but it would have to be under very controlled conditions excluding human drivers altogether. It could work, provided the shifting of control was at a controlled rate in relatively controlled conditions. Give the human being time to adapt and obtain situational awareness.
As cool as this sounds, it is just not ready to fully replace a human, unless it could perform at a human level or better. The dream of a car that can drive itself completely under all conditions is still some ways away.
The idea of changing carpool lanes over to high efficiency lanes where human control is not allowed seems like a more pragmatic approach that decreases the complexity and uncertainty that the Google system has to deal with. It has very high value as well since it can optimize traffic patterns far better than a human simply because it can cooperate with a much larger number of cars over greater distances. A human could never hope to do that with our inherent limitations.
That system could realize some serious fuel savings and increase productivity by essentially mimicking an airplane in auto pilot mode. The human is really just there to get the system to the point where it can safely transition in and out of a computer controlled lane. That will be extremely advantageous to overall traffic.
The Airbus will also change the throttle to the engines without moving the throttle levers whereas the Boeing will move the levers to where the computer set the throttle, When the autopilot takes a crap and you put your hands on the throttle, you must remember that the controls are lying to you and act accordingly.
Back in my Finnish Air Force days I talked to a captain who had flown the F-18C in his last three active flight years. He told that when you're straight and level in the Hornet and peek over your shoulder you probably see the ailerons swaying back and forth as the computer tries to keep the plane stable.
True ... but you may not ever have enough time to hit all the corner cases.
If it's a single 32-bit word, that can cause the issue, then yes, you can go through every single permutation fairly quickly. There are only 4,294,967,296 of them - nothing that a computer can't handle.
Suppose for a moment that the issue is caused, not by one single faulty piece of data, but two right after each-other. Essentially a 64-bit word causes the issue. Now we're looking at 18,446,744,073,709,551,616. Quite a bit more, but not impossible to test.
Now suppose that the first 64-bit word doesn't cause the fault on its own, but "simply" causes an instability in the software. That instability will be triggered by another specific 64-bit word. Now we're looking at 3.40282367 x 10^38 permutations.
Now, keep in mind that at this point, we're really looking at a fairly simple error triggered by two pieces of data. One sets it up, the other causes the fault.
Now let's make it slightly more complex.
The actual issue is caused by two different error conditions happening at once. If they are similar as above, we're now looking at, essentially, a 256-bit word. That's 1.15792089 x 10^77 permutations.
In comparison, the world's fastest super computer can do 10.51 petaflops, which is 10.51 x 10^15, and it would take that computer 0.409 microseconds to go through all permutations in a 32 bit word. About 30 minutes for a 64 bit word. 10^15 years for a 128 bit word and 10^53 years for a 256 bit word.
Yes, you can test every single permutation, if the problem is small enough. But the problem with most software is that it really isn't small.
Even if we are only talking 32 bit words causing the issue, will it happen every time that single word is issued, or do you need specific conditions? How is that condition created? As soon as the issue becomes even slightly complex, it becomes essentially impossible to test for.