Bad Code May Have Crashed Schiaparelli Mars Lander (nature.com)
cadogan west writes "In the accordance with the longstanding tradition of bad software wrecking space probes (See Mariner 1), it appears a coding bug crashed the ESA's latest attempt to land on Mars." Nature reports:
Thrusters, designed to decelerate the craft for 30 seconds until it was metres off the ground, engaged for only around 3 seconds before they were commanded to switch off, because the lander's computer thought it was on the ground. The lander even switched on its suite of instruments, ready to record Mars's weather and electrical field, although they did not collect data...
The most likely culprit is a flaw in the craft's software or a problem in merging the data coming from different sensors, which may have led the craft to believe it was lower in altitude than it really was, says Andrea Accomazzo, ESA's head of solar and planetary missions. Accomazzo says that this is a hunch; he is reluctant to diagnose the fault before a full post-mortem has been carried out... But software glitches should be easier to fix than a fundamental problem with the landing hardware, which ESA scientists say seems to have passed its test with flying colours.
The most likely culprit is a flaw in the craft's software or a problem in merging the data coming from different sensors, which may have led the craft to believe it was lower in altitude than it really was, says Andrea Accomazzo, ESA's head of solar and planetary missions. Accomazzo says that this is a hunch; he is reluctant to diagnose the fault before a full post-mortem has been carried out... But software glitches should be easier to fix than a fundamental problem with the landing hardware, which ESA scientists say seems to have passed its test with flying colours.
Working in a company that makes automotive electronics, I can say that any problem without an obvious hardware assembly cause becomes defined as a software problem.
Faulty sensor causing false readings that cause the software to detect that the craft is on the ground? That's the software's fault for not detecting that the sensor was faulty and using magic as a backup method to get the right result.
If the people writing the simulation are too close to the people writing the control software, I can see this happening.
When I worked on this stuff (and I did, including a Mars probe) we had three independent teams on different sides of the building, each with their own set of requirements, design, code and tests. Not only that, but the development environments and languages were different to avoid common mode bugs. Fun times; I have no idea how things are done today.
Usually when that sort of thing happens, it's not because the programmer did something obviously wrong. It's usually because the programmer had two (or more) competing scenarios to design for. He tried to design something which would split the difference, and ended up erring too much to one side.
Lufthansa flight 2904 is a good example. The plane had to land in an expected crosswind on a wet runway. A crosswind landing requires landing with the plane's orientation misaligned from the runway. The plane is pointed into the crosswind, so is actually landing diagonally, then when it hits the ground it has to quickly yaw so it's aligned with the runway (so the wheels are pointed in the right direction). The way this is done is it lands on one gear first, pivots around on that gear to point the nose at the end of the runway, then drops down the second gear, then the nose gear.
The A320's flight computer was programmed to avoid the disastrous scenario of a thrust reverser deploying in mid-air. It prohibited deployment of the thrust reversers unless both rear landing gear had 6.3 tons of force each on them. Full deployment of the spoilers (disrupts lift to plant the plane firmly on the ground) was prohibited unless the 6.3 tons criteria was met or the wheels were spinning faster than 72 knots.
Unfortunately, in flight 2904's case, the crosswing landing maneuver placed most of the initial the force on a single landing gear, so the thrust reversers didn't deploy. The wet runway caused hydroplaning so the spoilers failed to deploy, hindering the pilots from getting the second landing gear down. By the time the above criteria were met and the plane began slowing down, it was well past the halfway point of the runway, and ended up going off the end. Design criteria selected to prevent one type of accident inadvertently caused another.