ESA: European Mars Lander Crash Caused By 1-Second Glitch (space.com)
An anonymous reader quotes a report from Space.com: The European Space Agency (ESA) on Nov. 23 said its Schiaparelli lander's crash landing on Mars on Oct. 19 followed an unexplained saturation of its inertial measurement unit (IMU), which delivered bad data to the lander's computer and forced a premature release of its parachute. Polluted by the IMU data, the lander's computer apparently thought it had either already landed or was just about to land. The parachute system was released, the braking thrusters were fired only briefly and the on-ground systems were activated. Instead of being on the ground, Schiaparelli was still 2.3 miles (3.7 kilometers) above the Mars surface. It crashed, but not before delivering what ESA officials say is a wealth of data on entry into the Mars atmosphere, the functioning and release of the heat shield and the deployment of the parachute -- all of which went according to plan. In its Nov. 23 statement, ESA said the saturation reading from Schiaparelli's inertial measurement unit lasted only a second but was enough to play havoc with the navigation system. ESA said the sequence of events "has been clearly reproduced in computer simulations of the control system's response to the erroneous information." ESA's director of human spaceflight and robotic exploration, David Parker, said in a statement that ExoMars teams are still sifting through the voluminous data harvest from the Schiaparelli mission, and that an external, independent board of inquiry, now being created, would release a final report in early 2017.
Man, if I had a nickel for every time some kind of sensory saturation forced a premature release...
I've calculated my velocity with such exquisite precision that I have no idea where I am.
They're blaiming lag?
https://en.wikipedia.org/wiki/...
How in hell did they test their Kalman filter to allow such bad data to reach the decision logic? (I assume they used one.)
Government cannot make man richer, but it can make him poorer. - Ludwig von Mises
Overflows and bad data problems happened to ESA before.
"Obligatory" Dark Star reference.
When the altitude stops changing for a whole second the filter is going to have to be a long one! And that ain't desirable for responsive control.
The real question is how could the sensory processor have overloaded in the first place? My money is on simple [b]code bloat[/b]. Ie: They used a bunch of generic libraries that use further libraries that use further libraries that use further libraries that use further libraries that use further libraries ...
So they didn't correlate the IMU data with ranging radar or even barometric altitude information so as to avoid this?
I know weight and volume are at a premium on such craft but a barometric sensor (even one capable of operating in Mars's rarefied atmosphere, is the size of a thumbnail and weighs just a fraction of a gram.
Sigh!
Should've used metric seconds.
systemd is Roko's Basilisk.
> ...control software spat an Ada stack trace over a line...
Eh, no. The failure of the INS's control software caused the INS to send diagnostic data (rather than sensor data) to the control systems, which then did what they _thought_ they were being commanded to do.
None of the code in the system was modified in flight.
"[T]he erroneous information generated an estimated altitude that was negative," ESA said.
Which resulted in an actual altitude that was negative.
I wonder whether making the source code of these probes available to the public, for vetting would help spot bugs like these? I am also curious whether releasing the code would be problematic for any reason?
Jumpstart the tartan drive.
And then, what? Should it take an "instantenous" decision about what to do next?
So say they're doing some kind of weighted average of an altitude computation from the inertial navigation unit and an altitude computation from the doppler radar altimeter.
They should have some code in there saying: If these two values that we're averaging are wildly off from each other, let's not take the average. Instead, let's go into some exception handling code which uses some kind of heuristic (and a little time perhaps) to determine which of the two instruments should become the solely trusted source of the altitude value.
Sounds like a lack of hazard analysis / fault tree analysis and or fault-tolerant design in the design process.
Where are we going and why are we in a handbasket?