Bad Code May Have Crashed Schiaparelli Mars Lander (nature.com)

← Back to Stories (view on slashdot.org)

Bad Code May Have Crashed Schiaparelli Mars Lander (nature.com)

Posted by EditorDavid on Saturday October 29, 2016 @11:34AM from the bug-hunt dept.

cadogan west writes "In the accordance with the longstanding tradition of bad software wrecking space probes (See Mariner 1), it appears a coding bug crashed the ESA's latest attempt to land on Mars." Nature reports: Thrusters, designed to decelerate the craft for 30 seconds until it was metres off the ground, engaged for only around 3 seconds before they were commanded to switch off, because the lander's computer thought it was on the ground. The lander even switched on its suite of instruments, ready to record Mars's weather and electrical field, although they did not collect data...

The most likely culprit is a flaw in the craft's software or a problem in merging the data coming from different sensors, which may have led the craft to believe it was lower in altitude than it really was, says Andrea Accomazzo, ESA's head of solar and planetary missions. Accomazzo says that this is a hunch; he is reluctant to diagnose the fault before a full post-mortem has been carried out... But software glitches should be easier to fix than a fundamental problem with the landing hardware, which ESA scientists say seems to have passed its test with flying colours.

4 of 163 comments (clear)

Min score:

Reason:

Sort:

QA by bradgoodman · 2016-10-29 13:10 · Score: 4, Informative

I've been in organizations that had pretty light SQA departments. I used to say that the "really" good shops had 1-to-3 ratios - 1 engineer doing QA for every 3 doing implementation. When I started working for more "mission critical" stuff - that ratio went even higher.
I know people that work in companies that design chips. Those manufacturing cycles are MUCH longer and expensive - you can't just recompile when you test and find a bug. This, their QA is probably more like 10 people doing simulation (behvioral, thermal, timing, power, emissions, RF susspetabiliy, etc) before a design is even fabricated.
I would imagine that in Space Exploration - this would go even higher - given the time and expense of these missions. The point is - saying "it's just software" doesn't help you here. Software is *very* complex and the intricacies of advanced logic, variability of factors - trying to do this stuff probably dwarfs that of the hardware components in this day and age.
1. Re:QA by ShakaUVM · 2016-10-29 15:04 · Score: 3, Informative
  
  >I would imagine that in Space Exploration - this would go even higher - given the time and expense of these missions.
  It is. Well, at least it is at JPL - I've gone through their coding standards and testing process for spaceflight, and it's extremely intensive.
  I watched a video on their standards before, and without rewatching it I don't know if this is the same one, but it looks pretty good skimming through it.
  https://www.youtube.com/watch?...
  I'd be really interested in seeing someone go through the process and finding out where it went wrong.
2. Re:QA by johannesg · 2016-10-29 20:20 · Score: 5, Informative
  
  I work for a company that writes those simulations. Generally a simulation consists of a CPU emulator that runs the onboard software, and a whole bunch of models for each aspect of the spacecraft and environment: the orbit model, the communication model, various instrument models, etc.. These systems are generally set up to allow gradual replacement of each model with real hardware as it becomes available, so the software development is already underway long before the spacecraft hardware has even been built. Each model is a hard real-time program (to allow drop-in replacement of hardware), and has extensive capabilities for error injection in order to simulate things like flipped bits, broken communication channels, broken sensors, etc.
  I don't know what happened on Schiaparelli and they weren't a customer of ours anyway, but a scenario where a sensor breaks and sends bogus information could and should have been tested for during development.
  I'm not sure what the software engineer:QA ratio is - most of that happens internally by the spacecraft people. You run into their QA people everywhere though, while I have yet to personally meet my first flight software engineer.
  Oh, and back in the day I wrote the very first software-only environment for testing flight software on the ground. Up until then, the test environment used real hardware for the flight computer, thus requiring an expensive second set of flight computers just for doing the onboard software development. I hacked together a proof of concept that showed that you _could_ in fact model and simulate the flight computer as well, leading to a substantial cost saving on space projects since...
  The flight computer _simulator_ generally speaking runs on Linux. I'm not sure what the models use these days, but I have seen IRIX and Sun systems around for this purpose. As for the flight computer itself, VxWorks is not an uncommon choice of OS, and the on-board CPU is usually something like ERC32 or Leon - both are radiation-hardened SPARCs.
Re:easier to fix? by Anonymous Coward · 2016-10-29 14:00 · Score: 2, Informative

As a manufacturing engineer I can tell you from experience even in tightly regulated industries the instances of the print not matching the part is more common than you would think, even on parts produced for decades. When you are talking about one-offs that just self-destructed on another planet and cannot compare the as-produced part to the print it becomes exceedingly difficult to account for last-minute design changes.