Slashdot Mirror


Bad Code May Have Crashed Schiaparelli Mars Lander (nature.com)

cadogan west writes "In the accordance with the longstanding tradition of bad software wrecking space probes (See Mariner 1), it appears a coding bug crashed the ESA's latest attempt to land on Mars." Nature reports: Thrusters, designed to decelerate the craft for 30 seconds until it was metres off the ground, engaged for only around 3 seconds before they were commanded to switch off, because the lander's computer thought it was on the ground. The lander even switched on its suite of instruments, ready to record Mars's weather and electrical field, although they did not collect data...

The most likely culprit is a flaw in the craft's software or a problem in merging the data coming from different sensors, which may have led the craft to believe it was lower in altitude than it really was, says Andrea Accomazzo, ESA's head of solar and planetary missions. Accomazzo says that this is a hunch; he is reluctant to diagnose the fault before a full post-mortem has been carried out... But software glitches should be easier to fix than a fundamental problem with the landing hardware, which ESA scientists say seems to have passed its test with flying colours.

10 of 163 comments (clear)

  1. Mark my words by phantomfive · · Score: 5, Funny

    This wouldn't have happened if they'd used imperial not metric!
    New age hippie liberal airheads. If it's not a hogshead, it's not fresh!

    --
    "First they came for the slanderers and i said nothing."
    1. Re:Mark my words by michelcolman · · Score: 4, Funny

      Now whenever an ESA scientist wants to talk about the Schiaparelli crater, you can ask "which one?" to shut him up.

  2. Martians by meglon · · Score: 4, Funny

    They're still unwilling to concede that their defenses against the Martian's OBDS (Orbital Bombardment Defense System) is inadequate.

    --
    Fascism: An authoritarian and nationalistic right-wing system of government and social organization. See also: NAZI's
  3. Re:easier to fix? by plover · · Score: 4, Funny

    How are you going to issue a software patch to the pile of rubble on another planet? This is not a situation where you can ship the product without testing and fix it in firmware later!.

    It's Agile. The product owner will raise this issue as a priority in the backlog, they'll fix it in this sprint, and it will ship in the next release.

    --
    John
  4. QA by bradgoodman · · Score: 4, Informative
    I've been in organizations that had pretty light SQA departments. I used to say that the "really" good shops had 1-to-3 ratios - 1 engineer doing QA for every 3 doing implementation. When I started working for more "mission critical" stuff - that ratio went even higher.

    I know people that work in companies that design chips. Those manufacturing cycles are MUCH longer and expensive - you can't just recompile when you test and find a bug. This, their QA is probably more like 10 people doing simulation (behvioral, thermal, timing, power, emissions, RF susspetabiliy, etc) before a design is even fabricated.

    I would imagine that in Space Exploration - this would go even higher - given the time and expense of these missions. The point is - saying "it's just software" doesn't help you here. Software is *very* complex and the intricacies of advanced logic, variability of factors - trying to do this stuff probably dwarfs that of the hardware components in this day and age.

    1. Re:QA by ShakaUVM · · Score: 3, Informative

      >I would imagine that in Space Exploration - this would go even higher - given the time and expense of these missions.

      It is. Well, at least it is at JPL - I've gone through their coding standards and testing process for spaceflight, and it's extremely intensive.

      I watched a video on their standards before, and without rewatching it I don't know if this is the same one, but it looks pretty good skimming through it.

      https://www.youtube.com/watch?...

      I'd be really interested in seeing someone go through the process and finding out where it went wrong.

    2. Re:QA by johannesg · · Score: 5, Informative

      I work for a company that writes those simulations. Generally a simulation consists of a CPU emulator that runs the onboard software, and a whole bunch of models for each aspect of the spacecraft and environment: the orbit model, the communication model, various instrument models, etc.. These systems are generally set up to allow gradual replacement of each model with real hardware as it becomes available, so the software development is already underway long before the spacecraft hardware has even been built. Each model is a hard real-time program (to allow drop-in replacement of hardware), and has extensive capabilities for error injection in order to simulate things like flipped bits, broken communication channels, broken sensors, etc.

      I don't know what happened on Schiaparelli and they weren't a customer of ours anyway, but a scenario where a sensor breaks and sends bogus information could and should have been tested for during development.

      I'm not sure what the software engineer:QA ratio is - most of that happens internally by the spacecraft people. You run into their QA people everywhere though, while I have yet to personally meet my first flight software engineer.

      Oh, and back in the day I wrote the very first software-only environment for testing flight software on the ground. Up until then, the test environment used real hardware for the flight computer, thus requiring an expensive second set of flight computers just for doing the onboard software development. I hacked together a proof of concept that showed that you _could_ in fact model and simulate the flight computer as well, leading to a substantial cost saving on space projects since...

      The flight computer _simulator_ generally speaking runs on Linux. I'm not sure what the models use these days, but I have seen IRIX and Sun systems around for this purpose. As for the flight computer itself, VxWorks is not an uncommon choice of OS, and the on-board CPU is usually something like ERC32 or Leon - both are radiation-hardened SPARCs.

  5. Re:There is no bad code. by m00sh · · Score: 3, Insightful

    Testing on another planet is not that easy, though.

    Yes, test it all in production.

    Since testing is sooooooooo hard.

    Landing is the most complicated part and Beagle and others have failed exactly here. There should be x100 or even more code for unit and integration testing than the actual code itself for the landing code. And, those tests should run through every permutation possible of every possible failure point or bad sensor readings.

    There is no way it thinks it has landed with that many sensor inputs. It is simply code that is not put through a good enough testing system.

  6. Re:sounds familiar by Solandri · · Score: 4, Interesting

    Usually when that sort of thing happens, it's not because the programmer did something obviously wrong. It's usually because the programmer had two (or more) competing scenarios to design for. He tried to design something which would split the difference, and ended up erring too much to one side.

    Lufthansa flight 2904 is a good example. The plane had to land in an expected crosswind on a wet runway. A crosswind landing requires landing with the plane's orientation misaligned from the runway. The plane is pointed into the crosswind, so is actually landing diagonally, then when it hits the ground it has to quickly yaw so it's aligned with the runway (so the wheels are pointed in the right direction). The way this is done is it lands on one gear first, pivots around on that gear to point the nose at the end of the runway, then drops down the second gear, then the nose gear.

    The A320's flight computer was programmed to avoid the disastrous scenario of a thrust reverser deploying in mid-air. It prohibited deployment of the thrust reversers unless both rear landing gear had 6.3 tons of force each on them. Full deployment of the spoilers (disrupts lift to plant the plane firmly on the ground) was prohibited unless the 6.3 tons criteria was met or the wheels were spinning faster than 72 knots.

    Unfortunately, in flight 2904's case, the crosswing landing maneuver placed most of the initial the force on a single landing gear, so the thrust reversers didn't deploy. The wet runway caused hydroplaning so the spoilers failed to deploy, hindering the pilots from getting the second landing gear down. By the time the above criteria were met and the plane began slowing down, it was well past the halfway point of the runway, and ended up going off the end. Design criteria selected to prevent one type of accident inadvertently caused another.

  7. Re:Considering the decline in code quality... by wonkey_monkey · · Score: 4, Funny

    everything is going to hell.

    No, everything is calling hell() as a function.

    --
    systemd is Roko's Basilisk.