Slashdot Mirror


Pluto Probe Back To Normal, Cause of Snafu Found

Tablizer writes: NASA has provided an update to the problem with the New Horizons probe that will fly by Pluto next week. "The investigation into the anomaly that caused New Horizons to enter "safe mode" on July 4 has concluded that no hardware or software fault occurred on the spacecraft. The underlying cause of the incident was a hard-to-detect timing flaw in the spacecraft command sequence that occurred during an operation to prepare for the close flyby. No similar operations are planned for the remainder of the Pluto encounter.

5 of 80 comments (clear)

  1. No hardware or software fault? by tomhath · · Score: 4, Insightful

    The underlying cause of the incident was a hard-to-detect timing flaw in the spacecraft command sequence that occurred during an operation to prepare for the close flyby.

    So a "flaw" in the command sequence isn't a software fault? Sure sounds like one to me. Glad to hear the craft is functioning again though.

    1. Re:No hardware or software fault? by Anonymous Coward · · Score: 4, Insightful

      There's a gap between "flawless" and "faulty" whos length, as it so happens, is remarkably similar to the distance that New Horizons has travelled so far.

    2. Re:No hardware or software fault? by Anonymous Coward · · Score: 0, Insightful

      Software that doesn't do what it's intended to do is faulty. It doesn't matter if it's due to a race condition that programmer didn't expect (apparently what caused the probe's issue) or whether the programmer made a mistake (the pointless example of a shell script that deletes files). NASA didn't intend to put the probe into sleep mode with those commands. The shell script writer in GGP's post didn't intend to delete all the files.

      Now you go practice reading what a software fault is (hint: there's an old saying "the computer only does what you tell it to do, not what you want it to do")

    3. Re:No hardware or software fault? by Will.Woodhull · · Score: 3, Insightful

      I'm guessing it was an unanticipated race condition. Everything works correctly, everything passes all tests, but for some extremely rare constellation of input values software module "B" is able to complete its calculations and report its results before "A" can-- which has a probability of occurrence so low that it rounds to zero-- and that screws the pooch. If the probability of this happening again approaches zero, it would be fair for NASA to say there was no error in the programming, but instead an unexpected glitch in operations that is unlikely to ever recur.

      You can never test for every possible corner condition. More than that, in probably every real world situation, the longer the time since the last hard reboot, the more likely it is that the software will encounter some corner conditions. That Pluto bird has been running for quite a while.

      --
      Will
  2. Re:I would have done dry run of entire sequence by bobbied · · Score: 5, Insightful

    There are just some things simulations cannot find and rare "race conditions" are on that list. Of course, it all depends on how much fidelity you build into your simulation. However, at some point you have to say "Enough! If we spend any more on simulation and test we could just build and launch multiple spacecraft." So you accept the risks and move on. Race conditions are pretty hard to find in the first place, especially if they are not deterministic and only hit you every so often.

    --
    "File to fit, pound to insert, paint to match" - Aircraft Maintenance 101