Mars Polar Lander Had Fatal Design Flaw
GSearle writes, "Spacedaily.com reports on a design flaw that may have caused the Mars Polar Lander to cut its engines immediately upon firing and plummet 1800 meters to the surface. The problem lies with a sensor that detects when the lander has reached the ground. The sensor may have been triggered prematurely when the lander's legs locked into position. "
Undoubtedly, some of the testing is done on computer, but most of it is probably done on actual hardware. However, the problem here isn't one of testing, it's one of communication. The article points out that the deployment testing group knew of the problem, but the descent control group did not account for it.
Standard engineering practice is to gather the staff involved and hold a brainstorming session known as an FMEA (Failure Mode Effects Analysis). At this session, the idea is to identify all the possible ways in which something can go wrong, and determine the outcome if it were to happen. The most critical items are given the highest priority to ensure that they do not occur. Surely the consequences of sensor failure were identified at the session.
Since the second group did not account for the possibility of the sensor being prematurely activated, they must not have been informed of the results of the testing by the first group. What they need to work on is their inter-group communications, not their hardware testing.
Of course, I'm crediting the design group with following reasonable procedures. They may not have done it exactly like that, but it's a pretty standard approach in industry. Every part on an automobile goes through such an analysis (even down to the headlight switch), so it just stands to reason that a multi-million dollar one-off space probe should...
- - - -
I re-read the original article and have some observations on how I would have made the software more fault-tolerant.
The lander had 3 legs, each with a simple switch. If any one of the switches read "closed", the engines would be shut down (presumably to keep it from either cooking itself with blow-back and/or flipping over. This was a simplification of earlier landers, which used a radar altimiter to tell the engines when to shut off.
The first change I would have made would be to design the software to scan the switches repeatedly, rather than just once. Before the landing sequence begins, scan all the switches 5 (or more) times. They should all read open each time; if any reads closed, disregard further input from that sensor. (If all 3 failed closed, you'd be SOL and would have to rely on a back-up system) Then, after the legs deploy and the engines start firing, poll the switches at a rate of say 10x per second. Instead of shutting down the engines as soon as any 1 switch reads closed, only shut down the engines after you have the same switch read closed for 3 consecutive pollings (even better would be to require 3 consecutive closed readings from 2 sensors [assuming that you did not find 2 failures earlier].
I think that the logic should be to keep firing the engines until you are SURE it is safe to stop; I think this is a safer failure mode than to risk having the engines shut down prematurely. This approach leaves you vulnerable to blow-back damage from the exhaust and/or flipping over after touchdown if the switch system fails completely.
To combat these failure modes, you would need additional sensors to detect blow-back and tipping. One possible sensor to detect blow-back could be a series of strips made of a metal with a sufficiently low melting point as to melt under prolonged exposure to exhaust gasses; tipping could be detected via a mechanism similar to a common mercury switch or an aircraft artificial horizon. Also, the fact that you have a limited fuel supply limits the amount of damage you can suffer from not shutting down in time. Even if all your cut-off systems fail, the engine will stop firing when it runs out of fuel. While this would be a bad failure, it would probably not result in a total loss. Even if some of the more sensitive instruments got cooked or the lander flipped over, you could still probably get SOME usable data back. A semi-functional lander is much better than a smoking crater!
"The axiom 'An honest man has nothing to fear from the police'
Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?