Spirit Sends Debug Information to Earth
gfilion writes "NASA has released a
press release that says: 'Shortly before noon, controllers were surprised to receive a relay of data from Spirit via the Mars Odyssey orbiter. Spirit sent 73 megabits at a rate of 128 kilobits per second.'" They've been having communications troubles with Spirit since Wednesday, so it's good to hear from it again, even if the data is just filler.
Most likely it's not a protocol that involves a lot of ACK'ing [e.g. huge packets with FECs]
Tom
Someday, I'll have a real sig.
Well, you know, what's interesting about that is:
1. you'd have to increase the complexity of the device even more, exposing it to a higher risk of failure statistically
2. you'd need more complicated software and hardware that would require more time and effort (money & delays)
3. the hardware would need more power (limited batteries and solar panel capacity)
4. the system would be heavier and bigger (costs are measured in grams, iirc).
While you have a valid point, the constraints of this design give very strong tradeoffs among safety, feasability, and cash flow (and I'm sure there are others, but I'm not a rocket scientist). I'd imagine that some time was spent on redundant systems, but the adage of "Why have one when you can have two at twice the price?" only works when your budget can support the extra price of man-hours and cash.
I'd argue that where you work has unlimited available power, and if you need more, you can ask your power company for more. You have the money to spend on a X-thousand-dollar sever that's been pre-fabbed by whatever company you like. If you need more, you get more drop-shipped to you within days. NASA had to build these little buggers from the ground up.
<RANT>
You know, if you take your philosophy of simply duplicating the entire machine, there is a backup. It's called "Opportunity." It lands tomorrow.
I highly resent the fact that you've called some of the greatest engineers of our time "retarded." If you can't understand the problem (I certainly don't, but I do understand the concept of tradeoffs in design) you have no right to speak on the issue. Of course, this is slashdot. Everyone can mouth off about everything. Nevermind.
</RANT>
~MCH.
Michael C. Hollinger
If there is a COTS (commercial off-the-shelf) real-time operating system available that meets the system requirements, why go to the risk and expense of writing your own from scratch? Do you expect NASA to fabricate every component in the spacecraft?
Mea navis aericumbens anguillis abundat
Is it quite possible that NASA engineers simply have not mastered the art and science of designing hardware and software operable in the harshest of environments?
While I would never claim that NASA is perfect, I think you underestimate the both the engineering challenge of putting a rover on Mars and the impact of more conservative, get-it-right, policies.
Interplanetary missions are the hardest of all because the engineers never get to actually test the whole device under realistic conditions. Although they can test and analyze each subsystem under a variety of simulated or near-realistic conditions, they have no way of building a test rover, putting it in interplanetary space of months, having is aerobrake into a thin atmosphere, parachute in a thin atmosphere, and crashland at high speed, and then operate all its mechanical parts under dusty low G conditions.
Second, get-it-right == conservatism == greater cost == fewer missions == less experience. The last thing NASA should do is spend more money, take more time, and do fewer missions. The only way we will really learn how to operate in space is to go into space. I'm not saying that better engineering won't help, only that more experience (unfettered by excessive conservatism) is a crucial part of learning to operate on other planets.
Two wrongs don't make a right, but three lefts do.
To have some actual technical discussion on a site that is supposed to be filled with nerds, instead of the same tired jokes about martians.
The more you know, the less you understand.
Since Spirit is rebooting sixty times per day, a problem that started when an electric motor moving its spectrometer "conked out", one thinks first of a hardware failure, possibly leading to software corruption.
I don't know the boot sequence of Spirit, but in most battery-powered embedded systems with which I am familiar, an elaborate state machine design is made to ensure that, when the boot sequence is complete, the system has sufficient power to perform any task that may be requested of it. Since the power supply is limited, an unexpectedly heavy load on the primary supply could cause the supply voltage to the microcomputer to fall below its specified lower limit, leading to a system reset.
Now imagine that there is a hardware failure associated with some process that runs during the boot sequence--a voltage regulator turn-on, a heating system initialization, an electric motor activation, whatever--that results in excessive current drain. When this part of the boot sequence is reached, the supply voltage falls, and the microcomputer resets. This disables the problem-causing hardware, unloading the power supply. When the supply voltage recovers, the microcomputer reboots (either automatically, with a power-on reset, via a watchdog timer, or via some other means) and, when the critical part of the boot sequence is reached, the supply voltage falls again. The system is now in a continuous loop, in which it can remain indefinitely. (Or at least 60 times per day....)
Note that this situation can also arise due to a defect in the power supply--if the output impedance of the power supply has risen for some reason, its output voltage under lightly loaded conditions can be acceptable, but it may not be able to supply heavier loads.
One expects the Spirit power supply to be complex, with separate regulators for the microcomputer, radio transceiver, and electric motors, so looking for common circuits and systems would be the first thing to do when troubleshooting for this type of failure. Looking for system conditions that can cause a system reset would be another; the JPL people have lived with their systems for years now, and would have had many design reviews to identify possible system failure scenarios--I'm not telling them anything new here. I understand that the system telemetry received yesterday indicates that the power supply is within specification, so that seems to eliminate that possiblility.
The second alternative is a soft memory failure of some kind, either caused by a supply failure as the parent suggests or perhaps by a radiation event of some kind.
Note that these problems can be multi-disciplinary; for example, the problem could be caused by some vibration when a motor runs that loosens a broken connection created by a chemical reaction to something on the surface (to take an extreme example).