Kepler Recovers After 144 Hour "Glitch"
coondoggie writes "There was likely a pretty big sigh of relief at NASA's Ames Research Center this week as the group's star satellite Kepler recovered from a glitch that took it offline for 144 hours. According to NASA the glitch happened March 14, right after the spacecraft issued a network interface card (NIC) reset command to implement a computer program update. During the reset, the NIC sent invalid reaction wheel data to the flight software, which caused the spacecraft to enter safe mode, NASA stated."
You need safe mode with networking, not just plain old "Safe Mode" guys!
Here's to the crazy ones
Having a dirt-dumb mode that is tested until its lever falls off that ensures that, if the thing is mechanically able, it can find your signal so you can reprogram it from the nuts up is requirement #1 for any computer-controlled thing you send into space.
From what I've read nasa does some pretty thorough planning with their spacecraft software in terms of being able to recover from faults. (leave the units issues for another thread, eh?) I'm always impressed with how they have multiple fallback points that can usually dig them out of almost any hole bad programming, bad planning, or a stray cosmic ray can drop them into.
Look up the mars rovers, with their flash memory filling up, that in itself was amazing that they were able to recover from, given the crippling effect the programming oversight had on the system. (those iirc had to drop down three levels of safe before they were able to work with nasa) When you're millions of miles away you can't just send a tech out to press the Reset button.
And they have to not only get it back into a controllable state, but it has to be able to stay in that state for anywhere from minutes to days due to the time required for communication and analysis. If there's a fault in the solar panel positioning system your craft has to stay functional long enough to collect useful data, transmit it, wait for it to make it to earth, wait for it to be analyzed, and wait for a command to fix the problem, OR has to be able to at least patch it on its own before waiting for a proper fix. Amazing stuff really. It's not A.I. by any means, but it's definitely robust.
I work for the Department of Redundancy Department.
Imagine it only capable of uploading 16 colour 640x400 imagery.
Mind the frickin' laser...