Curiosity Rover On Standby As NASA Addresses Computer Glitch
alancronin writes "NASA's Mars rover Curiosity has been temporarily put into 'safe mode,' as scientists monitoring from Earth try to fix a computer glitch, the US space agency said. Scientists switched to a backup computer Thursday so that they could troubleshoot the problem, said to be linked to a glitch in the original computer's flash memory. 'We switched computers to get to a standard state from which to begin restoring routine operations,' said Richard Cook of NASA's Jet Propulsion Laboratory, the project manager for the Mars Science Laboratory Project, which built and operates Curiosity."
Are we talking a temporary issue that can be resolved by re-flashing the memory in question or is one of the cells damaged in some un-recoverable way? Either way there are solutions but the latter is far more serious.
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
Who else has a feeling that someone fitted in a module backwards?
Either that, or a dead cell or two.
Nobody who has read TFA has that feeling. Curiosity has been running since Aug. 6, 2012 on your putative "backwards module".
Sig Battery depleted. Reverting to safe mode.
This may be of some interest http://www.cpushack.com/space-craft-cpu.html
Only the State obtains its revenue by coercion. - Murray Rothbard
"NOOOO! I should've selected Safe mode WITH Networking!"
Nobody who has read TFA has that feeling.
You could actually explain it to him rather than choosing to go all holier than thou. Here, I'll do it for you.
Who else has a feeling that someone fitted in a module backwards? Either that, or a dead cell or two.
The A-side flux capacitor was somehow depolarized, perhaps by a cosmic ray impact event. They're hoping to fix it by reinitializing the quantum warp matrix.
#DeleteChrome
I once had to fix a server some 6000 km away due to a corrupted disk. Doing pdisk and modifying fstab over ssh and then a reboot. You just check and recheck to make sure you did it right and just hope you get a ping a few minutes later.
Can't imagine how these guys feel. 45 min ping and it isn't like they could ask someone to go turn it off and on again.
Good luck to the guys working on this.
The Galileo Jupiter atmosphere probe actually had a parachute-related part put on backward. It almost ruined the mission. They got lucky and the shaking from atmospheric drag eventually shook the high-altitude parachute off the bad lock barely in time before it could have damaged the probe.
Doesn't hurt to ask, although knowing more about the hardware may allow you to give more specific advice, such as "part X could be put in backward and still mostly work without early detection according to simulation Y."
Table-ized A.I.
Check out the official rover press kit for a summary of the computer design (http://mars.jpl.nasa.gov/msl/news/pdfs/MSLLanding.pdf) Page 42 in particular:
"Curiosity has redundant main computers, or rover compute elements. Of this “A” and “B” pair, it uses one at a time, with the spare held in cold backup. Thus, at a
given time, the rover is operating from either its “A” side or its “B” side. Most rover devices can be controlled by either side; a few components, such as the navigation camera, have side-specific redundancy themselves. The computer inside the rover — whichever side is active — also serves as the main computer for the rest of the Mars Science Laboratory spacecraft during the flight from Earth and arrival at Mars. In case the active computer resets for any reason during the critical minutes of entry, descent and landing, a software feature called “second chance” has been designed to enable the other side to promptly take control, and in most cases, finish the landing with a bare-bones version of entry, descent and landing instructions.
Each rover compute element contains a radiation-hardened central processor with PowerPC 750 architecture: a BAE RAD 750. This processor operates at up to 200 megahertz speed, compared with 20 megahertz speed of the single RAD6000 central processor in each of the Mars rovers Spirit and Opportunity. Each of Curiosity’s redundant computers has 2 gigabytes of flash memory (about eight times as much as Spirit or Opportunity), 256 megabytes of dynamic random access memory and 256 kilobytes of electrically erasable programmable read-only memory.
The Mars Science Laboratory flight software monitors the status and health of the spacecraft during all phases of the mission, checks for the presence of commands to execute, performs communication functions and controls spacecraft activities. The spacecraft was launched with software adequate to serve for the landing and for operations on the surface of Mars, as well as during the flight from Earth to Mars. The months after launch were used, as planned, to develop and test improved flight software versions. One upgraded version was sent to the spacecraft in May 2012 and installed onto its computers in May and June. This version includes improvements for entry, descent and landing. Another was sent to the spacecraft in June and will be installed on the rover’s computers a few days after landing, with improvements for driving the rover and using its robotic arm."
And according to a release they issued after landing, both computers receive the same updates and are running the same software (not a version or 2 behind like others have suggested): http://mars.jpl.nasa.gov/news/whatsnew/index.cfm?FuseAction=ShowNews&NewsID=1305
Ok lets assume a cosmic ray corrupted some random block of flash memory...so what? Why should that lead to failure to upload anything or enter sleep mode?
Pretty much any fault, error, or out-of-bounds reading with any part of the rover causes it to stop whatever it is doing and wait for ground control to check it out and decide what to do. If the fault is with the computer itself, it makes sense to gracefully enter safe mode. It probably was a cosmic ray flipping a random bit, but you can't assume that when designing your fault handler.
If it were any old PC app this would be perfectly acceptable behavior. However for ultra expensive spacefaring things I would expect it to be designed to still try and be useful even if the southbridge cought fire.
See, I think you have that backwards. If it were a PC app it would be appropriate to just assume the error was insignificant or more likely not bother checking in the first place. If it's a more serious problem then eventually the app or OS might crash, the user will reboot, and if that doesn't work reinstall, and if not that then they'll just go get some new hardware.
For a multi-billion rover on another planet, you don't want to just wait and see what happens. Any anomaly at all should be cause for cautious, deliberate action. Heck, the whole project is run that way.
The rover was designed with a lot of redundancy and flexibility so that it can be useful even in the face of more serious problems, and if that turns out to be the case they'll find a way to make the rover as useful as possible. Missing a couple night's worth of downloads and delaying some activities in order to take the time to make sure they're maximizing the rover's future potential is an easy tradeoff.
The enemies of Democracy are