Upgrading Software From 350 Million Miles Away
CWmike writes "Picture doing a remote software upgrade. Now picture doing it when the machine you're upgrading is a robotic rover sitting 350 million miles away, on the surface of Mars. That's what a team of programmers and engineers at NASA are dealing with as they get ready to download a new version of the flight software on the Mars rover Curiosity, which landed safely on the Red Planet earlier this week. 'We need to take a whole series of steps to make that software active. You have to imagine that if something goes wrong with this, it could be the last time you hear from the rover,' said Steve Scandore, a senior flight software engineer at NASA's Jet Propulsion Laboratory. 'It has to work,' he told Computerworld. 'You don't' want to be known as the guy doing the last activity on the rover before you lose contact.'"
The spacecraft TRAVELLED 350 million miles to get there, but as of tonight, Mars is only about 157.5 million miles from Earth.
NASA doing a software upgrade is not big news. This is going to be phenomenally safe. Much scarier doing software upgrades on millions of unknown hardware configurations globally than on one totally locked down platform no matter what distance or cost is involved.
Exactly. That's how it's done in the telecomms world (infrastructure, not terminals). Typically the new software is given three attempts to boot, and if it doesn't acknowledge that it's fully booted after three attempts, the bootloader falls back to the previous version of the software. Of course, things get tricker if you need to update the bootloader, but those should be very rare situations. However, they in turn can be handled a similar way (typically there's a 3-stage boot, the initial being a ROM bootstrap, then your bootloader, then the OS which you'll want to change).
Also FatPhil on SoylentNews, id 863
Computers: The two identical on-board rover computers, called "Rover Compute Element" (RCE), contain radiation hardened memory to tolerate the extreme radiation from space and to safeguard against power-off cycles. Each computer's memory includes 256 kB of EEPROM, 256 MB of DRAM, and 2 GB of flash memory.[22] This compares to 3 MB of EEPROM, 128 MB of DRAM, and 256 MB of flash memory used in the Mars Exploration Rovers.[23]
The RCE computers use the RAD750 CPU, which is a successor to the RAD6000 CPU used in the Mars Exploration Rovers.[24][25] The RAD750 CPU is capable of up to 400 MIPS, while the RAD6000 CPU is capable of up to 35 MIPS.[26][27] Of the two on-board computers, one is configured as backup, and will take over in the event of problems with the main computer.[22]
http://en.wikipedia.org/wiki/Curiosity_rover#Specifications
Data transfer speeds between Curiosity and each orbiter may reach 2 Mbit/s and 256 kbit/s, respectively, but each orbiter is only able to communicate with Curiosity for about eight minutes per day
When you have little bandwidth, better get it right the first time.
The point of the exercise is to replace the no longer needed flight software with software it can use to better perform it's tasks while on Mars.
We don't believe in radical loony monotheistic religions from the middle east -- we're Christians.
99% of brickings are the result of people doing stuff that the manufacturer did not intend for you to do, on devices where important design details were hidden for commercial reasons.
This is unlikely (one would hope) to be the case here.
Get a 10-foot 4X4 piece of lumber. Drop it flat on the ground. Walk from one end to the other like a balance beam. I'll bet you can do it. I'll bet you can do it blindfolded, walking backward. I'll bet you can do it reciting the alphabet backward. I'll bet you could do it drunk.
Take that same 4X4, suspend it 20 stories in the air between a couple of cranes. Put a bunch of razor sharp, rotating propellers on the ground beneath it. Intersperse the propellers with oil drillbits pointed up, not down for once. Have a bunch of trained turkey vultures flying around to watch you fall. Take your wife, kids and your momma, put a gun in their mouths while the Joker cackles that when you fall, he's gonna blow their heads off. Bring in the television cameras and monitors so the whole World can watch and you can watch them watch. Have some intern read the tweets and comments sections about your plight over the loudspeakers.
Now, there are a few ice-blooded "Licensed to Kill" Double-O men who could keep it together and walk that beam under that kind of pressure. Mary Lou Retton and Nadia could, no doubt. I seriously doubt I could.
Is it a big deal to do a software upgrade under such tightly controlled conditions? Not really. But try doing that software upgrade when billions of dollars and your career is on the line, with the whole world watching. The guy who screws that up is gonna be a punchline and a byword for a few decades, a real Wilson if you've read that book. :-) You'll be known as the guy who screwed up Mars.
Tell me there wouldn't be maybe one or two drops of sweat on the keyboard...
He put his boots up on the table and made a face. "The sig," he smirked. "You can waste your life in search of the sig."
why the NASA engineers want to take such a risk
Similar to some devices here on Earth, the rover should have an automatic revert solution. For instance, a non-updatable software running on a separate processor detects specific conditions (like no signal from Earth for a while) and flashes back the updatable software to its original version when that condition occurs.
Such things tend to be present, but how many times have they tested the automatic revert in actual conditions? An alternative codepath is always a risk.
Updating the software can have great advantages. Only a slightly more reliable connection would allow vast amounts of more science to be done. Adapting the algorithms for autonomous functions such as simple navigation or sample processing also makes a great difference when your lag time for a single command is measured in terms of minutes and you don't even have that level of "real-time" access most of the time.
99% of brickings are the result of people doing stuff that the manufacturer did not intend for you to do
In that case, that should happen with deep space probes quite a lot.
Ezekiel 23:20
Given it is radiation hardened specs, those are fabulous! You cant just get your latest core i7 and expect it to work correctly once it escapes the protection of earth's magnetosphere. Also, heat dissipation is much more trickier when you dont have air to work with (space) or cannot afford top replace air filters for the cooling systems (mars).
Tomorrow is another day...
And then that board becomes a single point of failure.
3 computers and a supervisor? That's already 4 components.
If you want to handle t arbitrary node failures, then you need at least 3t+1 nodes in total. Whether you call the nodes for computers or supervisor boards doesn't change that fact. If you have t failures among 3t or fewer total nodes, then the failures can happen in a way that cause the functional units to receive so inconsistent information, that they are unable to do anything meaningful. It is a case of byzantine agreement.
Any system designed to handle failures of one third or more components is making assumptions about how the failed components behave. If the failed components behave differently than the assumption, it takes even fewer failures to break the entire system.
Do you care about the security of your wireless mouse?
They do indeed have systems like that, if you're interested it's worth looking into how they dealt with the Sol 18 Anomaly on Spirit. Of particular note is the "Shutdown Dammit" command that they used to override everything else the rover was doing so it would stop wasting battery overnight.
Seeing as they were able to update the software on a device that wouldn't even finish booting, I imagine the procedures for doing it on a functioning device are pretty robust, even if they're still nailbiting.
Probably concerned that their virus software is now out of date after the long journey.
jsut athnoer menagiensls ltitle psrhae for you to dcoede. Why do we wtsae our tmie dnoig tihs?