Upgrading Software From 350 Million Miles Away
CWmike writes "Picture doing a remote software upgrade. Now picture doing it when the machine you're upgrading is a robotic rover sitting 350 million miles away, on the surface of Mars. That's what a team of programmers and engineers at NASA are dealing with as they get ready to download a new version of the flight software on the Mars rover Curiosity, which landed safely on the Red Planet earlier this week. 'We need to take a whole series of steps to make that software active. You have to imagine that if something goes wrong with this, it could be the last time you hear from the rover,' said Steve Scandore, a senior flight software engineer at NASA's Jet Propulsion Laboratory. 'It has to work,' he told Computerworld. 'You don't' want to be known as the guy doing the last activity on the rover before you lose contact.'"
It is a difficult task. While NASA has don'e a lot better than most of us programmers ever have, they have made mistakes in updating from Earth to Mars before.
http://en.wikipedia.org/wiki/Mars_Global_Surveyor#Loss_of_contact
http://lkml.org/lkml/2005/8/20/95
The spacecraft TRAVELLED 350 million miles to get there, but as of tonight, Mars is only about 157.5 million miles from Earth.
Working in remote smart metering we have a similar problem, where you can brick meters if the signal drops at the wrong place, or firmware doesn't fit the hardware right.
NASA doing a software upgrade is not big news. This is going to be phenomenally safe. Much scarier doing software upgrades on millions of unknown hardware configurations globally than on one totally locked down platform no matter what distance or cost is involved.
By pressing F8 at the "Starting Windows 95" message, and then choosing Safe Mode from the Windows 95 start-up menu.
Following these steps will gain you ultimate FAME and FAILURE - for updating the Mars software!!!
So what's their problem? Just tell a sysadmin to fix it.
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
Thank you so much Mr. Wowsers for giving NASA this great idea. I suspect, given the genius of the thought, you will be contacted for employment shortly.
This, and also having a full replica of the whole rover on Earth to double check that any software updates won't screw the whole operation. But I can't imagine they are not doing these already :?
i hope theres a really, really good reason why the need to update the software at all
Well, zero-day exploits.. and Wikileaks... and anonymous not forgiving or forgetting... and Duqu/Flame/Mahdi...
(grin)
Questions raise, answers kill. Raise questions to stay alive.
Exactly. That's how it's done in the telecomms world (infrastructure, not terminals). Typically the new software is given three attempts to boot, and if it doesn't acknowledge that it's fully booted after three attempts, the bootloader falls back to the previous version of the software. Of course, things get tricker if you need to update the bootloader, but those should be very rare situations. However, they in turn can be handled a similar way (typically there's a 3-stage boot, the initial being a ROM bootstrap, then your bootloader, then the OS which you'll want to change).
Also FatPhil on SoylentNews, id 863
Of course, not! They do it just for the lulz!
More seriously, for space systems and embedded systems in general, due to resource constraints on-board, you usually cannot fit all the functionality you would like to in one software image. So you keep only what is necessary for the first mission, and then you replace the obsolete ones with the next thing you want to do.
As a simplified example, when you launch a satellite, you will need it to deploy its solar arrays quickly (and do many initialization checks). When that is done, you could imagine changing this part of the software with something else...
Also, they might have had time planning constraints on the project, and needed to launch with a simpler first version of the software, while finalizing the second one. That does happen.
Computers: The two identical on-board rover computers, called "Rover Compute Element" (RCE), contain radiation hardened memory to tolerate the extreme radiation from space and to safeguard against power-off cycles. Each computer's memory includes 256 kB of EEPROM, 256 MB of DRAM, and 2 GB of flash memory.[22] This compares to 3 MB of EEPROM, 128 MB of DRAM, and 256 MB of flash memory used in the Mars Exploration Rovers.[23]
The RCE computers use the RAD750 CPU, which is a successor to the RAD6000 CPU used in the Mars Exploration Rovers.[24][25] The RAD750 CPU is capable of up to 400 MIPS, while the RAD6000 CPU is capable of up to 35 MIPS.[26][27] Of the two on-board computers, one is configured as backup, and will take over in the event of problems with the main computer.[22]
http://en.wikipedia.org/wiki/Curiosity_rover#Specifications
Data transfer speeds between Curiosity and each orbiter may reach 2 Mbit/s and 256 kbit/s, respectively, but each orbiter is only able to communicate with Curiosity for about eight minutes per day
When you have little bandwidth, better get it right the first time.
The point of the exercise is to replace the no longer needed flight software with software it can use to better perform it's tasks while on Mars.
We don't believe in radical loony monotheistic religions from the middle east -- we're Christians.
Not only am I absolutely sure they've got more than one copy of critical data in flash, but they have two identical and redundant computers on board
http://en.wikipedia.org/wiki/Curiosity_rover#Specifications
From http://marsprogram.jpl.nasa.gov/msl/mission/rover/brains/
The rover has two "computer brains" one which is normally asleep. In case of problems the other computer brain can be awakened to take over control and continue the mission.
Specialist Mac support for creative pros, Melbourne
sudo apt-get update mars
No keyboard found. Press to continue.
It will sit there forever: "Are you sure you want to update? Yes/No"
Get a 10-foot 4X4 piece of lumber. Drop it flat on the ground. Walk from one end to the other like a balance beam. I'll bet you can do it. I'll bet you can do it blindfolded, walking backward. I'll bet you can do it reciting the alphabet backward. I'll bet you could do it drunk.
Take that same 4X4, suspend it 20 stories in the air between a couple of cranes. Put a bunch of razor sharp, rotating propellers on the ground beneath it. Intersperse the propellers with oil drillbits pointed up, not down for once. Have a bunch of trained turkey vultures flying around to watch you fall. Take your wife, kids and your momma, put a gun in their mouths while the Joker cackles that when you fall, he's gonna blow their heads off. Bring in the television cameras and monitors so the whole World can watch and you can watch them watch. Have some intern read the tweets and comments sections about your plight over the loudspeakers.
Now, there are a few ice-blooded "Licensed to Kill" Double-O men who could keep it together and walk that beam under that kind of pressure. Mary Lou Retton and Nadia could, no doubt. I seriously doubt I could.
Is it a big deal to do a software upgrade under such tightly controlled conditions? Not really. But try doing that software upgrade when billions of dollars and your career is on the line, with the whole world watching. The guy who screws that up is gonna be a punchline and a byword for a few decades, a real Wilson if you've read that book. :-) You'll be known as the guy who screwed up Mars.
Tell me there wouldn't be maybe one or two drops of sweat on the keyboard...
He put his boots up on the table and made a face. "The sig," he smirked. "You can waste your life in search of the sig."
That's always a risk if you have two computers for redundancy. To completely solve that problem, you need four computers. But the algorithms for coordinating in such a scenario are complicated. So it might be safer to rely on systems being able to use the proper computer, with just two present. If you had a 3 out of 4 setup with the four computers running identical software, it only takes one software bug to bring down the system.
Do you care about the security of your wireless mouse?
"If you had a 3 out of 4 setup with the four computers running identical software, it only takes one software bug to bring down the system."
Not at all. You have a separate "supervisor" board that moderates among the computers. In a case like that, you only need 3 for Damned Good Redundancy, not 4.
But I expect that NASA has good reason to have faith in the reliability of their dual machine.
Given it is radiation hardened specs, those are fabulous! You cant just get your latest core i7 and expect it to work correctly once it escapes the protection of earth's magnetosphere. Also, heat dissipation is much more trickier when you dont have air to work with (space) or cannot afford top replace air filters for the cooling systems (mars).
Tomorrow is another day...
If you follow "Scott Maxwell" in google plus, there are some great snippets about the landing and software. See: https://plus.google.com/u/0/112648317373638762082/posts
The purpose of existence is to make money.
The radiation this thing emits is NOTHING compared to the solar and cosmic radiation it would experience both in transit and on Mars. Putting everything in a metal box only helps so much, you still need specifically designed electronics which can handle the odd bit of radiation without dying. Even with a thick metal box you can't run an i7 on Mars, or not for very long at least. Your standard DDR3 isn't going to work either, or your standard EEPROM.
The other thing to remember is that although this project is extremely important, they're still not going to throw more capabilities in than they need, because that is more that can go wrong. For a remote sensing platform, the amount of EEPROM isn't that important - you just need enough to hold your communication protocols, some basic reaction-to-obstacle algorithms and the motor control code. You aren't going to be pulling massive libraries in. The emphasis is on making it as simple as possible, so that there is less chance for bugs to creep in. Those extra MIPS will come in handy for the navigation and onboard image processing, and the flash for storing interesting info until you can upload, so those are what they have upgraded the most.
Help I am stuck in a signature factory!
They are bound to have a copy of Curiosity here on Earth, surely? So they should be able to thoroughly test the process first. Ok, it is not Mars and there might be issues specific to transmitting that data over such distances... but still. I'd be really surprised if this hasn't been thoroughly tried and tested.
They're running vxworks and they do have a backup computer. First the backup is flashed and verified, then the primary is flashed and verified.
And then that board becomes a single point of failure.
3 computers and a supervisor? That's already 4 components.
If you want to handle t arbitrary node failures, then you need at least 3t+1 nodes in total. Whether you call the nodes for computers or supervisor boards doesn't change that fact. If you have t failures among 3t or fewer total nodes, then the failures can happen in a way that cause the functional units to receive so inconsistent information, that they are unable to do anything meaningful. It is a case of byzantine agreement.
Any system designed to handle failures of one third or more components is making assumptions about how the failed components behave. If the failed components behave differently than the assumption, it takes even fewer failures to break the entire system.
Do you care about the security of your wireless mouse?
The RAD750 is quite limited in power; but has the advantage of being comparatively close to 'just going down to newegg and buying a motherboard' by the standards of projects that go into space and shop at mil/aero contractors... The price is still up in the "If you have to ask, don't ask" range; but doing a very-low-volume DIY would likely be worse still...
Not really. That might have been true 10 years ago.
No.
All I'm saying is: you can bet the hardware is in a well-shielded heavy metal box, and today all it takes is about 1/4 of a cubic inch to squeeze in another GB of RAM or flash.
I wonder why they didn't think about that. A nice thick, heavy metal box. Easy! Perhaps you should go and work for NASA?
Let's ignore the earth's magnetosphere for the moment and make some massive assumptions.
The pressure on the ground is about 10^5 Pa. That means there's 10^4 Kg of stuff above you to absorb radiation from space. That equates to 10m of water, 1.25m of steel ot about 90cm of lead. Quite a lot.
Mars is about 1.5 Au from the sun, so receives about 0.4 times the radiation.cos
The atmosphere is about 600Pa, by comparison.
Radiation hardening is a very well established field. Using some degree of shielding is just one of the many techniques in use. On Mars, it is simply not enough on its own.
It is very, very difficult to make a rad-hard processor, and then very thoroughly test it. Yo can't just keep shrinking the feature size, because is it goes down, the effect of radiation increases. Not only that but as the amount of crystal per transistor shrinks, the chance of unrecoverable lattice damage increases, due to the lack of redundancy.
There are faster Rad-hardened DSPs, but those are, well, DSPs and only actually really fast for DSP like tasks.
There also are almost certainly faster ones available now. But it's been in transit for a year, and they certainly weren't building it with a brand-new untested processor for which thay had to write all the software on the way after they launched it.
So, given the constraints, it's a pretty great CPU to have on board.
SJW n. One who posts facts.
Probably concerned that their virus software is now out of date after the long journey.
jsut athnoer menagiensls ltitle psrhae for you to dcoede. Why do we wtsae our tmie dnoig tihs?
Yeah, the freakin summary is potty as usual.
They aren't upgrading the flight software. They are replacing the flight software with driving around and exploring software.