Upgrading Software From 350 Million Miles Away
CWmike writes "Picture doing a remote software upgrade. Now picture doing it when the machine you're upgrading is a robotic rover sitting 350 million miles away, on the surface of Mars. That's what a team of programmers and engineers at NASA are dealing with as they get ready to download a new version of the flight software on the Mars rover Curiosity, which landed safely on the Red Planet earlier this week. 'We need to take a whole series of steps to make that software active. You have to imagine that if something goes wrong with this, it could be the last time you hear from the rover,' said Steve Scandore, a senior flight software engineer at NASA's Jet Propulsion Laboratory. 'It has to work,' he told Computerworld. 'You don't' want to be known as the guy doing the last activity on the rover before you lose contact.'"
It is a difficult task. While NASA has don'e a lot better than most of us programmers ever have, they have made mistakes in updating from Earth to Mars before.
http://en.wikipedia.org/wiki/Mars_Global_Surveyor#Loss_of_contact
http://lkml.org/lkml/2005/8/20/95
The spacecraft TRAVELLED 350 million miles to get there, but as of tonight, Mars is only about 157.5 million miles from Earth.
Working in remote smart metering we have a similar problem, where you can brick meters if the signal drops at the wrong place, or firmware doesn't fit the hardware right.
NASA doing a software upgrade is not big news. This is going to be phenomenally safe. Much scarier doing software upgrades on millions of unknown hardware configurations globally than on one totally locked down platform no matter what distance or cost is involved.
For such expensive projects, would it not make sense to have two EPROM's, one containing the original known working system, and one for the new one. If the new version fails, the machine can fall back to the older version, switch between the two if there are more OS upgrades planned. If they have watchdog times on board to keep the rover going, surely they could do similar setup for the OS?
Take Nobody's Word For It.
i hope theres a really, really good reason why the need to update the software at all
By pressing F8 at the "Starting Windows 95" message, and then choosing Safe Mode from the Windows 95 start-up menu.
Following these steps will gain you ultimate FAME and FAILURE - for updating the Mars software!!!
So what's their problem? Just tell a sysadmin to fix it.
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
maybe im missing something, but unless this update is going to make it sprout wings, why does it need flight software when it's already landed
I can't even get to that stage, it keeps giving me a keyboard error - did no-one stick one on Curiosity?
Also FatPhil on SoylentNews, id 863
Why, the life of a Mars Rover engineer is always intense.
Not that I want to start a flamewar, but does anybody know what os they use?does anybody know the exact hardware specs? :-) what they send in a mars rover.
I know that many sattelites carry around _reaheally_ old hardware and I'm really curious
Did they program it from scratch or is it some already existing project?
Why are they updating the "flight software"? I thought they were done with the flying bit?
Imagine how far it would have been if they had measured it in kilometers instead!
Whoaw!
. ;)
.
-
.
.
.
I has an arm, doesn't it? So it can push its own reset button and go into the BIOS if need be.
sudo apt-get update mars
I found it quite funny (a 1995 pc wiz kid telling how you should do it), but the redundant "FAME and FAILURE" line kind of ruined it.
There is this global media obsession with referring to Mars as "The Red Planet". It is really irritating.
Mars has a name, just like all the other planets in our solar system: Its name is "Mars". So use it, and respect the planet and its name.
It's so irritating and "media lovvie". Also, the planet it not really "red" at all. It's brown. It belongs in exactly the same category as media types referring to scientists as "boffins". It's RUDE and DISRESPECTFUL.
I wish the media would shed this ridiculous obsession with ignoring the name of the planet MARS.
"Absorbing your worst..."
No keyboard found. Press to continue.
No keyboard found. Press <F1> to continue.
(correcting for HTML... preview? What preview? Oh, that preview...)
which language do they use to tell the rover where to drive? Surely, it has to be Logo
It will sit there forever: "Are you sure you want to update? Yes/No"
Get a 10-foot 4X4 piece of lumber. Drop it flat on the ground. Walk from one end to the other like a balance beam. I'll bet you can do it. I'll bet you can do it blindfolded, walking backward. I'll bet you can do it reciting the alphabet backward. I'll bet you could do it drunk.
Take that same 4X4, suspend it 20 stories in the air between a couple of cranes. Put a bunch of razor sharp, rotating propellers on the ground beneath it. Intersperse the propellers with oil drillbits pointed up, not down for once. Have a bunch of trained turkey vultures flying around to watch you fall. Take your wife, kids and your momma, put a gun in their mouths while the Joker cackles that when you fall, he's gonna blow their heads off. Bring in the television cameras and monitors so the whole World can watch and you can watch them watch. Have some intern read the tweets and comments sections about your plight over the loudspeakers.
Now, there are a few ice-blooded "Licensed to Kill" Double-O men who could keep it together and walk that beam under that kind of pressure. Mary Lou Retton and Nadia could, no doubt. I seriously doubt I could.
Is it a big deal to do a software upgrade under such tightly controlled conditions? Not really. But try doing that software upgrade when billions of dollars and your career is on the line, with the whole world watching. The guy who screws that up is gonna be a punchline and a byword for a few decades, a real Wilson if you've read that book. :-) You'll be known as the guy who screwed up Mars.
Tell me there wouldn't be maybe one or two drops of sweat on the keyboard...
He put his boots up on the table and made a face. "The sig," he smirked. "You can waste your life in search of the sig."
I just got back from MARs on Tuesday the tacos suck, the clubs are dead and the girls all wear suncreen with shitty tans..I'd so done with it...rather be in NY on a Wed night...
What we really need to know is why it didn't need flight software BEFORE now?! Obviously it isn't really on Mars... if 'Mars' even exists. Lizards all the way down I tell you! LIZARDS!!
Python coder | PyQt Applications | Writer
"Flight" as in "fight-or-flight response". You know, in case Curiosity encounters Martian life which think it's delicious ... or at least interesting enough to study and take apart.
Those people at NASA think of everything ...
from the controls and everything should be fine
I don't feel I could begin to appreciate the issues these rocket surgeons deal with, but if it were my project, there would be two rovers, the guinea pig in Dalton, Ohio (there should be a penalty for bricking the test rover) and the one that gets the exact same script that succeeded in Dalton. Human hands should never directly touch a mission critical system.
If you follow "Scott Maxwell" in google plus, there are some great snippets about the landing and software. See: https://plus.google.com/u/0/112648317373638762082/posts
The purpose of existence is to make money.
If the software upload doesn't work, there are plenty of tools to help NASA fix it.
Some that come to mind are the (in)famous My Clean PC. If they had been smart
and purchased the extended warranty at the checkout, Geek Squad could help, too.
The headline should be "OMG! WE ARE TEH BUZY SO FAST!!!!shift-1"
The reason I say this is because it NEVER covers the fact that in possibility the programmers MUST have a Development Environments, Quality Assurance Environment, Staging Environment and Acceptance Testing Environments. Is it agile? Is it waterfall? WTF is the IT? and WTF is the I.T.?
Hell if you truly want to be technical and have a full fleshed out story you would say "In addition to the n flops uber computer simulators that introduce transmission failures and other physical environmental factors... We have the original prototype to exact specifications on the ground, in the labs here..."
Can a HaX0r hijack the uberWifi signal on mars and attack the aliens living there? If we divide by zero can the solar collectors and internal power source create an uncontrolled fusion? That's what I would like to know!!!
The original article itself does not cover "How does one prevent bricking 350m mile away equipment."
They are bound to have a copy of Curiosity here on Earth, surely? So they should be able to thoroughly test the process first. Ok, it is not Mars and there might be issues specific to transmitting that data over such distances... but still. I'd be really surprised if this hasn't been thoroughly tried and tested.
But the tecnologies used in some botnets are a goot starting points.
That'd be, call home and try to pull anything you need to do the upgrade.
The orbiter relay should be doing the same, first.
Sent as ripples into the electromagnetic field. No single photon has been harmed in the process.
It takes 3-5 years to field test this stuff. It takes years of preparation after the final decision of what hardware to use before you get to launch the thing and after that, to get it to mars. You are looking at the best of the best, proven technology hardware available for this sort of radiation tolerance at the moment they had the last opportunity to make design changes.
One does not simply fly to Mars.
I was promised a flying car. Where is my flying car?
a new version of the flight software on the Mars rover Curiosity
Is anybody else thinking that any changes to the flight software is now a few days too late?
The use of tag "ota" is technically wrong.
Here is how SW is managed on a spacecraft: you have a 'golden' image, residing on a physically separate PROM, which is write-protected in HW. This image is tested on the ground, before launch, and cannot be changed ever. Then you setup a HW watchdog that resets to the golden image if you don't hear from Earth every N days.
One or more operational SW images are stored in a separate EEPROM (or Flash) and you upgrade one of those at a time. Before booting up the upgraded image you verify the load.
If done correctly, the worst that can happen if you botch the upgrade is that you lose a few days waiting for the watchdog.
Probably concerned that their virus software is now out of date after the long journey.
jsut athnoer menagiensls ltitle psrhae for you to dcoede. Why do we wtsae our tmie dnoig tihs?
Butt sex requires a lot of lubrication, right? Lubrication. Lubruh... Chupuh... Chupacabra 's the, the goat killer of Mexican folklore. Folklore is stories from the past that are often fictionalized. Fictionalized to heighten drama. Drama students! Students at colleges usally have bicycles! Bi, bian, binary. It's binary code! If people don't wear jackets they could get cold. A cold is caused by a virus. A viru- a computer virus! We could make a computer virus and send it to their ships to disable their computers!
"Let's go find some Turian and beat the shit out of him
I once worked on simulation software for a new satellite that could be patched on-orbit (an orbiting satellite might as well be on Mars -- if you break it, it's going to stay broken). One of the main purposes of the software simulator, which ran the actual flight code that was on the bird, was to test new patches before they were pushed to the vehicle (and the vehicle itself did some validation of the patch after the upload was complete before applying it). Of course, hardware-in-the-loop testing using a duplicate test satellite on the ground was also done as a final step. In addition to a software simulator, I'm sure NASA has a duplicate rover or two in their labs for testing. The amount of testing done on these programs would drive you insane.
I mean, the lag is going to be on par with SSH in to a terrestrial server with my AT&T service and cell phone.
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
here on slashdot. So many genius level ideas and suggestions.
Upgrade the flight software all you want. The rover is on the ground and doesn't intend to fly.
People complain of 300ms of latency here on earth with their ISP. I have heard it takes 14 MINUTES for a signal round trip. Thats 840 seconds, or 840,000ms of latency. So you are not exactly programming on the fly.
The worst part, would be that presumably there is some pretty robust simulated debuggery on earth before anything gets transmitted. However once you finally tested, confirmed, compiled, packaged etc... and press the send button. You have to wait likely an eternal excruiciating 14 minutes before you know if your code actually worked, or if you just broke seveal billion dollars worth of project...
The lander OS upgrade system should include a failsafe mechanism where if the "user" doesn't confirm the new settings within a certain amount of time then the system reverts to the previous settings/OS/software.
I love how everyone here is like, "Y'know, they really should have a backup software solution on the rover" or "If I was doing this, I would do this, that, and the other thing, and they're stupid for not doing that".
An awful lot of assumptions being made about people who are probably the very top of their game. I'm going to give NASA the benefit of a doubt here: I think they wouldn't do the upgrade unless it was very beneficial, and I'd bet they're doing it in a way that has layers upon layers of safeguards.
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs
That NASA has learned from the experience of upgrading Sojourner to WinAMP 0.92...
who killed the cat. It will have been that darn NASA engineer who killed Curiosity. ;-)
I like my spaghetti with source.
You would want a deeply-embedded, simple HW module listening in on the raw radio link for a special code, and it then initiates flashing of the main module.
If this is well-done, no matter what king kong fuckup happens on the main processor(s) you can always have the little tough guy rip it a new asshole.
We do stuff like this in the auto environment.
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
What kind of protocol can be used for transmissions like that?
And, if anything goes wrong, and Curiosity throws up, eh, an exception, how can it fallback to a sane state? Someone further up this discussion suggested a mechanism where losing contact to base control for a certain period would trigger a revert to the previous version. But losing control may have totally different reasons.
Does anyone know how they do this stuff? Are they actually programming Curiosity in Python?
I like my spaghetti with source.
Damn, guys! If it ain't broke, don't fix it!
If it _is_ broke .. this is a hell of a time to find out about it. How about some more details, eh?
I doubt they are running a remote firmware update. I bet they are just uploading python scripts, and if it fails, no worries Curiosity will receive a new program update. Hopefully they are blowing up media hype. I wish NASA would be more scientific when talking to the public. we are not all idiots, just 75% of the population won't understand it. It's a shame really. Maybe the rest of the public can learn more if they are not talked down to.
Its the drivers for new devices and operations programs that are more likely to have bugs. Plus they may learn more useful ways of operating things during the years they operate these probes.
I recall the 2004 Mars Opportunity computer nearly died about a month into its 2003 operation. The memory management for the then "new fangled" flash-drive wasnt freeing memory correctly. Opportunity had gone into safe-mode and rebooted about 30 times in a row. But JPL engineers manged to patch the driver and Opportunity is still working 9 years later.
Twenty-one years ago, the Deep Space 1 probe was controlled by an autonomous spacecraft control system called "Remote Agent". This was a Lisp program running aboard the spacecraft, 100,000,000 miles away from Earth. During the flight, they remotely debugged and fixed a race condition in the code that had not shown up during ground testing. This saved the day, and the Remote Agent was subsequently named "NASA Software of the Year". One of the developers said, "Having a read-eval-print loop running on the spacecraft proved invaluable in finding and fixing the problem."
What do you think: Conservative, or Liberal Programming? (lol)
Formal Analysis of the Remote Agent Before and After Flight
Lisp was also used for the Mars Pathfinder mission, although in that case it was not running aboard the spacecraft.
For those who wondered, they do have a reset, and it works most of the time. There was a lot of reprogramming done on the Deep Space 1 mission, and a few times there was a bug that hadn't shown up in the sandbox duplicate that they have in a lab, but they sent a reset - I think once it took a couple of days to finally identify a star that would let it reorient DS1's dish to earth - but there are safe modes it can fall back to. I'm sure that the same's true of Curiosity, that unexpected situations come up (I mean, that's what exploring's all about), and you have to rethink how to do what you need, and we have to do it for Curiosity, given the state of our AIs....
And yes, I do know what I'm talking about: I know Steve, the long-haired controller, personally, and a mailing list I'm on saw a lot of posts by him back then, and some for Curiosity.
mark
LANDesk must lag like a bitch when trying to do updates from that far away.
You have to watch the news conferences on the web (10 am pacific every day)-- they have many of the real engineers and scientists answering questions in a pretty good Q&A with reporters. *way* better than your average press conference. What happens in an article is that you have a reporter with little technical background working from a press release or some short summary, and they they're trying to dumb fit it into a short article written at 5th grade level.
Sending new software to missions after they leave earth is pretty standard, particularly for things with a long cruise phase. For MSL, they had EDL software with the control loops to get safely to the ground dominating things, and now are dumping the software that they don't need so they can use the space for code that will be useful on the ground. Something that's important to remember (and other posters have mentioned) is that you pay a lot for every gram of mass you send to another planet, so you can't go packing in a lot of extra stuff, and if you can dump something you don't need (like the landing software) to make room for something that's more useful (like driving around software) you do.
Another thing to remember (that's also already been noted) is that missions like this have technology freeze *long* before launch, so that you can ensure that everything will play together and you can test everything really thoroughly (every time there's a change you go through a lot of retesting, and it involves hardware so it's more work than just typing "make test").
The spacecraft TRAVELLED 350 million miles to get there, but as of tonight, Mars is only about 157.5 million miles from Earth.
Kilometres, miles they are all the same to NASA, especially when dealing with Mars.
I thought the last guy to be working with the Mars Rover before contact was lost was Howard Wollowicz from "Big Bang Theory"?
If you're building a spacecraft that's going to live 350 million miles away, wouldn't you have redundant EVERYTHING on it? I.E. the entire command and control system should be duplicated. That way you update the standby system and have some predefined self check the thing can do after the update's done, and if it doesn't pass the self check, then that system stays in standby mode so the operation of the system as a whole isn't impacted. You'd also probably have some sort of OOB access to the failed system via the primary system so you can go in and try to repair it.
"Installing surface software on @MarsCuriosity takes longer than on my laptop, but doesn't remind me to restart every 15 min when done. #MSL" - Bobak F, via Twitter
http://www.stickyminds.com/BetterSoftware/magazine.asp?fn=cifea&id=121 :
"Cumulative Usage
Resource Exhaustion
The cumulative usage of software tends to create more and more intentionally stored data. If storage resources are not managed carefully, this stored data causes file systems to fill up o free memory to be depleted, a problem known as resource exhaustion.
A dramatic example of resource exhaustion occurred on NASA's Spirit rover, which stopped communicating with Earth on January 21, 2004, after having landed on Mars just seventeen days earlier. Suspecting a problem with the flash memory, JPL engineers commanded the rover to boot up without reading the flash, and then deleted hundreds of unneeded files on the flash memory, which quickly addressed the problem. [11] The rover has now been running for more than five years, well surpassing its longevity design goal of ninety days of operation..."
Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
Any updates or changes should be tried on an exact duplicate here. Screw up and it's no big deal, it's here. Reminds me of a dumbass that wanted to update a machine across the US. I told him to do the one in the next room, configured exactly the same. But NOOOOO! We made him go out there and fix it. Closest airport was over 100 miles away.
Cause of failure
On November 10, 1999, the Mars Climate Orbiter Mishap Investigation Board released a Phase I report, detailing the suspected issues encountered with the loss of the spacecraft. Previously, on September 8, 1999, Trajectory Correction Maneuver-4 was computed and then executed on September 15, 1999. It was intended to place the spacecraft at an optimal position for an orbital insertion maneuver that would bring the spacecraft around Mars at an altitude of 226 kilometers on September 23, 1999. However, during the week between TCM-4 and the orbital insertion maneuver, the navigation team indicated the altitude may be much lower than intended at 150 to 170 kilometers. Twenty-four hours prior to orbital insertion, calculations placed the orbiter at an altitude of 110 kilometers; 80 kilometers is the minimum altitude that Mars Climate Orbiter was thought to be capable of surviving during this maneuver. Final calculations placed the spacecraft in a trajectory that would have taken the orbiter within 57 kilometers of the surface where the spacecraft likely disintegrated because of atmospheric stresses. The primary cause of this discrepancy was engineering error. Specifically, the flight system software on the Mars Climate Orbiter was written to take thrust instructions using the metric unit newtons (N), while the software on the ground that generated those instructions used the Imperial measure pound-force (lbf). This error has since been known as the metric mixup and has been carefully avoided in all missions since by NASA
The new software image is already on Curiosity's local 4GB flash file system. They just need to send the commands to reboot from the new image. According to the Chief Software Engineer during the press conference Fri. morning, they uploaded the R10 image back in June while still in cruise mode.
It is likely all they need to do is change a few boot-loader parameters and reboot to the new image. If it doesn't work, it probably will safemode back to the previous image. They also have a completely independent backup computer that can probably unbrick its twin if something goes sideways.
This kind of stuff is only dangerous when the goal is to prevent end-users from easily reflashing their mobile devices.
NASA had been doing "spectacular landings" and "terrifying software upgrades" their entire existence; not to detract from the awesomeness of it all, the recent spin-offs are just a publicity stunt! -- doesn't it strike you how all this got suddenly so-o-o-o-o-o terrifying and spectacular just about the time of NASA's budget cuts and the NASA's declaration of the fight for "hearts and minds" of its fellow american citizens? This is all fine and cool, of course, but /. should know better duh...
On an LCD panel 550 million km away:
UPGRADE FAILED
RESEND FILE
Amazing how people think things are so much easier and simpler when they've never done them before, isn't it?
He put his boots up on the table and made a face. "The sig," he smirked. "You can waste your life in search of the sig."