Slashdot Mirror


Upgrading Software From 350 Million Miles Away

CWmike writes "Picture doing a remote software upgrade. Now picture doing it when the machine you're upgrading is a robotic rover sitting 350 million miles away, on the surface of Mars. That's what a team of programmers and engineers at NASA are dealing with as they get ready to download a new version of the flight software on the Mars rover Curiosity, which landed safely on the Red Planet earlier this week. 'We need to take a whole series of steps to make that software active. You have to imagine that if something goes wrong with this, it could be the last time you hear from the rover,' said Steve Scandore, a senior flight software engineer at NASA's Jet Propulsion Laboratory. 'It has to work,' he told Computerworld. 'You don't' want to be known as the guy doing the last activity on the rover before you lose contact.'"

228 comments

  1. And NASA has made mistakes with this before... by YesIAmAScript · · Score: 4, Interesting

    It is a difficult task. While NASA has don'e a lot better than most of us programmers ever have, they have made mistakes in updating from Earth to Mars before.

    http://en.wikipedia.org/wiki/Mars_Global_Surveyor#Loss_of_contact

    --
    http://lkml.org/lkml/2005/8/20/95
    1. Re:And NASA has made mistakes with this before... by Taco+Cowboy · · Score: 4, Interesting

      That is why I do not understand why the NASA engineers want to take such a risk

      Unless it is a totally fatal software bug - that is, if they do not upgrade the software, the Curiosity rover gonna be bricked - I do not think taking the risk of bricking the rover for a regular software upgrade is worth the danger of bricking the rover, which is, as TFA has stated, 350 millions miles away
       

      --
      Muchas Gracias, Señor Edward Snowden !
    2. Re:And NASA has made mistakes with this before... by Anonymous Coward · · Score: 0

      I presume it is because they planned on it from the beginning to allow them more time to work of the flight software while the craft was in transit.

    3. Re:And NASA has made mistakes with this before... by Anonymous Coward · · Score: 5, Informative

      99% of brickings are the result of people doing stuff that the manufacturer did not intend for you to do, on devices where important design details were hidden for commercial reasons.

      This is unlikely (one would hope) to be the case here.

    4. Re:And NASA has made mistakes with this before... by hcs_$reboot · · Score: 4, Interesting

      why the NASA engineers want to take such a risk

      Similar to some devices here on Earth, the rover should have an automatic revert solution. For instance, a non-updatable software running on a separate processor detects specific conditions (like no signal from Earth for a while) and flashes back the updatable software to its original version when that condition occurs.

      --
      Slashdot, fix the reply notifications... You won't get away with it...
    5. Re:And NASA has made mistakes with this before... by Jane+Q.+Public · · Score: 4, Interesting

      "I do not think taking the risk of bricking the rover for a regular software upgrade is worth the danger of bricking the rover..."

      I guess it all depends on on (A) what the perceived value of the upgrade is, versus (B) the perceived risk.

      It's probably a safe bet that they learned from the Surveyor issue, and built in better tests and safeguards. I imagine -- although I don't really know -- that they have implemented something like the "rolling upgrades" that are common now, which allow processes to replaced on the fly one at a time, without reboot, and with a failsafe revert that runs at a higher level than any of those processes if anything goes wrong.

      It isn't like Windows, in which just about every time you install or upgrade something you have to make all the changes then "reboot". They get done one at a time, and they are tested individually after they are made.

      It sounds complicated but conceptually it's pretty simple: you have a top-layer monitor program program that accepts commands to replace lower-level processes. All it needs to be pretty "fail-safe" is to wait for a specified period of time for an "okay" signal from Ground Control. If it doesn't receive one in the specified time, it automatically reverts the process back to the old version. It's a little more involved than that, but that's the idea.

      Lots of software does that now. A lot has improved since 1996.

    6. Re:And NASA has made mistakes with this before... by cnettel · · Score: 5, Insightful

      why the NASA engineers want to take such a risk

      Similar to some devices here on Earth, the rover should have an automatic revert solution. For instance, a non-updatable software running on a separate processor detects specific conditions (like no signal from Earth for a while) and flashes back the updatable software to its original version when that condition occurs.

      Such things tend to be present, but how many times have they tested the automatic revert in actual conditions? An alternative codepath is always a risk.

      Updating the software can have great advantages. Only a slightly more reliable connection would allow vast amounts of more science to be done. Adapting the algorithms for autonomous functions such as simple navigation or sample processing also makes a great difference when your lag time for a single command is measured in terms of minutes and you don't even have that level of "real-time" access most of the time.

    7. Re:And NASA has made mistakes with this before... by Jane+Q.+Public · · Score: 1

      Haha, I wrote pretty much the same thing, at about the same time. See below.

    8. Re:And NASA has made mistakes with this before... by Solandri · · Score: 1
    9. Re:And NASA has made mistakes with this before... by kasperd · · Score: 3, Informative

      they have made mistakes in updating from Earth to Mars before.

      Sounds like it was not just a software update gone wrong but rather some mechanical problem which they were trying to work around. It was nothing like the usual bricking problem, where a firmware update overwrites code which is needed to perform future firmware updates.

      The rovers have several mechanisms to make it safer to update firmware remotely. But ultimately a combination of multiple unfortunate events can still lead to the loss of a rover. And one of those events may have been human error. From the description it sounds like mechanical problems with the solar panel, combined with two cases of human error in coordination of updates, another case of human error trying to correct the previous human errors, an unfortunate condition triggering a latent problem introduced by previous errors, and finally ending up in a position causing the battery to overheat, and loss of power being the ultimate reason it was impossible to adjust the previous mistakes.

      --

      Do you care about the security of your wireless mouse?
    10. Re:And NASA has made mistakes with this before... by gagol · · Score: 4, Informative

      That is probably why a team of 100 software engineers issues about 1000 commands per day for the rover. My guess is a lot of the work is triple checking everything before they upload an update. There is just no room for error in this situation.

      --
      Tomorrow is another day...
    11. Re:And NASA has made mistakes with this before... by gagol · · Score: 2

      For those wondering where the numbers come from, just read the article!

      --
      Tomorrow is another day...
    12. Re:And NASA has made mistakes with this before... by K.+S.+Kyosuke · · Score: 5, Funny

      99% of brickings are the result of people doing stuff that the manufacturer did not intend for you to do

      In that case, that should happen with deep space probes quite a lot.

      --
      Ezekiel 23:20
    13. Re:And NASA has made mistakes with this before... by TenDollarMan · · Score: 2

      Yeah, but maybe the new JellyBean will be totally awesome!

    14. Re:And NASA has made mistakes with this before... by Anonymous Coward · · Score: 0

      I was referring more to trying to restrict you from upgrading / modifying stuff rather than people trying to "hack" stuff or squeeze extra functionality out of something that they have the full schematics / source code of.

    15. Re:And NASA has made mistakes with this before... by Anonymous Coward · · Score: 2, Interesting

      Unbelievable, this is so stupid...
      WHY NOT INCLUDE SECOND BIOS? or whatever fuck they are using? if its so precious and easilly broken, why not use back up hardware? It's not like it would add another half kilo of weight???? Risk is TOO BIG not to do that. A few grams => problem solved.

    16. Re:And NASA has made mistakes with this before... by Anonymous Coward · · Score: 1

      It's not about updating the virus scanner or patching leaks.
      The rover had software loaded for landing, now it's getting software for exploring Mars.
      It would be a waste of resource to have both loaded at once since they are never required simultaniously.
      A 4Gb SD card at Best Buy may be cheap, but memory that can tolerate the temperatures, radiation and other hazards of space exploration for 3 years is a little pricier.
      While the risk of losing communication with the rover is there, I'm pretty sure there is a fallback for when the update fails.

    17. Re:And NASA has made mistakes with this before... by somersault · · Score: 1

      I think most of us thought it. That probably means that NASA thought it too. Unless they were really against doing such a thing to save space/weight, but I think a few extra grams and square inches to have a recovery partition is definitely worth it, considering bricking the thing means you just wasted several billion dollars..

      --
      which is totally what she said
    18. Re:And NASA has made mistakes with this before... by Anonymous Coward · · Score: 0

      I assume there is lots of stuff in there that is related to the landing process and post-landing startup that they don't need (i.e. they can clear space), there is plenty of stuff related to the landing site that they can now upload (e.g., position information, local time, etc.), and that they've been working on the software and improving/implementing things they needed all during the cruise phase, but didn't mess with uploading it before landing because landing was the focus. I also know they had a big batch of "post-landing" instrument data collection events that were pre-programmed in order to limit the time they'd have to spend developing each day's commands (e.g., the colour panorama that came down yesterday was a "canned" routine that didn't depend on knowing the orientation of the rover, which is why Mt. Sharp was cut off at the top).

      Basically they've done the initial tasks, and now it's time to clean out the unnecessary fluff and go to a different mode. They have fallbacks if it doesn't work properly (such as two duplicate computers).

    19. Re:And NASA has made mistakes with this before... by Sean+Hederman · · Score: 4, Interesting

      First off, shielded hardware is NOT a few grams. A second system adds a significant amount of weight. Each gram added to the rover is several hundred kilos more propellant required. In any case, they DID add a second system, which will take over in the event of an emergency. However, even then, an update is quite perilous, because you could theoretically brick the one system, and if something else goes wrong, you now have no backup.

    20. Re:And NASA has made mistakes with this before... by Anonymous Coward · · Score: 0

      There are two computers, so I am pretty sure that there are two BIOS, as it is usually the case in any spacecraft.

    21. Re:And NASA has made mistakes with this before... by wmac1 · · Score: 2

      There are 2 separate computers on the board. Perhaps they upgrade one of them and after it worked correctly they transfer control to it and upgrade the other one?

    22. Re:And NASA has made mistakes with this before... by Psicopatico · · Score: 1

      It sounds complicated but conceptually it's pretty simple: you have a top-layer monitor program program that accepts commands to replace lower-level processes.

      Sorry to nitpick, but you got it backwards: "lower level" means more close to the bare-metal and "higher level" means more close to the user. So:

      It sounds complicated but conceptually it's pretty simple: you have a low-layer monitor program program that accepts commands to replace higher-level processes.

      FTFY

      Besides this, I believe your post is 100% correct.

      --
      Mastering the English language is fucking easy: all you have to do is to put an f* word in every fucking sentence.
    23. Re:And NASA has made mistakes with this before... by Anonymous Coward · · Score: 0

      But it's patch tuesday.

    24. Re:And NASA has made mistakes with this before... by Confusador · · Score: 5, Interesting

      They do indeed have systems like that, if you're interested it's worth looking into how they dealt with the Sol 18 Anomaly on Spirit. Of particular note is the "Shutdown Dammit" command that they used to override everything else the rover was doing so it would stop wasting battery overnight.

      Seeing as they were able to update the software on a device that wouldn't even finish booting, I imagine the procedures for doing it on a functioning device are pretty robust, even if they're still nailbiting.

    25. Re:And NASA has made mistakes with this before... by coofercat · · Score: 1

      I imagine it's a lot easier to change the software than it is to change the hardware. I have no idea what kit the rover has in it, but since my phone camera used to take bad pictures until a software update came along, I should think Nasa probably want to upgrade the software in their cameras in preference to biking a new camera out to Mars. I seem to remember that the drill may contaminate samples with teflon or something - that being the case, I'm sure they've got a fancy filter than can remove most of the "noise" from the contamination in the traces they get from the samples the drill gets. I'll bet they'd rather do that on the rover itself than trying to do it all back home after some of the data has been lost.

      I also doubt they have one software system to rule them all. You can bet "we're doing a software update" is actually to update subsystems, which could be revived if the update failed for some reason (albeit with a tedious/expensive hand-crafted additional update procedure). I'll bet the core system used to manage all this stuff gets updates very rarely, but the peripherals could conceivably get them quite regularly.

    26. Re:And NASA has made mistakes with this before... by necro81 · · Score: 2, Informative
      In some cases, the software loaded on the device is not suited to the task the engineers want it to do. TFA mentions that the software on the device now is geared towards interplanetary cruise, EDL, and some very basic on-the-surface tasks. If they actually want the rover to do what they've sent it there to do, they need to perform the upgrade. Why not have the entire suite of mission software on the rover when it launches? Perhaps they hadn't gotten around to coding/testing the on-the-surface software yet. Probably the limiting factor is the program storage space on the rover. According to this JPL website:

      The computer contains special memory to tolerate the extreme radiation environment from space and to safeguard against power-off cycles so the programs and data will remain and will not accidentally erase when the rover shuts down at night. On-board memory includes 256MB of DRAM and 2 GB of Flash Memory both with error detection and correction and 256kB of EEPROM

      Think you'd be able to code everything the rover is ever meant to do, in a single unchanging program image, into just a few hundred kB?

      In other cases, upgraded software provides new capabilities that weren't envisioned during the original design. Spirit and Opportunity, for instance, were given lots of new capabilities over their mission life: like the ability to autonomously navigate based on Simultaneous Locating And Mapping (SLAM) using the various cameras. These are capabilities that were just in development in academia when the rovers were originally programmed, but became proven during the MER mission. As a result of having that autonomous navigation capability, Spirit and Opportunity were able to travel much further distances than they would have if every single wheel revolution needed to be commanded from Earth.

    27. Re:And NASA has made mistakes with this before... by Anonymous Coward · · Score: 0

      Each gram added to the rover is several hundred kilos more propellant required.

      I know what you're getting at, but I think your math might be a bit off there. Curiosity weighs approximately 900 kilos. If there's a linear relationship between weight and propellant required and you need 300 kilos of propellant per gram of weight you'd need 270,000,000 kilos of propellant to get it there in the first place.

    28. Re:And NASA has made mistakes with this before... by Anonymous Coward · · Score: 2, Funny

      pff worst case scenario : they send him over to mars to jtag the rover by hand...

    29. Re:And NASA has made mistakes with this before... by Anonymous Coward · · Score: 0

      Provided an update has no impact on critical Earth command reception link, I would use smth like a built-in watchdog timer, started automatically when initiating a process of software update, and deactivated only by detecting "Everything went OK" message from Earth within a predefined period (taking into account the current Earth-Mars distance for round-trip delay). When expired, it automatically reverts to a knowingly good software revision.

    30. Re:And NASA has made mistakes with this before... by mcgrew · · Score: 1

      They've upgraded software on the other two rovers, as well as probes even farther away. I doubt there's any reasonable chance they'll brick it.

    31. Re:And NASA has made mistakes with this before... by datapharmer · · Score: 1

      as long as it isn't HP writing the installer I'm ok with it... Installing printer... error... rolling back... installing.. error... ad infinitum

      --
      Get a web developer
    32. Re:And NASA has made mistakes with this before... by fisted · · Score: 4, Informative

      It's not a linear relationship since you need additional propellant to move the additional propellant you needed for the extra payload

    33. Re:And NASA has made mistakes with this before... by fisted · · Score: 1

      > > It sounds complicated but conceptually it's pretty simple: you have a top-layer monitor program program that accepts commands to replace lower-level processes.
      > Sorry to nitpick, but you got it backwards: "lower level" means more close to the bare-metal and "higher level" means more close to the user. So:
      Yeah. When talking about programming languages. His choice of words is perfectly fine.
      Next time, when nitpicking, at least do it right. You'll look less like a moron then, too

    34. Re:And NASA has made mistakes with this before... by Frans+Faase · · Score: 3, Informative

      If you would inform yourself, you would know that we are not talking about a general PC with 4Gbytes of memory here, but about a much smaller (but reliable and radiation hardend) PowerPC compatible system with limited RAM. The reason that they planned this update is because they want to remove the flight software for the trip to mars and replace it by software needed to drive and control the rover. It is true that they spend improving the software during the time that the spacecraft was flying to mars. That would be more than logical to do. Please note that the software for the Spirit and Opportunity rover also have been updated several times. It would not surprise me, that when they know the Curiosity Rover better, they will perform another software update.

    35. Re:And NASA has made mistakes with this before... by supertall · · Score: 1

      Launch windows to Mars only occur every two years or so. Talk about a deadline. Sometimes the software isn't quite done when launch time comes. (Worked flight software for the doomed Mars '98 and much more successful Stardust).

    36. Re:And NASA has made mistakes with this before... by Bigby · · Score: 4, Insightful

      I think it is safe to assume that they purposely bricked the rover (or test rover) before the mission. And made sure it played out as the GP stated. And that they did this many different ways.

    37. Re:And NASA has made mistakes with this before... by Frans+Faase · · Score: 2

      It is not such a big risk and it has been done many times before with all kinds of space crafts. And you should also realize that many safety precautions has been build into the system. It is definitely not like doing a OS update on a PC. I presume that in case something goes wrong, the rover will get into some kind of safe mode sooner or earlier, allowing to establish communication again. Safe mode communication is at a very slow speed and it could take some time to establish contact again, but in many cases it has been able to revive spacecrafts from safe mode. Please note that the Spirit and Opportunity rover have had several software updates and also experienced multiple events of getting into safe mode for software and hardware errors.

    38. Re:And NASA has made mistakes with this before... by pingbak · · Score: 1

      No, the likelihood of getting bricked is really small, although the likelihood of misaligned or damaged equipment failure is much greater.

      "Bricking" is really small because there is always a known, good image that preceded the update. In the case of a failure, these spacecraft go into a "safe hold" mode (there are actually several different safe hold levels). The lowest safe hold level ensures that the operator always has access to a low-level monitor. This monitor allows the operator to select which image is booted, so there's always a way to get back to a known, good state.

      The operator can really brick the vehicle if instruments and antennae get misaligned, but that's a cascade failure (multiple things have to go wrong) that could be fixed by a higher safe hold level.

      There's a lot of redundancy on these vehicles.

    39. Re:And NASA has made mistakes with this before... by dtml-try+MyNick · · Score: 1

      That, and I also imagine there are separate systems for the rovers main controls and for the "work"-tasks it has to do over there.
      Since they issue about a 1000 commands *each day!* it seems to me that those commands go to a sort of sandboxed environment on the rover to ensure that a relatively "simple" command like "focus camera C on that rock to the right" can never cause major malfunctions to the main system on the rover itself.

      --
      Life starts at the end of your comfort zone.
    40. Re:And NASA has made mistakes with this before... by Anonymous Coward · · Score: 0

      That is why I do not understand why the NASA engineers want to take such a risk

      Because there is only so much RAM and flash on the rover, and they probably don't have enough space to run code that is optimized for the flight phase and the ground phase at the same time. I'm betting they don't have any of the driving routines load currently.

      We're not talking about a regular PC or smartphone/tablet here. First off, it was designed in 2004, so try to remember back to what the typical machines specs were back then. Next, you can't use COTS componenents: you have to use hardened equipment to deal with the radiation and some extreme temperatures; COTS only generally runs from 0-40C or so, and space is -270C. You also have design in redundancy, so while the RAM of flash many have X million transisters, and on Earth you can use all of them, I would say at least half on the Rover are used for redundancy, because Kingston can't send a tech to Mars to replace a DIMM when the radiation fries it.

      TL;DR: there's only so much you can run on one of these rovers, and they're swapping out routines that they don't need any more for ones that will be used going forward.

    41. Re:And NASA has made mistakes with this before... by Mr.CRC · · Score: 1

      But isn't the low gain antenna omnidirectional, so that there will always be a link available to the MRO or Odyssey satellites?

    42. Re:And NASA has made mistakes with this before... by Ruie · · Score: 2

      I think it is safe to assume that they purposely bricked the rover (or test rover) before the mission. And made sure it played out as the GP stated. And that they did this many different ways.

      Ideally - yes. In practice, they have limited funds and lots of deadlines.

      If they had lots of time to debug it, there would be no need to upload new software.

    43. Re:And NASA has made mistakes with this before... by DrXym · · Score: 4, Funny

      Similar to some devices here on Earth, the rover should have an automatic revert solution.

      It does. Scientists put a small switch in at the back which you hold down while powering it up and it will reset itself.

    44. Re:And NASA has made mistakes with this before... by Jane+Q.+Public · · Score: 1

      "Yeah. When talking about programming languages. His choice of words is perfectly fine."

      It depends on your perspective I guess. Conceptually, a "higher level" or "meta" process is considered to be "above" those that it supervises. On a hardware level, though, I think they do it the other way around ("Ring 0" on Intel, for example.) But regardless, from an abstract point of view, the process is still "above" the others.

    45. Re:And NASA has made mistakes with this before... by houghi · · Score: 1

      That is why I do not understand why the NASA engineers want to take such a risk

      Access to Facebook?

      --
      Don't fight for your country, if your country does not fight for you.
    46. Re:And NASA has made mistakes with this before... by jpmorgan · · Score: 3, Informative

      No, it follows from the Tsiolkovsky rocket equation, and it is linear. The amount of fuel required is exponential in the delta-V required, but linear in the payload mass. m_1 = m_0 e^{- \Delta v / v_e}

    47. Re:And NASA has made mistakes with this before... by Anonymous Coward · · Score: 1

      Updating the software can have great advantages. Only a slightly more reliable connection would allow vast amounts of more science to be done. Adapting the algorithms for autonomous functions such as simple navigation or sample processing also makes a great difference when your lag time for a single command is measured in terms of minutes and you don't even have that level of "real-time" access most of the time.

      Yup, that happened with the Galileo probe. The high gain antenna couldn't be deployed because it's lubricants were dry, so they had to rewrite the software to make the probe compress the images in JPEG before sending them with the low gain antenna (a lot less bandwidth). Some information was lost, but it worked.

    48. Re:And NASA has made mistakes with this before... by KamuZ · · Score: 1

      They explained yesterday in the NASA News Update, the department chief in charge for the upgrades explained it as follows

      * Boot the rover with the Primary Computer to test the upgrade (no flashing)
      * If all works, flash the software in the Primary Computer and test
      * Boot the rover with the Backup Computer and test the upgrade (no flashing)
      * If all works, flash the software in the Backup Computer and test

      This is going to take about 4 days.

    49. Re:And NASA has made mistakes with this before... by uninformedLuddite · · Score: 2

      Why do I always get an erection when I read these sort of comments?

      --
      The new right fascists are bilingual. They speak English and Bullshit.
    50. Re:And NASA has made mistakes with this before... by RockDoctor · · Score: 2

      99% of brickings are the result of people doing stuff that the manufacturer did not intend for you to do

      In that case, that should happen with deep space probes quite a lot.

      ... or it would do, if the manufacturers and the users weren't the same group. Or, for the likes of NASA, the manufacturers of the flight hardware computers and the manufacturers of the flight software weren't two groups of the same organisation, both of whom would take equal accountability for a failure like this. (And probably work in the same building complex, if not the same office block.)

      Which is the norm for deep space devices, wherever they come from.

      Before someone points it out, they do buy in the radiation-hardened processors from an outside supplier. But that is one of the reasons that they are very conservative about which processors they use, to only use processors whose design they understand in great depth.
      In a related recent thread there was discussion about why the imagers on MSL used a 2MPixel sensor. One of the points that didn't get stressed much in that was that all of the imagers on MSL use the same sensor, whether it be a science-imaging sensor, a hazard-hunting imager, or a long-range viewing sensor. They all use the same sensor, because it is a sensor that the JPL imaging team understand well.
      As a different example of the same logic, they chose to not build a zoom lens for the long-range sensor (twice ; they started and stopped the zoom lens programme twice) because they couldn't find a way to build a zoom lens without using "wet" lubrication, and they don't have confidence in the behaviour of wet lubricants under Mars surface conditions, and couldn't afford the weight and power to heat the lens to non-Mars surface conditions. (Which kind-of begs the question of couldn't they put multiple fixed-focal length lenses onto a turret ; but maybe a turret would also have required "wet" lubrication?)

      --
      Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
    51. Re:And NASA has made mistakes with this before... by Anonymous Coward · · Score: 0

      i think nope... they only upgrade the flight module into the driving module..

    52. Re:And NASA has made mistakes with this before... by webmistressrachel · · Score: 1

      It's Microsoft you have to thank for that kludge of a package management / installer system, HP's drivers work fine one finally installed.

      The LaserJet hardware is pretty amazing, too. I still have loads of ancient ones slaving away in bondage in client's offices, still printing crisp, beautiful invoices, ah invoices...

      --
      This tagline was transcoded to result in at least one smirk. If you experience failure to smirk, please consult your Gen
  2. Actually... only 157 million miles away by ronhip · · Score: 5, Informative

    The spacecraft TRAVELLED 350 million miles to get there, but as of tonight, Mars is only about 157.5 million miles from Earth.

    1. Re:Actually... only 157 million miles away by Anonymous Coward · · Score: 5, Funny

      Forgot something and noticed halfway? Happens to me all the time...

    2. Re:Actually... only 157 million miles away by qbitslayer · · Score: 0

      Give or take a million.

    3. Re:Actually... only 157 million miles away by TubeSteak · · Score: 4, Funny

      Good news everyone!
      NASA will only have to wait half as long to find out if their software upgrade worked!

      --
      [Fuck Beta]
      o0t!
    4. Re:Actually... only 157 million miles away by toygeek · · Score: 1

      Oh is that all?

    5. Re:Actually... only 157 million miles away by K.+S.+Kyosuke · · Score: 1

      The space highways are curved a lot.

      --
      Ezekiel 23:20
    6. Re:Actually... only 157 million miles away by TenDollarMan · · Score: 3, Funny

      Curiosity made the Mars run in 1.82543347 × 10-5 Parsecs

    7. Re:Actually... only 157 million miles away by Dave+Whiteside · · Score: 1

      1.684Au or
      156,537,715 miles
      see
      http://www.fourmilab.ch/cgi-bin/uncgi/Solar/

      --
      who where what when now?
    8. Re:Actually... only 157 million miles away by RaceProUK · · Score: 2

      Good news everyone! NASA will only have to wait half as long to find out if their software upgrade worked!

      Now read that in Farnsworth-voice...

      --
      No colour or religion ever stopped the bullet from a gun
    9. Re:Actually... only 157 million miles away by MachineShedFred · · Score: 1

      If they do it right, it shouldn't matter if it's 157 feet, or 157 million miles.

      Are we supposed to believe that they haven't tested and hardened this before lighting the fuse?

      --
      Slashdot still doesnâ(TM)t support Unicode after it was added to the HTML standard in 1997.
    10. Re:Actually... only 157 million miles away by Bardez · · Score: 1

      Now let's see it do the Kessel Run

      --
      Perception is the thin dividing line between reality and fiction.
    11. Re:Actually... only 157 million miles away by pingbak · · Score: 1

      Gravity slingshots and curvature. It's very ineffecient to travel in a straight line from Earth to Mars.

    12. Re:Actually... only 157 million miles away by tehcyder · · Score: 1

      The spacecraft TRAVELLED 350 million miles to get there, but as of tonight, Mars is only about 157.5 million miles from Earth.

      That's what happens when you rely on a cheap TomTom sat nav.

      --
      To have a right to do a thing is not at all the same as to be right in doing it
    13. Re:Actually... only 157 million miles away by ajlitt · · Score: 1

      Slashdot: home of the armchair engineer.

    14. Re:Actually... only 157 million miles away by gstrickler · · Score: 1

      More to the point, Mars is never more than 2AU + ~38M mi =~225M mi from earth, and that only when it's directly on the opposite side of the sun.

      --
      make imaginary.friends COUNT=100 VISIBLE=false
    15. Re:Actually... only 157 million miles away by stepho-wrs · · Score: 1

      In that worse case you are assuming that the craft takes a direct path. But there's a big yellow object in the way...

      Wouldn't it have taken a nice curved path from Earth's orbit to Mar's orbit? Worse case would be if Earth and Mars were at the opposite ends of a huge ellipse and the craft had to ride the along the ellipse instead of cutting through the middle. Assuming this ellipse has 2A+38M mi diameter, multiply by 3 to get a (very) rough estimate of the worst case travel distance.

    16. Re:Actually... only 157 million miles away by gstrickler · · Score: 1

      It's not about the path the craft travels, it's current distance. Radio waves travel in an essentially straight line, so current distance is what effects the transmission propagation time.

      Exception would be if we can't transmit through the sun, in which case then Mars is opposite the sun, the transmission would either have to be delayed until we can transmit, or relayed off a satellite such as STEREO.

      --
      make imaginary.friends COUNT=100 VISIBLE=false
    17. Re:Actually... only 157 million miles away by rjr162 · · Score: 1

      Good news everyone!
      NASA will only have to wait half as long to find out if their software upgrade worked!

      Now read that in Farnsworth-voice...

      lol that's exactly how I automatically read it the first time!

    18. Re:Actually... only 157 million miles away by stepho-wrs · · Score: 1

      Point taken. I'd mixed up travel distance of the craft vs travel distance of the signal.

  3. Not the same cost to get wrong, but by Anonymous Coward · · Score: 2, Interesting

    Working in remote smart metering we have a similar problem, where you can brick meters if the signal drops at the wrong place, or firmware doesn't fit the hardware right.

  4. Wow by undulato · · Score: 5, Insightful

    NASA doing a software upgrade is not big news. This is going to be phenomenally safe. Much scarier doing software upgrades on millions of unknown hardware configurations globally than on one totally locked down platform no matter what distance or cost is involved.

    1. Re:Wow by AchilleTalon · · Score: 1

      Agree, NASA has done a complete upgrade on the previous rover. This isn't new stuff and it has been tested, the procedure is well known. Well, yes, someone may do stupid thing at the wrong time, however, the main difference is the speed of transfer and the delay between transmission and confirmation everything went fine. The environment is well controlled and I do not doubt there is fallback mechanisms in place. So, I'm sorry, on this one I am not really impressed by the NASA team.

      --
      Achille Talon
      Hop!
    2. Re:Wow by Anonymous Coward · · Score: 1

      I won't deny the difficulty of the unknown hardware configuration, but the challenges are definitely still there, communication reliability for example. Also, you can always fix a failed upgrade. NASA doesn't get that luxury.

    3. Re:Wow by Anonymous Coward · · Score: 0

      Why do you assume that if there is a communications problem, the update will fail and it can't be fixed?
      I would find it rather surprising when the update protocol was designed like that!

    4. Re:Wow by darkfeline · · Score: 1

      Not if you're remotely managing a server via SSH whose physical machine is located at some godforsaken place far away. Sure, if all hell breaks loose then someone can go fix it, but it sure as hell ain't gonna be me. If sysadmins can set up failsafes to keep updates from going wrong just to avoid trekking down the street, I'm sure NASA can set up failsafes to avoid permanently losing an extremely expensive piece of hardware millions of miles away.

    5. Re:Wow by Stuarticus · · Score: 1

      Apple Fanboi in the house! (sorry)

      --
      If you think someone isn't free to have a different definition of "freedom" you may be a tyrant.
    6. Re:Wow by arth1 · · Score: 4, Interesting

      That reminds me... I have sometimes wondered what security protocols NASA (and their Russian counterparts) have in place for their probes. Back from now to the 1970s, when security wasn't nearly as advanced as it is today.

      Is it possible that someone with a large directional backyard antenna can hack some of the probes? To be remembered as the man who killed Voyager 2 might be attractive for some people.
      And who's to say that this hasn't already happened? There are non-responding probes out there, with no evidence for why they failed.

    7. Re:Wow by Bigby · · Score: 1

      I'm impressed that they found budget dollars for proper testing. Maybe they estimated the cost of failure to be $2.5b. I wish I could do that at work.

    8. Re:Wow by bitingduck · · Score: 1

      Most of the budget for big missions like this (and even small missions, really) goes into testing, verification, and documentation. The cost of the stuff often seems incidental compared to the cost of all the testing to make sure it's going to work and documenting it.

  5. Failsafe by Wowsers · · Score: 1, Insightful

    For such expensive projects, would it not make sense to have two EPROM's, one containing the original known working system, and one for the new one. If the new version fails, the machine can fall back to the older version, switch between the two if there are more OS upgrades planned. If they have watchdog times on board to keep the rover going, surely they could do similar setup for the OS?

    --
    Take Nobody's Word For It.
    1. Re:Failsafe by Anonymous Coward · · Score: 3, Funny

      Thank you so much Mr. Wowsers for giving NASA this great idea. I suspect, given the genius of the thought, you will be contacted for employment shortly.

    2. Re:Failsafe by zachie · · Score: 2

      This, and also having a full replica of the whole rover on Earth to double check that any software updates won't screw the whole operation. But I can't imagine they are not doing these already :?

    3. Re:Failsafe by fatphil · · Score: 5, Informative

      Exactly. That's how it's done in the telecomms world (infrastructure, not terminals). Typically the new software is given three attempts to boot, and if it doesn't acknowledge that it's fully booted after three attempts, the bootloader falls back to the previous version of the software. Of course, things get tricker if you need to update the bootloader, but those should be very rare situations. However, they in turn can be handled a similar way (typically there's a 3-stage boot, the initial being a ROM bootstrap, then your bootloader, then the OS which you'll want to change).

      --
      Also FatPhil on SoylentNews, id 863
    4. Re:Failsafe by Anonymous Coward · · Score: 5, Informative

      Computers: The two identical on-board rover computers, called "Rover Compute Element" (RCE), contain radiation hardened memory to tolerate the extreme radiation from space and to safeguard against power-off cycles. Each computer's memory includes 256 kB of EEPROM, 256 MB of DRAM, and 2 GB of flash memory.[22] This compares to 3 MB of EEPROM, 128 MB of DRAM, and 256 MB of flash memory used in the Mars Exploration Rovers.[23]
      The RCE computers use the RAD750 CPU, which is a successor to the RAD6000 CPU used in the Mars Exploration Rovers.[24][25] The RAD750 CPU is capable of up to 400 MIPS, while the RAD6000 CPU is capable of up to 35 MIPS.[26][27] Of the two on-board computers, one is configured as backup, and will take over in the event of problems with the main computer.[22]

      http://en.wikipedia.org/wiki/Curiosity_rover#Specifications

      Data transfer speeds between Curiosity and each orbiter may reach 2 Mbit/s and 256 kbit/s, respectively, but each orbiter is only able to communicate with Curiosity for about eight minutes per day

      When you have little bandwidth, better get it right the first time.

    5. Re:Failsafe by Sollord · · Score: 1

      The rover has two computers ones a fully redundant back up and I'd hope they didn't build a system that requires both system to be upgrade at the same time...

    6. Re:Failsafe by PhunkySchtuff · · Score: 2

      Not only am I absolutely sure they've got more than one copy of critical data in flash, but they have two identical and redundant computers on board
      http://en.wikipedia.org/wiki/Curiosity_rover#Specifications

      From http://marsprogram.jpl.nasa.gov/msl/mission/rover/brains/

      The rover has two "computer brains" one which is normally asleep. In case of problems the other computer brain can be awakened to take over control and continue the mission.

    7. Re:Failsafe by wvmarle · · Score: 1

      According to the linked article, they have two computers on board.

      They're currently testing the computers to see everything works as intended, then upgrade the main computer, and if that goes fine upgrade the backup computer. Also the new software has been uploaded in transit, so at the moment they have both software systems (the landing system and the surface work system) on their craft.

      What is not clear from the article, is how independent these computers are. E.g. what would happen if the upgrade fails partially, with the main computer trying to take over the craft, while the backup computer is still on the original program.

    8. Re:Failsafe by Jane+Q.+Public · · Score: 1

      It's extremely unlikely they will do anything even remotely resembling a "reboot". Instead they will carefully replace one process at a time, with no restarting.

    9. Re:Failsafe by Jane+Q.+Public · · Score: 1

      In all honesty, except for the MIPS figure, that seems like pretty lame hardware for something of this importance.

      But I'll bet that it's misleading: the majority of the functions probably aren't performed directly in the CPU and main memory, but by sub-modules running off of PLAs.

    10. Re:Failsafe by kasperd · · Score: 2

      What is not clear from the article, is how independent these computers are. E.g. what would happen if the upgrade fails partially, with the main computer trying to take over the craft, while the backup computer is still on the original program.

      That's always a risk if you have two computers for redundancy. To completely solve that problem, you need four computers. But the algorithms for coordinating in such a scenario are complicated. So it might be safer to rely on systems being able to use the proper computer, with just two present. If you had a 3 out of 4 setup with the four computers running identical software, it only takes one software bug to bring down the system.

      --

      Do you care about the security of your wireless mouse?
    11. Re:Failsafe by Jane+Q.+Public · · Score: 2

      "If you had a 3 out of 4 setup with the four computers running identical software, it only takes one software bug to bring down the system."

      Not at all. You have a separate "supervisor" board that moderates among the computers. In a case like that, you only need 3 for Damned Good Redundancy, not 4.

      But I expect that NASA has good reason to have faith in the reliability of their dual machine.

    12. Re:Failsafe by gagol · · Score: 5, Insightful

      Given it is radiation hardened specs, those are fabulous! You cant just get your latest core i7 and expect it to work correctly once it escapes the protection of earth's magnetosphere. Also, heat dissipation is much more trickier when you dont have air to work with (space) or cannot afford top replace air filters for the cooling systems (mars).

      --
      Tomorrow is another day...
    13. Re:Failsafe by Jane+Q.+Public · · Score: 1

      "Given it is radiation hardened specs, those are fabulous!"

      Not really. That might have been true 10 years ago.

      Hey... they had to radiation-harden this thing against ITSELF.

      All I'm saying is: you can bet the hardware is in a well-shielded heavy metal box, and today all it takes is about 1/4 of a cubic inch to squeeze in another GB of RAM or flash.

      So I suspect that they are using a system that is a bit more "distributed" (conceptually) than your everyday PC.

    14. Re:Failsafe by Anonymous Coward · · Score: 0

      What are you talking about?

      No, it is not a bit more "distributed". The nodes are as specified and nothing more. And there is no point "squeezing" in anything else if it is not required.

      More silicon means more power, more heat, more weight, more volume, and more transistors that can go wrong.

    15. Re:Failsafe by fatphil · · Score: 1

      That depends on what OS they're running. And whether they need to change anything in that OS itself. And whether they think they can trust the current state of the system. If the reason you're patching the software is because there's a bug which means you can't trust the state of the system, such as a scribbler, then the last thing you want to do is to attempt to continue running in that state (even dumping your state for later debugging is dangerous - you can no longer trust the data that in the flash driver), you must start from scratch, i.e. a reboot.

      --
      Also FatPhil on SoylentNews, id 863
    16. Re:Failsafe by jkflying · · Score: 4, Informative

      The radiation this thing emits is NOTHING compared to the solar and cosmic radiation it would experience both in transit and on Mars. Putting everything in a metal box only helps so much, you still need specifically designed electronics which can handle the odd bit of radiation without dying. Even with a thick metal box you can't run an i7 on Mars, or not for very long at least. Your standard DDR3 isn't going to work either, or your standard EEPROM.

      The other thing to remember is that although this project is extremely important, they're still not going to throw more capabilities in than they need, because that is more that can go wrong. For a remote sensing platform, the amount of EEPROM isn't that important - you just need enough to hold your communication protocols, some basic reaction-to-obstacle algorithms and the motor control code. You aren't going to be pulling massive libraries in. The emphasis is on making it as simple as possible, so that there is less chance for bugs to creep in. Those extra MIPS will come in handy for the navigation and onboard image processing, and the flash for storing interesting info until you can upload, so those are what they have upgraded the most.

      --
      Help I am stuck in a signature factory!
    17. Re:Failsafe by CubeSat+developer · · Score: 1

      I think the problem here is not the risk of bricking the rover. It has two on-board computers in a redundant configuration, so if the update fails, the watchdog is going to switch to the secondary computer. Also, the integrity of the update is most likely protected by a checksum.

      The real risk is that a valid on-board software could send unexpected commands to the other subsystems (that is what happened on MGS and many other failed spacecraft). For example, it could instruct the rover to drive off a cliff ;-)

    18. Re:Failsafe by ourlovecanlastforeve · · Score: 2

      They're running vxworks and they do have a backup computer. First the backup is flashed and verified, then the primary is flashed and verified.

    19. Re:Failsafe by kasperd · · Score: 5, Interesting

      You have a separate "supervisor" board that moderates among the computers.

      And then that board becomes a single point of failure.

      In a case like that, you only need 3 for Damned Good Redundancy

      3 computers and a supervisor? That's already 4 components.

      If you want to handle t arbitrary node failures, then you need at least 3t+1 nodes in total. Whether you call the nodes for computers or supervisor boards doesn't change that fact. If you have t failures among 3t or fewer total nodes, then the failures can happen in a way that cause the functional units to receive so inconsistent information, that they are unable to do anything meaningful. It is a case of byzantine agreement.

      Any system designed to handle failures of one third or more components is making assumptions about how the failed components behave. If the failed components behave differently than the assumption, it takes even fewer failures to break the entire system.

      --

      Do you care about the security of your wireless mouse?
    20. Re:Failsafe by sageres · · Score: 1

      Used to work for Hubble... These guys were running Solaris 10 on their main computer....

    21. Re:Failsafe by fatphil · · Score: 1

      I had presumed it would be VxWorks, as I know they've used it in plenty of previous projects, and it certainly is one of the most capable embedded OSes in existence. By 'verified', do you mean just verifying a HMAC? Why 2 flashes - why not just a bank switch?

      --
      Also FatPhil on SoylentNews, id 863
    22. Re:Failsafe by Brian+Feldman · · Score: 1

      If by computer you mean some kind of ground system....

      --
      Brian Fundakowski Feldman
    23. Re:Failsafe by fuzzyfuzzyfungus · · Score: 2

      The RAD750 is quite limited in power; but has the advantage of being comparatively close to 'just going down to newegg and buying a motherboard' by the standards of projects that go into space and shop at mil/aero contractors... The price is still up in the "If you have to ask, don't ask" range; but doing a very-low-volume DIY would likely be worse still...

    24. Re:Failsafe by serviscope_minor · · Score: 3, Informative

      Not really. That might have been true 10 years ago.

      No.

      All I'm saying is: you can bet the hardware is in a well-shielded heavy metal box, and today all it takes is about 1/4 of a cubic inch to squeeze in another GB of RAM or flash.

      I wonder why they didn't think about that. A nice thick, heavy metal box. Easy! Perhaps you should go and work for NASA?

      Let's ignore the earth's magnetosphere for the moment and make some massive assumptions.

      The pressure on the ground is about 10^5 Pa. That means there's 10^4 Kg of stuff above you to absorb radiation from space. That equates to 10m of water, 1.25m of steel ot about 90cm of lead. Quite a lot.

      Mars is about 1.5 Au from the sun, so receives about 0.4 times the radiation.cos

      The atmosphere is about 600Pa, by comparison.

      Radiation hardening is a very well established field. Using some degree of shielding is just one of the many techniques in use. On Mars, it is simply not enough on its own.

      It is very, very difficult to make a rad-hard processor, and then very thoroughly test it. Yo can't just keep shrinking the feature size, because is it goes down, the effect of radiation increases. Not only that but as the amount of crystal per transistor shrinks, the chance of unrecoverable lattice damage increases, due to the lack of redundancy.

      There are faster Rad-hardened DSPs, but those are, well, DSPs and only actually really fast for DSP like tasks.

      There also are almost certainly faster ones available now. But it's been in transit for a year, and they certainly weren't building it with a brand-new untested processor for which thay had to write all the software on the way after they launched it.

      So, given the constraints, it's a pretty great CPU to have on board.

      --
      SJW n. One who posts facts.
    25. Re:Failsafe by Anonymous Coward · · Score: 0

      It was true ten years ago, which is coincidentally when the rover was originally designed. For things like this, they are hesitant to change technology half way through the process, even if that process takes a decade.

    26. Re:Failsafe by tlhIngan · · Score: 1

      They're running vxworks and they do have a backup computer. First the backup is flashed and verified, then the primary is flashed and verified.

      Don't forget they also have a backup system right there - NASA actually built TWO Curiosity rovers - only one is over there, but there's another one right here on the ground. (Likewise, they had three last time - Spirit and Opportunity went to Mars while one stayed behind).

      It's used for testing, naturally (especially when the rover gets stuck so they can test various fixes without incurring the various latencies in sending back a command and waiting for the results), so they would've tested the entire update procedure on the ground first (several times) to ensure they don't have any issues.

    27. Re:Failsafe by Anonymous Coward · · Score: 0

      When you have little bandwidth, better get it right the first time.

      It's not even the bandwidth. They're currently getting over 9600 bps, which I've used over serial console on Sun's over modem links.

      The big thing is latency: they have 14 minute ping times. A one-way trip for radio / light waves is seven minutes.

    28. Re:Failsafe by Anonymous Coward · · Score: 0

      Thank you so much Mr. Wowsers for giving NASA this great idea. I suspect, given the genius of the thought, you will be contacted for employment shortly.

      I wonder if it would be some sort of patent infringement if I took that idea and suggested it to Cisco and HP as a method for safely upgrading their routers and switches. Two slots for firmware would be awesome....hell--they could even have a few different versions of the boot configuration sitting around...

    29. Re:Failsafe by Jane+Q.+Public · · Score: 1

      "Putting everything in a metal box only helps so much, you still need specifically designed electronics which can handle the odd bit of radiation without dying..."

      I'm aware of that. My point was that even radiation-hardened, a little more RAM and a little more flash wouldn't take up very much room inside the box or weigh hardly anything. Therefore: that probably wasn't the reason they didn't include it. They felt they didn't need it.

    30. Re:Failsafe by Jane+Q.+Public · · Score: 1

      "I wonder why they didn't think about that. A nice thick, heavy metal box. Easy! Perhaps you should go and work for NASA?"

      I'm not stupid. Maybe I could have worded it better, but you missed my point. I wasn't talking about a box primarily for radiaton shielding. My point was that a little more RAM or a little more flash -- even radiation-hardened -- would take only a very tiny bit of additional room inside the box, and weigh hardly anything.

      "So, given the constraints, it's a pretty great CPU to have on board."

      And I didn't say anything at all about the CPU. If anything, I gave it a sort of left-handed compliment about the MIPS rating.

    31. Re:Failsafe by Jane+Q.+Public · · Score: 1

      "3 computers and a supervisor? That's already 4 components."

      I meant it in an abstract sense, not physically. If you actually built a "supervisor board" it would indeed be a single point of failure. But you can't get around that entirely, unless you have all 4 machines monitoring each other, and "voting" on the results, which would cripple your computing power.

      Unless you built special, additional circuitry into each one to do that, of course, in which case you have 4 different "supervisor boards". But you still have them. So the basic concept is no different. But you have just added greatly to your cost, and I assert that you would not have enough additional security to warrant 4 machines rather than just 3.

      "If you have t failures among 3t or fewer total nodes, then the failures can happen in a way that cause the functional units to receive so inconsistent information, that they are unable to do anything meaningful."

      I think that's a rather large assumption. What do you mean by "information" and "failure"?

      It depends on the design. What you do in a case like THIS is simply compare the outputs (relatively simple circuitry), and if one disagrees with the others it is simply shut down until the reason can be determined. It's a replaceable redundant system, not a system in which "failed" nodes continue to communicate with one another. And again: ALL you need to do in this situation is compare the outputs. Nothing else matters.

    32. Re:Failsafe by bitingduck · · Score: 1

      Watching the press conference this morning, they do the upgrade pretty much the same way I do OS upgrades at home, but it takes about 4 days. They have two computers (I use two drives, but I can go to the store to get replacement bits) and they do the upgrade stepwise, one computer at a time, and test the software on one computer (without letting it take over in a full boot) before they then boot into it and do a checkout then install it on the other. They did an upgrade a couple months ago during cruise to update the software for EDL, too.

    33. Re:Failsafe by Miamicanes · · Score: 1

      I'm kind of surprised that they don't go a step further, sandwich a pair of microSD card guts together inside a small box of lead (not much, maybe a gram's worth), and hang them from a secondary SPI interface & use them as a last-ditch garbage dump for all the stuff they've captured, but don't have the bandwidth to relay back to earth & don't have enough hardened flash to store indefinitely. That way, if the rover got itself into a position someday where it permanently lost its mobility, but could still broadcast to the satellite orbiting Mars, it could still make itself useful by spending the rest of its life uploading the secondary data that got skipped the first time around. If the non-hardened flash failed... well... it failed. But if it mostly worked, with maybe a few bit errors that something like Reed-Solomon could fix (and less intense algorithms could at least detect), it would be almost like a free bonus and "Plan B". Or, if it collected a LOT of potentially useful data it couldn't uplink to the satellite for power reasons, and they were planning to send another rover someday, they could add a relay station to the next rover & eject it so it landed close enough to the first to wake up the first rover, download its data, then uplink it to the satellite during periods when the new rover was inactive & the satellite would otherwise just be listening to silence from the ground.

    34. Re:Failsafe by mcgrew · · Score: 1

      That depends on what OS they're running

      It's certainly not running Windows. In non-retarded operating systems you don't have to reboot for system changes unless you're changing the operating system's kernel, and I really doubt that's what they're changing. I just don't see them doing a reboot unless the whole system crashes, which also is pretty unlikely.

    35. Re:Failsafe by Yoda222 · · Score: 1

      I'm not stupid. Maybe I could have worded it better, but you missed my point. I wasn't talking about a box primarily for radiaton shielding. My point was that a little more RAM or a little more flash -- even radiation-hardened -- would take only a very tiny bit of additional room inside the box, and weigh hardly anything.

      That's true for everything. Add an instrument, it's only 5% of the weight of the others instruments together, or add some fancy stuff, you will be able to do [something cool] and it takes only a few space more, ...

      At some point, you have to stop. Yes, RAM don't take a lot of place or weight, but it takes some. 256 MB of DRAM is a relatively good amount of RAM for a space robot. (if it's use only for the on board computer, not for payload related stuff).

      By comparaison, the "new" PF (it's based on evolution of previous design, but also supposed to be used for the future...) at Thales Alenia Space for LEO/MEO constellation (Globalstar second generation, and the upcoming O3b) has 4 MB of RAM for one CPU. Ok, it does a lot less than what you expect from a Mars Rover. (it's a small and cheap platform). I don't have the numbers, but I think that the Astrium/TAS new GEO alphabus is closer of 4 than of 256 (even in log2) Ok, these are all examples of commercial and mostly french spacecraft, but still.

      NASA software engineer are lucky, at least they use in the real world the upload procedure. When I was at TAS I spend a few weeks to validate the "in orbit, complete flight software uploading procedure", knowing that it will probably be never used...

    36. Re:Failsafe by Jane+Q.+Public · · Score: 1

      "At some point, you have to stop."

      I am aware of this, too. But you're really reaching here. A very tiny weight and volume of more RAM or more flash really would not make a significant difference... any more difference than molding that Morse code into the wheels.

      Therefore, I conclude as I did at first: they didn't feel it was important.

      At first glance, it still seems lame compared to your typical home machine... but I didn't claim that it WAS lame.

  6. hmm by strack · · Score: 0

    i hope theres a really, really good reason why the need to update the software at all

    1. Re:hmm by c0lo · · Score: 2

      i hope theres a really, really good reason why the need to update the software at all

      Well, zero-day exploits.. and Wikileaks... and anonymous not forgiving or forgetting... and Duqu/Flame/Mahdi...

      (grin)

      --
      Questions raise, answers kill. Raise questions to stay alive.
    2. Re:hmm by hey_popey · · Score: 4, Insightful

      Of course, not! They do it just for the lulz!
      More seriously, for space systems and embedded systems in general, due to resource constraints on-board, you usually cannot fit all the functionality you would like to in one software image. So you keep only what is necessary for the first mission, and then you replace the obsolete ones with the next thing you want to do.
      As a simplified example, when you launch a satellite, you will need it to deploy its solar arrays quickly (and do many initialization checks). When that is done, you could imagine changing this part of the software with something else...

      Also, they might have had time planning constraints on the project, and needed to launch with a simpler first version of the software, while finalizing the second one. That does happen.

    3. Re:hmm by Anonymous Coward · · Score: 1

      Yeah. That's probably too much to ask from an organization that successfully sent a 4 ton spacecraft into space for a 9 month voyage over half a billion kilometers with a mission to lower a nuclear powered, car sized rover on the surface of Mars by entering the atmosphere with a rocket crane to lower it.

      Damn cowboys, just sending out rovers willy-nilly. They've already sent up 2 rovers not long ago, I bet they didn't even have a good reason to send this one up at all. Yeah, they'll be updating the shit out of it for no reason, don't worry about that.

  7. Hold F8, Boot to Safemode - which lacks networking by DontScotty · · Score: 2

    By pressing F8 at the "Starting Windows 95" message, and then choosing Safe Mode from the Windows 95 start-up menu.

    Following these steps will gain you ultimate FAME and FAILURE - for updating the Mars software!!!

  8. Oblig. by AliasMarlowe · · Score: 4, Funny

    So what's their problem? Just tell a sysadmin to fix it.

    --
    Those who can make you believe absurdities can make you commit atrocities. - Voltaire
  9. it can fly? by wjh31 · · Score: 0

    maybe im missing something, but unless this update is going to make it sprout wings, why does it need flight software when it's already landed

    1. Re:it can fly? by Bonobo_Unknown · · Score: 5, Informative

      The point of the exercise is to replace the no longer needed flight software with software it can use to better perform it's tasks while on Mars.

      --
      We don't believe in radical loony monotheistic religions from the middle east -- we're Christians.
    2. Re:it can fly? by darkfeline · · Score: 1

      Didn't you know? NASA uses python. https://xkcd.com/353/

    3. Re:it can fly? by Jane+Q.+Public · · Score: 1

      Whitespace and the Red Planet would probably not get along.

  10. Re:Hold F8, Boot to Safemode - which lacks network by fatphil · · Score: 1

    I can't even get to that stage, it keeps giving me a keyboard error - did no-one stick one on Curiosity?

    --
    Also FatPhil on SoylentNews, id 863
  11. This Is Intense! by qbitslayer · · Score: 0

    Why, the life of a Mars Rover engineer is always intense.

  12. Os? by Anonymous Coward · · Score: 0

    Not that I want to start a flamewar, but does anybody know what os they use?does anybody know the exact hardware specs?
    I know that many sattelites carry around _reaheally_ old hardware and I'm really curious :-) what they send in a mars rover.
    Did they program it from scratch or is it some already existing project?

    1. Re:Os? by Anonymous Coward · · Score: 0

      VxWorks, on RAD750 processors, which are basically radiation-hardened PowerPC G3. There's plenty more details; JFGI,

    2. Re:Os? by Anonymous Coward · · Score: 1
  13. "flight software"? by Barnett · · Score: 1, Informative

    Why are they updating the "flight software"? I thought they were done with the flying bit?

    1. Re:"flight software"? by Anonymous Coward · · Score: 0

      Why are they updating the "flight software"? I thought they were done with the flying bit?

      Did you try reading the article?

      "Michael Watkins, a mission systems manager at JPL, said during a press conference today that a team of programmers are getting ready to upgrade Curiosity's software from a program optimized for landing to one optimized for working on the planet's surface."

      They can't help it if Journalists say summarize using in accurate language.

      How the !@#$ does this get modded informative??? Ignorant is the new informative on slashdolt.

    2. Re:"flight software"? by Barnett · · Score: 1

      Calm down, this is Slashdot. Nobody reads TFA.

    3. Re:"flight software"? by Anonymous Coward · · Score: 0

      It's the software they use to fly back to Earth.

    4. Re:"flight software"? by sunking2 · · Score: 1

      While it's a rover at this point, NASA considers it a space craft. The term "flight" is equivalent to "production". It means it's on the actual hardware and has passed the appropriate certifications.

  14. Imagine if it had been in kilometers by G3ckoG33k · · Score: 1

    Imagine how far it would have been if they had measured it in kilometers instead!

    Whoaw!

    .
    .
    -
    .
    .
    . ;)

    1. Re:Imagine if it had been in kilometers by fa2k · · Score: 1

      For those distances I just read "miles" as "kilometers". A factor of 1.6 doesn't really make a huge difference for a casual understanding.

    2. Re:Imagine if it had been in kilometers by Anonymous Coward · · Score: 0

      Especially since they got the distance wrong anyway.

    3. Re:Imagine if it had been in kilometers by zippthorne · · Score: 1

      Astronomers use cgs for some reason, so you're both wrong...

      --
      Can you be Even More Awesome?!
  15. big deal by Anonymous Coward · · Score: 0

    I has an arm, doesn't it? So it can push its own reset button and go into the BIOS if need be.

  16. Should have gone with Debian.. by Anonymous Coward · · Score: 2, Funny

    sudo apt-get update mars

    1. Re:Should have gone with Debian.. by Anonymous Coward · · Score: 1

      Is that in the universe or multiverse repository?

  17. Re:Hold F8, Boot to Safemode - which lacks network by Anonymous Coward · · Score: 0

    I found it quite funny (a 1995 pc wiz kid telling how you should do it), but the redundant "FAME and FAILURE" line kind of ruined it.

  18. Stop Calling Mars "The Red Planet"! by Fleetie · · Score: 0, Flamebait

    There is this global media obsession with referring to Mars as "The Red Planet". It is really irritating.

    Mars has a name, just like all the other planets in our solar system: Its name is "Mars". So use it, and respect the planet and its name.

    It's so irritating and "media lovvie". Also, the planet it not really "red" at all. It's brown. It belongs in exactly the same category as media types referring to scientists as "boffins". It's RUDE and DISRESPECTFUL.

    I wish the media would shed this ridiculous obsession with ignoring the name of the planet MARS.

    --
    "Absorbing your worst..."
    1. Re:Stop Calling Mars "The Red Planet"! by Anonymous Coward · · Score: 1, Funny

      i spoke to mars and it assured me it wasn't offended

      but it was happy for you to be offended on its behalf

    2. Re:Stop Calling Mars "The Red Planet"! by Anonymous Coward · · Score: 0

      My acquaintance from the Middle Kingdom mentioned something similar when dicussing the amount of filthy lucre we could liberate from the Sunshine State. Apparently there are hidden deposits of black gold to be found.

    3. Re:Stop Calling Mars "The Red Planet"! by Anonymous Coward · · Score: 0

      I really doubt our native "brown bear proletariat" will ever allow "sucking the life-blood" of "mother earth..."

  19. Re:Hold F8, Boot to Safemode - which lacks network by wvmarle · · Score: 2

    No keyboard found. Press to continue.

  20. Re:Hold F8, Boot to Safemode - which lacks network by wvmarle · · Score: 1

    No keyboard found. Press <F1> to continue.

    (correcting for HTML... preview? What preview? Oh, that preview...)

  21. Yes but by Anonymous Coward · · Score: 1

    which language do they use to tell the rover where to drive? Surely, it has to be Logo

  22. Software upgrades.... by disi · · Score: 2

    It will sit there forever: "Are you sure you want to update? Yes/No"

    1. Re:Software upgrades.... by lxs · · Score: 2

      Have you tried turning it off and on again?

  23. Pressure changes things by jeko · · Score: 5, Interesting

    Get a 10-foot 4X4 piece of lumber. Drop it flat on the ground. Walk from one end to the other like a balance beam. I'll bet you can do it. I'll bet you can do it blindfolded, walking backward. I'll bet you can do it reciting the alphabet backward. I'll bet you could do it drunk.

    Take that same 4X4, suspend it 20 stories in the air between a couple of cranes. Put a bunch of razor sharp, rotating propellers on the ground beneath it. Intersperse the propellers with oil drillbits pointed up, not down for once. Have a bunch of trained turkey vultures flying around to watch you fall. Take your wife, kids and your momma, put a gun in their mouths while the Joker cackles that when you fall, he's gonna blow their heads off. Bring in the television cameras and monitors so the whole World can watch and you can watch them watch. Have some intern read the tweets and comments sections about your plight over the loudspeakers.

    Now, there are a few ice-blooded "Licensed to Kill" Double-O men who could keep it together and walk that beam under that kind of pressure. Mary Lou Retton and Nadia could, no doubt. I seriously doubt I could.

    Is it a big deal to do a software upgrade under such tightly controlled conditions? Not really. But try doing that software upgrade when billions of dollars and your career is on the line, with the whole world watching. The guy who screws that up is gonna be a punchline and a byword for a few decades, a real Wilson if you've read that book. :-) You'll be known as the guy who screwed up Mars.

    Tell me there wouldn't be maybe one or two drops of sweat on the keyboard...

     

    --
    He put his boots up on the table and made a face. "The sig," he smirked. "You can waste your life in search of the sig."
    1. Re:Pressure changes things by Jane+Q.+Public · · Score: 0

      It isn't anything like that at all.

      It's more like: the 4X4 is still on the ground, and there are no drills or propellers. It's just that you have to do it E-X-T-R-E-M-E-L-Y S-L-O-W-L-Y, with long pauses between each step. And if you do fall off, despite all the care taken, you lose your house and your car.

    2. Re:Pressure changes things by jkflying · · Score: 1

      And all your colleagues lose their last few years of work.

      --
      Help I am stuck in a signature factory!
    3. Re:Pressure changes things by Anonymous Coward · · Score: 1

      You should write children's stories.

    4. Re:Pressure changes things by fuzzyfuzzyfungus · · Score: 2

      And hell hath no fury like epic nerd-rage...

      If the firmware guys brick this thing, they'll probably be found in either the decompression test chamber with their eyeballs boiling off, or floating in the old hydrazine tank out back.

    5. Re:Pressure changes things by Anonymous Coward · · Score: 0

      The guy who screws that up is gonna be a punchline and a byword for a few decades ;

      I would seriously hope the code and plan is not review by only one guy, but by about 10.

    6. Re:Pressure changes things by tehcyder · · Score: 1

      LOL you have a worryingly vivid imagination.

      --
      To have a right to do a thing is not at all the same as to be right in doing it
    7. Re:Pressure changes things by LateArthurDent · · Score: 1

      Get a 10-foot 4X4 piece of lumber. Drop it flat on the ground. Walk from one end to the other like a balance beam. I'll bet you can do it. I'll bet you can do it blindfolded, walking backward. I'll bet you can do it reciting the alphabet backward. I'll bet you could do it drunk...Take that same 4X4, suspend it 20 stories in the air between a couple of cranes...

      Dude, I've recently had a remarkably similar experience. You don't need anything anywhere near your amount of complexity and added pressure.

      I was hiking with a group of friends. River level was a bit higher than normal, and you couldn't cross it through the normal stones used as part of the trail. However, there was a tree trunk across the river, and we decided we'd walk through that. That trunk was most certainly not an alternate people-approved path. It was pretty high over the river (I'd say about 10 ft), and there were some nasty rocks below. On the other hand, it was pretty thick. We tested our ability to walk it by walking the part of the trunk that was over land on our side. Piece of cake, plenty of surface area, it wasn't hard to maintain balance at all. I could walk it at my normal walking pace. So we decided it was safe to cross.

      I climbed on top of the land section, walk forward easily...as soon as I was above the river, I was having trouble balancing myself. Started walking really slowly, and actually almost fell about 2/3rds of the way through. As soon as I was past the river and the trunk was on top of solid land again, no more problems balancing myself. It's incredible what the psychological effect is on such things, and I don't even have a fear of heights, I'm a freaking skydiver. I didn't really feel much fear, but simply had a much harder time doing a simple task once actual danger was involved.

  24. Aww come on, 350 million...please by Anonymous Coward · · Score: 0

    I just got back from MARs on Tuesday the tacos suck, the clubs are dead and the girls all wear suncreen with shitty tans..I'd so done with it...rather be in NY on a Wed night...

  25. Wrong Question by mutube · · Score: 1

    What we really need to know is why it didn't need flight software BEFORE now?! Obviously it isn't really on Mars... if 'Mars' even exists. Lizards all the way down I tell you! LIZARDS!!

  26. Re:A little late? by MSojka · · Score: 1

    "...as they get ready to download a new version of the flight software on the Mars rover Curiosity..."

    Flight software? She flying back too?

    "Flight" as in "fight-or-flight response". You know, in case Curiosity encounters Martian life which think it's delicious ... or at least interesting enough to study and take apart.

    Those people at NASA think of everything ...

  27. Just keep Wolowitz away by TheHonch · · Score: 1

    from the controls and everything should be fine

  28. Risk mitigation by WinstonWolfIT · · Score: 1

    I don't feel I could begin to appreciate the issues these rocket surgeons deal with, but if it were my project, there would be two rovers, the guinea pig in Dalton, Ohio (there should be a penalty for bricking the test rover) and the one that gets the exact same script that succeeded in Dalton. Human hands should never directly touch a mission critical system.

    1. Re:Risk mitigation by Anonymous Coward · · Score: 0

      There are two rovers, one on earth for testing, etc.

    2. Re:Risk mitigation by biodata · · Score: 1

      I thought the point the poster was trying to make was that there should be two rovers ON EARTH - one that they try stuff out on (the dev rover), and one that they only do stuff to that actually gets done to the one on Mars (the test rover). That way they can hopefully control for the effects of doing and undoing changes, and they will always have one system here that is in the same state as the one on Mars, except for while the one on Mars is being updated. Engineers often like to set things up that way for supporting important systems.

      --
      Korma: Good
    3. Re:Risk mitigation by khallow · · Score: 1

      If it were my project, there'd be hundreds of rovers.

  29. There's some good related stories here by Grindalf · · Score: 2, Insightful

    If you follow "Scott Maxwell" in google plus, there are some great snippets about the landing and software. See: https://plus.google.com/u/0/112648317373638762082/posts

    --
    The purpose of existence is to make money.
  30. What's the problem.... by Anonymous Coward · · Score: 0

    If the software upload doesn't work, there are plenty of tools to help NASA fix it.
    Some that come to mind are the (in)famous My Clean PC. If they had been smart
    and purchased the extended warranty at the checkout, Geek Squad could help, too.

  31. Sensationalism at it's WORST! by Anonymous Coward · · Score: 0

    The headline should be "OMG! WE ARE TEH BUZY SO FAST!!!!shift-1"

    The reason I say this is because it NEVER covers the fact that in possibility the programmers MUST have a Development Environments, Quality Assurance Environment, Staging Environment and Acceptance Testing Environments. Is it agile? Is it waterfall? WTF is the IT? and WTF is the I.T.?

    Hell if you truly want to be technical and have a full fleshed out story you would say "In addition to the n flops uber computer simulators that introduce transmission failures and other physical environmental factors... We have the original prototype to exact specifications on the ground, in the labs here..."

    Can a HaX0r hijack the uberWifi signal on mars and attack the aliens living there? If we divide by zero can the solar collectors and internal power source create an uncontrolled fusion? That's what I would like to know!!!

    The original article itself does not cover "How does one prevent bricking 350m mile away equipment."

    1. Re:Sensationalism at it's WORST! by Anonymous Coward · · Score: 0

      it's means it is

  32. Should be easy enough by symes · · Score: 2

    They are bound to have a copy of Curiosity here on Earth, surely? So they should be able to thoroughly test the process first. Ok, it is not Mars and there might be issues specific to transmitting that data over such distances... but still. I'd be really surprised if this hasn't been thoroughly tried and tested.

    1. Re:Should be easy enough by ledow · · Score: 1

      More than that, if you design the system properly it would never be a problem.

      Watchdog timers on everything - on the hardware coming up, on the communications with Earth, etc. If you don't get a response from the timers in X seconds/minutes/days, then completely revert to the previous version of the software and try again.

      So if you upgrade the software and break the radio, in a day or so of not being able to talk to Earth, the machine should notice and revert back to the previous software. If you break the upgrade completely, the watchdog timers for, e.g. OS-level monitoring, sensor control, radio-to-Earth, etc. will eventually trigger and then you can revert.

      And *don't* let it remove a previous version of the software - just keep updates which automatically fall back to prior updates or the original mission software when they fail.

      The biggest problem you have is not the software update, it's purely corruption of the hardware, which you can't do much to combat if it happens. But even then, I'd expect the boot sequence to start with something so minimal that it's capable of, say, checking the whole of RAM and avoiding anything that's a bit dodgy (e.g. Linux BadRAM-patch-style) and reverting to *literally* just something that shouts for help from the radio if it can't.

    2. Re:Should be easy enough by codman1 · · Score: 1

      yes there is a a working mirror image of the rover all bells and whisle's they use to test and retest everything before uploading any new software. so i dont know what the fuss is about. think its a headline generator to keep people interested till it start doing the real sciency stuff. like using that laser to graffiti names on rocks :D

    3. Re:Should be easy enough by Bigby · · Score: 1

      If the minimum transmission rate is X, you can be sure that they are testing X/10 transmission speeds on the copy on earth. Also, they are testing partial transmissions and corrupted transmissions.

  33. Well, it's not really the same by aglider · · Score: 1

    But the tecnologies used in some botnets are a goot starting points.
    That'd be, call home and try to pull anything you need to do the upgrade.
    The orbiter relay should be doing the same, first.

    --
    Sent as ripples into the electromagnetic field. No single photon has been harmed in the process.
  34. tried any of this in the field? by dutchwhizzman · · Score: 1

    It takes 3-5 years to field test this stuff. It takes years of preparation after the final decision of what hardware to use before you get to launch the thing and after that, to get it to mars. You are looking at the best of the best, proven technology hardware available for this sort of radiation tolerance at the moment they had the last opportunity to make design changes.

    One does not simply fly to Mars.

    --
    I was promised a flying car. Where is my flying car?
    1. Re:tried any of this in the field? by Jane+Q.+Public · · Score: 1

      "... One does not simply fly to Mars."

      I am aware of this also. I mentioned that a lot had changed since 1996... when the last two rovers were designed. Slice it however you like, there is still an 8-10 year difference.

  35. Terminology? by Guppy06 · · Score: 1

    a new version of the flight software on the Mars rover Curiosity

    Is anybody else thinking that any changes to the flight software is now a few days too late?

    1. Re:Terminology? by the+eric+conspiracy · · Score: 2

      Yeah, the freakin summary is potty as usual.

      They aren't upgrading the flight software. They are replacing the flight software with driving around and exploring software.

  36. OTA by ziviani · · Score: 1

    The use of tag "ota" is technically wrong.

    1. Re:OTA by Anonymous Coward · · Score: 0

      OTNV (near vacuum)?
      OIS (interplanetary space)?
      It's technically OTA for a brief part of the journey from NASA's DSN antenna until it leaves the atmosphere. And perhaps again if we call the Martian atmosphere "air?"

  37. How to prevent bricking by Anonymous Coward · · Score: 0

    Here is how SW is managed on a spacecraft: you have a 'golden' image, residing on a physically separate PROM, which is write-protected in HW. This image is tested on the ground, before launch, and cannot be changed ever. Then you setup a HW watchdog that resets to the golden image if you don't hear from Earth every N days.

    One or more operational SW images are stored in a separate EEPROM (or Flash) and you upgrade one of those at a time. Before booting up the upgraded image you verify the load.

    If done correctly, the worst that can happen if you botch the upgrade is that you lose a few days waiting for the watchdog.

  38. virus protection by gsgriffin · · Score: 5, Funny

    Probably concerned that their virus software is now out of date after the long journey.

    --
    jsut athnoer menagiensls ltitle psrhae for you to dcoede. Why do we wtsae our tmie dnoig tihs?
    1. Re:virus protection by mcgrew · · Score: 1

      Virus software? I doubt Curiosity is running Windows, and nobody at NASA is going to be fooled by a trojan. Well, unless they let Walowitz loose...

  39. Unless This Happens... by sfhock · · Score: 1

    Butt sex requires a lot of lubrication, right? Lubrication. Lubruh... Chupuh... Chupacabra 's the, the goat killer of Mexican folklore. Folklore is stories from the past that are often fictionalized. Fictionalized to heighten drama. Drama students! Students at colleges usally have bicycles! Bi, bian, binary. It's binary code! If people don't wear jackets they could get cold. A cold is caused by a virus. A viru- a computer virus! We could make a computer virus and send it to their ships to disable their computers!

    --
    "Let's go find some Turian and beat the shit out of him ... That always cheers you up!!"
  40. If you love testing, work in aerospace by jcadam · · Score: 1

    I once worked on simulation software for a new satellite that could be patched on-orbit (an orbiting satellite might as well be on Mars -- if you break it, it's going to stay broken). One of the main purposes of the software simulator, which ran the actual flight code that was on the bird, was to test new patches before they were pushed to the vehicle (and the vehicle itself did some validation of the patch after the upload was complete before applying it). Of course, hardware-in-the-loop testing using a duplicate test satellite on the ground was also done as a final step. In addition to a software simulator, I'm sure NASA has a duplicate rover or two in their labs for testing. The amount of testing done on these programs would drive you insane.

  41. Can't they just SSH into it? by scorp1us · · Score: 1

    I mean, the lag is going to be on par with SSH in to a terrestrial server with my AT&T service and cell phone.

    --
    Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
  42. NASA should go on a hiring spree by Anonymous Coward · · Score: 0

    here on slashdot. So many genius level ideas and suggestions.

  43. no worries by Anonymous Coward · · Score: 0

    Upgrade the flight software all you want. The rover is on the ground and doesn't intend to fly.

  44. Latency! by DarthVain · · Score: 1

    People complain of 300ms of latency here on earth with their ISP. I have heard it takes 14 MINUTES for a signal round trip. Thats 840 seconds, or 840,000ms of latency. So you are not exactly programming on the fly.

    The worst part, would be that presumably there is some pretty robust simulated debuggery on earth before anything gets transmitted. However once you finally tested, confirmed, compiled, packaged etc... and press the send button. You have to wait likely an eternal excruiciating 14 minutes before you know if your code actually worked, or if you just broke seveal billion dollars worth of project...

  45. It's easy by Anonymous Coward · · Score: 0

    The lander OS upgrade system should include a failsafe mechanism where if the "user" doesn't confirm the new settings within a certain amount of time then the system reverts to the previous settings/OS/software.

  46. This is NASA, give them a break by TheSpoom · · Score: 1

    I love how everyone here is like, "Y'know, they really should have a backup software solution on the rover" or "If I was doing this, I would do this, that, and the other thing, and they're stupid for not doing that".

    An awful lot of assumptions being made about people who are probably the very top of their game. I'm going to give NASA the benefit of a doubt here: I think they wouldn't do the upgrade unless it was very beneficial, and I'd bet they're doing it in a way that has layers upon layers of safeguards.

    --
    It's better to vote for what you want and not get it than to vote for what you don't want and get it.
    - E. Debs
    1. Re:This is NASA, give them a break by R3d+M3rcury · · Score: 1

      I love how everyone here is like, "Y'know, they really should have a backup software solution on the rover" or "If I was doing this, I would do this, that, and the other thing, and they're stupid for not doing that".

      Well, some of it is the story.

      Is it possible to brick the rover? I'm sure it's possible. A number of bad things would have to happen for this to occur, most of which have been probably been figured out and designed around. But they call them "unknown unknowns" for a reason.

      This, of course, increases the drama, which is important for a news story to appeal to the masses. You have to have that dramatic element. So you add emphasis to the unlikely possibilities and downplay the odds of such a thing occurring.

      Also, it's better to predict doom and gloom than success because if the doom-and-gloom occurs, you're amazingly prescient and you can say, "See! I told you this was a bad idea!" If it doesn't, your predictions of doom-and-gloom will be easily forgotten among the euphoria of success. Worst case scenario, you can point out how "lucky" they were that everything worked out for the best.

    2. Re:This is NASA, give them a break by TheSpoom · · Score: 1

      Is it possible to brick the rover? I'm sure it's possible. A number of bad things would have to happen for this to occur, most of which have been probably been figured out and designed around. But they call them "unknown unknowns" for a reason.

      It's actually already happened. I'll still give the engineers behind it the benefit of a doubt. I bet they learned a lot from last time.

      Also, it's better to predict doom and gloom than success because if the doom-and-gloom occurs, you're amazingly prescient and you can say, "See! I told you this was a bad idea!" If it doesn't, your predictions of doom-and-gloom will be easily forgotten among the euphoria of success. Worst case scenario, you can point out how "lucky" they were that everything worked out for the best.

      Yeah, that sort of political game rubs me the wrong way. I try to avoid it.

      --
      It's better to vote for what you want and not get it than to vote for what you don't want and get it.
      - E. Debs
  47. I think we should all hope... by Brewster+Jennings · · Score: 1

    That NASA has learned from the experience of upgrading Sojourner to WinAMP 0.92...

    1. Re:I think we should all hope... by DrXym · · Score: 1

      They didn't account for the upgrade putting AOL icons all over their desktop.

    2. Re:I think we should all hope... by Brewster+Jennings · · Score: 1

      "Or causing Sojourner to develop an inexplicable liking for Smash Mouth," he replied, hoping desperately that inane pop culture references from 1997 would trick people into thinking it was funny and give him a 5, since, after all, it worked for Family Guy.

      Quietly, he clung to hopes of a Pity 5 through a desperate use of internal dialogue and a clumsy attempt to break the fourth wall.

  48. So, actually, it was NOT Curiosity ... by zapyon · · Score: 1

    who killed the cat. It will have been that darn NASA engineer who killed Curiosity. ;-)

    --
    I like my spaghetti with source.
  49. Data had such a thing by Impy+the+Impiuos+Imp · · Score: 1

    You would want a deeply-embedded, simple HW module listening in on the raw radio link for a special code, and it then initiates flashing of the main module.

    If this is well-done, no matter what king kong fuckup happens on the main processor(s) you can always have the little tough guy rip it a new asshole.

    We do stuff like this in the auto environment.

    --
    (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
  50. Protocol? Fallback? by zapyon · · Score: 1

    What kind of protocol can be used for transmissions like that?

    And, if anything goes wrong, and Curiosity throws up, eh, an exception, how can it fallback to a sane state? Someone further up this discussion suggested a mechanism where losing contact to base control for a certain period would trigger a revert to the previous version. But losing control may have totally different reasons.

    Does anyone know how they do this stuff? Are they actually programming Curiosity in Python?

    --
    I like my spaghetti with source.
  51. Now They Find A Bug? by Toad-san · · Score: 1

    Damn, guys! If it ain't broke, don't fix it!

    If it _is_ broke .. this is a hell of a time to find out about it. How about some more details, eh?

  52. I am willing to bet they are not updating firmware by jerryjnormandin · · Score: 1

    I doubt they are running a remote firmware update. I bet they are just uploading python scripts, and if it fails, no worries Curiosity will receive a new program update. Hopefully they are blowing up media hype. I wish NASA would be more scientific when talking to the public. we are not all idiots, just 75% of the population won't understand it. It's a shame really. Maybe the rest of the public can learn more if they are not talked down to.

  53. core OS (realtime VX-UNIX) is 30 years old by peter303 · · Score: 1

    Its the drivers for new devices and operations programs that are more likely to have bugs. Plus they may learn more useful ways of operating things during the years they operate these probes.

    I recall the 2004 Mars Opportunity computer nearly died about a month into its 2003 operation. The memory management for the then "new fangled" flash-drive wasnt freeing memory correctly. Opportunity had gone into safe-mode and rebooted about 30 times in a row. But JPL engineers manged to patch the driver and Opportunity is still working 9 years later.

  54. use Lisp (worked before @ 100 million miles) by cstacy · · Score: 1

    Twenty-one years ago, the Deep Space 1 probe was controlled by an autonomous spacecraft control system called "Remote Agent". This was a Lisp program running aboard the spacecraft, 100,000,000 miles away from Earth. During the flight, they remotely debugged and fixed a race condition in the code that had not shown up during ground testing. This saved the day, and the Remote Agent was subsequently named "NASA Software of the Year". One of the developers said, "Having a read-eval-print loop running on the spacecraft proved invaluable in finding and fixing the problem."

    What do you think: Conservative, or Liberal Programming? (lol)
    Formal Analysis of the Remote Agent Before and After Flight

    Lisp was also used for the Mars Pathfinder mission, although in that case it was not running aboard the spacecraft.

  55. there is a reset by whitroth · · Score: 1

    For those who wondered, they do have a reset, and it works most of the time. There was a lot of reprogramming done on the Deep Space 1 mission, and a few times there was a bug that hadn't shown up in the sandbox duplicate that they have in a lab, but they sent a reset - I think once it took a couple of days to finally identify a star that would let it reorient DS1's dish to earth - but there are safe modes it can fall back to. I'm sure that the same's true of Curiosity, that unexpected situations come up (I mean, that's what exploring's all about), and you have to rethink how to do what you need, and we have to do it for Curiosity, given the state of our AIs....

    And yes, I do know what I'm talking about: I know Steve, the long-haired controller, personally, and a mailing list I'm on saw a lot of posts by him back then, and some for Curiosity.

                          mark

  56. wow... by nighthawk243 · · Score: 1

    LANDesk must lag like a bitch when trying to do updates from that far away.

  57. Re:I am willing to bet they are not updating firmw by bitingduck · · Score: 1

    You have to watch the news conferences on the web (10 am pacific every day)-- they have many of the real engineers and scientists answering questions in a pretty good Q&A with reporters. *way* better than your average press conference. What happens in an article is that you have a reporter with little technical background working from a press release or some short summary, and they they're trying to dumb fit it into a short article written at 5th grade level.

    Sending new software to missions after they leave earth is pretty standard, particularly for things with a long cruise phase. For MSL, they had EDL software with the control loops to get safely to the ground dominating things, and now are dumping the software that they don't need so they can use the space for code that will be useful on the ground. Something that's important to remember (and other posters have mentioned) is that you pay a lot for every gram of mass you send to another planet, so you can't go packing in a lot of extra stuff, and if you can dump something you don't need (like the landing software) to make room for something that's more useful (like driving around software) you do.

    Another thing to remember (that's also already been noted) is that missions like this have technology freeze *long* before launch, so that you can ensure that everything will play together and you can test everything really thoroughly (every time there's a change you go through a lot of retesting, and it involves hardware so it's more work than just typing "make test").

  58. Km, miles... by Roger+W+Moore · · Score: 1

    The spacecraft TRAVELLED 350 million miles to get there, but as of tonight, Mars is only about 157.5 million miles from Earth.

    Kilometres, miles they are all the same to NASA, especially when dealing with Mars.

  59. Upgrading Rover by Anonymous Coward · · Score: 0

    I thought the last guy to be working with the Mars Rover before contact was lost was Howard Wollowicz from "Big Bang Theory"?

  60. Stupid question, but... by Vrtigo1 · · Score: 1

    If you're building a spacecraft that's going to live 350 million miles away, wouldn't you have redundant EVERYTHING on it? I.E. the entire command and control system should be duplicated. That way you update the standby system and have some predefined self check the thing can do after the update's done, and if it doesn't pass the self check, then that system stays in standby mode so the operation of the system as a whole isn't impacted. You'd also probably have some sort of OOB access to the failed system via the primary system so you can go in and try to repair it.

  61. Bobak by Altanar · · Score: 1

    "Installing surface software on @MarsCuriosity takes longer than on my laptop, but doesn't remind me to restart every 15 min when done. #MSL" - Bobak F, via Twitter

  62. NASA's Spirit Rover: Resource Exhaustion Article by antdude · · Score: 1

    http://www.stickyminds.com/BetterSoftware/magazine.asp?fn=cifea&id=121 :

    "Cumulative Usage

    Resource Exhaustion
    The cumulative usage of software tends to create more and more intentionally stored data. If storage resources are not managed carefully, this stored data causes file systems to fill up o free memory to be depleted, a problem known as resource exhaustion.

    A dramatic example of resource exhaustion occurred on NASA's Spirit rover, which stopped communicating with Earth on January 21, 2004, after having landed on Mars just seventeen days earlier. Suspecting a problem with the flash memory, JPL engineers commanded the rover to boot up without reading the flash, and then deleted hundreds of unneeded files on the flash memory, which quickly addressed the problem. [11] The rover has now been running for more than five years, well surpassing its longevity design goal of ninety days of operation..."

    --
    Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
  63. Did you try it on the machine in the next room? by ebvwfbw · · Score: 1

    Any updates or changes should be tried on an exact duplicate here. Screw up and it's no big deal, it's here. Reminds me of a dumbass that wanted to update a machine across the US. I told him to do the one in the next room, configured exactly the same. But NOOOOO! We made him go out there and fix it. Closest airport was over 100 miles away.

  64. Remember this? by Anonymous Coward · · Score: 0

    Cause of failure

    On November 10, 1999, the Mars Climate Orbiter Mishap Investigation Board released a Phase I report, detailing the suspected issues encountered with the loss of the spacecraft. Previously, on September 8, 1999, Trajectory Correction Maneuver-4 was computed and then executed on September 15, 1999. It was intended to place the spacecraft at an optimal position for an orbital insertion maneuver that would bring the spacecraft around Mars at an altitude of 226 kilometers on September 23, 1999. However, during the week between TCM-4 and the orbital insertion maneuver, the navigation team indicated the altitude may be much lower than intended at 150 to 170 kilometers. Twenty-four hours prior to orbital insertion, calculations placed the orbiter at an altitude of 110 kilometers; 80 kilometers is the minimum altitude that Mars Climate Orbiter was thought to be capable of surviving during this maneuver. Final calculations placed the spacecraft in a trajectory that would have taken the orbiter within 57 kilometers of the surface where the spacecraft likely disintegrated because of atmospheric stresses. The primary cause of this discrepancy was engineering error. Specifically, the flight system software on the Mars Climate Orbiter was written to take thrust instructions using the metric unit newtons (N), while the software on the ground that generated those instructions used the Imperial measure pound-force (lbf). This error has since been known as the metric mixup and has been carefully avoided in all missions since by NASA

  65. Hey get the story right people by metaforest · · Score: 1

    The new software image is already on Curiosity's local 4GB flash file system. They just need to send the commands to reboot from the new image. According to the Chief Software Engineer during the press conference Fri. morning, they uploaded the R10 image back in June while still in cruise mode.

    It is likely all they need to do is change a few boot-loader parameters and reboot to the new image. If it doesn't work, it probably will safemode back to the previous image. They also have a completely independent backup computer that can probably unbrick its twin if something goes sideways.

    This kind of stuff is only dangerous when the goal is to prevent end-users from easily reflashing their mobile devices.

  66. For Christ's sake people... by gmyuriy · · Score: 1

    NASA had been doing "spectacular landings" and "terrifying software upgrades" their entire existence; not to detract from the awesomeness of it all, the recent spin-offs are just a publicity stunt! -- doesn't it strike you how all this got suddenly so-o-o-o-o-o terrifying and spectacular just about the time of NASA's budget cuts and the NASA's declaration of the fight for "hearts and minds" of its fellow american citizens? This is all fine and cool, of course, but /. should know better duh...

  67. Reminds me of a Laserjet 4250n by Anonymous Coward · · Score: 0

    On an LCD panel 550 million km away:

    UPGRADE FAILED
    RESEND FILE

  68. Cool Story Bro. :-) by jeko · · Score: 1

    Amazing how people think things are so much easier and simpler when they've never done them before, isn't it?

    --
    He put his boots up on the table and made a face. "The sig," he smirked. "You can waste your life in search of the sig."