Slashdot Mirror


Mars Rover Spirit Back Online

Skyshadow writes "Just in time for the arrival of its twin, the Spirit Mars Rover is back in working order. Programmers at the JPL have traced the problem to the rover's flash RAM, which it uses to maintain its filesystems. They are using a ramdisk in the rover's RAM to bypass the bad flash memory, and are working on a workaround for the bad flash. Good news, but the rover is still potentially weeks away from full operational status."

20 of 386 comments (clear)

  1. Not "online" at all... by |>>? · · Score: 0, Informative
    The rover status has been updated from critical to serious. Peter Theisinger stated:
    "We made good progress overnight and the rover has been upgraded from critical to serious. We have a working hypothesis we are pursuing that is consistent with many of the observables and consistent with operations that we performed on the vehicle last night. It involves the flash memory on the vehicle and the software used to communicate with that memory."

    You can read all about it at: Spaceflight Now - where you can continue to follow the status of both spirit and opportunity (which currently is hours away from landing).
    --
    |>>? ..EBCDIC for Onno..
  2. The Full Story by DrunkenTerror · · Score: 5, Informative

    Here is the link to the real story. The one given in the /. acticle is getting pushed down spaceflight's page.

  3. Re:Monday morning quarterback: RTOS tradeoffs by G4from128k · · Score: 3, Informative

    If I was sending an embedded control computer to another planet, I would have chosen an OS with memory protection, not VxWorks.

    Actually, they might have protected memory if they use VxWorks AE RTOS/Tornado Tools 3.0. Spirit uses VxWorks, but I don't know what version they used or when they had to commit to a particular version of VxWorks.

    Also, as the article mentions, memory protection adds overhead and can affect real-time performance. Hard real-time software cannot afford to have a complex layered structure and lots of conditional code that adds unpredictable delays. For that reason, many really real-time applications run very close to the hardware (for better or for worse.)

    --
    Two wrongs don't make a right, but three lefts do.
  4. Re:Huh? Flash? by DougHalfWay+AroundTh · · Score: 2, Informative

    They don't. See DoD Bids About 3/4 of the way down the page.

    Title: Rad Hard Flash Technology Abstract: The highest density radiation hardened non-volatile (NV) memory currently available is a 256 kbit EEPROM based on SONOS technology. One of the major limitations in developing rad hard NV memory has been the cost in bringing up the NV technology in a dedicated rad hard process facility, especially when weighed against the limited market size. One way to bring radiation hardening to an advanced electronic product on a cost-effective basis is to leverage the commercial product by applying the hardening to the commercial fab instead of bringing the commercial technology to the rad hard fab. NV flash memory technology is popular in the commercial marketplace, with densities up to 256 Mbit in production. Unfortunately, flash memory is not available, at any density, in total dose rad hard versions. And, most commercial flash memories are so soft that impractical amounts of shielding are required to survive even moderate radiation environments. This effort will be the first step in developing rad hard flash technology at a commercial fab. Rad hard flash technology will be a near-term solution to the problem of high density NV memory for space applications. It will enable the development of rad hard flash memories and embedded NV memory for rad hard ASICs.

    Flash...the weakest link...

  5. You think that's neat by chazR · · Score: 5, Informative

    Here's a rant by a JPL guy about appropriate technologies for software on deep space probes. He recounts one story of a failed probe "100 million dollars, and 100 million miles away".

    They fixed it. The fact there was a lisp REPL running on the spacecraft helped.

    That's cool:

    (unwind-protect
    (progn (do-science)(talk-to-earth))
    (wait-in-repl-for-earth))

  6. Re:Where is the redundancy? by fermion · · Score: 2, Informative
    increased number of components means increased complexity. increased complexity means increased cost to maintain reliability. Cost increase much more than linearly. For non-humna missions, extra components not justified.

    Using redundant low reliability components is the cheap office solution, not the space exploration solution.

    --
    "She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
  7. Re:Information on the MER hardware. by georgewilliamherbert · · Score: 2, Informative
    Ive been unable to find any hard information on the design of the MER memory systems. If anyone can point me to a technical brief id be very happy.

    RAD6000 6U Compact PCI page at BAE Systems.

    It's not great, but there are more detailed links around the BAE website.

    It doesn't list how the FLASH is connected; that's not a standard built-in on the RAD6000 computer. I would guess, hung off the FPGA interface device, but I don't know that for sure.

  8. Re:Checksums by Anonymous Coward · · Score: 5, Informative

    I'm watching NASA tv at the moment and they're explaining possibilities now. At the moment, they only have a very broad explanation of what's going wrong. However the newest knowledge is;

    There are two separate flash memories on Spirit. At the moment, part of the problem is software which can read part of the flash memories as some of the operational software which is kept in flash ram seems to be coming up before the system reboots.

    The system is rebooting no matter which flash memory is being accessed, it has the same bug both ways, so the flash ram itself looks to be OK, but the interface between the flash ram and the software looks to be causing resets.

    Even if there were more backup flashrams, it looks like they'd still have this problem. Perhaps many, all on different controllers, and even an entire backup computer would have prevented this. at 100watts total power available for the rover, an entire extra computer may be a bit much to have fit. But then sending two rovers would also negate problems, and thats just what they've done

    It seems most likely at the moment, according to NASA, that the family of components that are involved with the hardware addressing of the flash memories looks to be where the problem is.

  9. Radiation hardened Flash by andygrace · · Score: 5, Informative

    There is a big difference between standard flash and radiation hardened flash. In fact we are designing a project with one of these VME buss units as a storage array.

  10. Re:The epitome of remote administration by JumboMessiah · · Score: 2, Informative

    Put this in /etc/sysctl.conf

    kernel.panic = 120

    That will tell the kernel to reboot itself 2 minutes after a panic. It has saved me in the past before :).

  11. Nasa TV by Nucleon500 · · Score: 3, Informative

    If you don't get it on cable, you can watch NASA TV here.

  12. Re:The epitome of remote administration by Anonymous Coward · · Score: 1, Informative

    I'm not the grandparent poster, but RILOE uses a seperate admin system that's secured. We personally keep ours on a seperate lan that can only be reached by a vpn for even more security. It's a pretty neat system though.

  13. Re: Technically... by MachDelta · · Score: 2, Informative
    I hate to be a nitpick, but the exact quote (with context) is:
    Andropov: Excuse me, but I think I know how to fix this.
    Watts: Move it! You don't know the components!
    Andropov: [annoyed] Components. American components, Russian Components, ALL MADE IN TAIWAN!!!

    Oh, and he has another quote I liked too:
    Lev Andropov: This is how we fix things on Russian space station!
    [hits panel with tool]

    But maybe I just like it because thats how I tend to fix things too ;)
  14. Re:Cosmic rays... by mnmn · · Score: 2, Informative

    Rockets have blasted off into space since Sputnik1 and with all the communication satellites, we know alot about high-radiation electronics. We've had sun flares corrupting electronic equipment for decades and ASIC companies have entire lines of chips for high-radiation resistance, partly for military applications.

    So I think the rovers electronics are well protected from at least the Suns radiation. I think Mars is 1.3AUs from the Earth, making it 2.3AUs from the Sun, so it should receive less than a quarter of the radiation per square inch the earth gets, but I strongly feel I could be wrong there. Martian dust getting into the compartments IMHO can be a more likely reason.

    If electronics break on Mars, I'd put the highest chance on the initial impact on landing. Beside that its just sitting on barren land, under full solar radiation, exposed to some dust but in close-to-vacuum. Its a simpler environment we have to deal with compared to say sending a rover to a planet like Earth where it must be able to swim and walk through the forests.

    --
    "Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky
  15. Re:Software / Hardware Breakthrough? by Saeed+al-Sahaf · · Score: 3, Informative

    A lot of the comoponents in this craft came from my former employer, www.InterPoint.com, who laid off half their staff a few years ago (me was one of those). Little boxes the size of a pack of cards, hand built. Really amazing stuff.

    --
    "Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
  16. Re:God Speed Opportunity by Anonymous Coward · · Score: 2, Informative

    Well Done NASA.. bringing space to us is the next best thing to taking us to space.

    0508 GMT (12:08 a.m. EST)
    A good signal is still being received! Unlike the Spirit landing where signal was lost immediately after touchdown, Opportunity continues to talk to Earth.

    0506 GMT (12:06 a.m. EST)
    After a short loss of signal from the rover, a strong signal is now being received as Opportunity arrives on Mars!

    0505 GMT (12:05 a.m. EST)
    BOUNCING ON MARS! Mission Control has received a signal of Opportunity bouncing on the surface of Mars.

  17. It's simple actually by HarveyBirdman · · Score: 2, Informative
    I've actually consulted on this for another group inside my company. You don't wait for a cosmic ray to change a programming bit in an FPGA.

    You have two or more running in parallel. While one is running, the next reloads from ROM. When it's loaded and synchronized, you switch to it, and load the next one. You do that in series, over and over, so you're only using any particular FPGA for a couple of seconds at a time, and their configurations are constantly being refreshed. It's a very simple idea that can be done now.

    --
    --- Ban humanity.
  18. Re:We learn from our mistakes... by Detritus · · Score: 2, Informative

    The Space Shuttle does not use 486s. It uses IBM AP-101s, which are architecturally similar to the IBM 360/370 series of computers. See the Second Generation Computers FAQ.

    --
    Mea navis aericumbens anguillis abundat
  19. Re:Monday morning quarterback by morgue-ann · · Score: 2, Informative

    I will never understand why Linux and NetBSD are currently looked down upon in the embedded corporations currently

    Because they're fucking HUGE.

    The uCLinux kernel for 68k which is more compact than SPARClite, but maybe less so than x86, is 512K.

    That's a stripped-down kernel with no MMU support and the special uClib C standard library designed to take less space.

    I'm working on a digital camera with 512K of flash and 8MB of SDRAM. That flash is divided into 7 64K sectors and 8+16+16+32K little sectors. We use the upper 64K for sensor calibration data and the lower 64K for a boot block that can be locked so you can recover a camera with a bad firmware load.

    That leaves 384K for everything else. Our kernel is Precise/MQX from ARC International and it's 30K !.

    Oh, and the RAM is needed for image processing and buffering movie frames on their way out to NAND flash, so your piggy kernels can't have it.

    While I'd like some things from uCLinux and busybox and netBSD, I have to be very selective. I'm presently porting elf2flt to the Metaware tools for ARC so we can dynamically load code resources. We'll also get a real log facility and monitor soon and maybe someday the Almquist shell.

    At least MQX and the Metaware tools are reasonably cheap and we get kernel and library sources (and ARC CPU hackable RTL instead of a giant impenetrable lump like ARM). I've heard nothing but irritation with WindRiver's high pricing and closed-IP attitude.

  20. Re:Monday morning quarterback: RTOS tradeoffs by AaronW · · Score: 2, Informative

    I can tell you that AE is in many ways WORSE than the standard VxWorks. It has a lot more bugs and is quite a bit slower. Think of regular VxWorks with memory protection hacked in, not designed in from the ground up.

    As a VxWorks programmer for the last 5 years, I can honestly say VxWorks is a PoS that is losing market share at a tremendous rate to the likes of embedded Linux and QNX. Wind River decided to spend tons of money buying add-in products like Routerware instead of improving their RTOS. It was a huge waste of money and now they're paying for it. They're losing money hand over fist and have had a lot of layoffs lately. They were good at one time, but they have fallen far behind the curve now in embedded RTOS design, especially for complex systems.

    VxWorks comes with support for a FAT flash file system, a completely broken malloc implementation, an ancient BSD TCP/IP stack, poor RT support, no memory protection, and no way to clean up after a task that dies. Not only that, it usually costs a fortune, but I've heard they're willing to sell it very cheap now because they're desparate.

    I looked into embedded Linux for our next generation hardware and software and Timesys appears to have a very nice solution with hard real-time support. The kernel is fully preemptable using semaphores instead of spinlocks and has priority inversion support. They also offer resource reservations, so I can say "I want this task to be guaranteed 5.73ms of execution time every 9.8ms" where after 5.73ms the task either gives up the CPU entirely, or else changes to a non-RT priority to not starve other tasks. It's really quite clever. Not only that, unlike RT-Linux there isn't a separate API for RT vs non-RT tasks. Monta Vista Linux is soft real-time. It cannot guarantee context switching time, nor does it deal with priority inversion. In RT, priority inversion can be a major problem (see the first Mars rover for an example).

    For an example of priority inversion say you have 3 tasks, a low priority, medium priority, and a high priority. The low priority task acquires a mutex semaphore to protect a critical section and starts processing. It is interrupted by a medium priority task. Meanwhile, a high-priority task unblocks and attempts to grab the mutex. The high priority task will block until the medium priority task blocks so that the low priority task can release the semaphore. A common solution is priority inheritance. With priority inheritance, as soon as the high priority task attempts to acquire the mutex semaphore, the low priority task has its priority bumped to that of the high priority task until it releases the semaphore. In this way, the low priority task will interrupt the medium priority task so that the high priority task won't have to wait as long.

    QNX is also a very good alternative. Very fast context switching and extremely robust memory protection. I think with QNX you can even buy a license suitable for use in medical devices (i.e. you absolutely cannot afford to have the OS crash for any reason).

    I've heard rumours that Wind River is dropping AE since nobody is using it. After our experience, I pity whoever tries it.

    Also, unless you get the source to VxWorks, which usually costs a lot of $$$, debugging is a complete nightmare, especially when you hit a bug in the Wind River code (and there's a lot of them). Hell, they couldn't even implement malloc right!

    Wind River is coming out with version 6 of VxWorks, but it is basically an enhanced version of AE. I'm not holding my breath.

    -Aaron

    --
    This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.