Slashdot Mirror


Mars Rover Spirit Back Online

Skyshadow writes "Just in time for the arrival of its twin, the Spirit Mars Rover is back in working order. Programmers at the JPL have traced the problem to the rover's flash RAM, which it uses to maintain its filesystems. They are using a ramdisk in the rover's RAM to bypass the bad flash memory, and are working on a workaround for the bad flash. Good news, but the rover is still potentially weeks away from full operational status."

36 of 386 comments (clear)

  1. They found the problem by Anonymous Coward · · Score: 5, Funny

    They signed up for Mars Online with 3000 free hours. What they didn't realize was that the free 3000 hours only applied to the first month of service. Once they paid their MOL bill, they got hooked back up. All the probes friends on Mars use MOL!

  2. Weeks away? by adrianbaugh · · Score: 5, Funny

    They should boot faster, using linux. Then they'd only be ten seconds away :-)

    --
    "'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
    - JRR Tolkien.
  3. You mite listen to Jimmy, But you can't hear Jimmy by niko9 · · Score: 5, Funny

    /riff/Move over Rover, let the ramdisk take over!/riff/

    Wonder wehre they got they flash ram from?

    --

  4. Warranty by DarkHelmet · · Score: 5, Funny
    They are using a ramdisk in the rover's RAM to bypass the bad flash memory, and are working on a workaround for the bad flash.

    I think they should return the bad flash part to where they got it and exchange it for a new part... although getting the memory back to the store by the 30 day warranty might be a little difficult.

    I hope they bought the extended warranty.

    --
    /^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$/i
    1. Re:Warranty by Albinoman · · Score: 5, Funny

      The real question is: Can they get their flash RAM supplier to pay for shipping?

    2. Re:Warranty by questamor · · Score: 4, Insightful

      Curiously, is there any difference with flashram on Spirit, and the stuff we have here? I didn't know about any radiation hardened flash ram... or even if there's any difference between the physical chips themselves in CF, SD, MemorySticks etc.

      The nasa report mentioned the problem seems to be revolving around the software that accesses the flashram. It could be filesystem corruption, or a physical problem with the flash ram itself, or even a broken interface to the flash ram. It's about the equivalent of having a machine a thousand miles away and just seeing that a certain drive won't mount, at the moment. Finding out whether there's a problem with the SCSI card it's connected to, or the drive itself, or a filesystem corruption, or a head crash... that comes in the next few weeks

  5. heh... /. was right! by Smitty825 · · Score: 5, Interesting

    During all of the "Spirit is broken" columns, I kept reading /. comments saying that it was likely a memory error due to the non-consistent errors...I guess a million monkeys with a typewriter can be correct :-)

    --

    Doh!
    1. Re:heh... /. was right! by AndroidCat · · Score: 4, Funny

      I thought I got it rather spot on. :^P (I guess that makes me the millionth monkey?)

      --
      One line blog. I hear that they're called Twitters now.
    2. Re:heh... /. was right! by cubicledrone · · Score: 4, Insightful

      Amazing, isn't it? Writing comments correctly debugging an $800 million spacecraft on another planet without even looking at it, and most programmers still can't rent a fuckin' job.

      Now let's all sing the company song...

      --
      Business isn't willing to pay for products, innovation and careers, so we get brands, mortgage commercials and layoffs.
    3. Re:heh... /. was right! by More+Trouble · · Score: 4, Funny

      Now let's all sing the company song...

      "Oh, say can you see..."

      :w

    4. Re:heh... /. was right! by argStyopa · · Score: 4, Funny

      How hard is that, really?

      Thousands of /. posters solve all the world's problems in a few snide lines of comment, despite rarely leaving their little veal-fattening pens or even RTFA. Fixing a software glitch a few million miles away is child's play in THIS neighborhood, my friend.

      --
      -Styopa
  6. The epitome of remote administration by Faust7 · · Score: 4, Interesting

    Engineers guessed that Spirit's troubles were in its Flash memory and set about sending the rover a complex series of instructions to see if they could get it to bypass the corrupted memory. Theisinger said engineers sent Spirit a command just before its daily "waking up," telling it to shut down and restart in what is known as "cripple mode," using RAM instead of Flash for its start-up instructions.

    Some people may take this sort of thing for granted, but I for one find it remarkable that we can essentially reboot and perhaps even fix a system that is on a whole other planet.

    Just wait until we have Interplanetary, Interstellar, Intergalactic Remote Desktop. I'm only half-joking.

    1. Re:The epitome of remote administration by Daychilde · · Score: 5, Funny

      It's all good until tech support says, "So... Do you have a boot disk?" :-)

      --
      A cheerful little bird is sitting here singing.
    2. Re:The epitome of remote administration by blincoln · · Score: 4, Interesting

      It's all good until tech support says, "So... Do you have a boot disk?" :-)

      You joke, but newer servers can do this remotely too.

      We have a bunch of Compaq servers at work, and one of the really cool features of the remote administration software is that you can send a virtual floppy image to the machine from anywhere in the world that can open a web browser connection to the server's remote administration board.

      A few months ago one of our servers in Denver died, and I had to boot it up in Windows 2000's command prompt only safe mode... but the local admin password had never been written down. I was able to make virtual floppy images of a tool that resets the local admin password, send them over the wire, and boot off of them from the remote administration system.

      Okay, it's not fixing a super-expensive robot on another planet, but I thought it was pretty cool.

      --
      "...always new atoms but always doing the same dance, remembering what the dance was yesterday." -Richard Feynman
  7. So basically... by cperciva · · Score: 4, Funny

    If I understand this properly, they've got a damaged filesystem on the flash RAM. Not really a big problem, you just have to send someone over to the console to boot it up in single-user mode and run fsck. ... oh yeah, sending someone over to the console is a little bit difficult here. :)

  8. 2 years ago, back at NASA R&D... by Dark+Lord+Seth · · Score: 5, Funny

    Engineer 1: Ho-hum.. Little bit of ... whatever it is, 'ere... Hand me that thingamajig, will you?
    Engineer 2: Yah, sure... Hey, remember that employee last month who got laid of within a week?
    Engineer 1: Who? Vincent?
    Engineer 2: Yeah, Vinnie... With the Italian accent?
    Engineer 1: Yeah, him. What about the guy?
    Engineer 2: Well, he has this offer on cheap RAM we just CAN'T resist!
    Engineer 1: Really now? But-
    Engineer 2: Look, our budget is already comparable to social welfare. We need to save some loot.
    Engineer 1: Fair enough, buy the crap and hand me the other twisty-turny thingy over there? I need to screw on this name tag reading... "Spirit"?
    Engineer 2: Look, it's either that or my wife's name.

  9. Monday morning quarterback by GGardner · · Score: 5, Insightful

    If I was sending an embedded control computer to another planet, I would have chosen an OS with memory protection, not VxWorks. VxWorks is like DOS, and early versions of Windows, where one pointer problem in one task can corrupt the whole system. Sure, we don't know that's the problem now, but it would be nice to know for sure that it wasn't.

  10. Static Discharge? by seven+of+five · · Score: 5, Interesting

    Is there a chance that the problem could've been caused by electrostatic discharge? Rover bounces on rubber airbags on sand, bags fold up, Rover rolls off, Rover touches rock - zap!??

  11. Cosmic rays... by bc90021 · · Score: 4, Interesting

    ...will apparently cause one out of every trillion bits on Earth to flip randomly... I guess with less of an atmosphere, it is a bigger problem on Mars! ;)

    1. Re:Cosmic rays... by shadowmatter · · Score: 5, Interesting

      Funny you mention that. I'm taking a class on design of digital systems at my university, and my professor works for JPL. He helps design the control systems onboard space vehicles such as the Mars rover. Anyway, a majority of the class grade is based on an end-of-the-quarter project, which we complete in groups of 2 to 4. On Wednesday he expressed interest in a group developing some sort of redundancy for FPGAs that would be suitable in spacecraft. You see, on Mars, you're not shielded from huge doses of radiation as you are on earth. A healthy dose of radiation bombardment could easily reprogram an FPGA chip on the surface of Mars; ASICs chips are used to overcome this problem.

      Maybe he was gung-ho about anti-radiation redundancy because he already knew the likely problem of the Spirit. Who knows?

      - sm

  12. Software / Hardware Breakthrough? by Saeed+al-Sahaf · · Score: 4, Insightful
    This is remarkable, and a testament to good software / hardware integration. It is true that I think this money could have been better spent elsewhere in terms of our understanding of the universe, but still, these types of projects and the hardships that come with them teach miles of experience in remote software / hardware problems.

    I do seriously wonder if these types of projects will tell us anything more than esoteric wonders of Mars, but from a strictly engineering standpoint, perhaps it's worth it after all.

    --
    "Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
  13. The Full Story by DrunkenTerror · · Score: 5, Informative

    Here is the link to the real story. The one given in the /. acticle is getting pushed down spaceflight's page.

  14. Nice by Omega1045 · · Score: 4, Interesting

    I have a friend who works in the field. Space travel hoses electronics bad. Triple redundancy and over-engineering is the name of the game. This is nice to hear. I would imagine that something went wrong intransit or on-landing, but they can keep going,

    --

    Great ideas often receive violent opposition from mediocre minds. - Albert Einstein

  15. Steal SOME by MajorDick · · Score: 5, Funny

    I mean like beagle isnt using its flashram anymore, just go and jack some off it. While your at it TAG the Beagle with some PRO-US graffiti :) hell maybe its got nicer rims too

    Seriously, can you imagine the first manned expiditon seeing the Beagle Jacked up, tagged , up on little martian cinderblocks, All that and we already got a head start on building martian cities

  16. Information on the MER hardware. by elrond1999 · · Score: 5, Interesting

    Ive been unable to find any hard information on the design of the MER memory systems. If anyone can point me to a technical brief id be very happy.

    From what ive pieced together the MER system is something like this:

    One RAD6000 powerpc cpu.
    Connected via probably compact pci to 128 mb of ecc sdram.
    256 mb of flash. No info on what make of flash, but likely Intel since they are the biggest. There was some info from the press conference that there are actually two flash chips and that the flight software is redundantly stored on each. So does this mean that there is actually 128mb of redundant flash? Also it was said that they had problems even with the redundancy, could they possibly have overwritten something? We all know that even a redundant raid does not stop filesystem corruption.

    No information on how the flash is connected, parallell / serial? How the redundancy works?

    Btw, I guess flash is rather radiation hard since they require 10 - 20V to erase / write.

  17. Re:Where is the redundancy? by cperciva · · Score: 4, Insightful

    It's not just a $5 flash ROM. If they wanted control redundancy, they would need extra flash RAM, RAM, ROM, CPU, motherboard, arbitration hardware, and arbitration software.

    Also keep in mind that this isn't a $5 flash ROM chip. When you consider the hostile environment, the testing, the power, and the fuel required to get everything to Mars, that flash ROM probably cost at least fifty thousand dollars.

  18. Salute the Helpdesk by Papa+Legba · · Score: 5, Funny

    I have had some tough calls in my time but I have never had to walk a robot 283 million miles away through brain surgery. Man I am glad I did not get that call. This is going to blow there call averages all to hell. I raise a cup of Joe to you, Rover Help Desk man.

    --
    Papa Legba come and open the gate
  19. last photo from Spirit by djupedal · · Score: 5, Funny

    This is the last image received prior to the recent issues with Spirit...

  20. Re:Monday morning quarterback: RTOS tradeoffs by GGardner · · Score: 4, Insightful
    memory protection adds overhead and can affect real-time performance

    This is the conventional wisdom, and in my experience, this particular nugget causes more embedded and real time software projects to fail than any other.

    First off, on a modern PowerPC processor, memory protection (that is, without virtual memory support) can be implemented very cheaply. If you can do it just with the IBAT/DBAT registers, it should be a constant-time overhead, which is good enough for hard-real time. Oddly enough, I can't find a single reference on the net that measures the cost of memory protection alone on a modern CPU. Anyone? Anyone?

    Secondly, though the rover certainly may have some software components that have hard-real time requirements, that doesn't mean that every single line of code does. Typically, less than 1 percent of the code in a real time system is hard real time. In that case, you can run the real-time code in ISRs, or perhaps in a dual-mode system, like RT-Linux, or in high-priority kernel threads (as with QNX). In any of these situations, you can run all the rest of the code in protected memory space.

  21. Re:Somebody here on Slashdot nailed it... by prockcore · · Score: 5, Funny

    I remember in the last thread about the rover, someone opined that it was bad memory, then proceeded to give a half dozen reasons why. Totally nailed it.

    Yeah, in the future NASA should just submit an Ask Slashdot whenever something goes wrong..

  22. You think that's neat by chazR · · Score: 5, Informative

    Here's a rant by a JPL guy about appropriate technologies for software on deep space probes. He recounts one story of a failed probe "100 million dollars, and 100 million miles away".

    They fixed it. The fact there was a lisp REPL running on the spacecraft helped.

    That's cool:

    (unwind-protect
    (progn (do-science)(talk-to-earth))
    (wait-in-repl-for-earth))

  23. that line from armageddon comes to mind... by MoFoQ · · Score: 5, Funny

    where the russian cosmonaut says "American components, Russian components. They're all made in Taiwan!"

  24. Re:Checksums by Anonymous Coward · · Score: 5, Informative

    I'm watching NASA tv at the moment and they're explaining possibilities now. At the moment, they only have a very broad explanation of what's going wrong. However the newest knowledge is;

    There are two separate flash memories on Spirit. At the moment, part of the problem is software which can read part of the flash memories as some of the operational software which is kept in flash ram seems to be coming up before the system reboots.

    The system is rebooting no matter which flash memory is being accessed, it has the same bug both ways, so the flash ram itself looks to be OK, but the interface between the flash ram and the software looks to be causing resets.

    Even if there were more backup flashrams, it looks like they'd still have this problem. Perhaps many, all on different controllers, and even an entire backup computer would have prevented this. at 100watts total power available for the rover, an entire extra computer may be a bit much to have fit. But then sending two rovers would also negate problems, and thats just what they've done

    It seems most likely at the moment, according to NASA, that the family of components that are involved with the hardware addressing of the flash memories looks to be where the problem is.

  25. As someone else said by Viadd · · Score: 5, Funny

    The Spirit is willing, but the flash is weak.
    (Posted by Jane Slee and John Stracke in separate usenet postings.)

  26. Radiation hardened Flash by andygrace · · Score: 5, Informative

    There is a big difference between standard flash and radiation hardened flash. In fact we are designing a project with one of these VME buss units as a storage array.

  27. We learn from our mistakes... by Chordonblue · · Score: 4, Interesting

    So... I wonder if they'll consider validating MRAM more quickly if Flash is found to be more error prone.

    You know how NASA works. The Space Shuttle running on 486's and whatnot. I understand the science behind that reasoning, as sad as a 66 MHz processor seems to us geeks nowadays, but I wonder if MRAM will prove more flexible and stable for future space missions.

    --
    "...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."