Slashdot Mirror


Mars Rover Spirit Back Online

Skyshadow writes "Just in time for the arrival of its twin, the Spirit Mars Rover is back in working order. Programmers at the JPL have traced the problem to the rover's flash RAM, which it uses to maintain its filesystems. They are using a ramdisk in the rover's RAM to bypass the bad flash memory, and are working on a workaround for the bad flash. Good news, but the rover is still potentially weeks away from full operational status."

87 of 386 comments (clear)

  1. They found the problem by Anonymous Coward · · Score: 5, Funny

    They signed up for Mars Online with 3000 free hours. What they didn't realize was that the free 3000 hours only applied to the first month of service. Once they paid their MOL bill, they got hooked back up. All the probes friends on Mars use MOL!

  2. Weeks away? by adrianbaugh · · Score: 5, Funny

    They should boot faster, using linux. Then they'd only be ten seconds away :-)

    --
    "'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
    - JRR Tolkien.
  3. You mite listen to Jimmy, But you can't hear Jimmy by niko9 · · Score: 5, Funny

    /riff/Move over Rover, let the ramdisk take over!/riff/

    Wonder wehre they got they flash ram from?

    --

  4. Warranty by DarkHelmet · · Score: 5, Funny
    They are using a ramdisk in the rover's RAM to bypass the bad flash memory, and are working on a workaround for the bad flash.

    I think they should return the bad flash part to where they got it and exchange it for a new part... although getting the memory back to the store by the 30 day warranty might be a little difficult.

    I hope they bought the extended warranty.

    --
    /^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$/i
    1. Re:Warranty by Albinoman · · Score: 5, Funny

      The real question is: Can they get their flash RAM supplier to pay for shipping?

    2. Re:Warranty by evilnissan · · Score: 3, Funny

      Now Nasa has just to wait for Tigerdirect.com to send a replacement, or get store credit..

      --
      This Sig for rent.
    3. Re:Warranty by questamor · · Score: 4, Insightful

      Curiously, is there any difference with flashram on Spirit, and the stuff we have here? I didn't know about any radiation hardened flash ram... or even if there's any difference between the physical chips themselves in CF, SD, MemorySticks etc.

      The nasa report mentioned the problem seems to be revolving around the software that accesses the flashram. It could be filesystem corruption, or a physical problem with the flash ram itself, or even a broken interface to the flash ram. It's about the equivalent of having a machine a thousand miles away and just seeing that a certain drive won't mount, at the moment. Finding out whether there's a problem with the SCSI card it's connected to, or the drive itself, or a filesystem corruption, or a head crash... that comes in the next few weeks

  5. Maybe just maybe... by TheMadPenguin · · Score: 2, Funny

    it was their AOL bill that wasn't paid? hmmmm...

    --
    Linux with kernel panic...
    MadPenguin.org
  6. heh... /. was right! by Smitty825 · · Score: 5, Interesting

    During all of the "Spirit is broken" columns, I kept reading /. comments saying that it was likely a memory error due to the non-consistent errors...I guess a million monkeys with a typewriter can be correct :-)

    --

    Doh!
    1. Re:heh... /. was right! by AndroidCat · · Score: 4, Funny

      I thought I got it rather spot on. :^P (I guess that makes me the millionth monkey?)

      --
      One line blog. I hear that they're called Twitters now.
    2. Re:heh... /. was right! by cubicledrone · · Score: 4, Insightful

      Amazing, isn't it? Writing comments correctly debugging an $800 million spacecraft on another planet without even looking at it, and most programmers still can't rent a fuckin' job.

      Now let's all sing the company song...

      --
      Business isn't willing to pay for products, innovation and careers, so we get brands, mortgage commercials and layoffs.
    3. Re:heh... /. was right! by anubi · · Score: 2, Interesting
      Yeh.. hats off to you guys!

      I was barking up the tree of spike on the power supply. For the exact same reasons.

      I have had power supplies or bypassing go bad due to the increasing ESR of aging capacitors, and by golly they come up with the damndest intermittent failures you would ever want to see. They will have you debugging every process in the system until you put a storage oscilloscope on the power supply line and watch it like a hawk.

      --
      "Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]

    4. Re:heh... /. was right! by More+Trouble · · Score: 4, Funny

      Now let's all sing the company song...

      "Oh, say can you see..."

      :w

    5. Re:heh... /. was right! by argStyopa · · Score: 4, Funny

      How hard is that, really?

      Thousands of /. posters solve all the world's problems in a few snide lines of comment, despite rarely leaving their little veal-fattening pens or even RTFA. Fixing a software glitch a few million miles away is child's play in THIS neighborhood, my friend.

      --
      -Styopa
    6. Re:heh... /. was right! by HermanAB · · Score: 2, Insightful

      Company song? "You load 16 tons, what do you get? A little bit older and deeper in debt..."?

      --
      Oh well, what the hell...
  7. The epitome of remote administration by Faust7 · · Score: 4, Interesting

    Engineers guessed that Spirit's troubles were in its Flash memory and set about sending the rover a complex series of instructions to see if they could get it to bypass the corrupted memory. Theisinger said engineers sent Spirit a command just before its daily "waking up," telling it to shut down and restart in what is known as "cripple mode," using RAM instead of Flash for its start-up instructions.

    Some people may take this sort of thing for granted, but I for one find it remarkable that we can essentially reboot and perhaps even fix a system that is on a whole other planet.

    Just wait until we have Interplanetary, Interstellar, Intergalactic Remote Desktop. I'm only half-joking.

    1. Re:The epitome of remote administration by Daychilde · · Score: 5, Funny

      It's all good until tech support says, "So... Do you have a boot disk?" :-)

      --
      A cheerful little bird is sitting here singing.
    2. Re:The epitome of remote administration by blincoln · · Score: 4, Interesting

      It's all good until tech support says, "So... Do you have a boot disk?" :-)

      You joke, but newer servers can do this remotely too.

      We have a bunch of Compaq servers at work, and one of the really cool features of the remote administration software is that you can send a virtual floppy image to the machine from anywhere in the world that can open a web browser connection to the server's remote administration board.

      A few months ago one of our servers in Denver died, and I had to boot it up in Windows 2000's command prompt only safe mode... but the local admin password had never been written down. I was able to make virtual floppy images of a tool that resets the local admin password, send them over the wire, and boot off of them from the remote administration system.

      Okay, it's not fixing a super-expensive robot on another planet, but I thought it was pretty cool.

      --
      "...always new atoms but always doing the same dance, remembering what the dance was yesterday." -Richard Feynman
    3. Re:The epitome of remote administration by JumboMessiah · · Score: 2, Informative

      Put this in /etc/sysctl.conf

      kernel.panic = 120

      That will tell the kernel to reboot itself 2 minutes after a panic. It has saved me in the past before :).

    4. Re:The epitome of remote administration by Daychilde · · Score: 2, Funny

      Of course I joke... heh. Mostly because of my past jobs working tech support, and the guy sitting next to me one time who tried to get a customer to type "a colon space setup dot exe" for about 5 minutes until I was off my call, heard what he was doing, and slapped him silly.

      Well, okay, I didn't slap him, but I wanted to. Badly. :-)

      But on your response -- that works. I mean, if you're doing something that you could just about do on another planet, it should count. Maybe not so glorious, but still. :-)

      --
      A cheerful little bird is sitting here singing.
    5. Re:The epitome of remote administration by Psychotext · · Score: 3, Interesting

      Oh god... I really, really hope you have a superb firewall & username & password blocking that machine off from the world. I did just read you right, you didn't have the admin password, so you you used a tool over the remote administration to hack past it?

      Mmmm... hackalicious. :-)

      (I've actually used a similar remote kvm system with lights out boards but until you write it down it just doesn't sound that risky!)

      --
      People that believe in their opinions don't post AC.
    6. Re:The epitome of remote administration by Catbeller · · Score: 2, Funny

      I can only smile as I recall the heady days of the PC revolution. All the ancient Big Iron sealed away in a hermetically sealed room, and all the expensive and unapproachable priesthood that tended and worshipped the Iron, would be sent packing. The whole shebang replaced by inexpensive PC's controlled by the user, O glorious day! All the expense and complexity, gone!

      Snicker. Meet the new iron, same as the old iron.

    7. Re:The epitome of remote administration by IgnoramusMaximus · · Score: 2, Insightful
      All the expense and complexity, gone!

      I assume you are being really, really sarcastic. In reality the PCs multiplied complexity and expense by orders of magnitude. And only now after decades of chaos and misery all the turd-brained suckers/managers who were responsible for this snake-oil sales bonanza which created the likes of Microsoft are now retreating to the only sane method of enterprise computing: centralized storage and processing. After billions of dollars wasted and con-men very rich by now, the circle is now complete. Of course the managerdiots are now seeing this old idea as "new" after someone smart re-labaled those old concepts with new sales tags like "thin client" or "data warehousing" etc. They would never have allowed someone to hint that they have been taken for a ride and the "priesthood", unlike them, actually had a clue. It is a sad testament to the depths of human stupidity.

      PCs are greatest thing ever for game players, home computer users and many other applications like engineering or science. They make no sense on adminstrative workers' desks, yet those presently constitute something like 80% of business PC deployment and come with hordes of MSCEs without whom they would come to a grinding halt within days.

  8. So basically... by cperciva · · Score: 4, Funny

    If I understand this properly, they've got a damaged filesystem on the flash RAM. Not really a big problem, you just have to send someone over to the console to boot it up in single-user mode and run fsck. ... oh yeah, sending someone over to the console is a little bit difficult here. :)

  9. Where is the redundancy? by MWChapel · · Score: 3, Interesting

    Shouldn't they have like 5 Flash RAM's? Really,they shouldn't have one of anything. In my computer if my BIOS fries, I pop open the box and replace it. If it fries on mars, obviously I kiss my megamillion dollar project goodbye, all for a $5 Flash ROM.

    1. Re:Where is the redundancy? by cperciva · · Score: 4, Insightful

      It's not just a $5 flash ROM. If they wanted control redundancy, they would need extra flash RAM, RAM, ROM, CPU, motherboard, arbitration hardware, and arbitration software.

      Also keep in mind that this isn't a $5 flash ROM chip. When you consider the hostile environment, the testing, the power, and the fuel required to get everything to Mars, that flash ROM probably cost at least fifty thousand dollars.

    2. Re:Where is the redundancy? by fermion · · Score: 2, Informative
      increased number of components means increased complexity. increased complexity means increased cost to maintain reliability. Cost increase much more than linearly. For non-humna missions, extra components not justified.

      Using redundant low reliability components is the cheap office solution, not the space exploration solution.

      --
      "She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
    3. Re:Where is the redundancy? by DuranDuran · · Score: 2, Funny

      > but really should have triple redundant systems.

      I laugh at your puny triple redundant systems!! They should have QUADRUPLE redundant systems!!!

      --
      "You can justify anything by putting it in quotes, adding a famous name and making it a sig" - Albert Einstein
    4. Re:Where is the redundancy? by Hawkxor · · Score: 2, Interesting

      First of all: obviously they've thought of that. Adding to what others have said in their responses tearing the parent apart, I'd like to mention that the problem is probably not due to a general defect in the RAM card. It probably has to do with the conditions on Mars, the landing, etc - in which case the same problem would be affecting all of the (even redundant) Flash RAM cards: so it really is amazing that they got this working at all.

    5. Re:Where is the redundancy? by grozzie2 · · Score: 2, Insightful
      Really,they shouldn't have one of anything

      I think you folks all missed the point completely. They have full dual redundancy on EVERYTHING in the MER program. Not only are the computer systems somewhat an issue, there's little issues like landing in one piece, etc etc. to that end, they built 2 full systems, packaged them on 2 different rockets, and fired them off a month apart from each other. this gave full dual redundancy to every system and every component, from the initial launch igniters, to every bit of hardware that landed on the surface. Then to maximize the redundancy, they set them to land on different halves of the planet, serious physical isolation of component set A and component set B.

      If one complete set of hardware arrives on the surface, and returns scientific data, the mission is considered a success. the real issue and difference of this program is, they went with dual redundancy in everything, from launchers to arrival, not just separate systems mounted on the same physical hardware host.

      This type of full redundancy does make a lot of sense when you consider, the highest risk portion of the mission is the entry and landing phase, followed closely by the launch phase. Dual redundant systems mounted on the same rover platform may well give for better chances of success whilst on the surface portion of the mission, but leaves a huge single point of failure during launch, and another one during entry and landing.

      Take a peek over at nasa tv, and you well see what real mission redundancy is all about, second lander about to enter martian atmosphere.

  10. 2 years ago, back at NASA R&D... by Dark+Lord+Seth · · Score: 5, Funny

    Engineer 1: Ho-hum.. Little bit of ... whatever it is, 'ere... Hand me that thingamajig, will you?
    Engineer 2: Yah, sure... Hey, remember that employee last month who got laid of within a week?
    Engineer 1: Who? Vincent?
    Engineer 2: Yeah, Vinnie... With the Italian accent?
    Engineer 1: Yeah, him. What about the guy?
    Engineer 2: Well, he has this offer on cheap RAM we just CAN'T resist!
    Engineer 1: Really now? But-
    Engineer 2: Look, our budget is already comparable to social welfare. We need to save some loot.
    Engineer 1: Fair enough, buy the crap and hand me the other twisty-turny thingy over there? I need to screw on this name tag reading... "Spirit"?
    Engineer 2: Look, it's either that or my wife's name.

    1. Re:2 years ago, back at NASA R&D... by CrazyJoel · · Score: 2, Insightful

      isn't the budget for social welfare something like 300 billion dollars?

      If only we spent so much on NASA. They only get 12 billion.

      --

      Such is the infinite Grace of Popeye.
  11. Monday morning quarterback by GGardner · · Score: 5, Insightful

    If I was sending an embedded control computer to another planet, I would have chosen an OS with memory protection, not VxWorks. VxWorks is like DOS, and early versions of Windows, where one pointer problem in one task can corrupt the whole system. Sure, we don't know that's the problem now, but it would be nice to know for sure that it wasn't.

    1. Re:Monday morning quarterback by morgue-ann · · Score: 2, Informative

      I will never understand why Linux and NetBSD are currently looked down upon in the embedded corporations currently

      Because they're fucking HUGE.

      The uCLinux kernel for 68k which is more compact than SPARClite, but maybe less so than x86, is 512K.

      That's a stripped-down kernel with no MMU support and the special uClib C standard library designed to take less space.

      I'm working on a digital camera with 512K of flash and 8MB of SDRAM. That flash is divided into 7 64K sectors and 8+16+16+32K little sectors. We use the upper 64K for sensor calibration data and the lower 64K for a boot block that can be locked so you can recover a camera with a bad firmware load.

      That leaves 384K for everything else. Our kernel is Precise/MQX from ARC International and it's 30K !.

      Oh, and the RAM is needed for image processing and buffering movie frames on their way out to NAND flash, so your piggy kernels can't have it.

      While I'd like some things from uCLinux and busybox and netBSD, I have to be very selective. I'm presently porting elf2flt to the Metaware tools for ARC so we can dynamically load code resources. We'll also get a real log facility and monitor soon and maybe someday the Almquist shell.

      At least MQX and the Metaware tools are reasonably cheap and we get kernel and library sources (and ARC CPU hackable RTL instead of a giant impenetrable lump like ARM). I've heard nothing but irritation with WindRiver's high pricing and closed-IP attitude.

  12. Static Discharge? by seven+of+five · · Score: 5, Interesting

    Is there a chance that the problem could've been caused by electrostatic discharge? Rover bounces on rubber airbags on sand, bags fold up, Rover rolls off, Rover touches rock - zap!??

    1. Re:Static Discharge? by juglugs · · Score: 2, Insightful

      Doubt it

      I'd hope that the RAM is in a shielded box given the amount of radiation it's getting from the sun and the rest of space.

      Could be Soft Errors caused by Alpha particles though - depends on the technology used in the flash - unlikely, but possible...

      --
      This sig is in Spanish when you're not looking....
    2. Re:Static Discharge? by myowntrueself · · Score: 2, Interesting

      I had been wondering.

      The sequence of events that lead up to this was, IIRC,

      1. Rover extends arm ready to take a grinder to a rock.

      2. Contact with Rover lost due to bad weather in Australia.

      3. Rover bad.

      So it had just moved part of its structure closer to the rock just before this happened.

      --
      In the free world the media isn't government run; the government is media run.
  13. Cosmic rays... by bc90021 · · Score: 4, Interesting

    ...will apparently cause one out of every trillion bits on Earth to flip randomly... I guess with less of an atmosphere, it is a bigger problem on Mars! ;)

    1. Re:Cosmic rays... by shadowmatter · · Score: 5, Interesting

      Funny you mention that. I'm taking a class on design of digital systems at my university, and my professor works for JPL. He helps design the control systems onboard space vehicles such as the Mars rover. Anyway, a majority of the class grade is based on an end-of-the-quarter project, which we complete in groups of 2 to 4. On Wednesday he expressed interest in a group developing some sort of redundancy for FPGAs that would be suitable in spacecraft. You see, on Mars, you're not shielded from huge doses of radiation as you are on earth. A healthy dose of radiation bombardment could easily reprogram an FPGA chip on the surface of Mars; ASICs chips are used to overcome this problem.

      Maybe he was gung-ho about anti-radiation redundancy because he already knew the likely problem of the Spirit. Who knows?

      - sm

    2. Re:Cosmic rays... by mnmn · · Score: 2, Informative

      Rockets have blasted off into space since Sputnik1 and with all the communication satellites, we know alot about high-radiation electronics. We've had sun flares corrupting electronic equipment for decades and ASIC companies have entire lines of chips for high-radiation resistance, partly for military applications.

      So I think the rovers electronics are well protected from at least the Suns radiation. I think Mars is 1.3AUs from the Earth, making it 2.3AUs from the Sun, so it should receive less than a quarter of the radiation per square inch the earth gets, but I strongly feel I could be wrong there. Martian dust getting into the compartments IMHO can be a more likely reason.

      If electronics break on Mars, I'd put the highest chance on the initial impact on landing. Beside that its just sitting on barren land, under full solar radiation, exposed to some dust but in close-to-vacuum. Its a simpler environment we have to deal with compared to say sending a rover to a planet like Earth where it must be able to swim and walk through the forests.

      --
      "Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky
  14. Software / Hardware Breakthrough? by Saeed+al-Sahaf · · Score: 4, Insightful
    This is remarkable, and a testament to good software / hardware integration. It is true that I think this money could have been better spent elsewhere in terms of our understanding of the universe, but still, these types of projects and the hardships that come with them teach miles of experience in remote software / hardware problems.

    I do seriously wonder if these types of projects will tell us anything more than esoteric wonders of Mars, but from a strictly engineering standpoint, perhaps it's worth it after all.

    --
    "Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
    1. Re:Software / Hardware Breakthrough? by Saeed+al-Sahaf · · Score: 3, Informative

      A lot of the comoponents in this craft came from my former employer, www.InterPoint.com, who laid off half their staff a few years ago (me was one of those). Little boxes the size of a pack of cards, hand built. Really amazing stuff.

      --
      "Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
  15. The Full Story by DrunkenTerror · · Score: 5, Informative

    Here is the link to the real story. The one given in the /. acticle is getting pushed down spaceflight's page.

  16. Nice by Omega1045 · · Score: 4, Interesting

    I have a friend who works in the field. Space travel hoses electronics bad. Triple redundancy and over-engineering is the name of the game. This is nice to hear. I would imagine that something went wrong intransit or on-landing, but they can keep going,

    --

    Great ideas often receive violent opposition from mediocre minds. - Albert Einstein

  17. flash ram is known to fail on writes after a while by Anonymous Coward · · Score: 2, Interesting

    I know a lot of ppl are using flash ram in smaller computers for booting linux or what not. Well if they are writing their logs and other things to that flash be aware that you can only write to it so many times before it fails.

    Was NASA writing to that flash or just reading? A ram drive in flash sounds like it will access/write thousands of times a ?minute? This should wear it out quickly.

  18. Re:Monday morning quarterback: RTOS tradeoffs by G4from128k · · Score: 3, Informative

    If I was sending an embedded control computer to another planet, I would have chosen an OS with memory protection, not VxWorks.

    Actually, they might have protected memory if they use VxWorks AE RTOS/Tornado Tools 3.0. Spirit uses VxWorks, but I don't know what version they used or when they had to commit to a particular version of VxWorks.

    Also, as the article mentions, memory protection adds overhead and can affect real-time performance. Hard real-time software cannot afford to have a complex layered structure and lots of conditional code that adds unpredictable delays. For that reason, many really real-time applications run very close to the hardware (for better or for worse.)

    --
    Two wrongs don't make a right, but three lefts do.
  19. Steal SOME by MajorDick · · Score: 5, Funny

    I mean like beagle isnt using its flashram anymore, just go and jack some off it. While your at it TAG the Beagle with some PRO-US graffiti :) hell maybe its got nicer rims too

    Seriously, can you imagine the first manned expiditon seeing the Beagle Jacked up, tagged , up on little martian cinderblocks, All that and we already got a head start on building martian cities

  20. Re:Not "online" at all... by Mr.+Darl+McBride · · Score: 3, Insightful
    You can read all about it at: Spaceflight Now - where you can continue to follow the status of both spirit and opportunity

    Nicely karma-whored. That's the link from the article. :)

  21. Information on the MER hardware. by elrond1999 · · Score: 5, Interesting

    Ive been unable to find any hard information on the design of the MER memory systems. If anyone can point me to a technical brief id be very happy.

    From what ive pieced together the MER system is something like this:

    One RAD6000 powerpc cpu.
    Connected via probably compact pci to 128 mb of ecc sdram.
    256 mb of flash. No info on what make of flash, but likely Intel since they are the biggest. There was some info from the press conference that there are actually two flash chips and that the flight software is redundantly stored on each. So does this mean that there is actually 128mb of redundant flash? Also it was said that they had problems even with the redundancy, could they possibly have overwritten something? We all know that even a redundant raid does not stop filesystem corruption.

    No information on how the flash is connected, parallell / serial? How the redundancy works?

    Btw, I guess flash is rather radiation hard since they require 10 - 20V to erase / write.

    1. Re:Information on the MER hardware. by georgewilliamherbert · · Score: 2, Informative
      Ive been unable to find any hard information on the design of the MER memory systems. If anyone can point me to a technical brief id be very happy.

      RAD6000 6U Compact PCI page at BAE Systems.

      It's not great, but there are more detailed links around the BAE website.

      It doesn't list how the FLASH is connected; that's not a standard built-in on the RAD6000 computer. I would guess, hung off the FPGA interface device, but I don't know that for sure.

  22. Re:Huh? Flash? by DougHalfWay+AroundTh · · Score: 2, Informative

    They don't. See DoD Bids About 3/4 of the way down the page.

    Title: Rad Hard Flash Technology Abstract: The highest density radiation hardened non-volatile (NV) memory currently available is a 256 kbit EEPROM based on SONOS technology. One of the major limitations in developing rad hard NV memory has been the cost in bringing up the NV technology in a dedicated rad hard process facility, especially when weighed against the limited market size. One way to bring radiation hardening to an advanced electronic product on a cost-effective basis is to leverage the commercial product by applying the hardening to the commercial fab instead of bringing the commercial technology to the rad hard fab. NV flash memory technology is popular in the commercial marketplace, with densities up to 256 Mbit in production. Unfortunately, flash memory is not available, at any density, in total dose rad hard versions. And, most commercial flash memories are so soft that impractical amounts of shielding are required to survive even moderate radiation environments. This effort will be the first step in developing rad hard flash technology at a commercial fab. Rad hard flash technology will be a near-term solution to the problem of high density NV memory for space applications. It will enable the development of rad hard flash memories and embedded NV memory for rad hard ASICs.

    Flash...the weakest link...

  23. Wrong colours again... by daina · · Score: 2, Funny
    NASA should never have used a Sony WHITE memory stick with built-in DRM. That rover probably took a picture of something that looked a little too much like a Disney character, and - bam - total shutdown!

    They should stick with purple next time.

  24. It's a good thing the Spirit had an F8 key by michaelmalak · · Score: 3, Funny

    ...and it's amazing NASA could press it at the right time from 124 million miles away (1.3 AU). Although I wonder how many times NASA did have to press it before they got the timing right -- we only know about the success :-)

  25. Salute the Helpdesk by Papa+Legba · · Score: 5, Funny

    I have had some tough calls in my time but I have never had to walk a robot 283 million miles away through brain surgery. Man I am glad I did not get that call. This is going to blow there call averages all to hell. I raise a cup of Joe to you, Rover Help Desk man.

    --
    Papa Legba come and open the gate
  26. last photo from Spirit by djupedal · · Score: 5, Funny

    This is the last image received prior to the recent issues with Spirit...

  27. Re:Monday morning quarterback: RTOS tradeoffs by GGardner · · Score: 4, Insightful
    memory protection adds overhead and can affect real-time performance

    This is the conventional wisdom, and in my experience, this particular nugget causes more embedded and real time software projects to fail than any other.

    First off, on a modern PowerPC processor, memory protection (that is, without virtual memory support) can be implemented very cheaply. If you can do it just with the IBAT/DBAT registers, it should be a constant-time overhead, which is good enough for hard-real time. Oddly enough, I can't find a single reference on the net that measures the cost of memory protection alone on a modern CPU. Anyone? Anyone?

    Secondly, though the rover certainly may have some software components that have hard-real time requirements, that doesn't mean that every single line of code does. Typically, less than 1 percent of the code in a real time system is hard real time. In that case, you can run the real-time code in ISRs, or perhaps in a dual-mode system, like RT-Linux, or in high-priority kernel threads (as with QNX). In any of these situations, you can run all the rest of the code in protected memory space.

  28. Re:Somebody here on Slashdot nailed it... by prockcore · · Score: 5, Funny

    I remember in the last thread about the rover, someone opined that it was bad memory, then proceeded to give a half dozen reasons why. Totally nailed it.

    Yeah, in the future NASA should just submit an Ask Slashdot whenever something goes wrong..

  29. Opportunity by loconet · · Score: 3, Interesting

    Opportunity is fast approaching the red planet. It should be an interesting night at JPL. Execellent work guys, good luck.

    --
    [alk]
  30. You think that's neat by chazR · · Score: 5, Informative

    Here's a rant by a JPL guy about appropriate technologies for software on deep space probes. He recounts one story of a failed probe "100 million dollars, and 100 million miles away".

    They fixed it. The fact there was a lisp REPL running on the spacecraft helped.

    That's cool:

    (unwind-protect
    (progn (do-science)(talk-to-earth))
    (wait-in-repl-for-earth))

    1. Re:You think that's neat by be-fan · · Score: 3, Interesting

      This is a bit OT, but I need to rant:

      A quote from his site: "It is incredibly frustrating watching all this happen... I can't even say the word Lisp without cementing my reputation as a crazy lunatic who thinks Lisp is the Answer to Everything"

      I feel his pain. I was introduced to Lisp not too long ago, and within a short time, a Lisp-derived language (Dylan) became my favorite. I also found that many of the features I loved from Python were very Lisp-y in nature. Now, I see Java and C# either neglecting all the knowledge garnered from the Lisp-family of languages, or reinventing it --- badly. The features in C# 2.0 have either been in Lisp for decades (lambdas, closures) or are not necessary in Lisp (iterators, enumerators --- which, btw, are theoretically not necessary in C# 2.0 either because of lambdas and closures!) This new "Xen" (or X#) language Microsoft Research is pushing takes a great idea (extending the language to fit the problem domain) that has been a part of Lisp for decades, and chops it off at the knees. Instead of having proper macros, so you can extend the language to fit *your* problem domain, they hack support for a single problem domain (back-end business programming) into the language itself!

      That said, the Lisp community is to blame as well. Part of the reason people stop listening the moment somebody says Lisp is that the Lisp community is *so* rabid and *so* unyielding. Especially some high-profile members who are highly respected within the community despite the fact that they are completely obnoxious and lack any human sense of manners.

      --
      A deep unwavering belief is a sure sign you're missing something...
    2. Re:You think that's neat by be-fan · · Score: 2, Interesting

      Lambdas are just an aesthetics/keystrokes issue.
      --- :: jaw drops :: :: gets on knees ::

      I bow down to your ignorance, oh mighty King of the Cluless!

      Seriously, though, please research lambdas. They don't just save typing. They are *everything*. All of computation can be described just with lambdas of a single parameter. Everything else is just syntax suger. If you ease one restriction of the lambda calculus (no side-effects), lambdas can do procedural code, functional code, and even object-oriented programming. That's why I said iterators, enumerators, etc, are not necessary in C# 2.0, because it has proper lambdas. All of thsoe can be implemented very easily on top of lambdas.

      and start focusing on really useful subjects - like formal verification
      ---
      You can do formal verification in Lisp. Look up ACL2 (first link in Google search for that term). I'd still say that Haskell or Clean are a bit better for such purposes, but mainly because they are designed from the beginning to be comfortable to program without side-effects, while Lisp was not.

      and strong typing (which go very well together).
      ---
      C'mon. You're making this too easy. Lisp has a strong type system. All type errors are caught, unless you disable type-checking in your compiler. Maybe you mean *static* typing instead?

      That's probably because the only people who think Lisp Is The Answer to Everything are a little bit insane.
      ---
      Um, wasn't that my point? Its the "Lisp is the Answer to Everything" people that make it harder for normal people to push Lisp to areas where it would be really useful.

      --
      A deep unwavering belief is a sure sign you're missing something...
  31. that line from armageddon comes to mind... by MoFoQ · · Score: 5, Funny

    where the russian cosmonaut says "American components, Russian components. They're all made in Taiwan!"

  32. Re:Somebody here on Slashdot nailed it... by jobugeek · · Score: 2, Funny
    Why, so everyone can start their response with

    IANANE (I am not a NASA engineer), but.....

    --
    I'm not drunk, I just have a speech impediment. And a stomach virus. And an inner ear infection.
  33. Re:Checksums by Anonymous Coward · · Score: 5, Informative

    I'm watching NASA tv at the moment and they're explaining possibilities now. At the moment, they only have a very broad explanation of what's going wrong. However the newest knowledge is;

    There are two separate flash memories on Spirit. At the moment, part of the problem is software which can read part of the flash memories as some of the operational software which is kept in flash ram seems to be coming up before the system reboots.

    The system is rebooting no matter which flash memory is being accessed, it has the same bug both ways, so the flash ram itself looks to be OK, but the interface between the flash ram and the software looks to be causing resets.

    Even if there were more backup flashrams, it looks like they'd still have this problem. Perhaps many, all on different controllers, and even an entire backup computer would have prevented this. at 100watts total power available for the rover, an entire extra computer may be a bit much to have fit. But then sending two rovers would also negate problems, and thats just what they've done

    It seems most likely at the moment, according to NASA, that the family of components that are involved with the hardware addressing of the flash memories looks to be where the problem is.

  34. Re:Monday morning quarterback: RTOS tradeoffs by anubi · · Score: 2, Interesting
    From where I sit, I think they did damm good.

    I don't know that much about VXWorks, but I heard that one of its main assets is having a very small tight multitasking kernel.

    They were able to regain the system, despite loss of a major computational component. Remotely. Through a debug link. That sure says a helluva lot for the robustness of the OS and how they configured it.

    Good job, JPL.

    --
    "Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]

  35. As someone else said by Viadd · · Score: 5, Funny

    The Spirit is willing, but the flash is weak.
    (Posted by Jane Slee and John Stracke in separate usenet postings.)

  36. Thank God by mtfbwy · · Score: 2, Funny

    They didn't use Windows CE. Remember the diplomat months back that got locked in his 7 series BMW because of a computer crash? :)

  37. Radiation hardened Flash by andygrace · · Score: 5, Informative

    There is a big difference between standard flash and radiation hardened flash. In fact we are designing a project with one of these VME buss units as a storage array.

  38. Relative positions of Earth and Mars by LouisvilleDebugger · · Score: 3, Insightful

    The present series of orbiters/landers (Nozomi, Mars Express, Spirit, Opportunity) were launched at such a time as to take advantage of the most optimal Mars-Earth configuration for something like 60,000 years. I believe the bottom line is that it was a time you could get the most science there for the least cost of launch.

    Shame on my fellow American who said we should strip Beagle 2 and leave it up on cinderblocks. If Beagle is ever discovered to have soft landed, I would think the only proper thing to do would be to restore whatever's wrong with it, and let it complete its mission. (HAL, V'Ger, anyone?) Given the discussion of things like the effects of radiation exposure on electronics, you'd just have to be interested to know what a 50-or-150-year-old "dead" lander might be able to wake up and do.

    If Spirit's problems aren't resolved, the Mars Scorecard should at least reflect that Beagle was the less expensive failure.

    (Disclaimer: I visited England for the first time last year, and falling in love with the whole place doesn't begin to describe it. R.I.P. Beagle 2. *sniff*)

  39. We learn from our mistakes... by Chordonblue · · Score: 4, Interesting

    So... I wonder if they'll consider validating MRAM more quickly if Flash is found to be more error prone.

    You know how NASA works. The Space Shuttle running on 486's and whatnot. I understand the science behind that reasoning, as sad as a 66 MHz processor seems to us geeks nowadays, but I wonder if MRAM will prove more flexible and stable for future space missions.

    --
    "...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."
    1. Re:We learn from our mistakes... by Detritus · · Score: 2, Informative

      The Space Shuttle does not use 486s. It uses IBM AP-101s, which are architecturally similar to the IBM 360/370 series of computers. See the Second Generation Computers FAQ.

      --
      Mea navis aericumbens anguillis abundat
  40. Remote nonsense by fm6 · · Score: 2, Insightful
    Just wait until we have Interplanetary, Interstellar, Intergalactic Remote Desktop. I'm only half-joking.
    No you're not. All these Mars glitches are exactly why real space exploration entails sending an actual carbon-based unit, not a glorified laptop.

    Consider that an interstellar probe will take years to receive updated instructions. By which time, any fix will probably be irrelevent. Plus if they're more than 30 light-years away (practically next door by galactic standards) they guy who sent out the instructions probably won't live long enough to find out if they worked!

  41. Re:Follow the status? by the_mad_poster · · Score: 2, Funny

    This is like a reality tv show, I love Nasa Tv!

    With the exception that this is actually real...

    --
    Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!
  42. Solution for Next Time by wildsurf · · Score: 2, Interesting

    Flash RAID array.

    (Can this even be done?)

    --
    Weeks of coding saves hours of planning.
  43. Nasa TV by Nucleon500 · · Score: 3, Informative

    If you don't get it on cable, you can watch NASA TV here.

  44. Re:The mission is not yet out of danger by Aardpig · · Score: 3, Funny

    Opportunity will most likely have the same problem since they are twin brothers and had an identical build process.

    I quote from my post a couple of days ago:

    Parent: So even if Spirit gives up the ghost, her kin can carry on the flame (albeit in a less interesting location).

    Me: Not if the problem is due to a design fault. That's the drawback of sending multiple identical probes: if one is intrinsically fucked, they all are.

    I now bask, contented, in the glow of my own brilliance....

    --
    Tubal-Cain smokes the white owl.
  45. Radiation by panxerox · · Score: 2, Interesting

    with all the radiation and very high energy particles zipping thru the spacecraft on its way there, I'm suprised any computerized spacecraft get anywhere intact.

    --
    "It's so convenient to have a system where everyone is a criminal" - A. Hitler
  46. Re:Follow the status? by Basehart · · Score: 2, Funny

    A little too much grandstanding though.

    I've noticed that a few people stand facing the cameras a lot, gesticulating wildly as if talking about something important.

    I also saw one guy go from reading a magazine and sipping a martini to furiously typing away at a keyboard as the camera panned across the room!

  47. Re: Technically... by MachDelta · · Score: 2, Informative
    I hate to be a nitpick, but the exact quote (with context) is:
    Andropov: Excuse me, but I think I know how to fix this.
    Watts: Move it! You don't know the components!
    Andropov: [annoyed] Components. American components, Russian Components, ALL MADE IN TAIWAN!!!

    Oh, and he has another quote I liked too:
    Lev Andropov: This is how we fix things on Russian space station!
    [hits panel with tool]

    But maybe I just like it because thats how I tend to fix things too ;)
  48. Re:Follow the status? by datan · · Score: 2, Insightful

    just wondering something. when they say 'currently' do they mean now or light-time ago? eg. they confirmed cruise stage separation less than a minute after it "happened"

  49. Re:God Speed Opportunity by Anonymous Coward · · Score: 2, Informative

    Well Done NASA.. bringing space to us is the next best thing to taking us to space.

    0508 GMT (12:08 a.m. EST)
    A good signal is still being received! Unlike the Spirit landing where signal was lost immediately after touchdown, Opportunity continues to talk to Earth.

    0506 GMT (12:06 a.m. EST)
    After a short loss of signal from the rover, a strong signal is now being received as Opportunity arrives on Mars!

    0505 GMT (12:05 a.m. EST)
    BOUNCING ON MARS! Mission Control has received a signal of Opportunity bouncing on the surface of Mars.

  50. We need open source rover software by HangingChad · · Score: 3, Interesting
    I'd put the /. community up against NASA any day. Instead of trying to be so secret about everything, open the software up to the community and let the collective propose solutions to some of these issues. Hey, it's our tax dollars developing all this stuff, why can't we play too?

    Besides robot exploration software would be handy right here. It would be neat to be able to send a research bot out in the deserts, deep oceans and jungle canopies of the world. Machines can go where we can't.

    Individually you can be damn annoying sometimes, but I'm constantly amazed and delighted by the collective intelligence of the /. pack.

    --
    That's our life, the big wheel of shit. - The Fat Man, Blue Tango Salvage
  51. It's simple actually by HarveyBirdman · · Score: 2, Informative
    I've actually consulted on this for another group inside my company. You don't wait for a cosmic ray to change a programming bit in an FPGA.

    You have two or more running in parallel. While one is running, the next reloads from ROM. When it's loaded and synchronized, you switch to it, and load the next one. You do that in series, over and over, so you're only using any particular FPGA for a couple of seconds at a time, and their configurations are constantly being refreshed. It's a very simple idea that can be done now.

    --
    --- Ban humanity.
  52. Cut it out! by Dun+Malg · · Score: 3, Insightful

    OK, you dorks (you know who you are) need to stop postulating about the memory failures having to do with static electricity, martian dust, or lack of redundancy. This is JPL and (the one case of metric vs. standard aside) they thought of all the obvious stuff during the design stage. Do you really think they're slapping their foreheads and saying "the dust! we forgot about the dust!" over in the design lab? Get real, people.

    --
    If a job's not worth doing, it's not worth doing right.
  53. Hear! Hear! I couldn't say it any better! by Teancum · · Score: 3, Interesting

    I was going to mod this one up, but I decided to give this reply some more emphasis by actually replying with some thoughtful encouraging words instead.

    It would be nice to be able to have some folks at JPL throw down the source code and engineering schematics and say to the geek/space/engineering community at large "We have a problem here and could use your suggestions to see if we can get this fixed."

    This (the mars missions) is obviously a big hit, as measured by replies on Slashdot, the number of hits on the website at JPL, stories in mainstream media, and other reasonable metrics to gague popularlity of a project. I'm sure that there are several geeks out there that wouldn't mind digging into the source code.

    The only reason I could see the engineers not wanting to do that is to open themselves up to obvious scrutiny for poor engineering and coding. (Whadda you mean the global variable named temp is the only variable. We also have temp2, temp3, and temp4. What do the numbers mean in those mean? You can get it from context, can't you?) That and some people just aren't used to allowing other into their "domain".

    Being 100% funded by public money should also be further reason for why this should be opened up. I also totally agree.

  54. Re:Monday morning quarterback: RTOS tradeoffs by AaronW · · Score: 3, Insightful

    As someone who has programmed VxWorks (including AE) for several years, I can say AE is a buggy piece of crap. We moved to AE for our project and eventually had to dump it since it was so buggy and slow. Also, as far as flash filesystems go, VxWorks ONLY SUPPORTS FAT, and not even FAT32, so it isn't a very robust filesystem. Not only that, because it's FAT there is no wear level support. I believe there also isn't the equivelent of chkdsk either. I also imagine that it can't handle faults in the filesystem (as if anything ever could deal with faults in a FAT filesystem very well).

    With VxWorks you can often get away without any filesystem because all the code is linked together in one big monolithic file. Separate tasks are not separate files (although you can have loadable object files).

    Yes, AE does provide memory protection domains, but it still doesn't clean up after a task dies. Sure, you can free the memory, but not open files, semaphores, pipes, or other things. Malloc in AE is improved over the braindead implementation in standard VxWorks, but it still has a long way to go. For example, it can't free up open file descriptors, semaphores, or other items associated with a task because a task usually isn't associated with it. So if you have a task that acquired a semaphore and dies, that semaphore will never be released.

    Hell, Wind River couldn't even get malloc right! Their malloc has got to be the worst implementation I've ever seen! They place free blocks in sorted order (smallest to largest) in a linked list after attempting to combine a new free block with neighboring free blocks. The next time you allocate, it walks the entire linked list until it finds a block large enough! In our case we wound up with tens or even hundreds of thousands of small blocks causing our watchdog timer to kick in because malloc became impossibly slow. AE improves this to use a tree instead of a list, but it still fragments. I ripped out the Wind River implementation and replaced it with Doug Lea's dlmalloc and all our malloc problems were solved, and the fragmentation went from tens of thousands of fragments to only a few dozen.

    For an RTOS being pushed for networking it isn't very good there either. It comes with an ancient BSD TCP/IP stack. If you have a device and want to see if it runs VxWorks, just run nmap against it. If it says TCP sequence number guessing is trivial, you can bet it's probably running VxWorks.

    In todays world, VxWorks doesn't cut it any more. Any complex project should choose a real OS like QNX or even embedded Linux over VxWorks. For realtime, Linux usually isn't very good, but Timesys appears to have solved that problem nicely.

    VxWorks isn't even that good at realtime. Usually you can't get any better resolution than half the system tick rate (usually 10ms), so you can't get better than 20ms of resolution in many cases.

    I've also heard many rumours that Wind River is dropping AE, or at least not pushing it. We're not the only ones to have been burned by it. I've heard of only one other company that used it, and they were also burned. I think it was a startup that went out of business.

    In VxWorks, all tasks share the same memory space. Think of every "task" as really a thread and you get the idea. In other words, if a "task" dies, the only way to clean up the system is to reboot.

    Also, VxWorks doesn't scale. The more tasks you have, the slower it runs (i.e. no O(1) scheduler). And with the shared memory, the more complex the code, the harder it is to debug and develop a stable system.

    QNX would have been a much better solution. In QNX, the core OS is very small, and if a task dies it can easily be restarted. In QNX, everything is a task with memory protection. The TCP/IP stack is separate from the core OS, for example, as are all the other drivers. If a driver crashes, it won't take the OS with it. Context switching in QNX is also very fast, faster than VxWorks even though memory protection is involved.

    -Aaron

    --
    This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
  55. Re:Monday morning quarterback: RTOS tradeoffs by AaronW · · Score: 2, Informative

    I can tell you that AE is in many ways WORSE than the standard VxWorks. It has a lot more bugs and is quite a bit slower. Think of regular VxWorks with memory protection hacked in, not designed in from the ground up.

    As a VxWorks programmer for the last 5 years, I can honestly say VxWorks is a PoS that is losing market share at a tremendous rate to the likes of embedded Linux and QNX. Wind River decided to spend tons of money buying add-in products like Routerware instead of improving their RTOS. It was a huge waste of money and now they're paying for it. They're losing money hand over fist and have had a lot of layoffs lately. They were good at one time, but they have fallen far behind the curve now in embedded RTOS design, especially for complex systems.

    VxWorks comes with support for a FAT flash file system, a completely broken malloc implementation, an ancient BSD TCP/IP stack, poor RT support, no memory protection, and no way to clean up after a task that dies. Not only that, it usually costs a fortune, but I've heard they're willing to sell it very cheap now because they're desparate.

    I looked into embedded Linux for our next generation hardware and software and Timesys appears to have a very nice solution with hard real-time support. The kernel is fully preemptable using semaphores instead of spinlocks and has priority inversion support. They also offer resource reservations, so I can say "I want this task to be guaranteed 5.73ms of execution time every 9.8ms" where after 5.73ms the task either gives up the CPU entirely, or else changes to a non-RT priority to not starve other tasks. It's really quite clever. Not only that, unlike RT-Linux there isn't a separate API for RT vs non-RT tasks. Monta Vista Linux is soft real-time. It cannot guarantee context switching time, nor does it deal with priority inversion. In RT, priority inversion can be a major problem (see the first Mars rover for an example).

    For an example of priority inversion say you have 3 tasks, a low priority, medium priority, and a high priority. The low priority task acquires a mutex semaphore to protect a critical section and starts processing. It is interrupted by a medium priority task. Meanwhile, a high-priority task unblocks and attempts to grab the mutex. The high priority task will block until the medium priority task blocks so that the low priority task can release the semaphore. A common solution is priority inheritance. With priority inheritance, as soon as the high priority task attempts to acquire the mutex semaphore, the low priority task has its priority bumped to that of the high priority task until it releases the semaphore. In this way, the low priority task will interrupt the medium priority task so that the high priority task won't have to wait as long.

    QNX is also a very good alternative. Very fast context switching and extremely robust memory protection. I think with QNX you can even buy a license suitable for use in medical devices (i.e. you absolutely cannot afford to have the OS crash for any reason).

    I've heard rumours that Wind River is dropping AE since nobody is using it. After our experience, I pity whoever tries it.

    Also, unless you get the source to VxWorks, which usually costs a lot of $$$, debugging is a complete nightmare, especially when you hit a bug in the Wind River code (and there's a lot of them). Hell, they couldn't even implement malloc right!

    Wind River is coming out with version 6 of VxWorks, but it is basically an enhanced version of AE. I'm not holding my breath.

    -Aaron

    --
    This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.