Slashdot Mirror


Spirit Sends Debug Information to Earth

gfilion writes "NASA has released a press release that says: 'Shortly before noon, controllers were surprised to receive a relay of data from Spirit via the Mars Odyssey orbiter. Spirit sent 73 megabits at a rate of 128 kilobits per second.'" They've been having communications troubles with Spirit since Wednesday, so it's good to hear from it again, even if the data is just filler.

27 of 477 comments (clear)

  1. You know what they say by Smallpond · · Score: 4, Interesting

    A diagnostic is what runs when nothing else will.

  2. Can low-power corrupt memory? by corebreech · · Score: 5, Interesting

    I watched the press conference on NASA-TV and they talked about how the thing wouldn't go to sleep at night and so it got me to wondering about the low power question. Obviously they have the rover power off when power gets to a certain level, but what if that level is slightly off?

    In other words, if the onboard CPU has enough power and continues to run but the memory doesn't have enough power, doesn't that cause all kinds of wackiness?

    They keep talking about the data pointing to simultaneous faults... well, as programmers we know these are the very worst kinds of bugs to deal with, but with something as (I'm assuming) well written as their code, so doesn't that point to a memory problem? I mean, the think is working flat-out beautifully one moment, and then the next moment it goes tits up.

    The other question I had concerned this motor they had turned on but which didn't complete its sequence. When they command the motor to do something, do they tell it to run for some interval of time, or do they tell it to achieve a specific position? I was thinking that if it's the latter, and then if it gets stuck somehow, this could create the low power situation as the motor just grinds away.

    1. Re:Can low-power corrupt memory? by ultrasound · · Score: 5, Interesting

      Generally there are low-voltage detection circuits inside and/or connected to the microprocessor that detect that power is fading, and wrap things up, terminating any writes in an orderly fashion if possible. Generally any power-down is going to be very slow (orders of 10s to 100s of milli-seconds or more) because of capacitor storage in the power supply. The LV device gives sufficient notice that power is fading so that the remaining processor time is more than ample to shut things down gracefully.

      Obviously with volatile RAM without battery backup we shouldn't need to care about the state of the RAM on power-down as it is only temporary storage and will be re-initialised on power-up. Generally the storage components will have wider operating tolerances than the microprocessor so it is very unlikely that the RAM will get corrupted during the powerdown proceedure.

      With non-volatile hardware such as battery backed RAM, flash, eeprom, fram etc we have a problem because these contain NV config data and firmware that must be consistent. And with some such as FLASH the write times can be very long, may be longer than the power-down time. In this case the general philosophy is to write the bytes, and the very last step is to update the checksum and set a valid data flag. Which means at worst the device boots up and knows its got some dodgy code or data on its hands, and hopefully handles it in a graceful fashion.

      With something like the Spirit I would guess that some form of multiple redundancy is used so that there are multiple firmware images, with a switchable bootloader so that a new image or dataset can be uploaded to an area that is offline, and only once all of the checksums/message hashes are confirmed is the switch made. And hardware watchdogs are running so that if the worst happens and it hangs it can always boot an alternate image. I would also expect a backup OTP PROM image that is guaranteed never to change and known to work.

  3. CNN article by pvt_medic · · Score: 4, Interesting

    Cnn has an article on some updates. Apparently the engineers been having all sorts of fun with the thing here a quick excert. "Cautioning that they will need more time to understand what went wrong, project engineers said they have determined that Spirit has rebooted or tried to reboot itself more than 60 times a day since the failure."

    --
    30% Troll, 50% Underrated, 10% Interesting
    Score:5, Troll
  4. Re:Linux Cost Tax Payers at least $410M...nothing by nut · · Score: 0, Interesting

    Hmm... I hate to admit it but this probably a fair comment.
    Failure rates on RTOS's is a known metric. If they used commercial hardware and commercially used software, did they check out the numbers? I would be surprised if Linux beat out QNX as the most reliable embedded operating system.

    --
    Never trust a man in a blue trench coat, Never drive a car when you're dead
  5. Wind river by hool5400 · · Score: 2, Interesting

    Maybe Wind River will not be so quick to brag now :)

    --

    Remember, it takes 42 muscles to frown and only 4 to pull the trigger of a sniper rifle.
    1. Re:Wind river by gnalre · · Score: 3, Interesting

      While I agree with the post in general, the one thing I do like about windriver is some of the debugging tools. It is hard to see how we could get along without windView for instance.

      I have been porting some vxWorks applications to windows recently(Don't ask) and the lack of a tool like that is killing me.

      Any suggestion of such a tool like windview that works on windows would be gratefully accepted.

      --
      Choose your allies carefully, it is highly unlikely you will be held accountable for the actions of your enemies
    2. Re:Wind river by AaronW · · Score: 2, Interesting

      I hate to follow up to my own post, but I heard on NPR that the problem is in the Flash memory.

      Usually in VxWorks everything is compiled and linked into a single binary image (i.e. no filesystem). For flash, the only built-in file system is FAT, not even FAT32. Due to this, it makes the flash much more critical. Fat itself is not very robust.

      Ideally they would have at least 2 copies of the image in flash and switch to the secondary if the primary fails a CRC or other validation test. Also, it should have been designed to survive a flash chip dying. I can see that it would be easy for the flash to get corrupted due to the radiation. Apparently they're working around it by bypassing the flash and just using RAM.

      I would have thought that the flash subsystem would be designed with some fairly healthy ECC (i.e. handling bit errors at the same time at least one flash chip is dead). Maybe go so far as 3 flash banks with ECC for triple redundancy.

      It would be interesting to see how the computer of the Mars rover was designed.

      --
      This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
  6. mars dvd message by xk · · Score: 5, Interesting

    Has anyone cracked this yet?

    -bk.

  7. Ive been telling you... by TitanOfire · · Score: 2, Interesting

    How many times do I have to say it? Robots just dont work for shit. Why dont we just send up some of those hyper-intelligent monkeys that we sent to the moon. I mean seriously it would cost a lot less. And then theyd make movies, how cool would it be to see another movie about a chimp doing what a human could do a billion times better?

  8. Re:400 million and only one CPU by 0x0d0a · · Score: 5, Interesting

    Nasa systems that involve human life are highly redundant. I remember a lecture by a NASA engineer about systems on the Shuttle. There are *seven* redundant computers which calculate data. That data requires identical answers from four to be accepted.

    On Spirit, power is an issue. More CPUs == more power drain.

    Furthermore, I remember the folks initially speculating that something was wrong with the power system. I stopped following it, but it said that this transmission was composed of power subsystem diagnostic data. Could be it's a response requested earlier that it didn't have enough juice to send, in which case more CPUs would have actually exacerbated the problem. :-)

  9. Suspiciously good pics of landing site from orbit by hazee · · Score: 3, Interesting

    Is it just me, or has anyone else been very puzzled by the pics that NASA released of Sprit's landing site? These were supposedly taken by the Mars Orbiter Camera on the Mars Global Surveyor.

    I thought that the best cameras in orbit around Mars were those on the European Mars Express, with a top resolution of 12 metres/pixel, and yet here the Spirit lander, about 2 metres aross, is spread across about 10 pixels.

    Something's not right...

  10. Re:Linux Cost Tax Payers at least $410M...nothing by 0x0d0a · · Score: 2, Interesting

    How often is a bug the fault of an RTOS, and how often introduced by the coders working on a particular project?

  11. Re:Suspiciously good pics of landing site from orb by AndroidCat · · Score: 5, Interesting
    They're trying a new technique. From this article:
    The MOC image of the Spirit lander and its landing site was acquired using a new technique that was pioneered by the MGS project in 2003. Called "cPROTO" (for Pitch and Roll Only Targeted Observation with planetary motion compensation), the approach allows MOC, which normally takes pictures 1.5 meters (5 feet) per pixel to 12 meters (40 feet) per pixel, to acquire images with a higher resolution. By pitching the MGS spacecraft at a rate faster than it orbits around Mars, and moving it in a way that compensates for the rotation of the planet, MOC is able to obtain images with a down-track resolution of about 50 cm/pixel (~20 inches/pixel), although the cross-track resolution remains ~1.5 m/pixel (5 ft/pixel). These images have a better signal-to-noise ratio than typical 1.5 m/pixel MOC images, as well. This technique allows the lander and other details not normally visible in a full-resolution MOC image to be seen.
    --
    One line blog. I hear that they're called Twitters now.
  12. Re:Spirit rebooting 60 times a day by pongo000 · · Score: 4, Interesting

    Something like 2/3 of NASA's recent missions have failed in some way or another. Is it quite possible that NASA engineers simply have not mastered the art and science of designing hardware and software operable in the harshest of environments?

    In some ways, there is an air of arrogance in everything NASA does, from their press conferences to their marketing agreements. We have dead shuttle astronauts being transformed into "national heroes," even though their demise wasn't the result of any heroic sacrifices on their part, but rather a materials and systems failure scenario that NASA failed to handle properly. We have Spirit as the "little train that could," sending back waves of photographs of rocks that NASA engineers have actually named. Does the naming of rocks somehow bring NASA's mission closer to the unwashed masses who relate better to Beanie Babies than to the stark facts of reality?

    Harsh as it sounds, NASA is reaping what they sow: A string of hardware and software failures that is serving as a backdrop to newly-mandated initiatives by Bush to send miners to the moon and astronauts to Mars. Yet NASA can't even seem to get a remote-control buggy to work correctly. The mind just reels at the catastrophes that await us between now and 2015 should NASA continue down this road of inept management and hardware/software designs insufficiently tested against the harsh envrions of space. As geeks, we owe it not only to ourselves but to the non-geek public to recognize these failures as serious shortcomings in the NASA culture. We must resist the temptation to blindly set NASA on a pedestal in the name of scientific achievement without first critically analyzing their failures.

  13. Re:Tell the rover by brokencomputer · · Score: 2, Interesting

    I have compiled some important quotes regarding the issue. * NASA's Spirit rover communicated with Earth in a signal detected by NASA's Deep Space Network antenna complex near Madrid, Spain, at 12:34 Universal Time (4:34 a.m. PST) this morning. The transmissions came during a communication window about 90 minutes after Spirit woke up for the morning on Mars. The signal lasted for 10 minutes at a data rate of 10 bits per second. Mission controllers at NASA's Jet Propulsion Laboratory, Pasadena, Calif., plan to send commands to Spirit seeking additional data from the spacecraft during the subsequent few hours. [11] * The flight team for NASA's Spirit received actual data from the rover in another communication session that began at 13:26 Universal Time (5:26 a.m. PST) and lasted 20 minutes at a data rate of 120 bits per second. [12] * Shortly before noon, controllers were surprised to receive a relay of data from Spirit via the Mars Odyssey orbiter. Spirit sent 73 megabits at a rate of 128 kilobits per second. * At a news briefing, Pete Theisinger said, "The software is in X-band fault mode. We surmise it got there because of some problem with the high-gain antenna pointing, and that is why the second high-gain antenna pass on Wednesday did not work. It gives us a little bit of a tale-tell for what is going on with the processor now. But as I pointed out to you, the flight software is not functioning normally. The two times we have gone and communicated with the system, we have gotten different flight software behaviors. Therefore we do not have assurance the next time we go and ask for it we will get either one of those two behaviors or perhaps a third behavior. " Later Theisinger said that the Spirit is in "critical condition" and stated that "We do not know to what extent we can restore functionality to the system because we don't know what's broke. We don't know what started this chain of events. I think, personally, that is a sequence of things. And we don't know, therefore, the consequences of that. I think it is difficult, at this very preliminary stage, to assume that we did not have some type of hardware event that caused this to start. Therefore, we don't know to what extent we can work around that hardware event and to what extent we can get the software to ignore that hardware event, if that is what we eventually have to do. " * An anomaly team has been formed, completely separate from the Opportunity team. They will be working a schedule that will look like 0500 Mars Time to about 1500 Mars Time. * At the press conference, Theisinger said that Spirit "has been in a processor reset loop of some type, mostly since Wednesday, we believe, where the processor wakes up, loads the flight software, uncovers a condition that would cause it to reset. But the processor doesn't do that immediately. It waits for a period of time - at the beginning of the day it waits for 15 minutes twice and then for the rest of the day it waits for an hour - and then it resets and comes back up." He added that Spirit's central computer has rebooted itself more than 60 times over the past two days. Theisinger also noted that "The indications we have on two occasions is that the thing that causes the reset is not always perceived to be the same." * At the press conference, two computer animations of Spirit's landing were released. Also released was an image of Spirit's landing site taken by the Mars Orbiter Camera on the Mars Global Surveyor.

  14. Re:Improving NASA: Get-it-right vs. get experience by pongo000 · · Score: 3, Interesting

    The last thing NASA should do is spend more money, take more time, and do fewer missions. The only way we will really learn how to operate in space is to go into space.

    This approach gives NASA the public exposure it needs to continue its work, but space is a very expensive testing ground. Where's the rush to get into space? It's not as if we're trying to capture fleeting moments of time. It seems ludicrous to me that NASA is on a 15-year time table...given the vastness of time in a cosmological sense, shouldn't NASA be considering 100-year or 1000-year timetables?

    The problem is that we as humans have a 70-year lifespan, and desire to see the fruits of our labor now. Plus, there wouldn't be much of a political boost for a president to unveil NASA's new 1000-year colonization plan.

    True scientific discovery is being tainted by political short-term gains. I have great respect for the scientific and engineering knowledge of many NASA principals, but I also believe many of them are selling out by playing the political game and adopting a false "can-do" attitude instead of pushing for more responsible scientific inquiry that might be more time-consuming in the long run, but will greatly benefit future generations of scientists.

  15. How hard is it? by Bombula · · Score: 1, Interesting
    No offense to the army of programmers reading slashdot, but how hard is it to get the software in these RC buggies to work properly? Obviously I'm aware that the environment is harsh on the hardware (then again, cold is good for computers, no?), but it just seems like NASA can't write code that isn't buggy for love or money - buckets and buckets of money.

    Here's an idea: how about adapting the software that runs cars? That is surely tried and tested stuff, very robust, and almost never crashes. Those systems are of comparable complexity, and like the rovers they have very limited variable input (unlike a home computer).

    Maybe instead of spending hundreds of millions of dollars on kleenex-frail gear and reinventing the wheel at every opportunity, how about spending all those millions on more fuel to send up heavier, more robust vehicles? Didn't that work for Apollo?

    I mean, these vehicles that are supposed to run for months in a dust pit are built in a clean room for god's sake...

    --
    A-Bomb
  16. Re:should NASA let Wind River write the code? by Loki_1929 · · Score: 2, Interesting

    "Do you expect NASA to fabricate every component in the spacecraft?"

    If we gave them a budget? Yes.

    Nasa's fiscal year 2003 budget: $15.1 Billion.
    DoD's fiscal year 2003 budget: $396.1 Billion.

    The DoD's budget does not include emergency supplementals, such as the $40 billion supplemental in '02, or the $87 billion supplemental requested in '03.

    --
    -- "Government is the great fiction through which everybody endeavors to live at the expense of everybody else."
  17. Closed source project... by jasno · · Score: 4, Interesting

    Is there any reason the code, schematics and CAD designs aren't available for public viewing? Its a publicly funded project, and I don't think JPL has to worry about trade secrets.

    If JPL would give us more information, I bet they'd have 50% of the entire engineering brainpower on the planet checking for races, inversions, memory leaks, hardware design flaws, etc.

    If there was ever a project that could benefit from so many eyeballs, its space exploration. There are thousands of some of the most talented engineers on the planet who would jump at the chance to contribute to something like this.

    --

    http://www.masturbateforpeace.com/
  18. Re:Spirit rebooting 60 times a day by qqtortqq · · Score: 2, Interesting

    Occasionally you see an ATM machine machine with a windows message box on the screen describing some error. Its easier getting brinks out to the ATM machine to reset it than getting nasa out to mars.

  19. unmanned probes by tgibbs · · Score: 2, Interesting

    Despite what seems to have become a widely held belief that we can learn as much from automated probes as from manned missions, it doesn't seem to have worked out that well in practice. Viking had a set of experiments that was supposed to definitively detect whether life was present. But when some of the experiments came out positive, they ended up being rejected, because researchers at home came up with nonbiological explanations. Unfortunately, there was nobody on site to do a follow-up experiment to really answer the question. Now we've had a long string of failed probes.

    Perhaps all Spirit really needs is somebody to give it a little kick.

  20. Re:400 million and only one CPU by Jon+Abbott · · Score: 2, Interesting
    1. you'd have to increase the complexity of the device even more, exposing it to a higher risk of failure statistically
    While that statement is correct for adding components in series, it is not correct when applied to adding components for redundancy (i.e. in parallel). Adding another CPU in parallel increases the redundancy, and therefore decreases the risk of failure statistically. Here is the math for both types:

    Series: If I add two components in series to a system, with reliability of R_1 and R_2, respectively, the overall system reliability is:

    R_series = R_1 * R_2
    To demonstrate this with real numbers, let's assume the values of R_1 = 0.95 and R_2 = 0.90. R_series would equal 0.95 * 0.90 = 0.855, or 85.5%. So, adding components in series makes reliability worse than the original reliability of either of the two components.

    Parallel: On the other hand, If I add two components in parallel, with reliability of R_1 and R_2, respectively, the overall system reliability is:

    R_parallel = 1-(1-R_1)*(1-R_2)
    Using the same values for R_1 and R_2 as above, the value of R_parallel would be 1-(1-0.95)*(1-0.90) = 0.995, or 99.5%. Redundant systems such as this are a good thing, because the overall chance of system failure can often be greatly reduced.

    Of course, the value of redundancy must of course be balanced with the overall cost of the system, which can be measured in money, man-hours, and weight... Most introductory courses in engineering management explain these tradeoffs in good detail, and help to understand how to maximize a project's reliability while minimizing the overall system cost.

    One of the most fascinating engineering management issues with Spirit and Opportunity is that the number of man-hours dedicated to both rovers is very limited, and now that Spirit is failing, less people will be available to make sure that Opportunity is going to land and operate successfully. The extra added cost of adding a second CPU or extra RAM to the rovers may well have already paid itself off, just for that very reason. A lack of man-hours devoted to Opportunity could spell as much doom to the project as a design flaw, but ultimately both cost money to fix. It all boils down to: "faster, cheaper, better -- pick any two."
  21. Re:No BSOD Jokes, Please by wash23 · · Score: 4, Interesting

    You know, it occurs to me that maybe instead of having an interactive rover with a billion complicated subsystems and spectrometers and cameras... it might be a good idea to launch a package full of smaller autonomous devices carrying different instrumentation... So you'd have a base that lands on mars, opens up (like the rover bases do) and releases 20 or 30 "dumb robots" on treads or big balloon tires(I'm thinking each the size of a big R/C car), some of which would have cameras, the rest instrumentation of whatever sort.. All of the little slaves would move around randomly or according to some simple program (either mechanical or software) and relay collected information to the base, which would transmit it to earth... Some of the camera bots would be designed to just move as far as possible and take as many pictures as possible... others would just do instrumental analyses of whatever they happen to bump into or land on... You wouldn't know exactly what the instruments were looking at but you'd probably be able to collect a sizable amount of data on a particular landing region; know what minerals are present, etc. You wouldn't know that pyramid shaped rock 12B contains olivine but you'd know olivine was present.

  22. Re:Spirit rebooting 60 times a day by RayBender · · Score: 3, Interesting
    Would you happen to know if they have any redundancy in the system? A spare CPU might be useful right about now...

    --
    Human genome = 3 billion base pairs = 6 GBit. Windows + Office = 20 Gbit. Which is more impressive?
  23. It would be nice... by oliverk · · Score: 2, Interesting

    ...if a little more of the information was given to the public. There are a lot of very bright, very interested and very talented engineers that would love to contribute to the solution. Some aspects would need to be kept out of the public hands (lest, of course, some kid in the Bronx go joy-riding in Spirit using just some RadioShack spare parts). But the lion's share of the problem could be posted up for the best (dare I say it?) open-source solution to an engineering problem.

    Bugzilla for NASA. I guess that's the best way to describe what I'm thinking.

    --
    ---- Please be nice in case my Slashdot karma ~= my real life karma.
  24. Re:No BSOD Jokes, Please by wash23 · · Score: 3, Interesting

    Good points. I'm sure NASA has thought of these sorts of things too; I have no idea where to read about them though if they have. It's sort of an interesting tradeoff to consider though; careful, directed examination of specific features of interest with really complicated instruments, or brute force "random" sampling with simpler ones.