Spirit Rover Communications Error
cybrthng writes "Through yesterdays press release and the current Nasa Briefing there is news that they are having communications errors with contacting spirit. Is she lost or is it something akin to the Pathfinder failures that happened? Or did little green people claim an expensive tonka truck toy?"
The newsflash I heard over the radio at lunch quoted someone (didn't catch a name) as saying that, at present, Spirit wasn't even relaying telemetry data -- so for the time being they've no way to even tell what happened, let alone how to fix it.
:)
I hope that they track it down and can fix it easily; Spirit was one of the coolest things going in recent weeks, and was providing a welcome break from all the election primary coverage.
I also really hope we don't degenerate into a `hah, you laughed at Beagle, now it's your turn` style flamewar. Hell, I'll actually settle for one or the other.
From the press release: similar events occurred several times during the Mars Pathfinder mission So a friendly "Don't Panic."
All is not lost, they mention that similar problems came up the the Pathfinder mission. Listening to the press conference, they said that the Spirit Rover did respond to a "give me a beep" signal. So the engineer said it looks more like a memory fault or software fault, rather than a more serious power fault.
The BBC is reporting it here.
/* This sig is disabled. Press CTRL-W to enable. Thankyou */
There are some more details in the BBC article here. Apparently the radio is working fine but it seems to be transmitting random data. Anyone know where the 'reset' button is?
I really hope that this isn't the end of Spirit.
Its been reported that a signal was sent to Spirit this morning to try and figure out whether it was in fault mode or not, and preliminary results suggest that Spirit is in fault mode. This is preliminary data and was announced half way through the news conference.
There is as of yet no reliable information as to what the state of Spirit is.
Here
DrkBr
I watched the press conference and have read extensively about the communications system. I am also the person featured in yesterday's slashdot article about imagery and my postprocessing/color correction/stereo anaglyph creation from it.
By now they have probably rebooted it (forced it through safe mode to clear any software fault; space vehicles never really go all the way "down"), so if it's still happenning I would say it's either a hardware fault or corruption of essential software or data in (putatively) nonvolatile memory (not unreasonable in high-rad environments).
Not impossible, but relatively unlikely with deep-space grade hardware. It'd require a double fault to create a detectable error, and more than that to create an undetectable one.
If they haven't forced it through safe mode, then they're not too worried and are more interested in characterizing the problem than getting on with the scientific mission. Which is a good or a bad thing depending on which sort of information is more valuable. I'm sure the guys in the software group have their bias.
They've had one day, and much of that was spent thinking the problem was because of thunderstorms/atmospheric vapor near Canberra and dish tracking problems were causing communications errors. It's important to get some idea of the problem before you go shoving things into safe modes because you may make things worse (if it's a power bus fault, for instance).
That tone is still unconfirmed-- they are not positive they have received it (it came in only 2.5 hours ago and processing the data sets takes time.. NASA has not confirmed that they are sure they got a 7.2 tone).
But I agree it is likely the rover is reporting it is faulted, even if it is not a sure thing yet.
Don't forget they were severely constrained on budget in the last administration. There were countless stories on how they managed to build something that went to mars for something like 14 million. Thats amazing when you look at the cost of a shuttle mission to earth orbit. They're not going to constrain spending to a bare minimum when human life is at stake.
Two roads diverged in a wood, and I - I took the one the bus load of girls just went down.
That last paragraph should be shot.
They have only had one day to troubleshoot. Much of that day was spent assuming atmospheric water vapor and dish tracking problems caused errors in sending command traffic. It's important to get some idea of the problem before you go shoving things into safe modes-- else you might make things worse.
unfortunately, the landing site of opportunity is less interesting scientifically than spirit's landing site, the gusev crater. a documentary on PBS describes the scientists' discussion of the landing sites; meridiana planum, opportunity's site, was chosen because it was considered a safer target than gusev crater; less sharp pointy rocks and craters to fall onto or into. NOVA: Mars Dead or Alive. I really hope this is just a temporary glitch that can be solved quickly.
Stay tuned ...
chongo (was here)
Spirit status updates are here: http://spaceflightnow.com/mars/mera/statustextonly .html
"To confine our attention to terrestrial matters would be to limit the human spirit." -Stephen Hawking
Spirit runs an Operating System called VXWorks, by Wind River.
Marvin packed an Uranium Pew-36 Explosive Space Modulator
/pedantry
/geekmode
The software running onboard the MER rovers is not written in java. Not even a little bit. Sun's posters and propaganda at last year's JavaOne seemed to deliberately give that false impression. There is plenty of Java running on the ground, though, for both planning activities and processing the downlinked data.
Personally I prefer a good side kick. It leaves you with space to hop out of the way, and if done right will get you several items and some spare change
--Keeping the flame wars alive, one post at a time
No. The rover's architecture allows for it to correct single bit errors, but a garbled packet will simply be rejected by the rover.
It's a Bagel.
I meant a synchronization problem between the physical transmitter unit and the main avionics system.
When it comes to clocks, it is somewhat complicated. The rover keeps a clock, and usually finds earth by locating the sun in the sky. It has a set of keplerian/rotational elements for both Earth around the Sun and the MGS/Odyssey around Mars, and thus knows when they rise and set in the sky. This tells it when to transmit and where to point the antenna.
Full duplex communications are possible on xband, so transmitting and receiving do not need to be synchronized. Blocks of data are sent with error correction codes-- as they arrive intact, messages are sent telling the rover to delete them. Retransmits can also be requested if the data is particularly interesting and missing (but often aren't, as witnessed by the number of empty portions of images.
UHF is usually just used to offload additional data from the rover during the night to the satellites. The delays are short and the protocols are thus more conventional.
There's much more detail about this here.
Apparently, Tidbinbilla is one of only 3 stations tracking Spirit from Earth. If it's out, they have to wait until Spirit is visible from over the horizon at another station before they can communicate.
"The plural of anecdote is not data" -- Bruce Schneier
I'm sure this was meant in jest, but a very real incident like this destroyed a very expensive mission once. The first launch of the Ariane 5 blew up about a minute into the launch. The reason was later determined to be because of an uncaught exception which shut down both flight control computers, leading to a big boom. (I think it was the flight control computers; at least I know it was an uncaught exception that shut down an entire doubly-redundant computer system.)
Mod down posts with a "Free Mac Mini/iPod" sig, they're spam!
The OS on the Spirit lander was created by Wind River - details are here. As with any OS designed for deep-space uses there are multiple redundancies on virtually every aspect of the system.
Nothing runs Windows in this respect. The Rover runs custom code for the Rad 6000 chip they use in the vxworks RTOS, and the mission control systems use Java to run a live version of Meastro.
Also, the chip they use, a radiation hardened 6000 CPU comes from the days before Java was even thought of. Read up on the facts first.
NASA has experience with uploading new software (including os) to deployed spacecraft to correct defects.
On Tuesday, I talked with some of the project scientists for a TechTV interview that's running next week on Screen Savers. One of the many things I learned from them was that they upload new software, and patches, and all that stuff with surprising frequency and ease.
The thing that really blew my mind was, in order to make their launch date, they just coded enough commands to get the thing there, and sent all the software to drive around and research stuff after the landing while the spacecraft was in transit.
I really hope they solve this current problem, and get the mission back on track. They are SUPER cool people at JPL who are working on this.
The exception was caught properly, unfortunatly the action on catching the exception was to shutdown the system. This made sense when the software was designed, because the signals were impossible for Arianne 4 to produce, and therefore there was obviously something seriously wrong. If you don't know what's wrong, then the best thing to do is get out of the loop and let your backup take over, and if it's something which is local to you, then the flight can continue. If it's the sensors, well you're already doomed. Arianne 5 is bigger, so can produce more oomf in the sensors, so the signals were actual instead of bogus, but it still didn't know what to do with them.
Pathfinder in it's 1997 landing (04JUL1997) suffered a series of unexplained system failures. David Wilner CTO of WindRiver Systems, the creators of WxWorks the realtime embedded system kernel talked to IEEE Real-Time Systems Symposium at a later date explaining how they solved software bugs in the system.
this article explains how they solved the problem - by including the debug code with the os. I remember reading about this on /. some time ago. A detailed account can be read here by Glenn Reeves (JPL Mars Flight SE).
Windriver systems is supplying the OS for the current mission. Lets see how long it takes them to work this one out :)
links:l r itative_Account.html
www.kohala.com/start/papers.others/pathfinder.htm
research.microsoft.com/~mbj/Mars_Pathfinder/Autho
peterrenshaw ~ Another Scrappy Startup
Ten seconds on Google will show you that I write HTML just fine. I didn't notice that my message setting wouldn't handle carriage returns typed into the text gracefully.
--BahdKo
Err, did you read the link?
priority inversion can be protected for however the mutex can be coded in two states. Priority Inversion Safe and non priority inversion safe. Unfortunately they forgot to turn the priority inversion protection on. Programming error, plain and simple.
Choose your allies carefully, it is highly unlikely you will be held accountable for the actions of your enemies
Safe mode is called safe mode because it's not supposed to make anything worse. If it does, someone's got some 'splainin' to do, loocee.