Spirit Rover Communications Error
cybrthng writes "Through yesterdays press release and the current Nasa Briefing there is news that they are having communications errors with contacting spirit. Is she lost or is it something akin to the Pathfinder failures that happened? Or did little green people claim an expensive tonka truck toy?"
The newsflash I heard over the radio at lunch quoted someone (didn't catch a name) as saying that, at present, Spirit wasn't even relaying telemetry data -- so for the time being they've no way to even tell what happened, let alone how to fix it.
:)
I hope that they track it down and can fix it easily; Spirit was one of the coolest things going in recent weeks, and was providing a welcome break from all the election primary coverage.
I also really hope we don't degenerate into a `hah, you laughed at Beagle, now it's your turn` style flamewar. Hell, I'll actually settle for one or the other.
From the press release: similar events occurred several times during the Mars Pathfinder mission So a friendly "Don't Panic."
All is not lost, they mention that similar problems came up the the Pathfinder mission. Listening to the press conference, they said that the Spirit Rover did respond to a "give me a beep" signal. So the engineer said it looks more like a memory fault or software fault, rather than a more serious power fault.
The BBC is reporting it here.
/* This sig is disabled. Press CTRL-W to enable. Thankyou */
There are some more details in the BBC article here. Apparently the radio is working fine but it seems to be transmitting random data. Anyone know where the 'reset' button is?
I really hope that this isn't the end of Spirit.
Its been reported that a signal was sent to Spirit this morning to try and figure out whether it was in fault mode or not, and preliminary results suggest that Spirit is in fault mode. This is preliminary data and was announced half way through the news conference.
There is as of yet no reliable information as to what the state of Spirit is.
Here
DrkBr
I watched the press conference and have read extensively about the communications system. I am also the person featured in yesterday's slashdot article about imagery and my postprocessing/color correction/stereo anaglyph creation from it.
By now they have probably rebooted it (forced it through safe mode to clear any software fault; space vehicles never really go all the way "down"), so if it's still happenning I would say it's either a hardware fault or corruption of essential software or data in (putatively) nonvolatile memory (not unreasonable in high-rad environments).
Not impossible, but relatively unlikely with deep-space grade hardware. It'd require a double fault to create a detectable error, and more than that to create an undetectable one.
If they haven't forced it through safe mode, then they're not too worried and are more interested in characterizing the problem than getting on with the scientific mission. Which is a good or a bad thing depending on which sort of information is more valuable. I'm sure the guys in the software group have their bias.
They've had one day, and much of that was spent thinking the problem was because of thunderstorms/atmospheric vapor near Canberra and dish tracking problems were causing communications errors. It's important to get some idea of the problem before you go shoving things into safe modes because you may make things worse (if it's a power bus fault, for instance).
That tone is still unconfirmed-- they are not positive they have received it (it came in only 2.5 hours ago and processing the data sets takes time.. NASA has not confirmed that they are sure they got a 7.2 tone).
But I agree it is likely the rover is reporting it is faulted, even if it is not a sure thing yet.
Don't forget they were severely constrained on budget in the last administration. There were countless stories on how they managed to build something that went to mars for something like 14 million. Thats amazing when you look at the cost of a shuttle mission to earth orbit. They're not going to constrain spending to a bare minimum when human life is at stake.
Two roads diverged in a wood, and I - I took the one the bus load of girls just went down.
That last paragraph should be shot.
They have only had one day to troubleshoot. Much of that day was spent assuming atmospheric water vapor and dish tracking problems caused errors in sending command traffic. It's important to get some idea of the problem before you go shoving things into safe modes-- else you might make things worse.
unfortunately, the landing site of opportunity is less interesting scientifically than spirit's landing site, the gusev crater. a documentary on PBS describes the scientists' discussion of the landing sites; meridiana planum, opportunity's site, was chosen because it was considered a safer target than gusev crater; less sharp pointy rocks and craters to fall onto or into. NOVA: Mars Dead or Alive. I really hope this is just a temporary glitch that can be solved quickly.
Stay tuned ...
chongo (was here)
Spirit status updates are here: http://spaceflightnow.com/mars/mera/statustextonly .html
"To confine our attention to terrestrial matters would be to limit the human spirit." -Stephen Hawking
Spirit runs an Operating System called VXWorks, by Wind River.
Marvin packed an Uranium Pew-36 Explosive Space Modulator
/pedantry
/geekmode
Prior to the error, transmission from Earth was spotty, so some commands may have been lost. If the sequence of commands that was actually received by the rover didn't make sense, then the software might think this was due to an error in its OWN functioning and might go into safe mode as a result.
THURSDAY, JANUARY 22, 2004
1810 GMT (1:10 p.m. EST)
Here is project manager Pete Theisinger's briefing to reporters in the last hour, describing what has happened:
"At yesterday's press conference, we reported to you that we had had some communications issues with the rover, which we thought at the time was due to weather at the Canberra station and (Deep Space Network) configuration issues.
"We now know we have had a very serious anomaly on the vehicle, and our ability to determine exactly what has happened has been limited by our inability to receive telemetry from the vehicle, basically the last 12 hours or so.
"Let me kind of describe what the sequence of events have been.
"Yesterday afternoon, local solar time on Mars, actually about 1 o'clock, we sent to the vehicle at a command rate of 31.25 bits per second a sequence. We activated that sequence by command and we received a beacon response that indicated that we vehicle had received that sequence and that it was activating that sequence.
"After that time, a scheduled high-gain antenna pass at 2 o'clock in the afternoon, roughly, local solar time on Mars, did not occur.
"The 4:30 p.m. afternoon Mars Odyssey afternoon pass did not occur in the sense there was no indication by Odyssey that they received a UHF transmission.
"Last night, we had about a 1:30-2 a.m. Mars Global Surveyor pass and it was anomalous in the sense that Mars Global Surveyor believes it saw UHF transmission in its receiver telemetry but there was no data in the packets and the period of time that it believed it saw UHF telemetry was very, very short -- about two-and-a-half minutes compared to 12- or 13-minute overflight.
"The 4 a.m. Odyssey pass received no data, and this morning we did not have a direct-to-Earth link session -- we did not receive data on the normal direct-to-Earth session, nor did we receive data on what would have been a fault session at 11 a.m., which is where the spacecraft has entered fault mode, knows that, and chooses to communicate with us at a different time.
"The team has been meeting this morning and through the night working on a set of postulated fault scenarios. There is no one single fault that explains all the observables -- that we know of at the present time that we can conceive.
"We have been working on fault scenarios, we have been developing to-do lists. We have run yesterday's sequences through the test-bed (on Earth) with no anomalous results. So that is kind of our current state of knowledge."
At the end of the news conference, mission manager Jennifer Trosper came into the room and delivered an update to deputy project manager Richard Cook sitting at the briefing desk.
"If the spacecraft believes it's in a fault mode, its command rate should be 7.8 bits per second. We sent a beep today, this morning, about the time that we came down here to talk to you at 7.8. We sent a command that says if you get this send us a beep. And I'm told from Richard that Jennifer came down here to tell us that they think they got it," Theisinger said.
"That would tell us that the spacecraft thinks it's in the fault side of the tree some how for some reason. That would mean that we've got positive power, some elements of the software is working, once again the X-band system is working, the SSPA, the multi-space transponder, all that stuff is working so that would be more information -- good news. We need to confirm that. Data off the DSN sometimes needs double-checking. We'll let you know if that's for sure."
The software running onboard the MER rovers is not written in java. Not even a little bit. Sun's posters and propaganda at last year's JavaOne seemed to deliberately give that false impression. There is plenty of Java running on the ground, though, for both planning activities and processing the downlinked data.
Personally I prefer a good side kick. It leaves you with space to hop out of the way, and if done right will get you several items and some spare change
--Keeping the flame wars alive, one post at a time
No. The rover's architecture allows for it to correct single bit errors, but a garbled packet will simply be rejected by the rover.
It's a Bagel.
I meant a synchronization problem between the physical transmitter unit and the main avionics system.
When it comes to clocks, it is somewhat complicated. The rover keeps a clock, and usually finds earth by locating the sun in the sky. It has a set of keplerian/rotational elements for both Earth around the Sun and the MGS/Odyssey around Mars, and thus knows when they rise and set in the sky. This tells it when to transmit and where to point the antenna.
Full duplex communications are possible on xband, so transmitting and receiving do not need to be synchronized. Blocks of data are sent with error correction codes-- as they arrive intact, messages are sent telling the rover to delete them. Retransmits can also be requested if the data is particularly interesting and missing (but often aren't, as witnessed by the number of empty portions of images.
UHF is usually just used to offload additional data from the rover during the night to the satellites. The delays are short and the protocols are thus more conventional.
There's much more detail about this here.
Apparently, Tidbinbilla is one of only 3 stations tracking Spirit from Earth. If it's out, they have to wait until Spirit is visible from over the horizon at another station before they can communicate.
"The plural of anecdote is not data" -- Bruce Schneier
Last picture broadcast has been released
I'm sure this was meant in jest, but a very real incident like this destroyed a very expensive mission once. The first launch of the Ariane 5 blew up about a minute into the launch. The reason was later determined to be because of an uncaught exception which shut down both flight control computers, leading to a big boom. (I think it was the flight control computers; at least I know it was an uncaught exception that shut down an entire doubly-redundant computer system.)
Mod down posts with a "Free Mac Mini/iPod" sig, they're spam!
Here are some of the best Mars editorial cartoons. The repository in general is good, but the Mars collection has been terrific!
The OS on the Spirit lander was created by Wind River - details are here. As with any OS designed for deep-space uses there are multiple redundancies on virtually every aspect of the system.
Nothing runs Windows in this respect. The Rover runs custom code for the Rad 6000 chip they use in the vxworks RTOS, and the mission control systems use Java to run a live version of Meastro.
Also, the chip they use, a radiation hardened 6000 CPU comes from the days before Java was even thought of. Read up on the facts first.
grap brp iuuutz flazzig! ...king tune that translator in properly! Ah that's better.
Many wheeled people of Earth, we have met with your ambassador. We are happy to see that you are mechanical beings much like ourselves - the organicist theories have at last been disproved.
We see that your planet has been overrun by plant--life. We will come to your aid immediately and destroy all carbon based auto-duplicative infections. Please stand by for more information.
NASA has experience with uploading new software (including os) to deployed spacecraft to correct defects.
On Tuesday, I talked with some of the project scientists for a TechTV interview that's running next week on Screen Savers. One of the many things I learned from them was that they upload new software, and patches, and all that stuff with surprising frequency and ease.
The thing that really blew my mind was, in order to make their launch date, they just coded enough commands to get the thing there, and sent all the software to drive around and research stuff after the landing while the spacecraft was in transit.
I really hope they solve this current problem, and get the mission back on track. They are SUPER cool people at JPL who are working on this.
The exception was caught properly, unfortunatly the action on catching the exception was to shutdown the system. This made sense when the software was designed, because the signals were impossible for Arianne 4 to produce, and therefore there was obviously something seriously wrong. If you don't know what's wrong, then the best thing to do is get out of the loop and let your backup take over, and if it's something which is local to you, then the flight can continue. If it's the sensors, well you're already doomed. Arianne 5 is bigger, so can produce more oomf in the sensors, so the signals were actual instead of bogus, but it still didn't know what to do with them.
Pathfinder in it's 1997 landing (04JUL1997) suffered a series of unexplained system failures. David Wilner CTO of WindRiver Systems, the creators of WxWorks the realtime embedded system kernel talked to IEEE Real-Time Systems Symposium at a later date explaining how they solved software bugs in the system.
this article explains how they solved the problem - by including the debug code with the os. I remember reading about this on /. some time ago. A detailed account can be read here by Glenn Reeves (JPL Mars Flight SE).
Windriver systems is supplying the OS for the current mission. Lets see how long it takes them to work this one out :)
links:l r itative_Account.html
www.kohala.com/start/papers.others/pathfinder.htm
research.microsoft.com/~mbj/Mars_Pathfinder/Autho
peterrenshaw ~ Another Scrappy Startup
Ten seconds on Google will show you that I write HTML just fine. I didn't notice that my message setting wouldn't handle carriage returns typed into the text gracefully.
--BahdKo
Err, did you read the link?
priority inversion can be protected for however the mutex can be coded in two states. Priority Inversion Safe and non priority inversion safe. Unfortunately they forgot to turn the priority inversion protection on. Programming error, plain and simple.
Choose your allies carefully, it is highly unlikely you will be held accountable for the actions of your enemies
Safe mode is called safe mode because it's not supposed to make anything worse. If it does, someone's got some 'splainin' to do, loocee.
Actually, no, it was a beep five minutes long. The rover's still quite alive, but is unable to send scientific data for an undetermined reason. However, before anyone comes to any conclusions, the beep was in response to a command, not an error, indicating that they can still contact the lander. Now they just gotta figure out what's causing the transmission problem. Full story here.