Spirit Sends Debug Information to Earth
gfilion writes "NASA has released a
press release that says: 'Shortly before noon, controllers were surprised to receive a relay of data from Spirit via the Mars Odyssey orbiter. Spirit sent 73 megabits at a rate of 128 kilobits per second.'" They've been having communications troubles with Spirit since Wednesday, so it's good to hear from it again, even if the data is just filler.
128 kBits/sec! Quite a bit up from the ealire 100Bit/sec. Too bad Mars is too far from the next CO to qualify for DSL
(first post?)
---- join dshield.org Distributed Intrusion Detec
A diagnostic is what runs when nothing else will.
...but the ping times suck. Can you imagine playing Quake over that kind of link?
Honey, I shrunk the Cygwin
Spirit sent 73 megabits at a rate of 128 kilobits per second.
:)
Pretty damn scary that that's faster then most pr0n download's via Kazza...
I watched the press conference on NASA-TV and they talked about how the thing wouldn't go to sleep at night and so it got me to wondering about the low power question. Obviously they have the rover power off when power gets to a certain level, but what if that level is slightly off?
In other words, if the onboard CPU has enough power and continues to run but the memory doesn't have enough power, doesn't that cause all kinds of wackiness?
They keep talking about the data pointing to simultaneous faults... well, as programmers we know these are the very worst kinds of bugs to deal with, but with something as (I'm assuming) well written as their code, so doesn't that point to a memory problem? I mean, the think is working flat-out beautifully one moment, and then the next moment it goes tits up.
The other question I had concerned this motor they had turned on but which didn't complete its sequence. When they command the motor to do something, do they tell it to run for some interval of time, or do they tell it to achieve a specific position? I was thinking that if it's the latter, and then if it gets stuck somehow, this could create the low power situation as the motor just grinds away.
Is this truly the only Earth I can live on?
CNN is reporting that spirit is self-rebooting 60 times a day. NASA suspects a hardware fault that is causing the processor to detect trouble and automatically reboot.
Two wrongs don't make a right, but three lefts do.
Cnn has an article on some updates. Apparently the engineers been having all sorts of fun with the thing here a quick excert. "Cautioning that they will need more time to understand what went wrong, project engineers said they have determined that Spirit has rebooted or tried to reboot itself more than 60 times a day since the failure."
30% Troll, 50% Underrated, 10% Interesting
Score:5, Troll
Only a couple of frames were fillers of random values. Most of the frames were engineering data. No actual scientific data came down, though.
Still, it's a good sign that it's still able to talk.
You might want to check your facts before you spew. While the ground system is heavy on Linux according to the article you referenced, the actual OS on the rover itself is VxWorks from Wind River.
http://www.windriver.com/news/press/20040105.html
This space for rent.
Spirit Sends Debug Information to Earth
A Fatal Exception 0E has occurred at 0028:C0231810 in VXD VMM(0D) + 00001810
Cool!
I've noticed that everyone who is for abortion has already been born - Ronald Reagan
128 kbps over 35 million miles... looks like we'll need another benchmark to replace the station wagon full of DAT tapes
one better than mcleodeight
Did the "filler data" look anything like this?
The Slashdot Paradox: "100% Overrated"
Doesn't Spirit's twin, Opportunity, start it's landing tomorrow?
It's probably some bizarre licensing issue for the OS causing it to shut down as it's detected that NASA are trying to run two copies at the same time.
Kind of like Beagle 2's problems caused by the transmissions being intercepted by the RIAA as they file a lawsuit against Colin Pillinger for offering illegal music downloads from Mars.
Fortunately, the cause of the blackout has been located and will be corrected soon.
Pesky Martians! :-)
-------
Warning: Slashdot may contain traces of nuts.
It appears that while editing the crontab of the rover to send spam, the script-kiddie accidentally added a shutdown -r 24m . "Having the rover send spam was a great idea! When people ping the X-Originating IP, they'll surely timeout!!"
Has anyone cracked this yet?
-bk.
"I can't swim.. I CaN'T sWim ... I cannot swim... I can't swim.. I can't swim.. I can't swim.. I can't swim.. I can't swim.. I can't swim.. sdf@#$@#$@#$
This is my sig.
rover: 128kbps
most mp3's: 128kbps
COINCIDENCE?
i think not.
...European, constantly rebooting, battery draining overlords. Now we know Beagle 2 was not lost but was in transit to Gusev crater. It took a little time to silently creep up behind spirit. If we had a high-enough resolution camera we would see that damn dog continuously poking at the rover, pressing our reset button.
Cheers to the European engineers who caught us with our pants downs and jeers to the American engineers who thought our little rover needed an external reset button for some reason.
Some of us Engineers work with RTOS all the time, not just for fun-and-dandy projects, for for multi-million dollar outcomes. Consensus is that Linux is not good enough. QNX, VRTX, VxWorks etc are still the preferred choices, but everyone admits that Linux is getting there. Most of us don't hang out on slashdot, yet many Linux zealots do: you don't get a good opinion here.
Extreme Remote Debugging
Sigs are bad for your health.
you remember, the Apollo 13, the one with Tom Hanks? Where the austronauts believe that their transmission is watched by the viewers on Earth but in fact all TV networks refused the transmission, stating that NASA made flights to the Moon as exciting as trips to Pittsburgh (or something of this kind)?
This is what is happenning people, the new in reality TV - our own Mars Rover - The Ultimate Survivor. The Opportunity will be landing today, so the audience should be able to vote for which rover is going to be kicked out of the show.
The Drama, The Excitement, The Unknonw, The Sex... oh, wait!
You can't handle the truth.
Nasa systems that involve human life are highly redundant. I remember a lecture by a NASA engineer about systems on the Shuttle. There are *seven* redundant computers which calculate data. That data requires identical answers from four to be accepted.
:-)
On Spirit, power is an issue. More CPUs == more power drain.
Furthermore, I remember the folks initially speculating that something was wrong with the power system. I stopped following it, but it said that this transmission was composed of power subsystem diagnostic data. Could be it's a response requested earlier that it didn't have enough juice to send, in which case more CPUs would have actually exacerbated the problem.
May we never see th
Well, you know, what's interesting about that is:
1. you'd have to increase the complexity of the device even more, exposing it to a higher risk of failure statistically
2. you'd need more complicated software and hardware that would require more time and effort (money & delays)
3. the hardware would need more power (limited batteries and solar panel capacity)
4. the system would be heavier and bigger (costs are measured in grams, iirc).
While you have a valid point, the constraints of this design give very strong tradeoffs among safety, feasability, and cash flow (and I'm sure there are others, but I'm not a rocket scientist). I'd imagine that some time was spent on redundant systems, but the adage of "Why have one when you can have two at twice the price?" only works when your budget can support the extra price of man-hours and cash.
I'd argue that where you work has unlimited available power, and if you need more, you can ask your power company for more. You have the money to spend on a X-thousand-dollar sever that's been pre-fabbed by whatever company you like. If you need more, you get more drop-shipped to you within days. NASA had to build these little buggers from the ground up.
<RANT>
You know, if you take your philosophy of simply duplicating the entire machine, there is a backup. It's called "Opportunity." It lands tomorrow.
I highly resent the fact that you've called some of the greatest engineers of our time "retarded." If you can't understand the problem (I certainly don't, but I do understand the concept of tradeoffs in design) you have no right to speak on the issue. Of course, this is slashdot. Everyone can mouth off about everything. Nevermind.
</RANT>
~MCH.
Michael C. Hollinger
One line blog. I hear that they're called Twitters now.
Fill data is typically transmitted when the telemetry multiplexer does not have any engineering or science data to send. Due to the way synchronous communications links work, something is always being transmitted, even if there is no "real data" available.
Mea navis aericumbens anguillis abundat
I wouldn't brag. I've been programming VxWorks for several years now and all I can say is it's a piece of crap for a complex system.
VxWorks does not provide any memory protection (well, AE does, but it's so buggy nobody uses it).
If a task dies, it does not clean up after it. All memory is global, i.e. any task can overwrite memory for any other task.
Wind River couldn't even implement a decent malloc implementation. I had to replace it with Doug Lea's DLMalloc code (which glibc's malloc is based off of). It fragments horribly, and becomes increasingly slower the more free blocks exist.
Just by replacing malloc, I brought the time down on our box from 50 minutes to under 3 minutes and went from tens of thousands of fragments to a couple of dozen.
If you want a reliable embedded system with a lot of complexity, go with QNX or perhapse a good embedded Linux (I like Timesys Linux myself - good realtime support).
At least with QNX if there's a problem in a task, it's much easier to isolate it and not kill the entire system. As it is on the product I'm working on, if a task dies about the only way to recover is to reboot. Also, VxWorks has piss-poor built-in debugging support. Sometimes you can get a stack trace. Tracing the heap is virtually impossible (and because it's a global memory pool, you don't even know what blocks were allocated by what task or even how much memory each task has allocated). In the product I'm working on I added such support to find memory leaks and detect memory corruption.
VxWorks AE does provide memory protection. We tried to use it, but it was so buggy and slow we had to drop it and go back to standard VxWorks.
VxWorks hasn't really changed in the last few years and Wind River is losing customers like crazy to the better alternatives. They're hemmoraging money at an astronomical rate and quickly losing market share to the likes of QNX and Linux.
Even the realtime performance of VxWorks isn't that great. The finest granularity for a reliable timer is 1/2 the system tick rate (often no more than 20ms resolution).
VxWorks doesn't have a shell as such either. The commands you type in are functions with parameters to those functions. You can do things like my_global = global_a + 7
or
my_func(&my_global, 3)
on the command line, but it's not at all like a traditional command line.
Most real-time Linux implementations arn't all that great either from my research into it. Most don't deal with priority inversion, or require a completely separate set of APIs for RT tasks (i.e. RT Linux). I found Timesys Linux to solve most of these issues and it looks like our next generation will be based off of either Timesys Linux or QNX.
-Aaron
This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
If there is a COTS (commercial off-the-shelf) real-time operating system available that meets the system requirements, why go to the risk and expense of writing your own from scratch? Do you expect NASA to fabricate every component in the spacecraft?
Mea navis aericumbens anguillis abundat
Is it quite possible that NASA engineers simply have not mastered the art and science of designing hardware and software operable in the harshest of environments?
While I would never claim that NASA is perfect, I think you underestimate the both the engineering challenge of putting a rover on Mars and the impact of more conservative, get-it-right, policies.
Interplanetary missions are the hardest of all because the engineers never get to actually test the whole device under realistic conditions. Although they can test and analyze each subsystem under a variety of simulated or near-realistic conditions, they have no way of building a test rover, putting it in interplanetary space of months, having is aerobrake into a thin atmosphere, parachute in a thin atmosphere, and crashland at high speed, and then operate all its mechanical parts under dusty low G conditions.
Second, get-it-right == conservatism == greater cost == fewer missions == less experience. The last thing NASA should do is spend more money, take more time, and do fewer missions. The only way we will really learn how to operate in space is to go into space. I'm not saying that better engineering won't help, only that more experience (unfettered by excessive conservatism) is a crucial part of learning to operate on other planets.
Two wrongs don't make a right, but three lefts do.
To have some actual technical discussion on a site that is supposed to be filled with nerds, instead of the same tired jokes about martians.
The more you know, the less you understand.
Since Spirit is rebooting sixty times per day, a problem that started when an electric motor moving its spectrometer "conked out", one thinks first of a hardware failure, possibly leading to software corruption.
I don't know the boot sequence of Spirit, but in most battery-powered embedded systems with which I am familiar, an elaborate state machine design is made to ensure that, when the boot sequence is complete, the system has sufficient power to perform any task that may be requested of it. Since the power supply is limited, an unexpectedly heavy load on the primary supply could cause the supply voltage to the microcomputer to fall below its specified lower limit, leading to a system reset.
Now imagine that there is a hardware failure associated with some process that runs during the boot sequence--a voltage regulator turn-on, a heating system initialization, an electric motor activation, whatever--that results in excessive current drain. When this part of the boot sequence is reached, the supply voltage falls, and the microcomputer resets. This disables the problem-causing hardware, unloading the power supply. When the supply voltage recovers, the microcomputer reboots (either automatically, with a power-on reset, via a watchdog timer, or via some other means) and, when the critical part of the boot sequence is reached, the supply voltage falls again. The system is now in a continuous loop, in which it can remain indefinitely. (Or at least 60 times per day....)
Note that this situation can also arise due to a defect in the power supply--if the output impedance of the power supply has risen for some reason, its output voltage under lightly loaded conditions can be acceptable, but it may not be able to supply heavier loads.
One expects the Spirit power supply to be complex, with separate regulators for the microcomputer, radio transceiver, and electric motors, so looking for common circuits and systems would be the first thing to do when troubleshooting for this type of failure. Looking for system conditions that can cause a system reset would be another; the JPL people have lived with their systems for years now, and would have had many design reviews to identify possible system failure scenarios--I'm not telling them anything new here. I understand that the system telemetry received yesterday indicates that the power supply is within specification, so that seems to eliminate that possiblility.
The second alternative is a soft memory failure of some kind, either caused by a supply failure as the parent suggests or perhaps by a radiation event of some kind.
Note that these problems can be multi-disciplinary; for example, the problem could be caused by some vibration when a motor runs that loosens a broken connection created by a chemical reaction to something on the surface (to take an extreme example).
How about red screen of death jokes?
Is there any reason the code, schematics and CAD designs aren't available for public viewing? Its a publicly funded project, and I don't think JPL has to worry about trade secrets.
If JPL would give us more information, I bet they'd have 50% of the entire engineering brainpower on the planet checking for races, inversions, memory leaks, hardware design flaws, etc.
If there was ever a project that could benefit from so many eyeballs, its space exploration. There are thousands of some of the most talented engineers on the planet who would jump at the chance to contribute to something like this.
http://www.masturbateforpeace.com/
On Earth at least, picking bits off of radio links usually involves an adaptive threshold and a clock that syncs to the clock of the sender. Sending too many 1's or 0's in a row can interfere with that because there aren't any "bit edges" on the signal. Sending random data ensures all patterns are equally likely and your adaptive filter stays happy for when you have real data to send. Otherwise you'll miss the first part while you re-establish the threshold and sync to the signal.
My guess is the NASA rover's link follows a similar principle, though its probably using some pretty damn fancy techniques to get the data from that far. Oh and missing the first part of the data would really suck for them since a retransmit would take 20 minutes.
You know, it occurs to me that maybe instead of having an interactive rover with a billion complicated subsystems and spectrometers and cameras... it might be a good idea to launch a package full of smaller autonomous devices carrying different instrumentation... So you'd have a base that lands on mars, opens up (like the rover bases do) and releases 20 or 30 "dumb robots" on treads or big balloon tires(I'm thinking each the size of a big R/C car), some of which would have cameras, the rest instrumentation of whatever sort.. All of the little slaves would move around randomly or according to some simple program (either mechanical or software) and relay collected information to the base, which would transmit it to earth... Some of the camera bots would be designed to just move as far as possible and take as many pictures as possible... others would just do instrumental analyses of whatever they happen to bump into or land on... You wouldn't know exactly what the instruments were looking at but you'd probably be able to collect a sizable amount of data on a particular landing region; know what minerals are present, etc. You wouldn't know that pyramid shaped rock 12B contains olivine but you'd know olivine was present.