10 Years In, Mars Rover Opportunity Suffers From Flash Memory Degradation
astroengine writes Mars Exploration Rover Opportunity has been exploring the Martian surface for over a decade — that's an amazing ten years longer than the 3-month primary mission it began in January 2004. But with its great successes, inevitable age-related issues have surfaced and mission engineers are being challenged by an increasingly troubling bout of "amnesia" triggered by the rover's flash memory. "The problems started off fairly benign, but now they've become more serious — much like an illness, the symptoms were mild, but now with the progression of time things have become more serious," Mars Exploration Rover Project Manager John Callas, of NASA's Jet Propulsion Laboratory in Pasadena, Calif., told Discovery News.
Memory bristles
Like Scottish thistles
Make operation tough
Plus the interplanetary stuff
Burma Shave
Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
But to claim it under warranty, you have to return it to the manufacturer
It is time to start building out the martian rover maintenance infrastructure so these guys can be towed in for repairs and upgrades.
the growth in cynicism and rebellion has not been without cause
At least they have identified a fix. But it surely won't be too long before more of the flash memory banks start exhibiting similar behaviour.
Still, 44x longer lifespan than originally planned == win in anyone's books.
Would it have cost to ship it with a RAID array of flash drives?
If only they had over-engineered it last, this never would have happened!
http://mars.nasa.gov/mer/mission/status_opportunityAll.html
I don't know that one could expect similar behavior from the other banks on a similar schedule. This is fairly old technology in terms of design and software, so I don't think they're doing any sort of automatic wear leveling, for instance. It's probably "manually leveled" if at all. For all we know, bank 7 was used the most and it's worn out. Or, it's taking more total ionizing dose (TID) because of the physical location on the card. Or, it's just a process variation when making the flash chips themselves. They were probably fabricated in 2000, most likely at Micron, since for a 2003 launch, the computer was probably assembled by early 2002, if not earlier.
Or, the software is not optimized for "space flight use" but, rather, for "consumer camera memory card", which has a different read/write/erase pattern and error tolerance.
http://spinroot.com/gerard/pdf/25MC.pdf describes an improved file manager under development, but also describes the existing flash architecture.
If it was long-known that long-duration, low-intensity heat would revive failed flash, why did these rovers leave without the ability to do so?
And why am I not able now to buy flash memory that will heat itself to 800 degrees and heal itself?
And why isn't flash memory sold in ceramic housings that can stand me baking them in an oven for a few days to fix failed flash manually?
I'd like to buy hardware that works, or that can be repaired. That's not flash.
if the issue turned out be mould.
Or at least the failing flash isn't the reason the problem is serious. Software bugs involving how the failed flash is handled are the problems, causing infinite loops and automatic reboots.
Let's see, slashdot post from two years ago (about 9 years AFTER the rovers were launched) cited this "It's still a long way from commercialization, but if it works on a small scale"
from the post that that post cites:
"The entire memory chip would need heating for hours at around 250 ÃC"
And the technique proposed in the paper being presented at an IEEE conference in december 2012, where they didn't even have an actual chip designed using your 800C technique, just speculation that might be possible. In fact, the IEEE Spectrum article quotes Macronix's guy:
"Lue says Macronix intends to capitalize on the self-healing flash breakthrough, but he would not give details about how and when. He was more forthcoming about when the flash industry should have worked in this technology. ÃoeIt took a leap of imagination to jump into a completely different regimeævery high temperature and in a very short time,à says Lue. ÃoeAfterward, we realized that there was no new physics principle invented here, and we could have done this 10 years ago.à é
Let us count the reasons why we might not want this on a rover being designed and built in 2000-2002
"intends to capitalize, no details on how and when" (in 2012)
"could have done this 10 years ago" (so, the *idea* was possible in 2002, about the time the MER rovers were being shipped to the cape for launch"
Yeah, I think they should have used that in the rovers built in early 2000s..
You *do* know that spacecraft designers are conservative when it comes to new technology? That MER carries a Rad6000 VME card CPU which is a MIPS 4000 like in the early TiVo boxes..
The memory, as little as it is, the Voyager spacecraft, must be of a different sort. Launched in the late 1970s, the electronics is still functioning, although with a few issues. That'll soon be four times longer that the Rover.
That, I tell friends, is why I'm happy to drive a 30+ year old car. It has issues, but the hardware it's built from is inherently more long-lived than that in today's cars. A crank-up window just keeps working. One driven by an electric motor doesn't.
So, does that mean that NASA needs to go back to the plated wire memory and tape systems like the Honeywell systems that ran the Viking and Voyager systems for decades on Mars and in space?
There is another probe in orbit around mars. Can't oportunity transmit it's data in a live stream to that, and the orbital can use it's flash to store and later re-transmit the data?
This is a clear case of not running TRIM.
I propose a space probe called "Operator".
The craft will be a communications relay. It's function will be able to receive and transmit between Earth and other probes that may have previously lost long range communications abilities. This may enable us to reestablish communications with others, and would serve as a backup communications point for other probes to be sent in the future.
They pass over the rover twice a day, and yes, they DO use them to send the data from the rover to the spacecraft. Virtually all science data from Mars comes via MEX or MRO relays these days. The direct to earth (DTE) link at X-band is used only for commanding the rovers and getting a small amount of telemetry. It only runs at 8 kbps, while you can get Mbps through the UHF relay links (and the orbiter DTE link)
The problem is that the software in the rover writes the data to flash, and the "sending of the data" to the relay comes out of flash as it streams to the radio. A flash error causes a system reset, which has the side effect of clearing volatile RAM, even though the power to the RAM isn't lost, so it *could* be preserved.
This can be changed and is in the process of doing so.. The various references say new software upload in January.. probably waiting for folks to come back from holidays before they beat on the testbed.
No, the article says that you either need low-intensity, long duration heat (which has apparently long been known), or high-intensity, short-duration:
We are still buying flash that we can't fix because of the packaging. We're still shipping this unfixable flash in mission-critical applications. When does it get fixed?
Let me guess: they used OCZ flash memory?
The Christian Right is Neither (Christian nor right). See: Matthew 23, Matthew 25, Ezekiel 16:48-50
Dave, my mind is going. I can feel it. I can feel it. My mind is going. There is no question about it. I can feel it. I can feel it. I can feel it. I'm a... fraid.
"Win treats sysadmins better than users. Mac treats users better than sysadmins. Linux treats everyone like sysadmins."
So, Asimov was actually right with the artificial brain degradation :)
12 years ago no smartPhones, tablets, on flash laptops due to expense of flash. Even Ipods had micro-disks. Just a few cameras and mp3 players with very limited memory. Those devices, or at least there chips, were upgraded long ago.
Happens to us all, Opportunity.
Now I'll just fire up my Steampunk Mars Exploratron and off we go!
-- Tigger warning: This post may contain tiggers! --
NO ROM BASIC
SYSTEM HALTED
Sanity Failure
**** NASA BASIC ****
1024000 BYTES FREE
FATAL: Internal Stack Failure, System Halted
RXNT@A\DRRNR
If good science would be still available after a decade (Opportunity) or many decades (Voyager), at least light components like flash and electronics in general should be designed with good degree of redundancy. Or else if the probe has a limited mission and has accomplished it, there is nothing wrong with abandoning it and focusing money and talent on new missions. Would engineers working on attempts to fix Opportunity be more useful working on newer Curiosity mission? My gut feeling is that making existing missions last longer is much more cost effective than launching new ones. But I am not a space scientist. The point is that mission planning should have clear focus one way or the other.
This is an interesting event. Failure of the flash memory can only really be overcome by either replacing it or having a secondary flash that's on standby, syncing up periodically so that it has much less wear on it, so you can extend the mission by switching over to the backup/secondary flash memory. However, this would add precious ounces to the payload, thereby requiring more fuel, etc.
Awk! Pieces of eight. Pieces of eight. Pieces of seven... ERROR: General Protection Fault. [Paroty Error.]
you just feed data through the watch hand using gravity to transmit morse code to someone whom you have a close emotional bond with. no problem
Hah, FlashZheimer :D
But then again, who does?
This makes you realize which amazing piece of engineering this hardware is.
Just try to remember what Smartphone you was using 12 years ago? What game console you had? What the specs of your main computer? Few things we have HERE last that long.