Spirit 'Will Be Perfect Again'
G. Holst writes "NASA technicians are preparing to wipe Spirit's flash memory clean of science and engineering files that have stymied its software. The fix, likely to be made Friday, could completely restore Spirit. "I think it will be perfect again," says the Mission Manager. Chalk this one up for earth!" There are numerous stories about Spirit and Mars: one describes being careful with rm -rf. Reader Tablizer sends in an interesting site: "I discovered Bill Momsen's website where he describes his experiences working on the first successful photographic mission to another planet: Mariner IV to Mars."
One has to wonder, is opportunity going to forego the same problems as spirit?.. As they are "identical" robots.. have steps been put in place to prevent the 2nd robot from "getting full".. I should certainly hope that we dont want this to happen again, as they might not be as lucky to regain it.
Apparently it was simply too many files and the FS ran out of inodes. Remember that they're constrained to a 256MB file system. It wouldn't surprise me if they used an 8 bit or 16 bit number for the inode count. (Ah, the joys of Vx(Doesn't)Works.)
On another note, does anyone know exactly what they're deleting here? While I understand that they need to get this mission underway, is there a chance they could lose valuable mission or navigational information?
Javascript + Nintendo DSi = DSiCade
Yes, actually it seems to be a filesystem bug... I mean, a reasonably stable filesystem - every OS has this, I am really surprised they messed this up! I wouldn't mind if it was an obscure kernel race condition or something, but filesystem!!!
Even if the memory handling is shitty, I wonder how it could have caused so much havoc.. How could it have caused spirit to go into the reset loop? It seems like some bad error handling code was also in play here (just guessing, the details aren't public to my knowledge..).
Another thing that surprised me is that if the flash had been broken, all data had to be uploaded before the rover went to sleep.. every modern PC can continue to refresh it's DRAM while sleeping. Why can't spirit? Maybe a feature to consider on future missions?
"It's too bad that stupidity isn't painful." - Anton LaVey
The same reason why your hard drive is cluttered with old unused files.
Why delete, when you still have room on the flash and you *just* might need that file later...
Of course they then found out that their filesystem handler borks out way before the flash is actually filled up, and that almost bought the whole show to an end... Software QA testing failure in my books, but they seem to be recovering from the fumble pretty well...
Ok ok ok... chill out everyone... ;) ). In fact the whole system that they are using on the rover has flown quite a few times (VXWorks running on rad hardened PowerPCs with a VME bus for it's backbone).
VXWorks is not that bad (I use it on almost a daily basis). Every single OS has its problems. Before we all go and start calling VXWorks or Spirits software a crappy piece of code, you have to understand what goes into writing space qualified software.
This is not some thing you hack together over the weekend. In fact something you wrote for a space system over the weekend would be tested over a period of months and possibly even years depending on the criticality of the code. We're talking life critical system testing here. That means all paths for you code heads out there.
That said, even when you hit rubber to the road, there are always unexpected situations. Something that you didn't anticipate, a bug that made its way through under circumstance x. Hands up for everyone here who has written a complex bug free system right out of the gates. Anywone who just lifted their hand does not understand what a complex system is or a bug. Though stuff that flies tends to be pretty darn close to bug free.
We are dealing with many complex unknowns when we land something on another planet.
VXWorks is actually very popular with the space program. It's not perfect but neither is Linux (though someday it will be right
Trust me, the software running on the rover is not crappy. In fact, the fact they can bring it back to life like they did says a lot.
To answer your question, there was probably a watchdog timer that caused it to go into a reset loop.
Yes modern PCs have all of these wiz bang features but let me ask you this... would you want to be on an airplane where it's fly by wire system was controlled with your PC? No probably not.
Systems that fly and are life critical (yes there is no one on it, but space systems are held to that standard) cannot have a bunch of wiz bang features on board. The more you add, the more potential for failures. So you try to mitigate your risks as much as possible. You can't go out there to simply tweak the chip that failed because it got zapped by radiation as it was heading over to Mars.
Which is a reminder to always test the boundary conditions, no matter how ridiculous they may seem. If it is possible to have that many files, then the regression test scripts should generate that many files during testing.
At least it's fixable.
You can only drink 30 or 40 glasses of beer a day, no matter how rich you are.
-- Colonel Adolphus Busch
The memory is not faulty. It is a bug in the filesystem software. The memory isn't full, but there are more files than the rover can handle. They were basically letting everything pile up, so the rover had eighteen days worth of files (and pre-landing files on top of that.) With the other rover they are deleting the files after they are received on Earth.
Tim
Omnia vestra castrorum habetur nobis.
That's false reasoning.
1. No practical software is bug-free.
2. Testing is never complete.
3. People make mistakes, even during testing.
4. Spirit broke down.
It makes sense, when building a robust system, to do rigorous testing AND have the memory protection.
VxWorks obviously has a brilliant team of brainwashers^Wsalesmen because they've convinced you that you don't need a feature they don't offer. Perfect!