Writing Code for Spacecraft

← Back to Stories (view on slashdot.org)

Posted by michael on Saturday November 20, 2004 @06:37AM from the carried-the-bits-uphill-one-by-one-in-the-snow dept.

CowboyRobot writes "In an article subtitled, "And you think *your* operating system needs to be reliable." Queue has an interview with the developer of the OS that runs on the Mars Rovers. Mike Deliman, chief engineer of operating systems at Wind River Systems, has quotes like, 'Writing the code for spacecraft is no harder than for any other realtime life- or mission-critical application. The thing that is hard is debugging a problem from another planet.' and, 'The operating system and kernel fit in less than 2 megabytes; the rest of the code, plus data space, eventually exceeded 30 megabytes.'"

6 of 204 comments (clear)

Min score:

Reason:

Sort:

George Neville-Neil by cpghost · 2004-11-20 06:45 · Score: 4, Informative

The interviewer George Neville-Neil co-authored "The Design and Implementation of the FreeBSD Operating System" with Marshall Kirk McKusick.

--
cpghost at Cordula's Web.
Will they quit using FAT? by EqualSlash · 2004-11-20 07:05 · Score: 4, Informative

Remember sometime ago Spirit was continously rebooting due to a flash memory problem. The usage of FAT file system in the embedded systems was partly responsible for the mess.

The problem, Denise said, was in the file system the rover used. In DOS, a directory structure is actually stored as a file. As that directory tree grows, the directory file grows, as well. The Achilles' heel, Denise said, was that deleting files from the directory tree does not reduce the size of the directory file. Instead, deleted files are represented within the directory by special characters, which tell the OS that the files can be replaced with new data.

By itself, the cancerous file might not have been an issue. Combined with a "feature" of a third-party piece of software used by the onboard Wind River embedded OS, however, the glitch proved nearly fatal.

According to Denise, the Spirit rover contains 256 Mbytes of flash memory, a nonvolatile memory that can be written and rewritten thousands of times. The rover also contains 128 Mbytes of DRAM, 96 Mbytes of which are used for data, such as buffering image files in preparation for transmitting them to Earth. The other 32 Mbytes are used for code storage. An additional 11 Mbytes of EEPROM memory are used for additional program code storage.

The undisclosed software vendor required that data stored in flash memory be mirrored in RAM. Since the rover's flash memory was twice the size of the system RAM, a crash was almost inevitable, Denise said.

Moving an actuator, for example, generates a large number of tiny data files. After the rover rebooted, the OSes heap memory would be a hair's breadth away from a crash, as the system RAM would be nearly full, Denise said. Adding another data file would generate a memory allocation command to a nonexistent memory address, prompting a fatal error.

Source: DOS Glitch Nearly Killed Mars Rover

BTW, there is another interview of Mike Deliman I read sometime ago in PCWorld.
Re:Efficiency by Brett+Buck · 2004-11-20 07:55 · Score: 5, Informative

> "The operating system and kernel fit in less than 2
> megabytes; the rest of the code, plus data space,
> eventually exceeded 30 megabytes." This should be used as
> the example for efficient coding

You've GOT to be kidding, right? 2 meg of OS code? That's ULTRABLOAT compared to most spacecraft. In fact, for the vast majority of the space age, that would have exceeded the resources of the computer by several orders of magnitude.

I've done this kind of programming for a living (for 10 years, moved up to controls design) but the last system I programmed for has 372k of memory, total. That includes data, code, OS, everything. Runs at 432 KIPS. And it performs what it probably one of the most complex in-flight autonomous control operations ever.

Most are even more restrictive. For example, 8K of PROM and 1k of volatile memory (and 28 WORDS) of non-volatile memory. This more than adequate for most applications, if you do it right.

Many spacecraft OS's are more akin to this:

hardware interrupt
external electronics power up processor.
external electronics set PC = 80hex
run
{execute all the code}
halt
power down

Once every 1/4 a second for 15 years.

The project I am currently working on uses VxWorks (and so we were quite interested in the Mars Rover problem) and it's so bloated with unnecessary features it's absurd. This is not a Windows box, it's a spacecraft processor.

I can't argue with the 30 meg of data space. Using the memory as a data recorder would be quite useful and a good picture takes a lot of space. But it's alarming to me that you could figure out how to waste maybe 4-5 meg on code. If you started with a bare home-brew OS, I would guess (and I get paid for this sort of guess) that you could do the entire flight code in 512K, with maybe 8k of data space, excluding the science data.

Only recently have space-qualified rad-hard processors with this kind of capability become available. Until then, if you said you needed 2 meg for the OS alone, you would have gotten fired on the sopt and referred to mental health professionals. The availability of these processors enabled people to use high-level languages with tremendous overhead (like C++) to be used. And this was only done for employee retention purposes during the bubble. For years it was done at the assembler or even machine level. It's still not at all uncommon to do, and we've done MANY flight code patches, with only a processor handbook, an engineering paper pad, and by setting individual bits one-by-one.

Brett
Re:Out of curiousity by The+Vulture · 2004-11-20 09:20 · Score: 4, Informative

The problem is that technology moves too quickly for it to get "NASA certified". When you send something up in space where making changes to it will be difficult, you need something that is known to be robust and reliable, that has several years of testing.

Last I read (maybe a year ago?), NASA still used 386 and 486 chips because they didn't generate a lot of heat (compared to todays machines) and could be made to withstand higher than normal forces (through extra padding on the device I imagine). They were more resiliant to the issues you might see in space than newer processors.

Simply put, if they put the latest CPU with tons of RAM in there, and it fails, how are they going to fix it?

-- Joe
Re:Out of curiousity by GileadGreene · 2004-11-20 10:21 · Score: 4, Informative

Shielding does not protect against single-event upsets (particle-induced bit flips), it only provides some mitigation against total ionizing dose (which causes long term cumulative degradation as a result of drift in transistor operating parameters). There are design techniques and fabrication processes that can reduce the likelihood that a circuit will suffer upsets, but it's still standard practice to provide either redundant memory, or error detection and correction coding. In the case of MER they had 3 physically separate PROMs carrying identical copies of the flight software, and the RAM was (IIRC) protected by an EDAC code implemented in a rad-hard FPGA.
Re:Out of curiousity by arnasobr · 2004-11-20 10:26 · Score: 4, Informative

Feature size. The smaller the feature (think gate level), the higher the chance it will be ruined by random radiation exposure. And that's the one-sentence summary of the "Radiation Effects on Microelectronics" class I took about 7 years ago.

Smaller memory capacity for a given surface area implies larger feature size.

By the way, the class I took was 1-on-1 with Prof. Stephen McGuire at Cornell. Extremely cool guy.