Writing Code for Spacecraft

hmm... by opqdonut · 2004-11-20 06:39 · Score: 2, Interesting

I wonder will they be releasing the source. It could be an interesting read.

--
yes > /dev/dsp

Re:hmm... by Anonymous Coward · 2004-11-20 06:48 · Score: 2, Informative

WindRiver ROTS (real time operating system) is painful to work with. There debugging environment is a nightmare and the cost of development and deployment is almost 3x that of an embedded linux. My little company just finished doing a trade study of the various ROTS kernels available and yes, thiers might be more reliable, but at a huge cost. Furthermore, performance wise, it just isn't to snuff vs say MercuryOS on a single CPU, let alone a multi CPU system.
As to releasing of thier source code? From Wind River? ROTFL!
1. 2. 3. 4. Profit???? (For a quick mod up)
Re:hmm... by grub · 2004-11-20 06:59 · Score: 2, Funny

i>and the cost of development and deployment is almost 3x that of an embedded linux

When a spacecraft millions of kilometers from Earth packs it in I'm sure a project leader at NASA would be happy they saved 2/3 of the price on a relatively small ticket item.

--
Trolling is a art,
Re:hmm... by Infinityis · 2004-11-20 07:08 · Score: 5, Funny

Not gonna happen, for one big reason. I could just see the Slashdot headline:

Mars Rover HaX0r3d and OS replaced with Linux.

Shortly thereafter, Micro$oft claims that they can enforce patent infringement on Mars...
Re:hmm... by Richthofen80 · 2004-11-20 07:23 · Score: 3, Insightful

thiers might be more reliable, but at a huge cost.

Probably not as big a cost as losing a Mars rover because your OS wasn't reliable enough.

--
Reason, free market capitalism, and individualism
Re:hmm... by ScrewMaster · 2004-11-20 07:32 · Score: 1

Don't you mean ROTS: "real operating time system"?

--
The higher the technology, the sharper that two-edged sword.
Re:hmm... by The+Vulture · 2004-11-20 09:12 · Score: 3, Interesting

Yes, and seeing as I'm currently working with embedded Linux, I can honestly say that it's a pain. (Note: I must preface this by saying that I am using Linux 2.4.18 for MIPS and my company is not using any sort of real-time extensions, just the bare 2.4.18 tree).

You get what you pay for... I've used VxWorks for a few years now, and while it does have it's share of problems, and while they are sometimes difficult to deal with, it is a great platform for development. You get much better control of the system as opposed to Linux (the main problem with using Linux in an embedded environment is the user to kernel relationship. It's solved neatly in vxWorks by getting rid of it (everything is in kernel space)). This works out very nicely for MIPS processors, which I deal with most of the time. Threading (or tasks as vxWorks has) is much better than Linux - you can at least somewhat guarantee when your tasks run, unlike with the default Linux scheduler.

I am very interested in trying QNX out, to see how it compares to vxWorks, one of these days.

-- Joe
Re:hmm... by grozzie2 · 2004-11-20 13:46 · Score: 1

if you look back, they almost did lose a rover to the _reliability_ of wind river. It was something about having to many files in a directory or some such silliness, took the rover offline and almost lost it. Hmmmmmmm.
Re:hmm... by Tablizer · 2004-11-20 17:49 · Score: 3, Funny

[public source code] Not gonna happen, for one big reason. I could just see the Slashdot headline: Mars Rover HaX0r3d and OS replaced with Linux.

More likely: "Mars Rover Draws Goatse In Sand"

--
Table-ized A.I.
Re:hmm... by Goth+Biker+Babe · 2004-11-21 01:26 · Score: 1

Actually that was a priority inversion issue which can happen with any operating system when the code isn't designed properly. It was because of the flexibility of VxWorks when a debug build is used that allowed them to patch the software and fix the problem.
Re:hmm... by Goth+Biker+Babe · 2004-11-21 01:47 · Score: 1

Firstly Linux isn't an ROTS (sic). It's not predictive and using pthreads say for thread control and IPC has much to be desired when compared to VxWorks. For example in pthreads you can't try and take a semaphore and then have it either return when it can take or when a specified time out is fulfilled. You have to sleep and poll.

Secondly, yes Tornado is rather dated, but I don't believe the debugging environment any worse than say ddd and gdb server. In fact I generally find it quicker to develop in Tornado than for linux on embedded devices.

We write software components that can plug together. Each component has a configuration file that an in-house developed make configuration system utility can use to configure a make system with. That utility will also generate Tornado project files. So pull the components you need out of CVS. Build a Tornado project. Fire up Tornado and develop downloading objects via an ICE/JTAG or ethernet to run and debug on the hardware.

Finally, Wind River are now moving with the times. The new tools are Eclipse based and they are improving their POSIX compatibility in the OS so that you can compile the same code for Linux or for VxWorks if you need hard real time.

As the company I work for uses several operating systems including VxWorks and Linux Wind River's embracement of open source is allowing us to standardise on tools and use the appropriate operating system for the appropriate device.
Re:hmm... by Anonymous Coward · 2004-11-21 04:56 · Score: 0

Your dealing with realtime problems with a plain jane 2.4 kernel ,complaining and comparing it with an OS that does everything in real time and kernel space?
WTF?
Re:hmm... by Anonymous Coward · 2004-11-21 05:01 · Score: 1, Informative

Please, please, please read 'Linux Kernel
Development.' by Robert Love.
As far as locks go in pthreads(?)..WTF?
You can lock critical sections in pthreads without
using constructs like semaphores, which are crappy anyway(Read Stevens..again, or maybe for the
first time),by using a little imagination.
Re:hmm... by Anonymous Coward · 2004-11-21 10:18 · Score: 0

Yeah, I'm looking forward to linux so that I can achieve some kernel/user seperation so the damn thing stays up for more than 5 secs once a haywire task starts painting memory.......
Re:hmm... by Anonymous Coward · 2004-11-21 10:57 · Score: 0

Goth,

you're dead-on in one sense - Mars Pathfinder (MPF) did suffer the inversion problem, when being opperated well beyond the tested limits. It was working so well, they decided to bump the transmission rate up to a rate never tested, while running a science package. The combination brought the inversion to light.

MER-A, or Spirit, did have a problem related to file systems. Part of the problem is that several user-configurable parameters were left at defaults, which in the end precipitated the problem on the 18th day of surface operatoins on Mars.

And the flexibility - is a combination of OS capability, hardware capability, and foresight and wisdome of the engineers at JPL who programmed the systems to be able to accept patches and run maintenance scripts. :-)

-Coward who knows
Re:hmm... by Goth+Biker+Babe · 2004-11-21 22:46 · Score: 1

This is cross platform code being developed. Why should I have to resort to Linux wizardry just because it's not complete.

hard to imagine.. by Chuck+Bucket · 2004-11-20 06:41 · Score: 2, Interesting

all software has bugs, what happens when 1/2 thru the trip they have an update? who installs remotely, and I guess having a sysop reboot is out of the question...

CBB

--
free ipod and free gmail!

Re:hard to imagine.. by brilinux · 2004-11-20 06:43 · Score: 3, Informative

Actually, if I remember correctly, there was a problem with one of the rovers, and they had to re-flash it from millions of KM away. I am not sure whether they had a backup copy of the OS on the rover that would facilitate the re-flashing, or whether there was some patch that was transmitted, but I remember them talking about it on the news.
Re:hard to imagine.. by Coneasfast · 2004-11-20 06:50 · Score: 2, Funny

I guess having a sysop reboot is out of the question...

Oh shit, i forgot to rerun 'lilo' before rebooting!

--
Marge, get me your address book, 4 beers, and my conversation hat.
Re:hard to imagine.. by Cylix · 2004-11-20 06:50 · Score: 2, Interesting

They had a section of the flash memory go bad... so they patched a work around for those sectors if I remember correctly.

--
"You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
Re:hard to imagine.. by Vardamir · 2004-11-20 06:58 · Score: 5, Interesting

Yes, here is an email my OS prof sent our class on the subject:

Subject: What really happened on Mars Rover Pathfinder

The Mars Pathfinder mission was widely proclaimed as "flawless" in the early
days after its July 4th, 1997 landing on the Martian surface. Successes
included its unconventional "landing" -- bouncing onto the Martian surface
surrounded by airbags, deploying the Sojourner rover, and gathering and
transmitting voluminous data back to Earth, including the panoramic pictures
that were such a hit on the Web. But a few days into the mission, not long
after Pathfinder started gathering meteorological data, the spacecraft began
experiencing total system resets, each resulting in losses of data. The
press reported these failures in terms such as "software glitches" and "the
computer was trying to do too many things at once".

This week at the IEEE Real-Time Systems Symposium I heard a fascinating
keynote address by David Wilner, Chief Technical Officer of Wind River
Systems. Wind River makes VxWorks, the real-time embedded systems kernel
that was used in the Mars Pathfinder mission. In his talk, he explained in
detail the actual software problems that caused the total system resets of
the Pathfinder spacecraft, how they were diagnosed, and how they were
solved. I wanted to share his story with each of you.

VxWorks provides preemptive priority scheduling of threads. Tasks on the
Pathfinder spacecraft were executed as threads with priorities that were
assigned in the usual manner reflecting the relative urgency of these tasks.

Pathfinder contained an "information bus", which you can think of as a
shared memory area used for passing information between different components
of the spacecraft. A bus management task ran frequently with high priority
to move certain kinds of data in and out of the information bus. Access to
the bus was synchronized with mutual exclusion locks (mutexes).

The meteorological data gathering task ran as an infrequent, low priority
thread, and used the information bus to publish its data. When publishing
its data, it would acquire a mutex, do writes to the bus, and release the
mutex. If an interrupt caused the information bus thread to be scheduled
while this mutex was held, and if the information bus thread then attempted
to acquire this same mutex in order to retrieve published data, this would
cause it to block on the mutex, waiting until the meteorological thread
released the mutex before it could continue. The spacecraft also contained
a communications task that ran with medium priority.

Most of the time this combination worked fine. However, very infrequently
it was possible for an interrupt to occur that caused the (medium priority)
communications task to be scheduled during the short interval while the
(high priority) information bus thread was blocked waiting for the (low
priority) meteorological data thread. In this case, the long-running
communications task, having higher priority than the meteorological task,
would prevent it from running, consequently preventing the blocked
information bus task from running. After some time had passed, a watchdog
timer would go off, notice that the data bus task had not been executed for
some time, conclude that something had gone drastically wrong, and initiate
a total system reset.

This scenario is a classic case of priority inversion.

HOW WAS THIS DEBUGGED?

VxWorks can be run in a mode where it records a total trace of all
interesting system events, including context switches, uses of
synchronization objects, and interrupts. After the failure, JPL engineers
spent hours and hours running the system on the exact spacecraft replica in
their lab with tracing turned on, attempting to replicate the precise
conditions under which they believed that the reset occurred. Early in the
morning, after all but one engineer had gone
Re:hard to imagine.. by cortana · 2004-11-20 07:28 · Score: 1

Shoulda used Grub :)
Re:hard to imagine.. by Anonymous Coward · 2004-11-20 08:18 · Score: 0

Fourtunately they were lucky enough to reproduce the glitch on earth, unfourtunately the problem was a design mistake (not preserving thread inheritance).
It's so 'real', so 'understandable' the image of software engineers dimissing a spurious error as a 'hardware glitch'!, let's talk about software QA!

AC by slashdot still blocking my ISP's Inktomy server.
Re:hard to imagine.. by AaronW · 2004-11-20 08:28 · Score: 4, Interesting

As someone who's worked with VxWorks for the last several years I'm surprised they didn't turn on priority inheritance to begin with for the semaphore. As a rule, we usually turn on priority inheritance for our mutex semaphores.

Other problems in the Mars Pathfinder were related to using the VxWorks filesystem. VxWorks basically only supports FAT on top of flash. For flash, FAT is a poor choice since some areas of the disk like the root directory and FAT tables will quickly wear out. Also, I don't think VxWorks has much support for working around bad sections of flash.

As far as VxWorks memory allocation support, in an ideal world one would statically allocate all memory, but oftentimes things are not ideal. In the product I work on, we have to have dynamic memory allocation, since depending on how the product is being used at the time, different data structures are required with no way of knowing beforehand how many of a particular type are needed, and this changes dynamically. For a simple device, it's easy to statically allocate everything, or if you have enough memory where you can statically allocate everything.

In our case, while we statically allocate memory where we can, however, in many cases we cannot. For example, I have to maintain a data structure keeping track of all of the network gateways connected to an output interface. We can have many thousands of gateways and thousands of output interfaces. There could be anything between one and thousands of gateways on an interface. In this case, I use static arrays for information on each gateway and each output interface, but must use dynamic data structures to list all the gateways connected to an output interface. It would be prohibitive to allocate storage for 30,000 gateways with 30,000 interfaces! I also can't use a linked list of gateways per interface since it doesn't scale, a linked list having access time O(n).

Also, we use third party libraries that perform dynamic memory allocation and it would be prohibitive to change that.

By replacing Wind River's malloc code with Doug Lea's code we eliminated fragmentation problems and saw our startup time jump from 50 minutes to 3 minutes. Doug Lea's malloc code is the basis of malloc in glibc and is very effecient. We also added support for tracing heap memory allocations to keep track of which task allocated a block and where it was allocated. This alone helped tremendously in tracking down a number of memory leaks since we can just walk the heap and see exactly where all the memory is being allocated. This is a sorely missing feature in VxWorks.

The lack of memory protection is another major problem for complex tasks. We have a bug we've spent weeks trying to track down the cause without any luck where random memory locations get corrupted.

Needless to say, all new projects where I work will not run on VxWorks. All of the chip vendors we're looking at are either dropping support for it or have already dropped it and are focusing on Linux.

BTW, this is one feature I would *REALLY* love to see added to Linux. The company I'm working for is looking at writing our next generation platform on top of an embedded Linux. We have not yet decided which one to use, but want something 2.6 based.

With priority inheritance, if a mutex is held by a low priority task and a high priority task tries to grab it, the low priority task is automatically boosted to the highest priority task that has attempted to acquire the semaphore. When the semaphore is released, the low priority task's priority is restored.

Some other nice features are interrupt scheduling and better priority based message passing support (which may already be present, I'm still looking into this).

Finally, one very useful feature would be the ability to guarantee a real-time thread a certain percentage of the CPU, with the option of placing a hard limit if it tries to exceed that or temporarily lowering it's priority to non-realtime so as to not starve no

--
This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
Re:hard to imagine.. by StefanoB · 2004-11-20 09:06 · Score: 0

Have you looked into RTAI? It's a kernel patch (even the 2.6 series) to get it real-time.

Greets,

Stefano
Re:hard to imagine.. by GileadGreene · 2004-11-20 10:13 · Score: 3, Interesting

You are confusing Mars Pathfinder (a 1997 mission, which suffered a priority inversion problem) with Mars Exploration Rover (a 2003-2004 mission which suffered a file allocation issue in Flash memory, and the subject of TFA). Although both used VxWorks.
Re:hard to imagine.. by Inthewire · 2004-11-20 15:46 · Score: 2

There's this article that covers what happened - it's fairly short, easy to read, and is linked in the story submission.

--

Writers imply. Readers infer.
Re:hard to imagine.. by DerekLyons · 2004-11-20 16:43 · Score: 1

all software has bugs,
But not all software has significant bugs. What matters most is how much effort you are willing to put into testing, proving the code correct, etc...
Re:hard to imagine.. by Anonymous Coward · 2004-11-21 06:06 · Score: 0

You obviously either never bought or never used WindView with VxWorks then, as using it operations such as memPartAlloc and memPartFree can be easily logged and stepped through graphically.
Re:hard to imagine.. by Anonymous Coward · 2004-11-21 10:50 · Score: 0

A small comment:
The priority inversion mentioned above did not surface until the the lander was opperated at more than twice the data transmission rate than it had been tested for in the lab, and the ASIMET meteorological package was running at the same time.

The file systems problem with Spirit on SOL18 happened after running for 18 "days" or Sols on-planet, recording data. It had been tested for up to 15 days in the lab.

Lesson learned: Test what you fly, Fly what you test.

-Coward who knows...
Re:hard to imagine.. by Anonymous Coward · 2004-11-21 11:01 · Score: 0

There was no file system on MPF (Mars Pathfinder).

There are file systems on each of the MER Rovers.

MPF was a lander that carried a rover. Commands and data were buffered in RAM on MPF.
Re:hard to imagine.. by Chuck+Bucket · 2004-11-21 17:58 · Score: 1

That's exactly what I was thinking! When I reboot my server if I screw anything up with Lilo (or if the kernel doesn't boot) I have to hall the old monitor out of the garage and hook it up just so I can choose my old kernel to get back in.

CB

--
free ipod and free gmail!
Re:hard to imagine.. by cortana · 2004-11-23 10:38 · Score: 1

Use "lilo -R". Or enter the 20th century with a device known as a "serial" port. :)

Efficiency by Maxim+Kovalenko · 2004-11-20 06:44 · Score: 2, Funny

"The operating system and kernel fit in less than 2 megabytes; the rest of the code, plus data space, eventually exceeded 30 megabytes." This should be used as the example for efficient coding

--
Requiem

Re:Efficiency by Omicron32 · 2004-11-20 06:47 · Score: 3, Interesting

That's all well and good, but don't forget that this kernel only has to interface with one set of hardware.

Things like the Linux kernel has to know about hundereds and thousands of different devices which is why it's so big.
Re:Efficiency by Anonymous Coward · 2004-11-20 06:47 · Score: 0

It could have been done smaller with DOS.
Re:Efficiency by UncleScrooge · 2004-11-20 06:53 · Score: 3, Interesting

I can get Linux on a 1.44mb floppy and run a system from it. 2 megs ain't that hard.

--
Slashdot 1|0 Productivity
Re:Efficiency by Armchair+Dissident · 2004-11-20 07:06 · Score: 4, Interesting

I used to write embedded applications using OS-9 (NOT MacOS 9) on 68000-based systems as a sub-contractor for Nuclear Electric (nuclear power stations company in the UK before it became BNFL). Our development system - complete with OS/Kernel and compilers - had only about a meg of memory; the final embeded systems often only had 512K if we were lucky

Okay, so this was some 14 years ago - but it was doing a lot of work. 2 megabytes is a lot of memory! There's a phenomenal amount of code and data that can be stored in 2 meg. Maybe it's good by current standards, but - personally - I would suggest that current standards is a bad place to start from.

--

The ways of gods are mysteriously indistinguishable from chance.
Re:Efficiency by Maxim+Kovalenko · 2004-11-20 07:19 · Score: 1

Oh very true....but there are still a whole hell of a lot of programmers who could take lessons from this.

--
Requiem
Re:Efficiency by Rattencremesuppe · 2004-11-20 07:41 · Score: 1

I'm currently writing an application for a MSP430 microcontroller which has 60K flash and 2K (yes, 2048 BYTES) RAM.
(it doesn't have to land on Mars, though ;)))
Re:Efficiency by Infinityis · 2004-11-20 07:42 · Score: 0

Well now, that all depends on how you define efficient. Some people would say efficient means compact code...others might say efficient code is written quickly. I mean, an efficient worker does a lot of work in a little time, would not the same standard apply for a software/OS developer?
Re:Efficiency by CarlDenny · 2004-11-20 07:46 · Score: 2, Insightful

Just to clarify, VxWorks runs on a hell of a lot of hardware, dozens of CPUs across all the major families, thousands of device drivers.

Now, any particular instance of the kernel gets compiled for a specific processor, and only includes the drivers it needs. Which does save on some space. But a lot of that extra space comes from things like a dynamic loader/loader, graphics packages, local shells (usually in multiple flavors,) and host of other applications that are "standard."

The thing that saves *that* space is the local WDB debugging agent. It lets you offload almost all of the bells and whistles to another machine, which does the object loading, provides your shell, does whatever debugging you need, then sends simple instructions ot he agent to carry them out, and generally dramatically increase the interface capabilities without increasing footprint.
Re:Efficiency by Brett+Buck · 2004-11-20 07:55 · Score: 5, Informative

> "The operating system and kernel fit in less than 2
> megabytes; the rest of the code, plus data space,
> eventually exceeded 30 megabytes." This should be used as
> the example for efficient coding

You've GOT to be kidding, right? 2 meg of OS code? That's ULTRABLOAT compared to most spacecraft. In fact, for the vast majority of the space age, that would have exceeded the resources of the computer by several orders of magnitude.

I've done this kind of programming for a living (for 10 years, moved up to controls design) but the last system I programmed for has 372k of memory, total. That includes data, code, OS, everything. Runs at 432 KIPS. And it performs what it probably one of the most complex in-flight autonomous control operations ever.

Most are even more restrictive. For example, 8K of PROM and 1k of volatile memory (and 28 WORDS) of non-volatile memory. This more than adequate for most applications, if you do it right.

Many spacecraft OS's are more akin to this:

hardware interrupt
external electronics power up processor.
external electronics set PC = 80hex
run
{execute all the code}
halt
power down

Once every 1/4 a second for 15 years.

The project I am currently working on uses VxWorks (and so we were quite interested in the Mars Rover problem) and it's so bloated with unnecessary features it's absurd. This is not a Windows box, it's a spacecraft processor.

I can't argue with the 30 meg of data space. Using the memory as a data recorder would be quite useful and a good picture takes a lot of space. But it's alarming to me that you could figure out how to waste maybe 4-5 meg on code. If you started with a bare home-brew OS, I would guess (and I get paid for this sort of guess) that you could do the entire flight code in 512K, with maybe 8k of data space, excluding the science data.

Only recently have space-qualified rad-hard processors with this kind of capability become available. Until then, if you said you needed 2 meg for the OS alone, you would have gotten fired on the sopt and referred to mental health professionals. The availability of these processors enabled people to use high-level languages with tremendous overhead (like C++) to be used. And this was only done for employee retention purposes during the bubble. For years it was done at the assembler or even machine level. It's still not at all uncommon to do, and we've done MANY flight code patches, with only a processor handbook, an engineering paper pad, and by setting individual bits one-by-one.

Brett
Re:Efficiency by JamesP · 2004-11-20 07:56 · Score: 1

Pur-lease...

I wrote stuff for the PIC microcontroller 16F84. Thet's 4k of code and 68 bytes of RAM (yes, 68 bytes - a SMS Message can be bigger)

--
how long until /. fixes commenting on Chrome?
Re:Efficiency by Anonymous Coward · 2004-11-20 08:27 · Score: 0

It's still not at all uncommon to do, and we've done MANY flight code patches, with only a processor handbook, an engineering paper pad, and by setting individual bits one-by-one.

Yes that's how a hardware control level software works, not an extra push/pop, I love this programming level!.

AC by slashdot still bloking my ISP's Inktomy server IP.
Re:Efficiency by Anonymous Coward · 2004-11-20 08:46 · Score: 0

As I understand it, the rovers have image recognition software that allows them to visually navigate and autonomously avoid hazards using their cameras. That's a little more complex than the simple orbital mechanics and instrument control code that most probes run.
Implementing a functioning artificial vision and autonomous decision-making system in a small fraction of the space that a Java "Hello World" program takes is still pretty impressive in my book.
Maybe now that the hardware supports the luxury of 3 megs of code, trying to write all of the functionality in assembler would "get you fired", because it is an excessive use of expensive developer time and QA resources.
Re:Efficiency by Ecyrd · 2004-11-20 09:49 · Score: 1

Yup. I did SW testing on the ENVISAT-1 satellite, and we had a 16-bit CPU with 64k of RAM for use (including code, data, heap and patch store). Code was in Ada (very tight code for a language with lots of features), and the OS was a custom operating system called "ASTRES", which took 1.5 kbytes (you could compile out all the features you didn't need). Did co-operative multi-tasking and memory management, which was pretty impressive...

Note: This was in 1997... I believe that 32-bit space-hardened CPUs didn't become available until after that time. And they're still a rare breed. The more transistors you have, the more likely it is that one of them develops a fatal flaw...
Re:Efficiency by Anonymous Coward · 2004-11-20 13:09 · Score: 0

Seems like you've pointed out an important programming tradeoff: size vs. stability.

Ideally, a high-level language would easier to prove correct, ensuring that critical applications will run properly under all circumstances. This may have been a design choice that resulted in a cost of more overhead based on the existance of processors capable of handling that overhead.

I agree that 2 MB of kernel sounds pretty high for spacecraft though.

Everything is a design choice on these missions with several alternatives and various justifications. It would be interesting to see what was going on here. Slashdotters, check out the NASA technical report server http://ntrs.nasa.gov/ for some downloadable journal articles decribing mission specifications and other research.
Re:Efficiency by DerekLyons · 2004-11-20 16:41 · Score: 1

Okay, so this was some 14 years ago - but it was doing a lot of work. 2 megabytes is a lot of memory! There's a phenomenal amount of code and data that can be stored in 2 meg.
Agreed. Take the FCS MK 98/2, which controls the Navy's Trident II missiles and performs the prelaunch guidance calculations. It takes about 20 mins to calculate a launch package (24 missiles x 8 warheads ea) from a standing start, and controls the launch sequence in real time. (Including assembling a complex data preload for the guidance systems, partly from stored data and partly calculated as-needed.) It takes about 8 minutes from pushing the 'Strategic Launch' mode button to first missile away, and a bird leaves the boat about every 40 deconds thereafter.

And it accomplishes all this with just over 1 meg of memory and clocks out (roughly) at a little over 5Mhz.
Re:Efficiency by Anonymous Coward · 2004-11-20 16:57 · Score: 0

Oh yeah? Well I wrote stuff for the attiny11! That's 1KB of flash and 32B ram.

I think at some point all the dick-waving isn't going to impress anyone...

BTW if you ever think of using the "f84" hobbyist-friendly chip that you read about in a million places on the net, try the f628a instead. It's better by every possible metric, pin compatible, and cheaper.
Re:Efficiency by Anonymous Coward · 2004-11-21 05:14 · Score: 0

No. Not really, when you consider that the memory management(page tracking)code in the linux kernel
consumes about 1mb of memory on pretty standard desktop machines these days.
Re:Efficiency by Anonymous Coward · 2004-11-21 06:38 · Score: 0

I've done this kind of programming for a living (for 10 years, moved up to controls design) but the last system I programmed for has 372k of memory, total. That includes data, code, OS, everything. Runs at 432 KIPS.

Uphill. Both ways.
Re:Efficiency by Anonymous Coward · 2004-11-21 10:41 · Score: 0

probably one of the most complex in-flight

Name of craft?

VxWorks - the follow-on versions based on the Pathfinder code - ran in Deep Space One, Stardust, and many others - DS1 is the first satellite / deep space probe to ever run AutoNav and AutoTelem. Stardust can run with AutoNav using either star sensors or inertial data. Both craft were deliberately twisted off-course, then released to run under AutoNav - both craft corrected course perfectly without assistance.

The MER Rovers have 128MB on-board; the OS and applications are limited to 32MB total, the rest is used for data steering. The bulk of the code is written in C++, most of the OS is written in C and assembly.

-Coward who knows

Summary of OS code by boingyzain · 2004-11-20 06:45 · Score: 4, Funny

while (1 = 1) { Dig(); Picture(); }

Re:Summary of OS code by caramelcarrot · 2004-11-20 06:50 · Score: 5, Funny

c:\rover\code\main.cpp(3) : error C2106: '=' : left operand must be l-value

Not quite bug free yet.
Re:Summary of OS code by oexeo · 2004-11-20 06:55 · Score: 1

Why not just:

while(true)
{
Dig(); Picture();
}
Re:Summary of OS code by zeath · 2004-11-20 06:56 · Score: 5, Funny

roveros.c: 1: non-lvalue in assignment
make: *** [roveros] Error 1 I'm sorry, your rover is lost in space. Insert $1 billion and press any key to try again.
Re:Summary of OS code by Bananenrepublik · 2004-11-20 07:32 · Score: 1

So 1-value is not enough?
Re:Summary of OS code by boingyzain · 2004-11-20 07:58 · Score: 1

What can I say... I was still half asleep. Not that I'm a very good coder anyway.
Re:Summary of OS code by hey · 2004-11-20 08:01 · Score: 1

Why not just:

for (;;)
{
Dig(); Picture();
}
Re:Summary of OS code by oexeo · 2004-11-20 09:18 · Score: 1

OK, how about:

for(;;Dig() && Picture());;
Re:Summary of OS code by sahonen · 2004-11-20 09:37 · Score: 4, Funny

10 DIG 20 PICTURE 30 GOTO 10

--
Make me a friend and I'll mod you up
Re:Summary of OS code by oexeo · 2004-11-20 09:45 · Score: 1

$ echo "10 DIG 20 PICTURE 30 GOTO 10" | wc -m
29
$ echo "for(;;Dig() && Picture());;" | wc -m
28

I'm still ahead by one char.
Re:Summary of OS code by hazem · 2004-11-20 10:15 · Score: 1

$echo "1 DIG 2 PICTURE 3 GOTO 1" | wc -m
25

Of course you still have some kind of carriage returns/line breaks.
Re:Summary of OS code by hazem · 2004-11-20 10:24 · Score: 1

It's rather frightening that I could think of this:

$ echo "1 DIG:PICTURE:RUN"|wc -m
18
Re:Summary of OS code by oexeo · 2004-11-20 10:53 · Score: 1

$ echo "for(;;Dig(),Picture());" | wc -m
24

Thats as close as I can get
Re:Summary of OS code by Ice_Balrog · 2004-11-20 11:27 · Score: 1

That's l, as in Lamda. See http://en.wikipedia.org/wiki/Lvalue.

--
#include "sig.h"
Re:Summary of OS code by multipartmixed · 2004-11-20 12:50 · Score: 1

The comma-operator is good, but you can save a character by making Dig() an argument to Picture(). Just ignore the compiler warnings.

Note: will only work as expected on platforms where the callER cleans up the stack (cdecl). "Pascal"-style calling convention (common in m$ crap) where the callEE cleans up the stack will probably run out of heap eventually.

--

Do daemons dream of electric sleep()?
Re:Summary of OS code by sahonen · 2004-11-20 17:36 · Score: 1

I wasn't really going for less chars, I was just trying to exploit the inherent humor value in using BASIC to control a spacecraft.

--
Make me a friend and I'll mod you up
Re:Summary of OS code by niteice · 2004-11-21 01:11 · Score: 1

Using Visual C++ I see....who uses Windows on a spacecraft anyway?

--
ROMANES EUNT DOMUS

George Neville-Neil by cpghost · 2004-11-20 06:45 · Score: 4, Informative

The interviewer George Neville-Neil co-authored "The Design and Implementation of the FreeBSD Operating System" with Marshall Kirk McKusick.

--
cpghost at Cordula's Web.

Too bad about their compiler/asssembler line... by Anonymous Coward · 2004-11-20 06:45 · Score: 0

Too bad about their compiler/asssembler line it is not half as reliable as their mars rover software...

In outer space... by Anonymous Coward · 2004-11-20 06:45 · Score: 0, Funny

...rover codes you!

Reinventing the wheel. by Anonymous Coward · 2004-11-20 06:49 · Score: 5, Funny

Should have just used WinCE, with a few of the productivity apps cut out. Adding a copy of pocket Auto-route, with some Martian JPEGS would have helped navigation as well.

Carmack by mfh · 2004-11-20 06:49 · Score: 3, Interesting

I would like to think that this article embodies the reasons that John Carmack got into space program development to begin with.

In the beginning he got into 3d game applications for a similar reason. The cutting edge is always the very outer area of human development, and Carmack makes a good example of a programmer who has taken aim at the edge of what is known to programmers. Maybe Mr. Carmack would care to comment?

Much like how Id Software develops engines, the space craft programming is new an innovative, although the difference is that space craft have systems have no room for error.

--
The dangers of knowledge trigger emotional distress in human beings.

Re:Carmack by Infinityis · 2004-11-20 07:31 · Score: 0

I was hoping for a less abstract reason, like an upcoming game, such as The Sims: Space Station or The Sims: Mars Rover.

Guess I'll have to scratch another one off my Christmas wish list...
Re:Carmack by Tablizer · 2004-11-20 12:21 · Score: 1

I would like to think that this article embodies the reasons that John Carmack got into space program development to begin with....In the beginning he got into 3d game applications for a similar reason. The cutting edge is always the very outer area of human development,

Space software is probably very conservative due to the cost of errors. I would think that a more forgiving domain field would be the best place to test and play with new ideas.

--
Table-ized A.I.
Re:Carmack by Anonymous Coward · 2004-11-20 15:00 · Score: 0

Why don't you suck up some more to the guy, eh? Maybe he'll even notice you.

"Space craft programming" isn't something new. It's been with us since the beginning of the space age. "Outer area of human development"? Uh, yeah.

Re:just imagine ... by Anonymous Coward · 2004-11-20 06:51 · Score: 0

a beowulf cluster of morons who think this joke is funny...

Errr... by Anonymous Coward · 2004-11-20 06:51 · Score: 0

a beowulf cluster of rovers
Don't you mean a convoy of rovers? :P

Kevin

Wait a minute? by Billly+Gates · 2004-11-20 06:52 · Score: 3, Insightful

Was not the OS about Rover loaded with problems? Go read past news from last Febuarary here on slashdot?

VXworks does not even offer memory protection and the ram can get fragmented. Not to sound trollish but I would pick something like Qnx or NetBSD for any critical app or embedded device.

Its amazing the engineers fixed it and got it to work reliably but better more mission critical operating systems would be a better choice.

--
http://saveie6.com/

Re:Wait a minute? by cpghost · 2004-11-20 07:11 · Score: 2, Interesting

I would pick something like Qnx or NetBSD for any critical app

Okay, let's turn NetBSD into a real-time OS. Add some "hardening" features like watchdogs etc. Hmm... what should we call it? Perhaps: SpaceBSD?

--
cpghost at Cordula's Web.
Re:Wait a minute? by Dominic_Mazzoni · 2004-11-20 07:12 · Score: 1

VXworks does not even offer memory protection and the ram can get fragmented. Not to sound trollish but I would pick something like Qnx or NetBSD for any critical app or embedded device.

I think QNX is a valid alternative. But is NetBSD hard-real-time?
Re:Wait a minute? by neonstz · 2004-11-20 07:20 · Score: 4, Insightful

VXworks does not even offer memory protection and the ram can get fragmented.

Dynamically allocating memory is usually a big no-no in real time systems.
Re:Wait a minute? by bhima · 2004-11-20 07:41 · Score: 1

NetBSD is not Hard Real Time. But most applications don't need true Real Time behavior. I use it at work for a couple of projects and find it more satisfactory than VXworks, Linux or (god forbid) Windows XP embedded.
Also Dynamic Memory allocation makes for ... "Interesting" testing "Oppurtunities". That's not to say I've never done it, only that I sort of wish I hadn't

--
Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
Re:Wait a minute? by RAMMS+EIN · 2004-11-20 07:41 · Score: 3, Insightful

``VXworks does not even offer memory protection and the ram can get fragmented.''

Why would you even want memory protection in a system like this? Memory protection is great to prevent crappy apps on your PC from doing too much damage, but in a system like the Rover it's pure overhead.

As for ram getting fragmented, it all depends on how you program it. Often, you don't even need memory allocation, so you won't have any problem with fragmentation.

--
Please correct me if I got my facts wrong.
Re:Wait a minute? by Dominic_Mazzoni · 2004-11-20 10:55 · Score: 1

NetBSD is not Hard Real Time. But most applications don't need true Real Time behavior.

Um, yeah, but we're talking about spacecraft here. I think that qualifies as an application that needs true Real Time behavior.
Re:Wait a minute? by bhima · 2004-11-20 16:45 · Score: 1

I'd wager that less than 15~20% of thrit need is Hard-Real time. But hey what do I know, I develop medical devices not space ships

--
Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
Re:Wait a minute? by Anonymous Coward · 2004-11-30 05:27 · Score: 0

You're pretty funny.

Your LJ info states you dislike liars and folks who post what ammount to trolls... and this is an obvious troll. (btw, you miss-spelled liars - "liers" - in your journal.)

you post your LJ web address to places like /., with trolls based on complete lack of understanding of the system (you must not have read the articles you "mention" above), and then post how upset you are that you had to make your LJ "private" because of all the activity you created.

Nice try. Get better bait, troll elsewhere.

And take a few more english courses, your usage and diction are very poor, not to mention the preponderance of spelling errors on your journal.

2MB Kernel by Anonymous Coward · 2004-11-20 06:54 · Score: 0

My linux kernel comes in at 1.7 meg and that's a fairly large kernel from what I've seen.

Re:2MB Kernel by Anonymous Coward · 2004-11-20 08:33 · Score: 0

...add to that the space your kernel modules take up.
Re:2MB Kernel by lintux · 2004-11-20 09:33 · Score: 1

But that file's probably called bzImage or vmlinuz. And do you know what the z means? Right, compressed. :-)

vmlinux files are 3-4MBytes (2.6) AFAIK. And, as the other poster pointed out, that doesn't include the modules.
Re:2MB Kernel by KiloByte · 2004-11-20 11:03 · Score: 1

Eh? Try ls -l /boot/ ...

You would have to statically compile in quite a lot of drivers to get to 3MB.

--
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
Re:2MB Kernel by lintux · 2004-11-20 21:25 · Score: 1

Well, I have it over 3MB on at least two 2.6 machines and I see a 2.9MB vmlinux on one other. It's not that I put everything into modules when I get the chance, maybe that's why you have them smaller?

I hope they had the foresight to add spam-blocking by Infinityis · 2004-11-20 07:02 · Score: 0, Funny

From Mr.Marvin
Olympus Mons Coast.

DEAR SIR/MADAM,

I AM HAPPY TO WRITE AND SEND THIS MESSAGE TO YOU.
AND I STRONGLY BELIEVE THAT THIS MESSAGE WOULD COME TO YOU AS A SURPRISE BUT I HOPE YOU WILL CONSIDER IT AS A CALL FROM A FAMILY IN DARE NEED AND GIVE IT URGENT CONSIDERATION. MY NAME IS MR marvin, A CITIZEN OF MARS AND THE SON OF LATE DR. FIDELIS GUBWANO WHO BEFORE HIS DEATH WAS THE MANAGER OF MARTIAN FINANCIAL TRUST CORPORATION (M.F.T.C). UPON HIS DEATH HE $60,000,000 (SIXTY MILLION U.S. DOLLARS) IN A THE OLYMPUS MONS BRANCH OF THE MARTIAN PLANETARY BANKING SYSTEM. I BELIEVE YOU TO BE AN HONEST AND TRUSTWORTY CITIZEN AND CAPABLE OF ASSISTING ME IN REMOVING THE MONEY FROM THIS ACCOUNT.

compilation error found by circletimessquare · 2004-11-20 07:03 · Score: 3, Funny

#include
int main() {
printf("Hello World!\n");
return 0;
}

marsrover.c: 3: You are no longer on the planet Earth.

--
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it

Re:compilation error found by Infinityis · 2004-11-20 07:35 · Score: 0

I dunno, I think that program is actually best suited to the task of interplanetary exploration.

Also, you need a new compiler. The real reason for that error is because you need to follow up #include with a filename.
Re:compilation error found by Anonymous Coward · 2004-11-20 08:28 · Score: 0

you need to follow up #include with a filename.

Slashbug.
Re:compilation error found by Tablizer · 2004-11-20 11:49 · Score: 1

Um, Mars is also a "world". Remember "explore strange new worlds"?

--
Table-ized A.I.
Re:compilation error found by wastingtape · 2004-11-20 12:20 · Score: 1

Reminds me of the part in Doom3 where the scientist is like "we've been having earthquakes... erm.. marsquakes..."

Will they quit using FAT? by EqualSlash · 2004-11-20 07:05 · Score: 4, Informative

Remember sometime ago Spirit was continously rebooting due to a flash memory problem. The usage of FAT file system in the embedded systems was partly responsible for the mess.

The problem, Denise said, was in the file system the rover used. In DOS, a directory structure is actually stored as a file. As that directory tree grows, the directory file grows, as well. The Achilles' heel, Denise said, was that deleting files from the directory tree does not reduce the size of the directory file. Instead, deleted files are represented within the directory by special characters, which tell the OS that the files can be replaced with new data.

By itself, the cancerous file might not have been an issue. Combined with a "feature" of a third-party piece of software used by the onboard Wind River embedded OS, however, the glitch proved nearly fatal.

According to Denise, the Spirit rover contains 256 Mbytes of flash memory, a nonvolatile memory that can be written and rewritten thousands of times. The rover also contains 128 Mbytes of DRAM, 96 Mbytes of which are used for data, such as buffering image files in preparation for transmitting them to Earth. The other 32 Mbytes are used for code storage. An additional 11 Mbytes of EEPROM memory are used for additional program code storage.

The undisclosed software vendor required that data stored in flash memory be mirrored in RAM. Since the rover's flash memory was twice the size of the system RAM, a crash was almost inevitable, Denise said.

Moving an actuator, for example, generates a large number of tiny data files. After the rover rebooted, the OSes heap memory would be a hair's breadth away from a crash, as the system RAM would be nearly full, Denise said. Adding another data file would generate a memory allocation command to a nonexistent memory address, prompting a fatal error.

Source: DOS Glitch Nearly Killed Mars Rover

BTW, there is another interview of Mike Deliman I read sometime ago in PCWorld.

Re:Will they quit using FAT? by xhispage · 2004-11-20 07:09 · Score: 1, Interesting

Well , beta testing , anyone ? :-D
Re:Will they quit using FAT? by Anonymous Coward · 2004-11-20 16:27 · Score: 0

The undisclosed software vendor...
I think we all know who we're talking about here. Bad design and the marketing power to keep their name "undisclosed"?
Re:Will they quit using FAT? by Anonymous Coward · 2004-11-21 00:39 · Score: 0

Mike Deliman disagrees with your long quote. He said the problem is that they're caching FS data structures in RAM, and they just tried to cache too much. They hadn't tested it with as many files as they had, so the cache grew too big.

Note that most UNIX filesystems (like ext2, FFS, and UFS) all have the same problem with directories. However, FAT has fixed-size entries, so the first deleted file's entry will be overwritten by the very next file created, so a FAT directory will not grow until all of its entries are filled up. A UNIX directory can get fragmented and grow even if there's lots of space left in it.

aQazaQa

Other options being considered by Dominic_Mazzoni · 2004-11-20 07:08 · Score: 5, Insightful

For those who are wondering, JPL is very aware of the shortcomings of VxWorks and has seriously considered other alternatives for every mission. Keep in mind that the choice of OS has to be made years before launch, so at the time the OS for the 2004 Mars Rovers was decided on, many options that are possibilities today were not contenders. Also keep in mind that in spite of many shortcomings, VxWorks is a known quantity. JPL has been working with it for years and had a lot of in-house expertise with it.

There are a few groups at JPL that have been actively experimenting with other options, including RTLinux and a few different variants of hard-real-time Java (basically Java with explicit memory management and no garbage collection).

Re:Other options being considered by GooberToo · 2004-11-20 10:12 · Score: 1

I recently started my first project with VxWorks about 30-days ago. I can honestly say that I'm not the least bit impressed.
Re:Other options being considered by sexylicious · 2004-11-20 11:31 · Score: 2, Informative

Yeah, when I was doing RT stuff at my former employer we made a pretty unanimous decision to not even get close to WR's stuff. Not that it couldn't do the job. They had some funky licensing thing that interfered with how we wanted to use the code. We ended up looking at a linux variant that had some tweaks to the tasking algorithm that fit perfectly with what we wanted. I think we ended up actually going in-house because one of the engineers we had programmed some code for an earlier project and we found out that the code did the EXACT thing we wanted. But I am pretty sure we would have went linux if we didn't get the in-house stuff. (The in-house code wasn't considered because the project was classified and all the source for it was locked away. It wasn't until that other engineer told us that we already had something in house that looked like it would work that we found out about it.)

Huh, its easy.. by adeyadey · 2004-11-20 07:13 · Score: 5, Funny

you are in a red rocky landscape..

GO NORTH..

you are in a red rocky landscape..

DIG.

ok. you see some red sand.
it is getting dark.

GO NORTH..

you were eaten by a grue.

--
"You lied to me! There is a Swansea!"

Re:Huh, its easy.. by Obasan · 2004-11-20 09:05 · Score: 0

Suddenly the dungeon collapses.

You die.
Re:Huh, its easy.. by Anonymous Coward · 2004-11-20 11:51 · Score: 0

you were eaten by a grue.

Elmer's Grue? Me so hoeney!
Re:Huh, its easy.. by Anonymous Coward · 2004-11-20 16:29 · Score: 0

Elmer's Grue? Me so hoeney!
Idiot.

Article Slashdotted? by emiddlec · 2004-11-20 07:14 · Score: 1

I can't get through to acmqueue.com. Can someone post an alternate link to the article?

Re:Article Slashdotted? by elFarto+the+2nd · 2004-11-20 07:20 · Score: 1

http://www.mirrordot.org/ is your friend.
Regards
elFarto
Re:Article Slashdotted? by emiddlec · 2004-11-20 07:28 · Score: 1

Many thanks.

Other (older) articles that I found if anyone is interested:
No Life on Mars, But Many Bugs
Three Minutes With Mike Deliman
Out-of-memory problem caused Mars rover's glitch
MarsNews.com :: NewsWire :: Mars Exploration Rovers :: Archives
Red Rover's master coder
Re:Article Slashdotted? by Infinityis · 2004-11-20 07:39 · Score: 0

It's not slashdotted yet, it's just hosted from Mars is all...
Re:Article Slashdotted? by emiddlec · 2004-11-20 07:48 · Score: 1

MirrorDot question -- is it mirroring all four pages of this article? The links for pages 2-4 lead to the acm site.

Today's /. footer quote by edittard · 2004-11-20 07:14 · Score: 0

"Examinations are formidable even to the best prepared, for even the greatest fool may ask more the the wisest man can answer. -- C.C. Colton "

Of course, a wise man knows the difference between "the" and "than".

--
At the bottom of the /. main page it says 'Yesterday's News'. Well they got that right.

On the other hand.. by Anonymous Coward · 2004-11-20 07:14 · Score: 0

..if only Wind River spent half the time they use on VxWorks on pSOS as well the world would be a much better place for a lot of people.

Don't hold your breath though.

my satellite debugging experience by nil5 · 2004-11-20 07:25 · Score: 5, Interesting

I worked on a satellite mission where we had some trouble. Due to an error the satellite wound up pointing 16 degrees away from the sun in a higher-than-expected orbit of 443 miles (714 kilometers) above Earth.

The misalignment meant the spacecraft was unable to look directly at the sun's center to record the amount of radiation streaming toward Earth. To accurately measure sunlight, the darn thing needed to be pointed to within a quarter of a degree of dead center.

It took about four and a half months to fix that problem, due to uplink difficulties. Ground controllers from first had to slow the spacecraft's spin in order to transmit a series of software "patches" and then gradually speed it up to see how well the commands worked.

Then things were fixed.

Moral of the story: it is a tough job indeed!

Re:my satellite debugging experience by Tablizer · 2004-11-20 11:56 · Score: 0, Offtopic

I worked on a satellite mission where we had some trouble. Due to an error the satellite wound up pointing 16 degrees away from the sun in a higher-than-expected orbit of 443 miles (714 kilometers) above Earth.

Nil, perhaps you should have posted anonymously. Now we will all avoid anyone named "Nil" on our space teams. BTW, do you have a brother named "Null"?

--
Table-ized A.I.
Re:my satellite debugging experience by DAldredge · 2004-11-21 09:07 · Score: 1

Was that the Active Cavity Radiometer Irradiance Monitor satellite?

there is only one by Foktip · 2004-11-20 07:43 · Score: 0, Troll

Do the Debian.

Re:there is only one by Anonymous Coward · 2004-11-20 07:47 · Score: 0

i thought for sure they'd use windows.

mutex's always cause trouble by hey · 2004-11-20 07:52 · Score: 3, Interesting

In my experience mutex's, semaphores, etc always cause trouble. There is nearly always another way to write things.

And you'll never ever seem me coding an infinite wait for a mutex. That's just asking for trouble.

Bad: in Windows, FindNextChangeNotification()
requires those IPC operations and I always gives me grief.

Good: The Linux File Activity Monitor (FAM). Lets you open and read a pipe of actions. Nice!

Re:mutex's always cause trouble by Anonymous Coward · 2004-11-20 11:02 · Score: 2, Insightful

In my experience, it is engineers who do not understand how to correctly employ mutexes and semaphores that always cause trouble.
Re:mutex's always cause trouble by faragon · 2004-11-20 11:41 · Score: 1

You hit the correct key. If your code it is well designed, the only one source of bugs it is the OS. Anyway, every RT-OSes have lethal bugs: still ones ensuring priority inversion detection/correction almost always lie. I'm used to work with RTOSes, and you have to be both extremely paranoic in your desing and also doing a good test plan.

Another point is to be aware of working with the "latest and supercool -put your favourite RTOS name here- OS version", your life can turn into a nightmare. If there are lives or expensive parts related to the success of your software, try to be both accurate/professional and extremely meticulous (Bonaparte to Josephine: dress me slowly because I'm in a hurry).
Re:mutex's always cause trouble by Anonymous Coward · 2004-11-21 05:08 · Score: 0

No they don't.
Look, locking can be problematic but the mistake here is a classic one in OS design..you have an occasional contention deadlock.

In the end I'd rather have my critical data saved
and a deadlock aborted by timeout than have my data
corrupted and truly bizarre things break out like
a naked dancing girl in antartica.

Marketing crap by jeffmock · 2004-11-20 07:54 · Score: 3, Insightful

Okay, I've got to call foul on this WindRiver marketing ploy. They're trading on the last days of being able to get away with saying that something mystical and special and super-high quality is going on behind the walls of trade secret and proprietary software.

I used vxworks on a reasonably large project several years ago, it's a fine piece of work, but nothing special, it's no where close to the quality of a recent linux kernel.

About half-way through our project we developed a need for a local filesystem on our box. We bought a FAT filesystem add-on from wind river that was annoyingly poor quality, lots of bizarre little problems, memory leaks, and of course no source to look at. In the end we didn't use it, we put together our own filesystem from freely available sources.

When I read the articles about vxworks filesystem problems nearly borking the entire Mars rover mission I laughed and laughed. I'm sure that it was the same crappy code (although I don't really know for sure).

For me it's a case study on why you shouldn't use closed source software, you can't evaluate the quality of the code on the other side trade-secret barrier and you wind up trusting things like glossy brochures.

jeff

Re:Marketing crap by Anonymous Coward · 2004-11-20 09:52 · Score: 2, Interesting

Well said! And ditto.

I do embedded software for a living as well, and run like heck away from any project involving WindRiver.

WindRiver is great for those people who don't know what they are doing in the embedded space. And it's useful as a red flag for telling one as such.

But for people who actually know what they are doing, and who actually do understand OS's, Linux solutions are a far better choice. The time-to-market is absolutely unbeatable; as well as all the choices that one has in order to get a product out. Plus the reliability is also the best.

Sorry if that sounds like a troll; it's not meant to be. It's just my own first hand experience in this space.
Re:Marketing crap by Anonymous Coward · 2004-11-20 20:56 · Score: 1, Informative

For me it's a case study on why you shouldn't use closed source software, you can't evaluate the quality of the code

BS. How much of the linux kernel have you read in detail for determining its quality? I agree that access to code allows you to FIX things faster or on your own, but you cant evaluate quality of any large piece of commercial software by looking at it.
Also for FWIW VxWorks still kicks Linux's ass on context switch times. For a really responsive system (think line rate packet switching etc) Linux is not even an option. Look at QNX, VxWorks etc.
Re:Marketing crap by Anonymous Coward · 2004-11-30 05:00 · Score: 0

Jeff,

in case you didn't actually read the article, I was interviewed for that article on _my_own_time_, after I'd left Wind River to work directly for NASA.

It's not a marketing glossy, WIND wasn't involved.

-Mike

Open source spaceware by relaxrelax · 2004-11-20 07:56 · Score: 5, Insightful

If that was open source, there are so many space nerds who are programmers that flaws of that magnitude would never get by the army of testers.

Many would help out simply because hey it's the *space program* and that's good enough for them. Other would want their name listed next to some obscure bug fix on a NASA site; it's good for the ego or your CV.

Simply put, even a binary distribution of that code would allow unlimited free testing for crashes. Why wouldn't NASA do it?

Because there are still people in washington that think code mysteriously get damaged by being public - even if such code isn't modifiable by the public who reads it.

This is evidence of advanced cluelessness in Washington and maybe independant anti-free-source advocates (spelled M-i-c-r-o-s-o-f-t) are at cause.

But I've learned not to bash. Never explain by Microsoft malice what could be explained by stupidity. Such as using DOS on a space thing...

--
Microsoft is pure dog-ma. FreeBSD is pure cat-ma.

Re:Open source spaceware by GileadGreene · 2004-11-20 08:46 · Score: 2, Informative

Or perhaps because NASA doesn't own the code -WindRiver does.
Re:Open source spaceware by Gogo+Dodo · 2004-11-20 09:25 · Score: 2, Insightful

Uhhh... and exactly how are you going to allow people to test "spaceware"? Last I checked, nobody owns their own satellite system. You just don't dump some satellite code onto your PC and "test" it.
Open Source is great and all, but it's hardly the answer to everything.
Re:Open source spaceware by Anonymous Coward · 2004-11-20 09:31 · Score: 0

This is evidence of advanced cluelessness in Washington and maybe independant anti-free-source advocates (spelled M-i-c-r-o-s-o-f-t) are at cause.
But I've learned not to bash. Never explain by Microsoft malice what could be explained by stupidity. Such as using DOS on a space thing...
Sounds like you forgot what you "learned". Seeing how we're talking about VxWorks, I fail to see how this is Microsoft's fault. I'm amazed how some people take an opportunity to turn any non-open source project into a bash Microsoft thing.
Re:Open source spaceware by ragnar · 2004-11-20 09:46 · Score: 2, Informative

I agree about opening the source, but for entirely different reasons. It would be an ideal teaching aid in a real time CS course or for enthusiasts. Although it might be possible to contribute bug fixes, I wouldn't count on it. From what I've read and seen concerning the open source projects, they tend to gather contributors for features much more readily than for bug fixes, especially the variety that are very hard to reproduce or require formal proof along with the fix.

--
-- Solaris Central - http://w
Re:Open source spaceware by GileadGreene · 2004-11-20 10:25 · Score: 1

There are several open source RTOS's out there, if you want to provide some kind of educational aid. Off the top of my head, I can think of RTEMS, eCos, and RT-Linux. I've seen several real-time courses use the MicroC OS, but I don't recall if it is open source or not. The odds of WindRiver (NASA doesn't own the code) open sourcing VxWorks are pretty minimal I would imagine.
Re:Open source spaceware by GileadGreene · 2004-11-20 10:28 · Score: 1

It's probably worth adding that from the TFA it seems that VxWorks code is shared between different spacecraft programs. So the folks who can test on real satellites are sharing their patches, fixes, and features.
Re:Open source spaceware by johannesg · 2004-11-20 12:01 · Score: 4, Interesting

You just don't dump some satellite code onto your PC and "test" it.
Sure you can. We make that kind of software. The reason you won't ever see it as open source is because the various instruments on the spacecraft are covered by confidentiality agreements (or worse, in case of military hardware). And as hardware goes it is typically rather obscure stuff, requiring significant domain knowledge as well to emulate correctly.
Another issue is that these systems are rather CPU-intensive - we have a 16-CPU box for the spacecraft instruments plus a dedicated PC to emulate the flight computer itself. But you could run it on simpler hardware if you are willing to run at less than realtime speed.
Interestingly, the closest we ever get to seeing the actual flight software is binary images of it. While that is a lot closer than most slashdotters are likely to get, it is still far removed from being able to do something useful with it.
Of course the other good reason why this isn't going to be open source is because of price. For details you should really contact a salesperson, but let me give you a clue: (raises little finger to mouth) "Mwuhahaha!" ;-)
Re:Open source spaceware by Anonymous Coward · 2004-11-20 12:02 · Score: 0

Why would you need a satellite setup to test how the OS and applications run on a device? It's not like NASA could even run remote tests on their rovers, as they weren't in space at the time of the software being written and tested.
Re:Open source spaceware by johannesg · 2004-11-20 12:05 · Score: 1

"Hi, I just wanted to let you know that last night I checked in a patch for the space shuttle that will let it make an extra loop around the moon to drop off some supplies for a buddy of mine who is stuck there for a few weeks. Hope you don't mind!"
Re:Open source spaceware by Gogo+Dodo · 2004-11-20 12:54 · Score: 1

Your products sound like cool stuff.

It just adds to my point that Open Source of space software just isn't really viable. Your 1 MILLION DOLLARS! ;-) software package isn't going to be readily available to your average OSS hacker.

I suppose I should rephrase my statement... You just don't dump some satellite code onto your average OSS hacker's PC and "test" it.
Re:Open source spaceware by DerekLyons · 2004-11-20 16:30 · Score: 2, Insightful

If that was open source, there are so many space nerds who are programmers that flaws of that magnitude would never get by the army of testers.
Almost certainly not, as none of that army of geeks would have the specialized hardware that the Rovers use.
Many would help out simply because hey it's the *space program* and that's good enough for them.
Few would accomplish anything, as few would bother to study, and learn, and analyze the structure of the program.
Re:Open source spaceware by NaDrew · 2004-11-20 16:35 · Score: 1

"Hi, I just wanted to let you know that last night I checked in a patch for the space shuttle that will let it make an extra loop around the moon to drop off some supplies for a buddy of mine who is stuck there for a few weeks. Hope you don't mind!"
You mean this guy?

--
Vista:XPSP2::ME:98SE

Contiki by Anonymous Coward · 2004-11-20 08:05 · Score: 0

Contiki - multitasking kernel, TCP/IP stack, GUI, themeable window system, web server, web browser, etc. Runs in 40k RAM (yes, only 40960 bytes!). That's efficient coding.

Re:Contiki by Anonymous Coward · 2004-11-21 06:03 · Score: 0

Don't I know it! I just bought an ethernet adapter for my Commodore 64 and when time permits (work crunch) I'll install the Contiki it came with.

Debugging in space: a case for dynamic systems. by voodoo1man · 2004-11-20 08:09 · Score: 4, Interesting

In 1998-2001, the JPL successfuly flew the Deep Space 1 spacecraft. One of the systems on board was the Remote Agent, a fully autonomous spacecraft control and guidance system. The software was written entirely in Common Lisp, and parts were verified in SPIN (there is an interesting paper written on the verification process, along with an informal account by one of the designers), which yielded the detection of several unforeseen race conditions. The parts that were not verified were thought to be thread-safe, but unfortunately this proved mistaken as a race condition occured in-flight. With the help of the Read-Eval-Print Loop and other Lisp debugging facilities, the bug was tracked down and fixed in less than a day, and Remote Agent went on to win NASA's Software of the Year Award.

Perhaps not surprisingly for anyone who has heard about the management at NASA, C++ was selected for the successors to the Remote Agent on the grounds that it is supposed to be more reliable (this despite the fact that the Remote Agent was originally to be developed in C++, an effort that was abandoned after a year of failure). This caused more than a few people to be upset (including a very personal account by one of the aforementioned designers). Clearly the debugging facilities of Common Lisp are far superior to static systems like C++, something which is very useful in diagnosing unexpected error conditions in spacecraft software (read the first question on p. 3 of the interview to see what pains the JPL staff went through to adapt similar, ad-hoc methods to VxWorks). It's also clear from this interview (question: "How is application programming done for a spacecraft?" Answer:"Much the same as for anything elsesoftware requirements are written, with specifications and test plans, then the software is written and tested, problems are fixed, and eventually its sent off to do its job.") that NASA has in no way tried to adapt formal verification methods for it's software, prefering instead to rely on the "tried and true" (at failing, maybe) poke-and-test development "methods."

Clearly, formal verification methods to eliminate bugs before critical software is deployed, and deployment in a system with advanced debugging facilities is a clear win for spacecraft software, and should be adapted as the standard model of development. Unfortunately, like in many other software development enterprises, inertia keeps outdated, inadequate systems going despite a strong failure correlation rate.

--

In the great CONS chain of life, you can either be the CAR or be in the CDR.

Re:Debugging in space: a case for dynamic systems. by GileadGreene · 2004-11-20 08:58 · Score: 3, Interesting

NASA has had an active formal methods/formal verification program for a number of years, located at NASA Langley. They mostly do research, but have worked on a few practical applications, mostly in the shuttle program. Additionally, JPL recently (2003) set up the JPL Laboratory for Reliable Software, which is chartered to look into formal verification among other things. The lead technologist in the LaRS is none other than Gerard Holzmann, the man behind SPIN.
Having said all of that, I'll agree that formal verification at NASA is in its infancy, and is facing an uphill battle for acceptance (witness how long the Langley group has been trying to push formal methods). It'll be interesting to see what happens with JPL's LaRS.
Re:Debugging in space: a case for dynamic systems. by Tablizer · 2004-11-20 12:08 · Score: 1

The software was written entirely in Common Lisp, and parts were verified in SPIN....mistaken as a race condition occured in-flight. With the help of the Read-Eval-Print Loop and other Lisp debugging facilities, the bug was tracked down and fixed in less than a day, and Remote Agent went on to win NASA's Software of the Year Award.....Perhaps not surprisingly for anyone who has heard about the management at NASA, C++ was selected for the successors to the Remote Agent on the grounds that it is supposed to be more reliable (this despite the fact that the Remote Agent was originally to be developed in C++, an effort that was abandoned after a year of failure).

Face it, Lisp is the Rodney Dangerfield of languages: No Respect. Always was and always will be.

--
Table-ized A.I.

Re:Chinese Threat: Keep the Source Code Secret! by Performaman · 2004-11-20 08:24 · Score: 0

Explain to me how the souce code for a computer designed to operate a slow-moving, 4 or 6 wheeled vehicle used to take pictures and to sample temprature, radiation and other scientefic data could be adapted for use on an aicraft with a crusing speed of about 84 miles per hour.
Also, China already has its own UAV. "China's armed forces have operated the Chang Hong (CH-1) long-range, air- launched autonomous reconnaissance drone since the 1980s. China developed the CH-1 by reverse-engineering US Firebee reconnaissance drones recovered during the Vietnam War. An upgraded version of the system was displayed at the 2000 Zhuhai air show and is being offered for export. A PRC aviation periodical reported that the CH-1 can carry a TV, daylight still, or infrared camera." (from http://www.globalsecurity.org/military/world/china /uav.htm

--

I have gas, but my car uses petrol.

Re:Microsoft. Where do you want to go today? by syynnapse · 2004-11-20 08:30 · Score: 1

well, it's software running on several million dollars worth of hardware that is in no way easy to troubleshoot. rebooting probobly takes many hours considering the delay associated with transmission to mars.

I remeber running windows 95, and if my pc cost a few million dollars i wouldn't want a copy of win95 within 100ft of it.

--

System.out.println(syynnapse.getSig());

Re:Microsoft. Where do you want to go today? by syynnapse · 2004-11-20 08:35 · Score: 1

to further answer:

Writing the code for spacecraft is no harder than for any other realtime life- or mission-critical application. The thing that is hard is debugging a problem from another planet: you can't put your hands on the malfunctioning system to see what's going on; you must use intuition and experience.

--

System.out.println(syynnapse.getSig());

Microsoft Windows by Anonymous Coward · 2004-11-20 08:42 · Score: 1, Funny

Hands down for any Mission Critical application.

Out of curiousity by FunkSoulBrother · 2004-11-20 08:50 · Score: 2, Interesting

Why, in the 21st century, is it necessary to fit something like the Mars rover code in 2MB of memory? If something like a Gameboy Advance or a PDA can hold 64MB-a couple gigs, what is holding NASA back, with their gigantic budget and all?

I can't imagine it would be the cost of the memory... I mean I know it costs much much more to make chips to a very strict specification, but if you are already producing so few units, isn't your cost of production going to be extrodinarily high whether you are making 64KB chips or 2MB or even 64MB?

This is not to say that I don't have admiration for fitting all that code in such a small space, but is there a reason they feel the need to do so?

Re:Out of curiousity by Anonymous Coward · 2004-11-20 09:08 · Score: 1, Informative

why, in the 21st century, is it necessary to fit something like the Mars rover code in 2MB of memory? If something like a Gameboy Advance or a PDA can hold 64MB-a couple gigs, what is holding NASA back, with their gigantic budget and all?

One thing, radiation. It cheaper to take simpler purpose designed and fabricated, bulkier chips up that dont get upset once a particle hits it then it is to send up the lates and smallest chips supersensitive to radiation but oh so fast, and add lead shielding doubling only as dead weight.
Re:Out of curiousity by The+Vulture · 2004-11-20 09:20 · Score: 4, Informative

The problem is that technology moves too quickly for it to get "NASA certified". When you send something up in space where making changes to it will be difficult, you need something that is known to be robust and reliable, that has several years of testing.

Last I read (maybe a year ago?), NASA still used 386 and 486 chips because they didn't generate a lot of heat (compared to todays machines) and could be made to withstand higher than normal forces (through extra padding on the device I imagine). They were more resiliant to the issues you might see in space than newer processors.

Simply put, if they put the latest CPU with tons of RAM in there, and it fails, how are they going to fix it?

-- Joe
Re:Out of curiousity by GileadGreene · 2004-11-20 10:21 · Score: 4, Informative

Shielding does not protect against single-event upsets (particle-induced bit flips), it only provides some mitigation against total ionizing dose (which causes long term cumulative degradation as a result of drift in transistor operating parameters). There are design techniques and fabrication processes that can reduce the likelihood that a circuit will suffer upsets, but it's still standard practice to provide either redundant memory, or error detection and correction coding. In the case of MER they had 3 physically separate PROMs carrying identical copies of the flight software, and the RAM was (IIRC) protected by an EDAC code implemented in a rad-hard FPGA.
Re:Out of curiousity by arnasobr · 2004-11-20 10:26 · Score: 4, Informative

Feature size. The smaller the feature (think gate level), the higher the chance it will be ruined by random radiation exposure. And that's the one-sentence summary of the "Radiation Effects on Microelectronics" class I took about 7 years ago.

Smaller memory capacity for a given surface area implies larger feature size.

By the way, the class I took was 1-on-1 with Prof. Stephen McGuire at Cornell. Extremely cool guy.
Re:Out of curiousity by grozzie2 · 2004-11-20 22:45 · Score: 3, Insightful

This just illustrates why /. folks are typically not actually involved in spacecraft design and deployment. If you were, you would know the real reason for this, and wouldn't ask the question (which is not a dumb question btw).
In the real world, once you get up in the vicinity of the Van Allen belt, you get into hard radiation. If you use typical modern high density chips, with 0.15 micron die spacing, a single particle will short/damage half a dozen traces on the chip on a single impact. If you use really old stuff, with 5 micron die spacing (and higher), a particle will be to small to get multiple traces in a single impact. you may still get a single bit flip, but, ecc will catch that, and you can deal with it. In the former case of a high density die, the failure would end up being catastrophic when a particle impacts the chip. There are practical limits to the size of die that can be mounted on a carrier, and the trace density defines the capacity of that die. Yes, it's possible to cram 32 meg of ram into that space, but, it wont last but a few minutes in a hard radiation environment. Take that same silicon wafer, using 5 micron traces, and it'll last years exposed to the same environment, but, it'll only have 1 meg of useable ram locations due to the decrease in density. you cant just throw more of them on, because then power consumption becomes the issue, in overly simplified terms, the chip is going to use power relative to it's surface area, matters not if it's got 1 or 32 meg of addressable locations in that area. Clock frequency is the other major contributor to power consumption, hence its not uncommon at all to see space hardware measured in KHZ rather than MHZ and GHZ like most folks are used to, and there are damn good reasons to leave it that way.
An all up spacecraft platform has hard limits on physical size (constrained by the physical limits of the launcher), and hard limits on total mass, determined by the launch vehicle capability to the final trajectory required. The final design will budget a portion of it's mass allowance to power generation, and that power is in turn budgeted to various systems. the folks doing the controllers will have a hard limit on power consumption, another on volume, and a third on mass. working within those limits, they have to design and deploy a system that is expected to have 99.999999% reliability, operating in conditions more extreme than it's possible to actually simulate on earth.
Its a shame, but there is one thing they dont seem to teach in computer science courses anymore. Out here in the real world, reality gets in the way of all the theory. Moore's law may well say chips will get faster, and density higher as time goes on, but it becomes irrelavent when other limiting factors get in the way. until gamma particles start to shrink, or we come up with an effective way of making sure they dont hit the electronics, 10 year old and older stuff is going to remain 'state of the art' for use in space. Die density and ability to shield are hard limitations, cant get past them, and you wont see more modern equipment going into the reaches of space till those limitations are overcome. That's not likely to happen in the forseeable future, the research in that area is all 'nuclear research' and that's all out of vouge these days, gonna take a couple more generations or a severely critical power shortage to change that.
Re:Out of curiousity by Anonymous Coward · 2004-11-30 05:05 · Score: 0

The subject of Deep-Space computers is a complex one at best. When it comes to 32-bit computers, until one year ago, the RAD6000 was the most rad-hardend 32-bit CPU available by a factor of 1000 rads; the Rad6K can suffer a 1MRAD "hit" and not suffer SEU whereas the closest competitor could only handle 1KRAD - and that was a MIPS based CPU, NOT an X86 family chip.

If you can afford to wrap your space hardware with lead, use a PC; otherwise you use hardware that's resistant and well-known.

-Coward Who Knows

Spacecraft by sheetsda · 2004-11-20 08:56 · Score: 3, Funny

Writing Code for Spacecraft

My first thought was "Spacecraft? is that a new Starcraft clone I hadn't heard about?". It was then I realized I've been hanging out on the Game Programming Wiki too much lately.

Re:Chinese Threat: Keep the Source Code Secret! by Anonymous Coward · 2004-11-20 09:08 · Score: 0

YHBT ;)

Re:Chinese Threat: Keep the Source Code Secret! by Sj0 · 2004-11-20 09:37 · Score: 1

I'd point out how stupid your arguement is, but I don't think I really have to. It speaks for itself.

--
It's been a long time.

Hell yes! by devphil · 2004-11-20 09:38 · Score: 4, Insightful

Why would you even want memory protection in a system like this? Memory protection is great to prevent crappy apps on your PC from doing too much damage, but in a system like the Rover it's pure overhead.

Exactly!

The problem is that most /.ers are used to thinking of an OS as something that needs to run any arbitrary program under any arbitrary conditions and survive any arbitrary crash in those programs.

For a Rover, none of those are true. They know exactly what code is going to be run. They know exactly where it's going to sit in memory. And they test it. (This is the part that /.ers can't quite understand.) They test these programs far more rigorously than any bog-standard x86 Linux OSS program ever gets tested. Those programs have their problems, but they will be mistakes in logic (metric/imperial conversions, or thread priority inversions), not segfaults because of derefing a null pointer.

I wonder how many undergrand CS degree programs still teach correctness proofs? Not "yeah, I ran it lots of times and it didn't crash," but "I ran it 100,000 times with 100,000 different inputs, all random, and it didn't crash, but while it was running I also sat down and mathematically proved the code is correct."

Embedded programming is just plain different than "normal" progrmming. It's usually a mistake to try to generalize from one to the other.

(All that said, the next version of VxWorks is advertised to optionally support a "traditional Unix" process model, and I think protected memory boundaries are one of the features. In case your embedded app needs to run arbitrary third-party software which probably doesn't get stress-tested at JPL :-), you can turn all that stuff on and live with the overhead.)

--
You cannot apply a technological solution to a sociological problem. (Edwards' Law)

Re:Hell yes! by murdocj · 2004-11-20 12:12 · Score: 1

Not "yeah, I ran it lots of times and it didn't crash," but "I ran it 100,000 times with 100,000 different inputs, all random, and it didn't crash, but while it was running I also sat down and mathematically proved the code is correct."

In our work we have to use a component supplied by what is essentially a parent company. One high level developer/manager is very proud of the fact that he runs tests with random input. The component often still has serious, basic problems when we get it. I'm not convinced that random input testing does much to find bugs.
Re:Hell yes! by devphil · 2004-11-20 18:49 · Score: 1

I mostly agree with you, but was trying to make a rhetorical point. :-) As you say, proper testing does more than just spew bits at the input pipe.

--
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
Re:Hell yes! by Inthewire · 2004-11-30 03:18 · Score: 1

RYS::Dildos

--

Writers imply. Readers infer.

And on top of all that... by devphil · 2004-11-20 09:42 · Score: 2, Informative

...the memory inside the Gameboy Advance and whatnot isn't radiation-hardened.

The grandparent poster needs to RTFA, and note what had to be done to protect circuits from Marvin the Martian's cosmic rays. The chips get physically bigger (sometimes a lot bigger), and that builds up quickly.

--
You cannot apply a technological solution to a sociological problem. (Edwards' Law)

Similar, though terrestrial, problems by renehollan · 2004-11-20 09:52 · Score: 5, Interesting

I'll leave out the names to protect the guilty.

About five years ago, I worked for a major test equipment manufacturer who was contracted to deliver a test system for POTS lines (which could eventually do ADSL prequalification) to a national telco in a major European country. The idea was to test every POTS line in the system (millions of them) every night to detect early signs of degradation so repair crews could be dispatched before dialtone was completely lost.

As you can imagine, this involved a distributed system of test heads in each central office, networked back to a central command and control site. The sysem worked well, but had one flaw: downloading new firmware to the test heads was fraught with problems, and often led to the test head "locking up", even though a backup copy of firmware was always present, along with a hardware watchdog timer (though it was possible to lock out the watchdog interrupt, particularly when reprogramming flash, so it was a less than perfect watchdog). In these situations, one had to dispatch a "truck roll" to the affected central office, and replace EPROMs by hand.

Needless to say, the customer was pissed. More worrying was that even if we fixed the software download problem (which we were unable to reproduce in the lab), was that we'd be paying for truck rolls all over the country. This was a not insignificant amount of money.

Management frittered away time, instead of authorizing a root cause analysis, by requesting tweaks to TCP/IP operating parameters, and testing to see if the problem was getting better or worse. This did not prove illuminating, time was wasted, and the customer was getting royally angry.

Finally, a small team of us were permitted to undertake a root cause analysis to find and fix the problem: the engineer responsible for the embedded flash file system, the telecom engineer on the control side, and I: responsible for the embedded O/S, and TCP/IP stack (inherited from the supplier of the embedded O/S). We wanted a month. We got two weeks. Remember, deploying experimental software to live COs requires so many layers of approval, it isn't funny, and we were worried that would be our biggest bottleneck.

Finally, the controller telecom engineer was able to reproduce the problem, by attempting to download software from our controllers to deployed equipment in a single central office (getting permission was a feat in itself -- while there was little danger of affecting telephone service, this was a live CO).

The problem was clear: the data network was slow (9600 b/s over an X.25 PVC, carrying PPP-encapsulated TCP/IP), resulting in the use of large MTUs to minimize packetizing overhead (latency wasn't an issue - throughput was). Because of the way the controller's TCP/IP stack worked, it misestimated the packet/ack round trip time: it used a one byte payload for the first packet, and full MTUs after that. The resulting packet ACK timeout and retransmissions exposed an inconsistency between controller and embedded TCP/IP stacks that caused the embedded system to lock up.

Great. Now, how to fix it?

The fix wasn't a big deal (I implemented a fix in the embedded TCP/IP code since we didn't have source to the controller TCP/IP stack), but deploying it was: remember we couldn't download the code sucessfully, and we didn't want to pay for a truck roll.

At this point, I proposed something daring: download a small patch, in as few packets as possible (we could send three full MTUs safely). which would patch the existing code in place, which would be good enough to reliably download a complete replacement.

The thought of "self-modifying code" freaked management out to no end: it went against every rule in the book. But all three of us stood our ground: the only other alternative was a truck roll to each central office in the country. Reluctantly, we were allowed to proceed with that fix.

At this point, we had about ten days left. I had managed to get approval to pipeline the dev and tes

--
You could've hired me.

Re:Similar, though terrestrial, problems by forkazoo · 2004-11-20 13:38 · Score: 1

At least they gave you a favorable writeup. They could have put something ambiguos, like, "Was involved with the software update problem." :)
Re:Similar, though terrestrial, problems by anubi · 2004-11-20 20:16 · Score: 1

I love reading posts like yours. There is nothing like the "tales from the trenches" to illustrate troubleshooting techniques.
Unfortunately, your post illustrates a problem that we techies are really ill-equipped to solve - trying to get all the bits of the "and-gate" formed by all the layers of hierarchical management to all go to "1" at the same time.
I found there was nothing more frustrating in the world than knowing how to do something, knowing the inevitable result of continuing on the present course, and being unable to do a damn thing about it. Its like being the only one on an express train who knows the bridge ahead is washed out, while the people in charge just dismiss me as a lunatic holding up their show. Yeh, cute little Dilbert cartoons and poems about running the train help, but having the knowledge of the inevitable outcome is painful. I have often heard others lament about the burden of knowledge as well.
Often, I have wondered if ignorance is indeed, bliss.
Yeh, and that "review". I can read your post and know what you did. It makes perfect sense to me. Only someone who knows what they are doing could author such a post. I get so frustrated with these "big-picture" men who seem to have no idea of the criticality of the 'little stuff' and think of it as of little significance. Just one little thing, like you had the skills to find, can and will bring an entire company down, just like the malfunction of one little gene will bring an entire living organism down.
Those are nice memories of a technical accomplishment requiring a full understanding in order to solve the puzzle - and come out with a 100% solution. I only wish others besides other techies in a tech group had an appreciation for the skills involved. This was more than a touchdown - stuff like this is what saves a company, and the jobs of all the other people employed in it.
A helluva lot more important than a touchdown in any football game in my book.

--
"Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
Re:Similar, though terrestrial, problems by renehollan · 2004-11-20 20:34 · Score: 1

I've had asshole bosses too: I remember one idiot who asked if I could develop communication system over serial and "ethernet" hardware.
No brainer: TCP/IP (already supported) over Ethernet or PPP (serial). I quoted 2-3 weeks to implement the application layer stuff over that.
Idiot insisted that "for an embedded system", TCP/IP was "too fat" of a footprint: replace it with a home grown solution: we "only" had 128 MB RAM, "after all".
My protests that anything I could do in three weeks would be unlikely to be "leaner" than an embedded TCP/IP implementation fell on deaf ears. I was asked to implement a TCP/IP look- alike in three weeks from scratch.
Well, it took six, (admitedly for a functional, but half-assed job) and I got shat on royally for "taking too long".
So, I too, have been in dev hell.
Giving my resignation to that asshole was a most pleasant experience. The $7k or so relo expense payback was not so much fun, but easily offset by the signing bonus for my next job.
My advise to those with idiots for managers is to keep your nose to the ground, and look for a better opportunity. Don't quit until you get something better lined up.
This guy was a real piece of work: two days after I started, he started sending me anti-American email (being a Canadian firm in Canada, this was fair game). Since my son is an American citizen, I was not amused.

--
You could've hired me.
Re:Similar, though terrestrial, problems by renehollan · 2004-11-20 20:52 · Score: 1

I think a bigger problem is that we give greater credence to bullshit counterarguments than they deserve.
For all our apparant risk-taking and bravado, I think that technical people are ultra-conservative: we're just better at evaluating technical risks than most.
So, when someone counters with a "what if" that challenges a theory of a problem that we have, we give it disproportionate attention: anything possible and risky must be disproven before we can proceed.
The right thing to do, of course, is to take the counter-arguments to our theories in stride, and disprove them by carefully crafted experiment. That's the scientific method.
Alas, the world does not work that was: merely accepting the possibility that one's theory may have a flaw is an acceptance of defeat. Game over. You don't get to run that experiment. At least that's the way it is nine times out of ten.
Even daring to speak up against such odds is folly: leadership is born not of technical acumen but of those who have amassed sufficient power to discharge a challenge on the weakness of the slightest uncertainty instead of the merits of an objective test. It is also a precarious position: butressed more by luck than skill. I don't know if "leaders" sleep peacefully because they can shrug off uncertainties or because they are ignorantly oblivious of them.
The best one can do is to be truthful without angering the "gods" too much and to satisfy one's self with the somewhat smug knowledge that fire was tamed by an engineer willing to risk getting burned and not a manager afraid of a blister.

--
You could've hired me.
Re:Similar, though terrestrial, problems by ballpoint · 2004-11-21 04:41 · Score: 2, Funny

Are you telling me that the company you work(ed) for was partly responsible for ONE OF THE MOST ANNOYING THINGS I EVER SUFFERED FROM ?

Some years ago, I started being waked up haphazardly by the phone ringing. The day of the month was random, the day of the week was random, the time of the night was random between 2 and 5 AM but it sure freaked me, and my wife, out.

Calls to the telco had no effect. They tested (or at least pretended to) the line and said: "Oh no Sir, everything is fine!".

I ended up connecting a digital storage oscilloscope on the wires and leaving it running overnight, tweaking the thresholds until I was finally able to capture the overvoltage pulses that were causing my phones to ring.

Armed with the proof I managed to get through to a technically competent person in the telco with some authority (imagine the immenseness of that accomplishment !).

It ended up with the telco contacting the manufacturer of my (telco certified) phone switch to work out a correction, that manufacturer sending a technician turning up to solder a few resistors and a cap in place and charging me the equivalent of $100 for something I could easily have done myself.

<shouting time="again&again">F*CK F*CK F*CK</shouting>

--
Flourescent (adj): smelling like ground wheat.
Re:Similar, though terrestrial, problems by renehollan · 2004-11-22 04:06 · Score: 1

Unlikely.
The automated tests were not designed to make the phone ring, unless it was way too sensitive.
Of course, that might have been the case, if you had a defective phone.

--
You could've hired me.
Re:Similar, though terrestrial, problems by ballpoint · 2004-11-22 21:25 · Score: 1

The telco confirmed that it were automated line tests. They were performed at a voltage below standard ringing voltage, but way above what you would see normally off-hook and with a different frequency.

My phone switch wasn't defective, but just reacted badly to these new-fangled signals that didn't exist yet when the switch was certified by the telco.

--
Flourescent (adj): smelling like ground wheat.
Re:Similar, though terrestrial, problems by renehollan · 2004-11-23 05:45 · Score: 1

My phone switch wasn't defective, but just reacted badly to these new-fangled signals that didn't exist yet when the switch was certified by the telco.
"Certified by the telco" probably just means that it was deemed acceptable to connect to their network, i.e. that it would do no harm, not that the telco would guarnetee that it would work as you expect -- that would be between you and the manufacturer.
Does the telco in question sell or lease their own phones, or offer an operational (and not just interconnect) certification? If so, using anything else is likely at your own risk.
There should, at least, be minimum and maximum ringing levels (frequency, cadence, duration, and level) at which the ringer should ring, and specs for acceptable non-ringing signals.
I know that the equipment we provided stayed well within the specs documented by the telco in qustion -- "ring tap" was a problem we took seriously.
Of course, I have no way of knowing if those specs were "sloppy" or enforced on certification of third-party equipment, leading to your problem. If the problem you encountered were widespread, we certainly would have heard of it.
I suspect that your equipment was just outside the design parameters that were followed (that's always a risk when third-party stuff is involved). From your description of the fix, it sounds like it lacked a "snubbing network" around an electronic ringer that it should have had.

--
You could've hired me.

Re:Chinese Threat: Keep the Source Code Secret! by wo1verin3 · 2004-11-20 10:00 · Score: 1

Put it on SourceForge and watch what branches appear :)

hmm... - Release and hold a contest by UnapprovedThought · 2004-11-20 10:45 · Score: 1

If they release [part of?] the source then they should also release their test cases as well, and then award cash prizes to whomever is first to find and confirm input datasets that result in a new bug, by posting it to an online forum. Of course, this would probably be most useful for the next rover design, but may require extra work to set up that may make the effort more expensive than doing it yourself (in the short term). But, if even one major bug is found this way I think the effort could easily pay for itself. Surely a metric unit conversion error would be spotted easily this way.

Of course, this is in an Ideal World where the OS is not platform-specific and could be run under Linux (similar to how an instance of Linux can run under itself to allow quicker testing of kernel patches).

From then on, outgoing communications to the rover would probably need to be encrypted :) but it is probably just as well, as long as they don't give out the key, the communications frequencies, the exact location, etc.

FAT on flash by freakmaster · 2004-11-20 10:57 · Score: 1

A coworker of mine is running VxWorks and is using FAT on flash memory. He tells me that he's actually running another filesystem layer underneath FAT which uses some algorithm to spread out the updates more evenly so that one particular section of the flash memory is not wornd out much faster than the rest of the memory.

Unfortunately I don't know any of the details offhand.

Ditto by wowbagger · 2004-11-20 15:17 · Score: 2, Insightful

I as well have had the misfortune to pick WindRiver as the core OS for my project, and have had no end of problems.

Part of the problem in my case was that VxWorks is for smaller embedded systems, which my project is NOT. I need fast disk storage, I need graphics, I need networking, I need things that VxWorks just doesn't provide very well.

Were I able to change one decision about the design of my project, I would have gone with Linux instead.

WRS *used* to have something to offer, in that they provided a real-time OS and hardware driver bundles (board support packages in WRS-speak). However, they no longer provide great value in that area - Linux has far better hardware support, and for any reasonably complex project will scale down as well as VxWorks will scale up.

--
www.eFax.com are spammers

Sig reply (Similar, though terrestrial, problems) by NaDrew · 2004-11-20 16:44 · Score: 1

You cite two /. articles in the "Publications" section of your resume. What kind of response has this received in interviews?

--
Vista:XPSP2::ME:98SE

Oh, another planet is nothing... by Anonymous Coward · 2004-11-20 16:53 · Score: 0

They should try debugging from my QA lab. That'll give them a run for their money.

NASA may use another OS when... by Anonymous Coward · 2004-11-20 18:05 · Score: 1, Interesting

NASA may consider using a new OS after it has finished V&V in house and by an independant testing company (per NASA procedures) and it has flown in space successfully. An order of magnitude estimate is ten times the development cost.

VxWorks is a well known OS with lots of experienced users. Priority inversion is a known problems, just set the SEM_INVERSION_SAFE flag in semMCreate() to fix it.

Besides making the OS, Wind River also sells Tornado(R) and other tools for developing, debugging, and testing embedded realtime code running in the target computer. Anyone who has ever done embedded and realtime code knows good tools are mandatory with any complex system.

VxWorks runs in a flat file space. There is no segment protection, but code does get extensively reviewed and tested so bad pointers are not a problem. Preventing memory fragmentation requires good design, the solutions are will known, and more reviews and testing.

The last time I priced a run time license (most satellites need two licenses), it was noise ($400) compared to the labor required to build a spacecraft.

I a VxWorks user in the space buisness.

_Richard

I don't remember a C Language Interpreter by Anonymous Coward · 2004-11-20 18:14 · Score: 0

I don't think VxWorks has or had a C Language interpreter. VxWorks is in transition from a tcl shell to a Java bases shell. Prior to tcl, VxWorks has some type of shell script language. C code is hard enough to parse in batch mode, I would not want to try and write an runtime interpreter.

Can anyone confirm this, or fill in what I don't remember?

_Richard

Re:I don't remember a C Language Interpreter by Anonymous Coward · 2004-11-30 04:21 · Score: 0

Confirmed: the target-side shell has a built-in C expression interpreter; it is not full-featured, but it provides a very powerfull tool that can call just about any function that's been loaded to the target (C or ASM) that is not declaired as a static, and if you know the right tricks you can call the statics as well.

-Coward who knows

Huh? by devphil · 2004-11-20 18:54 · Score: 3, Insightful

I need fast disk storage, I need graphics, I need networking, I need things that VxWorks just doesn't provide very well.

"...and even though I chose the wrong tool for the job, it's still the tool's fault for not doing everything I need."

--
You cannot apply a technological solution to a sociological problem. (Edwards' Law)

Re:Huh? by wowbagger · 2004-11-21 02:23 · Score: 3, Informative

It's called:

"WindRiver portayed their tool as being able to do those things, thus I made the wrong decision based upon the false claims of the manufacturer."

You see, WRS would have you believe that VxWorks has a reasonable disk subsystem, even though they have no option of using DMA for the data transfers, a fact they convienently don't make available.

WRS had a port of XFree available for VxWorks. However, they did not release the source for it, and they stopped supporting it, and thus it fell behind in support for the video chips now in use. Of course, they did not inform developers of their impending decision to drop support until it was too late.

WRS has a TCP/IP stack. However, they did NOT have support for DHCP, nor DNS, and on certain platforms their stack has gross errors (e.g. packets being shifted by one byte so that when the reach the application they are corrupted.)

WRS claims to have board support packages so that you don't have to develop them. They don't mention that they don't support half the hardware on most boards (e.g. they don't enable the cache on XScale processors, halving the speed of the processor).

WRS claimed they would support development under Linux as a host OS "within a couple of months" - that was back in 1998. They started supporting development under Linux this year - and then not very well.

Yes, I choose the wrong tool for the job - because WRS did not correctly represent their tool's capabilities and there was no other way to evaluate the capabilities of the tool.

--
www.eFax.com are spammers

Re:Sig reply (Similar, though terrestrial, problem by renehollan · 2004-11-20 20:38 · Score: 1

You cite two /. articles in the "Publications" section of your resume. What kind of response has this received in interviews?

Honestly, it never came up.

Look, a resume proves you can "talk the talk".

An interview is your opportunity to prove that you can "walk the walk" as well.

--
You could've hired me.

Windows has it beat! by Anonymous Coward · 2004-11-21 01:06 · Score: 0

My Win2k machine has a 1,702,800 byte NTOSKRNL.EXE, and that's not compressed. Using NTFS comression it gets down to 1,286,144 bytes.

The first smiley :-) by Anonymous Coward · 2004-11-21 02:03 · Score: 0

A few links away from the article is the first bboard post that proposed the smiley as joke marker:

19-Sep-82 11:44 Scott E Fahlman :-)
From: Scott E Fahlman

I propose that the following character sequence for joke markers: :-)

Read it sideways. Actually, it is probably more economical to mark
things that are NOT jokes, given current trends. For this, use :-(

but a crash shutdown Spirit for two weeks! by peter303 · 2004-11-21 06:47 · Score: 3, Interesting

It was around the first or second month of operation this year, but Spirit was unusable for a couple of weeks due to an OS failure. The symptom was Spirit tried to reboot itself about 20 times in a row- a default practice if something drastric happens. It was traced (according to the rumor mill) to flash memory overflow. Supposedly the VxWorks file manangement system improperly updated its flash memory free-inode list. So the memory appeared to run out of space.

The nice thing about software is that JPL was able to upload a patch and get both rovers working properly again. They reconfigured the Galileo mission to the bypass the broken high gain attenna and use the hundred times slower low gain attenna with software patches and achieved most of the mission objectives.

An important point by m3talsling3r · 2004-11-22 05:54 · Score: 2, Insightful

I hope no one overlooked the "radiation hardening" part of the article. This is something the common, and even a lot of techs I talk to, don't realize as important. Speed is not the only variable in the equation. I'd much rather have a chip that doesn't fall to pieces on me while I'm flying through space. In fact I think it's time for us normal people to get used to thinking about quality again. We are soon going to be forced into harsh elements where we must be able to depend, absolutely, on the hardware being reliable. It's time we start now getting used to the performance loss some might have because of it; or get ready to ditch thin again.

--
My sig is as boring as you...

RTFA by Anonymous Coward · 2004-11-22 09:17 · Score: 1, Insightful

If you had read the article, you would have discovered that JPL had full source code to VxWorks. The article belabors the fact that the folks at WindRiver went out of their way to make sure that JPL could complie the entire system from scratch.

I'm as fervent a WindRiver basher as the next guy. But at least bash them for things they are *guilty* of. Sheesh!

204 comments