Stress-Testing Software For Deep Space
kenekaplan writes "NASA has used VxWorks for several deep space missions, including Sojourner, Spirit, Opportunity and the Mars Reconnaissance Orbiter. When the space agency's Jet Propulsion Laboratory (JPL) needs to run stress tests or simulations for upgrades and fixes to the OS, Wind River's Mike Deliman gets the call. In a recent interview, Deliman, a senior member of the technical staff at Wind River, which is owned by Intel, gave a peek at the legacy technology under Curiosity's hood and recalled the emergency call he got when an earlier Mars mission hit a software snag after liftoff."
While I buy that the landing systems need an RTOS, I doubt Curiosity does. Image processing that happens with "precision"? Do x86 processors not process images precisely enough? I get the idea of being hardened to radiation but it was my understanding we have newer processors that fit the bill on this. The rest of this seems like a rationalization for using old hardware. However, as an engineer for the government it's possible I'm just old and embittered.
''recalled the emergency call he got when an earlier Mars mission hit a software snag after liftoff."
From TFA:
Back when Spirit Rover landed on Mars in 2004, it experienced file systems problems. I got a call on landing day while I was in Southern California. I fired up my laptop and worked with three groups who were dealing with a variety of time zones: California, Japan and Mars. Since I had a RAD 6000 systems on my desk running simulations, by the end of first week we figured it out and were able to fix it.
The last thing I would want to do is program mission-critical systems. That G*d my programming mistakes are hidden in the mire of a thousand other programmer's mistakes, and never make it to the front page of /.
Sent from my ENIAC
At least one instrument running VxWorks has been flying on the ISS since 2001. I'd be surprised if it were the only one.
If you can survive eight hours, you can survive *ANYTHING*...
With my long experience with VxWorks this doesn't surprise me. VxWorks is not the most robust RTOS. Think of it as a multi-tasking MS-DOS. The version they used has no memory protection between processes and I have found numerous areas of VxWorks to be badly implemented or downright buggy. Up through version 5.3 the malloc() implementation was absolutely horrid and suffered from severe fragmentation and performance problems. On the platform I was working with I replaced the VxWorks implementation with Doug Lea's implementation (which glibc was based off of) and our startup time dropped from an hour to 3 minutes. I was also able to easily add instrumentation so we could quickly find memory leaks or heap corruption in the field, something not possible with Wind River's implementation. After reading about the problems with the filesystem I looked at the Wind River filesystem code. It was rather ugly. They map FAT on top of flash memory (not the best choice) and the corner cases were not well handled (like a full filesystem).
Similarly, their TCP/IP stack sucked as well. If you can drop to the T-shell through a security exploit you totally own the box (i.e. Huawei's poor security record).
VxWorks is fine for simple applications, but for very complex applications it sucks. At least the 5.x series do not clean up after a task if it crashes because it does not keep track of what resources are used by a task. A task is basically just a thread of execution. All memory is a shared global pool. At the time it did have one feature that was useful that was lacking in Linux, priority inheritance mutexes. These are a requirement for proper real-time performance and I believe are now included in Linux.
This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
My PVR also runs VxWorks. Given that it still crashes randomly now and again, I hope they have a better version for space probes.
the other big player in space RTOS: RTEMS.
Free, open source, rtems.org.
Has all the same problems as VxWorks.. no process memory isolation (because space flight hardware doesn't have the hardware to support it usually)....
One thing that VxWorks has that RTEMS doesn't, and I wish it did, was dynamic loading and linking of applications. You're basically back in 1960s monolithic image days, not even with overlay loaders.
that is (or was?) in newer Linksys routers, that are much less stable than the older Linux based versions..
http://en.wikipedia.org/wiki/VxWorks#Networking_and_communication_components
They start planning this years, years and years ahead. It is not uncustomary to have decided on a hardware platform five years before launch. Since there's a lot at stake for these bigger missions to succeed, they usually don't take risks and put stuff up there that hasn't proven itself. Maybe some evolution like a higher clock rate or more memory or something like that, but a new processor architecture gets tried on other things that have redundancy, lower cost or less exposure and preferably a combination of those.
I have been discussing some technology that was possibly put in an instrument on a weather/climate sat with the primary investigator of the then current mission and named to be the one of the next mission as well. This was around 2007. They had to choose the technology then, so they could work on plans and get funding around now. Once they get their funding, it will still be three to five years before it goes up there. Back then, due to the reliability demands they had for the sensor and the relative unproven state of using CMOS sensors for photon capture (common used in digital consumer cameras in 2007) they chose to go with the previous solution, that was in the current instrument. That means that they will probably launch a pre-CMOS sensor equipped instrument around 2015, because that was the best option available to them when it was decision time.
Unless we change the way we "go to space" in a radical way, I don't see the latest and greatest tech make it in missions like this. It's up there, sure it is, but only a handful people know it is and they don't want their precious black ops budget exposed or taken away from them. Once the statistics they get from the successes and failures (failing in secret "testing missions" once in a while is allowed) to a rating that makes it commercially viable to sell the tech to civilian usage, plus the state of technology used for espionage and military use is such that there isn't any tactical threat to do so, more modern tech will be used for missions like this.
I was promised a flying car. Where is my flying car?
Up through version 5.3 the malloc() implementation was absolutely horrid and suffered from severe fragmentation and performance problems.
I talked to one of Curiosity's software engineers the day it landed... he mentioned that one of their coding rules was: no malloc() allowed.
I don't care if it's 90,000 hectares. That lake was not my doing.
While I agree with most of the sentiments on the 5.x vxworks version, it has to be a said that vxworks is now at version 6.9 and is a much improved beast with a far better IP stack, support for 'proper' processes, etc. Saying that it comes with the cost of dropping or modifying a lot of API's making upgrades difficulty and to be honest looks so like linux once you've finished with it you wonder why you spent $50000 on a developer seat
As a slightly off thread, I always wondered why Intel bought windriver. One of the issues we have is that finding someone who knows the OS well is difficuilt because there is no way of getting exposure to it unless you have a lot money.
I can't help thinking Intel have missed a trick here. With the rise of the embedded hobbiest with things like Raspberry Pi, a new generation of engineers are learning, however there experience is based around ARM and linux, so further marginalising Intel in the embedded world, which is likely to be the big growth area in the future.
If intel was smart they would create there own hobbiest board based around an embeeded core duo or the like and provide a free version of vxworks to run on it. It doesn't need some of the high end features, but would provide early exposure to the OS as well as raising the profile of Intel in the embedded space.
Just a thought....
Didn't they used to do Linux distros back in the day?
Yes , I know, off topic , but just asking...
No malloc()? Interesting, I worked on a project at NG and we had same policy. Everything was on the stack or global. We had the chance to run with Monta Vista embedded Linux but someone higher up decided to go with "tried and true" VxWorks. I agree with a poster above about re-training costs and all that adding up.. but if embedded linux became standard with big companies I don't think it would take too long to make-up the costs of re-training and all the other stuff that goes with it.
An old Power PC can fly a spaceship to mars, execute a difficult landing and now semi autonomously drive a robot across the surface of a planet 30 million miles away , yet its not up to the job of writing documents using the latest word processors. Whats wrong with this picture?
I find the most revealing part of the interview that he publicly acknowledges his customers working on secret designs for space.
I'm sure those customers will deny any such project exists.
To Terminate, or not to Terminate, that's the question - SCSIROB
That is a good policy if you can do it, but in this case it was impossible. We had to use some 3rd party software which used malloc and realloc extensively. To make matters worse, for a long time we could only get obfuscated code to support the network processor we were using, meaning that it was impossible to make changes to it. We also had to make use of it because of the dynamic nature of the software. In our case it really wasn't feasible to avoid mllox. Replacing Windriver's malloc had some huge advantages. Fragmentation was horrible with the VxWorks malloc to the point where there were many tens of thousands of fragments of memory. VxWorks used a sorted linked list from smallest to largest free block. Due to the extensive dynamic reallocs, this linked list turned into a huge bottleneck.
Replacing the code with Doug Lea's malloc eliminated the fragmentation problem completely. By including the task ID and calling function's program counter in each block allocated it made it trivial to find memory leaks and keep track of how much memory and how many blocks were allocated per task or even by function.
There really was no good reason why VxWorks was chosen since there were no hard real-time requirements. The product was a mess (router and broadband remote access server) since each box had to include a Sun Ultrasparc computer running Solaris (we required big-endian) where most of the software ran. Solaris was an even worse choice. Trying to write streams drivers for Solaris was a nightmare compared to Linux drivers, especially when trying to tie into the TCP/IP stack. Not only that, Solaris was quite slow. Give me Linux any day.
The great thing about writing applications in Linux user space is that you can use tools like Valgrind to catch many of these memory leaks, uninitialized variables, etc.
This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
You probably shouldn't be using malloc() on an embedded system like that anyway. Statically allocate everything. That way you know exactly how much memory will be consumed at any time and can budget appropriately. It also reduces the chance of having a bug malloc() all your memory or running out of stack space.
VxWorks claims to have memory protection, chances are it is the CPU they are using which lacks an MMU to support it.
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
If your objhective requires an RTOS, you're probably not going to malloc(). There are edge cases, but we've successfully banished them. We don't use VxWorks, thank god, but we do use a real memory machine instead of a virtual memory machine. Getting young programmers to understand that is challenging, and getting CS grads, of all fucking people, to program for a real memory machine is just fucking impossible. We make them managers instead.
I wonder what the CPU and memory load graphs would look like for a probe versus some standard desktop applications. Might explain a lot.
policy inheritance can be handled through FUTEX_PI. Issues due to a lock-contention can be handled by the kernel via FUTEX_LOCK_PI.
Get a web developer
Your desktop word processing software also didn't have a licensing cost in the hundreds of thousands of dollars...
Get a web developer
I would imagine that landing a spaceship takes a lot more CPU than reformatting some text and drawing a blinking cursor.
Malloc is non-deterministic. The request for a pointer to return contiguous free bytes will need to search a fragmented memory map to complete the request. The duration of the search depends upon the algorithms and the amount of fragmentation relative to the size of the request. It is worse if it must rearrange memory to accomodate the request. Thus, use of malloc() is typically avoided for time-critical code in a real-time operating system.
Why do you think that? Landing a space ship can be done using analogue electronics as a control system in the 60s. That means it was simple and light enough even when analogue in design that it made it into space. The rate on the feedback loops doesn't have to be more than a few khz and the amount of processing per loop is very low. More intensive than blinking a cursor yes but is it more intensive than reformatting text? Perhaps not so.
A PID controller is say 10 arithmetic operations per evaluation and only has to be evaluated at the rate of the feedback loops. No, it's not very much processing to control a spaceship landing.
"Landing a space ship can be done using analogue electronics as a control system in the 60s"
I don't remember any system in the 60s where a skycrane had to hover in place, lower a lander down, release it then fly off. Or navigate using image recognition. If you know otherwise fill me in.
"more intensive than reformatting text?"
Oh please. Reformatting text algorithms were running on 8 bit home computers in the 70s!
Oh please. Reformatting text algorithms were running on 8 bit home computers in the 70s!
I'm sure that explains why we still don't have hypenation in web browsers and justified text sucks. Hey, browser guys! This one has a clue! You have to use 8 bit home computers!
Or navigate using image recognition.
Well, you can call image recognition anything these days, like what univesity students do in their robotic-fight competitions (based on maybe an 8bit luminosity sensor) or what any laser based mouse does to detect movement across a surface. The devil is in the details, isn't it?
Your desktop word processing software also didn't have a licensing cost in the hundreds of thousands of dollars
It would if you were the only customer and it was only going to run on one computer. Do you have any idea how many programmers MS has and what it costs for salaries and other overhead?
Free Martian Whores!
You were going okay until here. You can't rearrange memory, malloc returns pointers, and there isn't any callback to ask for that pointer back to move it to another location.
Byte compiled languages like Java can rearrange memory but you call new not malloc so I know you weren't talking about them. Garbage collection is a much bigger problem especially if you think about mixing Java and real time operations. C/C++ in realtime means following the best practices, but for Java, get a different Java http://en.wikipedia.org/wiki/Real_time_Java.