Houston, We Have a Software Problem

← Back to Stories (view on slashdot.org)

Houston, We Have a Software Problem

Posted by chrisd on Sunday September 8, 2002 @11:08AM from the nasa.sourceforge.net dept.

An anonymous reader writes "The computer system that launches the Space Shuttle is an old, but important, computer system. It is built from mid 70's technology and features SSI chips like 7400's...which are getting hard to find. It has 64k of memory and no room to repair any software bugs. NASA started the CLCS project in 1996 which uses state of the art computer languages, OO methodologies, and hardware. Everything that you could actually hire people off the street for. However, NASA is in a budget crunch with the Space Station cost overruns. It is looking to trim costs to keep the Space Station going. There are stories about CLCS getting cancelled here and these guys say its already cancelled."

13 of 319 comments (clear)

7400s hard to find? by Istealmymusic · 2002-09-08 11:27 · Score: 5, Informative

I don't know about everyone else, but when I was a kid I got a Radio Shack 300-in-1 electronic project kit for my birthday which came with a dozen or so 7400 chips. When I plugged one in backwards I just went down to my local Radio Shack and picked up a new 74LS00, which they had plenty of in stock all the time.
Certainly the 7400 series as a whole is still widespread and used in hobbyists kits, I'm not that old. Maybe the original 7400 is becoming obsolete, being replaced with the 74LS (low-power Schottkey) or CMOS chips? If then it shouldn't be too difficult to replace the TTL logic with CMOS logic, given a few adjustment levels in voltage, or they could use the TTL-logic and CMOS-logic in one compatible chips.
Of course, the 5400 series SSIs (small-scale integrated circuits) are preferred over the 7400s for industrial purposes, and as a plus they are completely backwards compatible. Why isn't NASA using those?

--
"The lesson to be learned is not to take the comments on slashdot too literally." --Vinnie Falco, BearShare
1. Re:7400s hard to find? by mikewas · 2002-09-08 11:55 · Score: 5, Interesting
  
  The 54 series parts were like the 74 series, but in a hermitically sealed case, 100% tested over a wider temperature range, and burned in to remove infant failures. For this application they used space qualified components. The same as 54 series parts, more stringent tests, and now the chips are also evaluated for radiation resistance. Any change in the design or production process and the 54 & space qualified chips must be requalified. What can happen is that a chip is produced to be fuctionally the same, but using smaller geometries, and now is more suseptiple to ESD and radiation.
  CMOS chips, because of their high impedances, are notorious for ESD and rad sensitivity so they won't do.
  With the reduction in military, aerospace, and space spending many manufacturers have dropped the 54 series and space qualified components. They haven't made any attempts to add replacements in their product lines.
  When a part is dropped, the manufacturer usually informs the industry of their intent. You're given a date & price for a final order. the theory is that you can buy a lifetime supply of these parts. Industry isn't likely to but any more than they need to complete existing contracts plus a few spares, there's no guarenty that you'll get any more contracts to build items requiring these parts so these purchases will cut into your profits. Government procurment may buy additional components, but lack funding to really buy large quantities.
  An opportunity is presented, and they will be taken advantage of. A distributer might buy some additional parts -- since the distribributer has several customers buying a particular part from him, his risk of being stuck with an unseable component is small.
  After the final production run, the chip manufactorers will sell the documentation, tooling, and rights to make a chip. There are small manufacturers who buy these, all well as the out of date machinery to produce these parts. They can then make small production runs, sometimes under a hundred components, for a price. In addition, they might buy untested dice or wafers from the last production run. The untested & unpackaged componets are very cheap, so it's more affordable & less risky to buy and store these than the completed components.
  So it is possible to still get the parts needed? -- at a price!
  
  --
  
  "Glory is fleeting, but obscurity is forever." --Napoleon Bonaparte
Re:Why not simulate it? by rodgerd · 2002-09-08 11:28 · Score: 5, Insightful

Auditing the emulator and the host OS would be a problem - the code they've currently got has a very low rate of bugs, and has been extensively audited. NASA knows everything from the hardware up, exactly what the failure rate is and so forth.

Now, imagine you take modern commodity hardware (which changes periodically - look at how often Intel silently release new steppings of their CPUs). You're not going to have a guarantee of consistency there. You're going to have to boot an OS off it - and even the simplest RTOSes are still much, much bigger than the whole platform currently. Then you need an emulator. Then you need the system. And the only problem you've solved with all that work is the unavailablility of the old hardware - you still have a old machine language on a tiny platform which can't be easily extended for new functionality.
Hey, O'Keefe, look what I found on SourceForge... by Boss,+Pointy+Haired · 2002-09-08 11:35 · Score: 5, Funny

What?

"shuttle_launcher_0_1"

Excellent. That'll save a few dollars. What's the development status?

"1 - Planning, sir"

Ah.
A Simple Solution by NeuroManson · 2002-09-08 11:37 · Score: 5, Funny

(1) Print up 50,000 numbered authenticity certificates...

(2) Break down the old mainframes until you have roughly 50,000 pieces...

(3) Sell it on eBay (or other auction sites) as space memorabilia, mention that the computer the parts came from were responsible for guiding the Apollo missions to the moon, etc and so on... The machines are SO obsolete now that the only way they could pose a security risk is by sending them back in time...

(4) Profit!

(5) Buy a nice little beowulf cluster, hire 20 Linux geeks and feed each of them $50 in dew and pizza in exchange for setting up the system...

(6) Use remaining funds to pay the Russian space agency to have a little "airlock accident" for that Nsync guy...

--
Just because you can mod me down, doesn't mean you're right. Shoes for industry!
Oh come ON guys!!! by nettdata · 2002-09-08 11:48 · Score: 5, Funny

It's not like this is rocket science!

Oh, wait....

--

$0.02 (CDN)
More shuttle development? by timeOday · 2002-09-08 11:54 · Score: 5, Insightful

The code in the Shuttle's launch system is old? The entire Space Shuttle is old. I'll bet a lot of slashdotters don't even remember the Columbia's maiden voyage.
I'm not one to replace things that are working fine, but as I understand it, newer designs could be a whole lot cheaper to operate. So I wonder if pouring more into the Space Shuttle program is the best thing to do.
I'm not saying "let's throw out the space shuttle" but it bothers me that there's apparently nothing in the works with a decent shot at replacing it any time soon. It seems the field of space exploration is becoming antiquated.
Re:Why not simulate it? by io333 · 2002-09-08 11:54 · Score: 5, Interesting

There comes a time in every products lifetime when its time to start over,.

Exactly. And that includes the shuttle. It has never lived up to what it was envisioned to be and it is only going to become more costly and more failure prone in the future as every bit of hardware on that pig is already showing signs of fatigue.

There are many launch systems that cost far less per pound to throw things into orbit. The reasons we still have those monstrosities flying are political only, not technological or scientific.

Sure this is flamebate. (Gosh, getting rid of the old karma system is so LIBERATING!) But if we can discuss how some little bits of hardware in the shuttle are past their time, why can't we discuss the big bit?
Re:Why not simulate it? by WasterDave · 2002-09-08 12:06 · Score: 5, Insightful

This is a very pertinent point that appears to have been lost on the initiators (and now burger flippers) of the replacement-launch-thingy project.

What they have, right there, is one spectacularly reliable piece of software. I suspect it's significantly more bug free than even the microcode in a modern processor, let alone the companion chips, bios, operating system, and virtual machine for some god awful p-code language (not that I'm naming names here).

The question that should have been asked is "how can we make a sustainable process for making extremely reliable control computers?". How to go about cutting custom silicon, tiny os's etc. How to save the happy tax payer hundreds of millions of dollars by reselling these services to people making nuclear power stations, heart pace makers etc. instead of going shopping for big sun boxes.

Oh well, reality strikes again.

Dave

--
I write a blog now, you should be afraid.
Space Computing: Some Numbers by aebrain · 2002-09-08 12:32 · Score: 5, Informative

From an article in the Sydney Morning Herald .
Only 58 centimetres square and weighing 50 kilograms, the tiny FedSat satellite is packed with five scientific experiments and all of the instruments required to communicate with Earth during its anticipated three-year life. At the heart of the satellite is a 10MHz ERC-32 processor - a SPARC-based 32-bit RISC processor developed for high-reliability space applications.

The ERC-32 sacrifices processing power for durability and reliability. It uses three chips to process a modest 10 million instructions per second and two million floating-point operations per second - less than 1 per cent of a Pentium 4's capabilities.

The pay-off is reliability: the ERC-32 uses concurrent error-detection to correct more than 95 per cent of errors.

Power-hungry microprocessors such as the Pentium 4, which runs a standard office PC bought off the shelf today, would be an intolerable burden on the solar-powered satellite. The ERC-32 consumes less than 2.25 watts at 5.5 volts.

Designed to survive extreme radiation bursts from solar flares, the ERC-32 can tolerate radiation doses up to 50,000 rad. This is 100 times the lethal dose for humans.

...A team of Australian programmers developed FedSat's onboard software, building on work done in Britain. It is written in Ada-95, a programming language designed for embedded systems and safety-critical software. All it has to work with is 16MB of RAM, 2MB of flash memory for storing the program, a 128K boot prompt and 320MB of DRAM in place of a hard disk that would never survive the launch process. All essential data is stored in three physically different locations.

The software is built in a similar way - lots of internal checks, tell-me-thrice memory, soft-failure-bit-flip-correcting daemons etc. In this case, lives aren't at stake, but the people doing the programming are used to situations where they are.

--
Zoe Brain - Rocket Scientist
1. Re:Space Computing: Some Numbers by aebrain · 2002-09-08 17:37 · Score: 5, Informative
  
  The context was that of software for an unmanned microsatellite, not the shuttle.
  
  Crewed spacecraft have an even more strict set of rules attached to the software development process. Have a look at some of the articles on DO-178B, the software development standard for avionics. Similar issues apply, but even more so.
  
  Look, people - not Geniuses - just normal, everyday programmers - have been making software you can bet your life on for a long time now. We know how to do it even more cheaply than the normal buggy commercial work (though testing is radically expensive and blows out the total cost). There's no need, and no excuse, for BSDs and security problems. None. You just have to have the right tools, the right training, and the right attitude. If you like, the Right Stuff. Here's a quote from that article:
  It's strictly an 8-to-5 kind of place -- there are late nights, but they're the exception. The programmers are intense, but low-key. Many of them have put in years of work either for IBM ( which owned the shuttle group until 1994 ), or directly on the shuttle software. They're adults, with spouses and kids and lives beyond their remarkable software program.
  
  That's the culture: the on-board shuttle group produces grown-up software, and the way they do it is by being grown-ups. It may not be sexy, it may not be a coding ego-trip -- but it is the future of software. When you're ready to take the next step -- when you have to write perfect software instead of software that's just good enough -- then it's time to grow up.
  People like myself look upon any work over about 7 hours a day more than twice a month as signs that "I personally screwed up", because I'm the guy who sets the schedule, not some PHB. We have lives. We have kids. We have hobbies. And the stuff we do is hard, the systems do a lot more than most commercial apps, and with far fewer memory and CPU resources. It's both incredible fun "boldly going.." and all that, but also a crushing responsibility when we do safety-critical work. People's lives depend on us doing the best possible job we can.
  One area I disagree with in the "Right Stuff" article is that the work doesn't involve creativity. This is balderdash - we're doing stuff no-one has ever done before under really tight resource constraints. To get a reliable architecture often requires significant smarts, lateral thinking. Anyone can make a complex solution to a complex problem, the really good guys and gals make solutions so drop-dead simple, obviously-correct and efficient that it's miraculous how much such simple, obvious and readable code actually accomplishes.
  
  Looking at the general world of InfoTech, we see that most programmers out there would rather write the winning entry for the "Obfuscated C" contest than make some software that gets us around the solar system. And that people who make reliable software hit the unemployment queue on project completion, while those making buggy stuff have jobs-for-life in maintenance. Of course, they often have 80-hour weeks too, and are driven by PHBs who know b* all, and can't even take pride in the product, so there is some justice.
  
  --
  Zoe Brain - Rocket Scientist
Re:port the software? ... try hardware! by rodgerd · 2002-09-08 16:10 · Score: 5, Interesting

Replacing it can be harder. I used to work in newspaper publishing; the core editorial systems of one employer were old ATEX J11 systems with a proprietary, tightly integrated OS and application suite. Over time, various aspects of the system were offloaded to more modern systems (eg, PostScript output and integration with graphics from desktop systems had dedicated AIX systems, imagesetters driven by PostScript RIPs, dumb terminals run from dedicated I/O boards replaced with terminal emulators on the desktop).

Despite all this tweaking, the crufty old systems stayed in place. Why? Well, on each of these old boxes, we could support 25-30 journos and the systems just worked, grinding out newspapers day after day.

People kept talking about replacing them, not least because we had to train up operators and engineers on them every time new staff came in, parts were hard to come by (the standards-not-compatible SCSI and ethernet interfaces were picky about what they talked to, and the filesystem could only address 600 MB of disk per system), and they used huge amounts of power and floor space.

For the three years I worked there and in the three years hence no-one has been able to deliver an editorial system that just works. When vendors rolled their rigged demos in, they crash. The major vendors like CyberGraphics and ATEX couldn't point to successful implementations of their new systems producing a decent number of newspapers on the basis of more than one edition per day.

Would it have been nice to have a Unix or Windows based system? Sure. Reduced overheads and training burdens, able to buy the latest and greatest hardware, and so on. But no-one could actually deliver something that worked better than the crufty old J11 systems.

NASA are probably in a similar bind; it's a very familiar problem: old systems developed by tight, focused, skilled teams and developed over the years are very, very hard to replace.
Re:It has 64k of memory by henley · 2002-09-08 20:17 · Score: 5, Interesting

You don't mean the kind that looks like jillions of tiny tires (or black donuts) intersecting with the wires of a chain-link fence, are you?

Yes, he does mean Core Memory, and yes, the AP-101 as flown in the Shuttle from mid-70s through to mid-90s did indeed use Core memory.
Indeed, the upgrade to the AP-101s with (I think) static-column RAM took so long because Core memory has the lovely property of retaining information even when the power dies - a key factor, sadly, in the ability to retrieve information from Challenger's onboard computers after the 1986 crash. Another key factor is that Core memory is remarkably resilient to bit-flipping caused by cosmic rays and other radiation (events known as "SEUs" or "Single Event Upsets").
All of which meant that it was a major project just to replace that memory with more modern RAM. And it's not just a couple' sticks of SDRAM either - most of the space-savings you'd expect from replacing bulky core with nice compact RAM chips is taken up with additional hardware to a) provide sufficient power support to retain memory in the event of main power failure b) continually scan through memory doing parity checks to detect and correct for SEUs...
Don't diss Core, man...

--

--
I'd rather have a bottle in front of me than a frontal lobotomy