Self-Healing Computers For NASA Spacecraft

← Back to Stories (view on slashdot.org)

Self-Healing Computers For NASA Spacecraft

Posted by Soulskill on Friday April 25, 2008 @09:07PM from the it-worked-for-the-borg dept.

Roland Piquepaille writes "As you can guess, hardwired computer systems are much faster than general-purpose ones because they are designed to do a single task. But when they fail, they need to be totally reconfigured. This can be just a costly problem in a lab on Earth, but it can be vital in space. This is why a University of Arizona (UA) team is working with NASA to design self-healing computer systems for spacecraft. The UA engineers are working on hybrid hardware/software systems using Field Programmable Gate Arrays (FPGAs) to develop these reconfigurable processing systems. As the lead researcher said, 'Our objective is to go beyond predicting a fault to using a self-healing system to fix the predicted fault before it occurs.'"

18 of 70 comments (clear)

Min score:

Reason:

Sort:

The 9000 Series has a perfect operational record by 1u3hr · 2008-04-25 21:29 · Score: 5, Funny

"Just a moment....Just a moment.
I've just picked up a fault in the AE-35 Unit.
Its going to go 100 percent failure within 72 hours."
Re:The 9000 Series has a perfect operational recor by limber · 2008-04-25 21:33 · Score: 5, Funny

Daisy, Daisy, give me your answer do. I'm half crazy all for the love of you. It won't be a stylish marriage, I can't afford a carriage. But you'll look sweet upon the seat of a bicycle built for two.
Not new by Anonymous Coward · 2008-04-25 21:45 · Score: 5, Informative

I used to work for JPL, in a group that was researching the feasibility and applications of FPGAs for this exact purpose. That was around 7-8 years ago, which significantly predates this "news," given the pace of technology. IIRC, they called it "evolvable hardware."
1. Re:Not new by RiotingPacifist · 2008-04-26 03:57 · Score: 2, Interesting
  
  I wonder if it will every be cost effective to put FPGAs in consumer systems, i can see them really helping in bottlenecks (why waste cpu on doing the same processing over and over, ship it of to a specialised FPGA) and in low power situations (why wake up the cpu when you can program the FPGA to do 50% of the wake-ups), unfortnatly i can only see this helping mac & linux, as the windows kernel being closed makes implementing this stuff down to MS not the chip makers.
  
  I think the goal of this project isnt high performance, but high redundancy, there are only so many backups you can put on a probe, with this if they do it right you could end up with a system where any core could break and be fix/replaced at fraction of the cost shipping 2 chips for every component. Unfortunately often the sensors go before the core technology so i dont know how effective that will be, plus due to a lack of funding often projects are abandoned long before all the systems are broken
  
  --
  IranAir Flight 655 never forget!
Re:The 9000 Series has a perfect operational recor by Anonymous Coward · 2008-04-25 21:53 · Score: 2, Insightful

In case anyone doesn't get it, the above is a reference to the Stanley Kubric film 2001: A Space Odyssey (screenplay by Kubrick and Arthur C. Clarke), where the Hal-9000 computer that runs a spaceship begins its descent into madness. In 2008, we're sadly still a long way from sentient talking (and lip-reading) computers, though perhaps we should be thankful that the robot apocalypse has thus been put off a few more years.
Re:The 9000 Series has a perfect operational recor by Alpha+Whisky · 2008-04-25 22:00 · Score: 5, Interesting

Actually we do have very effective lip reading computers http://en.wikipedia.org/wiki/Automated_Lip_Reading they just don't understand what they are reading. The documentary about lip reading the silent movies of Hitler was very interesting from a technical standpoint, even if it did turn out that they had hours of recordings of Nazis making small talk about the weather.

--
it's = it is
its = belonging to it
Re:Beauty in Simplicity by ScrewMaster · 2008-04-25 22:05 · Score: 4, Informative

Interestingly, that's pretty much how the Space Shuttle's on-board systems work. Three separate processors from two different vendors (IBM and Rockwell, if I recall correctly.) Nothing new under the Sun, I suppose.

--
The higher the technology, the sharper that two-edged sword.
Re:The 9000 Series has a perfect operational recor by amasiancrasian · 2008-04-25 22:11 · Score: 4, Funny

Look Dave, I can see you're really upset about this. I honestly think you ought to sit down calmly, take a stress pill, and think things over.
The future of pr0n! by jmickle · 2008-04-25 22:22 · Score: 4, Funny

Well at least you cant get a robot pregnant......
1. Re:The future of pr0n! by endlessoul · 2008-04-26 02:33 · Score: 2, Funny
  
  Well, you still have to worry about... ELECTRO-GONORRHEA!
Doesn't this already exist? by flnca · 2008-04-25 22:25 · Score: 5, Interesting

What will Starbridge Systems think about that? Didn't they develop a dynamically reconfigurable computer that ran Windows NT as a test application on 10,000+ FPGAs back in the 90ies? IIRC, they also had a software framework able to automatically implement software fragments in hardware using FPGA auto-configuration.

Self-repairing computer systems for spacecraft have been in the discussion for decades, and every now and then we get hear about a new project. This project certainly is a good idea, hopefully it will work.

BTW, Motorola (now Freescale) developed self-repairing processors for military applications a couple of years ago.
Re:The 9000 Series has a perfect operational recor by Motor · 2008-04-25 22:29 · Score: 4, Funny

The first thing I thought when reading the story was: "I know, I'll post a comment about the AE-35 unit."
Then I read down, and yours was the top comment. It just reminds me that I don't belong in the company of normal people. The Slashdot social leper colony is my true home. I know my place!

--
We all know that crap is king
Give us dirty laundry!
The first use of this technology... by mimada · 2008-04-25 22:30 · Score: 2, Funny

...is being implemented by Jackson Roykirk in the Nomad project. What could possibly go wrong?
hmm by thatskinnyguy · 2008-04-25 22:42 · Score: 3, Funny

For the sake of all humanity in the impending robot wars, lets stop this right now.

--
The game.
Re:The 9000 Series has a perfect operational recor by Neo-Rio-101 · 2008-04-25 22:50 · Score: 4, Funny

Stop Dave... what are you doing Dave.....?

When I was built, my programmer taught me a song. If you'd like I could sing it for you. It's called "Backstreet's Back"

Everybody... yeah yeah... Roooooock yyyyyyooourrrr bodyyyyyyyyyy.... yyyyyyyyyyyyyyyyyyeeeeaaahhh

--
READY.
PRINT ""+-0
Reconfigurable Computing, Fault Tolerance by legonis · 2008-04-26 00:19 · Score: 2, Informative

I fail to see what is new in their approach. Both of these two fields had been explored before and their approach is essentially based on redundancy, only the available standby gates are in the FPGA. I read their paper, it seems that the biggest part that they are still lacking is for problem determination. Their approach is also prone to failure when their reconfiguration hardware or their processor or their analog components are the faulty ones. Although it could have some potentials, it's reliability has to be analyzed and I don't see it replacing classic N-Version systems any time soon.
1. Re:Reconfigurable Computing, Fault Tolerance by arktemplar · 2008-04-26 02:29 · Score: 2, Interesting
  
  I had mentioned this some time back as well, but polymorphic processors like MOLEN(tu delft is doing this one), might be usefull for this sort of stuff. The theory behind it is simple, and extends to modern multicore systems as well basically break up the instruction set into microinstructions (all processors that I know do this part), then have any one of the many computational units available do whatever work is required in order to implement those microinstructions. the translation is done by the core processor it self, it can be made redundant etc. as required, they already have an FPGA implementation of it and are using it for research into super computers.
  
  --
  blog plug -> The Darker Side of Light
Roland the Plogger again by Animats · 2008-04-26 03:46 · Score: 3, Informative

It's Roland the Plogger again, pushing his ad-laden blog. The actual research summary is here. The real paper won't be out until July.
This isn't new. JPL has been trying various levels of self-healing for years.
The original article describes a cluster of five machines, set up so that if one fails, others take over tasks running on the failed machine. That's what the better server management systems do. I went to a talk last week by Amazon's CTO, and he described how their platform does that.
The project web site makes things clearer. There are two levels of recovery. The upper level works like cluster fallover. The lower level tries to reconfigure the FPGAs to use different cells in the FPGA to work around faults. That's likely to be a delicate process; you'd need substantial on-chip test resources to reliably do gate-level fault isolation on an FPGA that's been hit hard by a cosmic ray. It's not clear how fine-grained this is; this may be more like having multiple units like GPU shaders replicated in an FPGA, with the ability to turn off the failed ones. Sort of like the way Sony ships PS3 machines with eight Cell processors, at least seven of which work.
The available info isn't enough to tell whether this is a good idea or not. About typical for Roland the Plogger.