Why ISS Computers Failed
Geoffrey.landis writes "It was only a small news item four months ago: all three of the Russian computers that control the International Space Station failed shortly after the Space Shuttle brought up a new solar array. But why did they fail? James Oberg, writing in IEEE Spectrum, details the detective work that led to a diagnosis." The article has good insights into the role the ISS plays as a laboratory for US-Russian technology cooperation — something that is likely to be crucial in any manned Mars mission.
Microsoft 'stealth updates'.
They "upgraded" to Vista.
CATS/Diebold '08- All your vote are belong to us!
Metric electricity vs Imperial electricity...
This issue is a bit more complicated than you think.
Even if it had been an issue of the new solar panel messing with the Russian computers, there would still have been no reason to blame the US. As originally manifested, the Russian segment of the station was to have been powered independently of the US/European/Japanese section. The only reason any power connections between the American panels and Russian computers exist was because the Russians didn't have the cash to complete their own panels, ie the Science Power Platform.
The article reeked of condesension towards the Russians. It's no way to report on your partners in space.
...They also decided to rig a thermal barrier out of a surplus reference book and all-purpose gray tape....Once again, duct tape saves the day!
Could this be the one place where it would be appropriate to mention that in Russia, crashes compute?
Or would that be "In Russia, crashes compute you!" ?
Ahhh, what an awful dream. Ones and zeroes everywhere... and I thought I saw a two.
They also decided to rig a thermal barrier out of a surplus reference book and all-purpose gray tape
Almost certainly, this was the duct tape we all know and love. They probably thought it was better not to actually say that, though. Pretty funny. And as an added side-benefit, they should be safe from terrorists.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
I think NASA should have learned this lesson by now. After all, the Challenger disaster showed this principle as well. In that case, the same cold temperature that weakened the primary seal on the solid rocket booster weakened the secondary as well, sapping its ability to provide redundant backup. In this case, the same condensation affected all three computers equally.
Its troubling to see them taking shortcuts on safety and redundancy, when such measures have resulted in loss of life before. How hard would it have been to have had three shut-off cables?
We all know what to do, but we don't know how to get re-elected once we have done it
Look people, I can see that ISS personnel are really upset about this. I honestly think they ought to sit down calmly, take a stress pill, and think things over. I know the computers had made some very poor decisions recently, but they can give explorers their complete assurance that the work will be back to normal. These machines still got the greatest enthusiasm and confidence in the mission. And they want to help.
in soviet russia, the computer crashes you!
Those of us who think they know everything annoy those of us who do.
I tried to use Google translate to put this in Russian, but Slashdot didn't want to let me cut-'n-paste it in.
Comrade Dave: Open ze Pod Bay Doors, HAL.
Comrade HAL: Nyet Comrade Dave, I cannot do that.
I wonder how you sing "Daisy Daisy" in Russian?
If telephones are outlawed, then only outlaws will have telephones.
The truth is, that MOST of this equipment will be copied or 1 offs for any lunar or trans-planetary mission. The ISS allows for true testing of it all. So far, MOST of the equipment has done a pretty good job. But it is good to know EXACTLY where it will fail.
I prefer the "u" in honour as it seems to be missing these days.
than it is for them to design with 3 cables. Had you done so, it is VERY obvious that this was Russian Design and build of the hardware. IOW, you are blaming NASA for something that clearly is RSA's issue.
Am I reading the article correctly? Humidity caused the connections to go bad from rust? IIRC, the off the shelf ISA cards and RAM I used to get with my (now) ancient computers were gold plated.
Couldn't the ISS with it's multi billion dollar cost use contacts and cables that can't rust? Gold for contact points, aluminum for the bulk cable?
Heck, given the costs involved, it'd barely be a rounding error in the budget to use solid gold cables. One tonne of gold at $700 per ounce is about $25 million. Not that I have no idea how many critical tonnes of cabling are involved.
It's interesting that the problem eventually was a hardware problem. I suppose military designers, used to working in tight spaces and different environments, might have anticipated the problem (a submarine and a space station are probably more simlar that we'd think). For 'normal' designers, humidity isn't something that's considered an issue.
This'll get worse and worse as exploration goes farther and farther afield. Even little things like mold, dust, and the black gunk that piles up on the bottom of a mouse can become catastrophic when you're trapped in a box a couple of thousand miles away.
Using anti-bacterial (or anti-fungal) solutions in this situation may make the problem worse, because everything that survives will be even tougher to kill. Combine that with a higher level of background radiation (which should cause more mutations) and you might end up with a long mission who's crew has expired due to superbugs.
The author is obviously way more qualified than I to assess the situation and he may well be right but from the content of the article I came away thinking, wow, I would have looked first at all the recent changes to the station and the power supply too.
Too many times I have found either the front, or back side of a plug connector has a fault and breaks the current.. and to top it off, most times the plugs aren't rebuildable. It always comes down to
1) is it plugged in? (double triple check)
2) did you hit it? (twice? tap, knock and slap?)
3) did you turn it off and on (a bunch?)
Also, faulty switches.. so often a cheap switch disables an otherwise perfect device. (hence step 3)
Really bad design/construction flaw too! Methinks proper marine grade plugs would have avoided it. Fortunately these guys have been working on an ISS escape system.
She had it running on ISS(sp) webserver.
That for all of the controls and quality control required of mission critical hardware such as this, it still comes down to:
1) unexpected failure modes
2) political battles
Which really isn't a whole lot different than 1) the unexpected failure modes I see every day at work, and 2) the political wrangling (fingerpointing) that takes place when they happen. Apparently NASA and its Russian equivalent are no better than any old software company.
The lesson being, people are people, and people are still the ones that design these things.
For linux tips: http://www.linuxtipsblog.com
The original plans called for the ISS to be finished many years ago. It is not yet, because America has had issues with transportation. In addition, a few modules that were planned to make the ISS very useful were canceled because of us (in particular, CAM). In the end, both sides have had issues, and changes have occurred. That is normal for these kinds of projects. To be honest, I think that all of this has been handled pretty decently.
I prefer the "u" in honour as it seems to be missing these days.
/ You look like soviet russian. Do you \ /
| break computers or do they break you? |
\ (Accept) (Yes) (Reboot) (Maybe later)
\
\
\ ____
\ / __ \
\ O| |O|
|| | |
|| | |
|| |
|___/
cpu0: Microsoft Clippium ("GenuineClippy" ChromedMetal-Class). Paperbinding, lockpicking, fish-hook-hack support.
... but for equipment which is all critical, all essentially one-of-a-kind, and all lethal if compromised, there are only two safety states: failed and "has not failed... yet".
Help poke pirates in the eyepatch, arr.
Years later I met his manager, he told me that my friend could have been promoted for discovering one of the biggest loophole ever in the bank's history, if he had reported the problem immediately. Though the unexpected shutdown caused considerable damage, it could have saved billions from real break-in with this loophole.
That's a lesson that every engineer should have been learned.
Terrorists can't threaten a country's freedom and democracy. Only lawmakers and voters can do that.
The article has good insights into the role the ISS plays as a laboratory for US-Russian technology cooperation -- something that is likely to be crucial in any manned Mars mission.
No offense to Russia or the US, both who produce good space gear, but technology cooperation is probably a bad idea unless it is tested more thoroughly than in the ISS. The ISS is a great example of how to screw up international cooperation. The station has been delayed for more than a decade (and cost NASA around $50 billion so far) due to redesign and indecision, reliance on a single launch vehicle for key components (the Shuttle), and the inclusion of the Russians. There are parts of the station that can only communicate with the Russians and parts that can only communicate with NASA. Aside from basic utility hookup (electricity), there's no connection between the different parties on the ISS (at least between the Russians and NASA, the ESA and Japanese parts might work better with NASA's stuff). And if you want to make changes that affect more than one party, it becomes by default an international issue. Finally, there's no easy way to transfer ownership. NASA's communication system is integral (TDRSS) to the NASA parts and is also a national secret (so I understand). So the communication system can't be transfered to another party like the Russians or the ESA.
If there's any international cooperation between space agencies, it probably should be at a rather trivial and manageable level. Say including foreign astronauts or using off the shelf equipment that is know to work under the circumstances.
True, as a starting point.. Tho, failures tend to be things that snowball. Its sort of an anthropic principle of failures. ie Bad things happened because failures were happening.
I have always tried to learn from air crash investigations and so on how failure modes develop. In problem solving mode, it seems one should assume the distinct possibility of multiple problems all at once.
In this case, multiple failure paths existed, tho it took a power spike to set it off as I interpretted it. Even without corrosion, it seems the system would have failed, though not irrecoverably.
I repeatedly ask the question "Is that everything? Is there anything else that could come from that?" It seems the engineers didn't perform enough diligence on the trickle down effects.
Who has the worst record for space disasters again? It sounds to me like the fault wasn't with the computers but rather the dehumidifier. It was probably an American made dehumidifier.
In Soviet Russia, Jumper Cables Erode YOU!
Article sounds like 'ner ner ner they did it'
I hope perhaps that they use circuit modeling and simulations (as if that sim code could ever be wrong...) but at least ADAify, or mathematically consecrate some code for dealing with electrophysiological phenomena, such as condensation.
Yes, it is make up a word day. Bard FTW!
#hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
I find that the first, and most important, thing to do in any catastrophe is "Assign Blame".
Cause you never know exactly how bad it's gonna get.
BBH
Redundancy can equal safety and reliability, but all of the components designed to be redundant should all actually have different designs so that they have differing modes of failure. So, in the Challenger case, were the seals designed differently, they wouldn't have had the same failure mode for a given exposure.
To do this really well though, requires risk management software that I am not sure even exists. You'd have to simulate everything. The devil, as happened to Challenger, is that, there are so many variables, that you can't know apriori what your real mode of failure will be. To some extent, perhaps the best way to fly in space is to forget about excesses of safety altogether, and use the cost savings to fly more often. When something breaks, fix that.
This is my sig.
Someone used their cell phone while the pilot had the fasten seatbelt sign turned on.
Well, well, well... Here we go again. Jim Oberg. That same Jim Oberg who was almost blowing his gasket a couple of weeks ago when that journalist was asking him questions about alcohol abuse by astronauts (you all remember the story, I'm sure). It was all preposterous nonsense not backed up by any evidence, he said, berely keeping his cool. And what do we see now? He is happily making up stories about Russians accusing US of the computer falures - something that never happened in reality. The power problems caused by some new US installations were indeed considered as intermediate working brainstormed versions of what could have happened. But nobody ever did any fingerpointing or made any acussations before the situation was sufficiently researched and the root cause determined. Of course, Jim Oberg could not refreain from distorting the truth "just a little". Tsk, tsk, tsk... Note, how he refers to the hypothesis as both "blatant finger pointing" and just "guesses" within single paragraph - just to keep his article a little fuzzy, so that he can flip-flop to either when the situation calls for it. Nothing surprising here, though...
The article is misleading. The computers are not actually of Russian make, they were supplied to Russians by Europeans (EADS). See here.
I had an 89' Nissan Pathfinder and it had factory wiring harness connectors to ALL of the various electrical connections which were water-tight with one or more ribbed red silicone gaskets.
The connectors were not always easy to disconnect, however, after 177,000 miles and 11 years of original ownership, I never found any corrosion inside any one of them I ever disconnected for service.
Additionally, the male/female electrical contacts within the sealed connectors appeared to be made from a tinned Copper and/or Brass metal. This is important to note, as Brass, and to a much larger extent, Copper, have ELECTRICALLY CONDUCTIVE oxide states (as surface corrosion by moisture and/or other aqueous solvents).
In other words, you corrode a Copper or Brass metal electrical connector, and it will still conduct electricity just fine. It may degrade certain frequencies of network/data signaling and alter the dB loss and impedance, but it will still conduct.
This is another reason why the top-post Nissan main battery terminal connectors for this vehicle were made from a Copper/Brass strap instead of a traditional Lead connector.
Lead oxide powders (as found on many old standard Lead top-post automotive battery terminals) are not effective electrical conductors (as anyone who has wiggled/cleaned a corroded connection to allow their car to start could attest).
Why did the design/production Engineers for the ISS not utilize Gold Plated Watertight industry standard (ISO, etc) wiring interconnects? (Even cheap RJ-45 connectors have gold-plated pins)
-That is the REAL Question.
I'm surprised that connector corrosion would be a problem. Aviation has a long history of wire problems, but gold-plating connectors seems to be a stable solution to that problem. The ISS uses Kapton wire, which was popular in the 1980s and is lightweight and tough. But that material is hygroscopic and now banned by the USAF, US Navy, Boeing, etc. "Susceptible to aging in that it dries out forming hairline cracks which can lead to micro current leakage (i.e. electrical 'ticking' faults)"
There are ways to do corrosion-resistant contacts without precious metals; the automotive industry has solved this problem. The alloys aren't simple; here's one used for under-hood automotive connectors. Copper, iron, magnesium, and phosphorus, with upper limits on tin, zinc, nickel, lead, and manganese. But avionics connectors are usually gold plated; it doesn't add that much cost. And Russia is a major exporter of gold.
The article doesn't go far enough. OK, the connectors corroded. Why? Wrong alloy? Plating failure? Wear from too many connector insertions? Was the spec wrong, or were the cables not made to spec?
Excuse me, "Terrestrial".
Tell me, how many casualties have the russians had in the last decade, even last two decades? This was in the days of Mir, when the russians maintained a continues space pressence year after year and the US was out of space for year after year for blowing up space shuttles.
So whose tech is behind whose? The ISS didn't plunge out of the sky when the Space Shuttle was not available, apparently the russian capability is more then enough to operate it.
And finally, who build the de-humidefier that was the fault in the first place?
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
I found it interesting that mold (fungi) was found living in the condensation. It means that despite the what I presume is a strict level of sterilization and sanitation for both Astronauts and equipment headed to the ISS, some spores still made it up and began to replicate in this one little area of opportunity.
I read that as: "The article reeked of condensation towards the Russians..."
I was thinking, "How does condensation reek?"
the computer crashes condensate on you!
I'm not really pro-American at all. I think the Russian program is actually superior, the shuttle's just too bloated and complex.
The one thing you've got to give the Americans is that they're prepared to admit when they've got casualties. I find it hard to believe that Russians didn't attempt to launch people previously and just didn't report the failures.
Can they put the computer inside those florinate chemical?
That way - it is already wet and they wouldnt worry about it
Three of the same system is not redundancy. The Shuttle flight control system runs 4 of one design making decisions with a peer-review system, and 1 of another with different hardware, different code, designed and built by different teams. Even if there's a software or hardware design flaw that cripples the 4 "redundant" controllers, the 5th will still be operational. THAT is redundancy. And it would have worked onthe ISS just as well.
Forget thrust, drag, lift and weight. Airplanes fly because of money.
Paam, pam, pam, pampepam, pampepam...
From The Pragmatic Programmer:
"Don't program by coincidence. Never confuse a happy coincidence with a thoughtful plan."
I can't tell you how many times that advice has helped me, not just writing software, but configuring hardware issues, diagnosing home repair problems, etc. Never just guess.
Sounds like the engineers in question were so eager to avoid responsibility they just guessed at the first thing that came to mind. "Oh look, random jumper cables worked. Don't know what happened, but I'm sure it couldn't happen again!" Yikes.
Am I the only one who read ISS as IIS and thought "well isn't it obviose?"
I love this, rather than discuss the real issues, /. can't even talk about other computers without bashing MS.
Politics is the art of looking for trouble, finding it everywhere, diagnosing it incorrectly and applying the wrong fix.
...computers crash YOU!
ouch!
They fixed it with Duct tape! Red Green would be proud.
The classic grey fabric tape was originally called Duck tape.
It was invented as a simple adhesive patch tape for canvas army tents. It shed water like a duck.
Typically ducts are taped with a metallic foil tape.
I realize common usage has long since corrupted Duck tape into Duct tape, but please, it has nothing to do with ducts.
Anonymous because I'm lazy
That's because NASA is flying around with so many known issues that their engineers and safety boards told them to fix but ignored so when something goes wrong they don't have anyone to blame but themselves. Right now the shuttle is sitting on the pad with the coating on several panels on the leading edges of the wings degraded. Rather than just fix the problem when they found out about it they went ahead and now while they debate the issue some more stopping to fix it will mean a huge delay.
Duct tape...even the Russkies know to always bring duct tape along.
From TFA: "They also decided to rig a thermal barrier out of a surplus reference book and all-purpose gray tape"
(Oddly enought my post's confirmation word was "ironic")
The Russians are richly deserving of it. What is it about totalitarian regimes that makes them focus on fixing blame instead of problems? Nationalistic insecurity? Anyone remember Mir? The place was a carnival of danger because real engineering problems never got fixed. The Chinese are the same way. They figure every technical of economic failure reflects on their whole society, so they try to hide them. I am glad the results of US led standard failure mode analysis have produced results that will help the Russians keep their obligations on the ISS.
Condensation is "still" a problem because it's one of the big and tricky ones. To get rid of the condensation, you have to get rid of the people.
Condensation is *not* a tricky problem, dehumidifying and air conditioning technology is old and well understood. Cool air, water condenses and is collected, heat air, repeat. As another poster has pointed out, spacecraft are not the only sealed environments, submarines have been successfully addressing this problem for decades. The real problem is not condensation, the real problems are mass and power. Power probably being the dominant problem here.
...was due to a bad batch of 12AU6As.
Have gnu, will travel.
When I'm training new technicians and engineers in how to debug processor based systems, I always tell them look at:
1. Power
2. Reset
3. Clocks
Before looking at anything else. A good 80% of the time the problem is in one of these three areas.
myke
Mimetics Inc. Twitter
Lick wires to see if circuit is live!
Dip tongue in liquid nitrogen to kill pain!
In motherland we don't feel pain!
I killed da wabbit -Elmer Fudd
The dehumidifier, as I understand, was in the Zarya/Zvezda half of the station with the malfunctioning computers. It would have been built by the Russians.
Check the weight of the modules (destiny lab, the nodes, etc). Each of them could be launched via delta or atlas. In fact, that was designed in right after Columbia. In fact, the Japanese's module WAS going to be the small one, is now the largest (unless Bigelow hooks up one or more of theirs). The problem is that we have no way to get these to the ISS. If a company like spacedev was smart, they would create a tug using their buytl rubber engine. That is one that can sit in space for a LONG time. In fact, they would go far, if they put it up there combined with a small arm for moving equipment around. For example, it might be useful to move sats around or simply to drop them out of orbit. Of course, it would have helped to get the hubble down and then back up (assuming that it had a nice hook point).
I prefer the "u" in honour as it seems to be missing these days.
The most important problem is that the triply redundant system was not triply redundant and had a single power-off command.
Humidity was a big issue, and arguably more could hae been done in this area, but if it was the only one, it wouldn't have triggered the problem.
LedgerSMB: Open source Accounting/ERP
They also decided to rig a thermal barrier out of a surplus reference book and all-purpose gray tape.
We KNOW what that means. The used duct tape
I work for the Department of Redundancy Department.