Slashdot Mirror


Space Station BSOD

Lostman writes: "CNN has an article that details a computer glitch that has occured at the international space station. The problem disrupted all communication from the command computers on the station. Although NASA knows that this was because an onboard server had crashed, the cause of this was not immediately known." See also space.com, the BBC, or NASA's status update. NASA is using Windows for most of their computing functions, as mentioned here.

40 of 254 comments (clear)

  1. Don't worry! by Eg0r · · Score: 3
    They probably use one of them $10million, computer controlled, robot arm to press the ISS mainframe's reset button from earth.

    Oh... wait a sec! :-)

    ---

    --
    "Hasta la victoria siempre!" El Comandante
  2. Here's the link you want. by leoc · · Score: 4
    The Canadian company who built this new robotic arm is using a space-hardened 386/387 system with all custom software, including the operating system.

    There is no mention what OS the thinkpad in the picture is running. For all we know that might be the "server" they are talking about... http://www.mdrobotics.ca/rws.htm

    The web site runs linux, though... :)

    --
    STFU about slashdot bias.
  3. Windows, it's worse than that! by Dino · · Score: 3

    I intervewied at Boeing for doing Space Station networking work.....here's the surprising part, the Space Station is all run off of 386s!!! They do most of the low level programming in assembly to squueze out as much performance as possible.

    It totally blew my mind. This was about 14 months ago.
    ---------------------------

    --
    That's not what I meant.
  4. Re:What really happened by Dr.Dubious+DDQ · · Score: 5
    [...]shake the ISS around until the US system thought it was out of control and went into what is called Free Drift Mode.

    Great...so the ISS is really a giant pinball machine with one of the flippers locked up, so we need to get it to go "TILT" and shut down so we can reset it? :-)


    ---
  5. Re:Let's play "Bet Your Life" by IntlHarvester · · Score: 3

    Netware 3.12

    Yeah, memory protection is for wusses.

    Seriously, tho, in a former life as a network guy in the early 90s, I saw far more NetWare ABENDs than I've saw NT Bluescreens. It was generally OK file+print, but if you tried to run any slightly non-standard NLM (AppleShare, OS2 namespace, backup software, btrieve, CD-ROM drivers, etc) you had to keep your fingers crossed. I guess that goes to show if you keep a product in maintenance for 10 years or more, anything can become rock stable.
    --

    --
    Business. Numbers. Money. People. Computer World.
  6. Re:Windows bashing by IntlHarvester · · Score: 3

    XFree86 drivers run as root and have full access to your systems memory. Poorly coded user space X drivers could easily crash your system.

    NT servers don't use the Nvidia drivers and aren't expected to do things like optimize video playback. They generally run a rather generic unaccellerated SVGA driver. I've seen lots of bluescreens on servers, and none of them that I recall could be traced to the video drivers. There's the usual SCSI and NIC driver issues that could crash any OS, and for a long time in the NT 4.0 series, there was some issue in NTFS.SYS that caused systems to fall over.

    I'll accept that it's somewhat stupid to have a mandatory GUI on a server, but I don't think this is the stablility issue that the NT-haters club makes it out to be. NT has/had plenty of larger reliablity problems.
    --

    --
    Business. Numbers. Money. People. Computer World.
  7. Re:Deep link by DHartung · · Score: 3

    sllort asks:
    Now what do you guys make of this?

    ... This would have been much easier with some bootable media that could run Windows. (Or if Shep was not indoctrinated by that "other" operating system).


    According to this Expedition One crew debriefing, Shep answered a provocative question thus:

    Ops LAN
    ? Was the service pack distribution system easy to follow?
    Shep: Yes. No problems.
    Sergei: I'd like to have a little more explanation of what is in the service pack.
    Shep & Sergei: That way we would have known if it was really critical to load the new version or not.
    ? Was the desktop configuration (SSC Client, SSC File Server) easy to navigate? Any suggestions on how to improve the desktop layout?
    o Shep (joking): Go to a Mac OS.


    This fits with the wording: Shep is a Mac user. The log is tweaking him for being less technical because he uses a Mac. It's unclear if this section of the log was written by one of the cosmonauts, or possibly Shep tweaking himself. But he's known to have a real sense of humor.
    ----
    lake effect weblog

    --
    lake effect weblog
    {Network engineer in Chicago--looking for work!}
  8. Official reports of mundane activity by Webmonger · · Score: 4

    Man, it is really bizarre to see a press release about an oranization cold booting into safe mode. The way they write it up, you'd think it was rocket science. . .

    1. Re:Official reports of mundane activity by Cranston+Snord · · Score: 3

      What really gets me is the following quote...

      The computers were running, but were unable to access data in their memory banks because of the downed server.

      Danger Will Robinson! Danger Will Robinson! Memory banks unreachable!

      --
      And now for something completely different...a man with three buttocks.
  9. In other news by overshoot · · Score: 4

    Of course, the fact that NASA had just installed a bunch of critical hotfixes from Microsoft's FunLove-infected update site is purely coincidental.

    --
    Lacking <sarcasm> tags, /. substitutes moderation as "Troll."
  10. Re:Odd by boarder · · Score: 3

    That is not what happened at all. The IBM thinkpads are just INTERFACES for the control system. They don't actually control things. They just allow the astronauts to see what is going on in the station and sendc ommands. All of the actual control (autonomous and commanded) is done by other machines: three Command and Control Multiplexor/DeMultiplexors (not running windows).

    --
    IANAL, but I play one on /.
  11. hmm. by boarder · · Score: 3
    Yes, I saw the humor, but I didn't think it was relevant or correct.

    In this case, the problem was not with the interface software OR interface computer (thinkpad) but with the core system (they were still not sure whether it was software or hardware last I checked). Not only that, but the software of the Thinkpad was not provided by a "monolith^H^H^H^Hpoly" unless you consider Sun Solaris a monopoly.

    I guess I always did think of HAL as an OS and not an interface. That is an interesting revelation to me, but that still doesn't change the fact that the interface didn't cause the problem and the fact that the interface wasn't supplied by a monopoly.

    --
    IANAL, but I play one on /.
  12. What really happened, and FUD even by /.ers by boarder · · Score: 5
    First off, Windows almost definitely did not cause the crash; /. personnel are the only people saying that. It was a hardware failure in all likelyhood occuring the the US control module (probably in the Command and Control MDMs). I can't believe the kind of reporting going on here; it reads like a M$ FUD press release. Blue Screen of Death my ass!?!

    What really happened is the US control module computers stopped responding to any inputs from the ground. They weren't able to control the station or tell it to shutdown or anything. Their plan to fix it (last I heard) was to have the Russian control module move and shake the ISS around until the US system thought it was out of control and went into what is called Free Drift Mode. In this mode, it can be completely controlled by the Russian module and we can debug the system and bring it back online.

    --
    IANAL, but I play one on /.
  13. Send sysadmins into space by p3d0 · · Score: 3

    IIRC, the stated reason for using Windows is that astronauts (who are not necessarily computer experts) can manage it. Well, is it worth the risk?

    Wouldn't it be better to use whatever system is best for the job, and send a computer guy up there to maintain it?

    (Yes, I admit it, I'm only suggesting this because it increases my chances of getting into space from zero to negligible.)
    --

    --
    Patrick Doyle
    I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
  14. Re:Bad form, Slashdot... by anticypher · · Score: 5

    There's *nothing* in the CNN article ... implying that Windows is the reason for the server crash

    Micro~1.oft spent a lot of time, energy and money to ensure that their OSes were dominant on the ISS. They have spent millions of $$$ just to place a few hundred copies on the ISS, in the space flight centre, and in the russian control centres. The reason for this massive cost was to use the ISS as a giant marketing tool, and they even created a whole marketing campaign around it.

    Windoze is not the only OS on the ISS, but it is dominant. There are some *nixes running critical communication processes, such as the main link from the station to ground points, and these have not had many problems at all.

    When the M$ servers started crashing, the whole micr~1.oft in space campaign was put on hold. If you read the logs created by the station crew, they are pretty upset having to spend entire days trying to fix micr~1.oft problems. NASA has a direct line into the best and brightest engineers at M$, but even they are clueless as to why certain processes hang, why backups fail to happen, why entire directories are blown away with no trace, or why new patches cause driver conflicts.

    Since the Register article highlighting the ISS problems in the logs, micr~1.oft has been putting pressure on NASA to redact all mention of micr~1.oft. Certainly someone has been archiving copies of the logs since they appeared, so they can diff them later and see when NASA bows to micr~1.oft pressure.

    As you noticed, none of the mainstream reporting now mentions micr~1.oft by name, that is due to a pressure campaign by one of the largest advertising bugdets in the US. But when the logs are posted for these events, you will notice a great many references to the machines running micr~1.oft, even if the name of OS is redacted out. If you do a little research, you will see these machines are running either DoS or windoze.

    the AC

    --
    Hemos is like...sci-fi fans;he thinks technology is cool, but he hasn't bothered to understand the science it's based on
  15. Re:Two problems with your example. by sconeu · · Score: 4

    Dude, I was referring to the Yorktown discussion thread. I never said it BSOD'ed. I said crashed. There's a difference.

    Here's the article about the Yorktown.

    I used to work for a defense contractor, so I know how these things should be tested. You don't just test on good inputs, you test with bad ones. That's why I said that the app crashing was unacceptable. However, nothing should ever cause an OS to crash, especially in a military environment.

    It doesn't have to be a BSOD, it could be some other failure mode, which is what appeared to happen to the Yorktown.

    --
    General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
  16. Happy Funtime Conspiracy Corner by RollingThunder · · Score: 5

    Not that I believe this at all, but it occured to me and I figure it's amusing enough to share.

    CNN:
    A delay in the departure of Endeavour could mean a delay in the launch of space tourist Dennis Tito aboard a Russian Soyuz craft. Tito was scheduled to lift off on Saturday, but that mission would have to be delayed if the computer problem is not corrected, NASA spokesman Doug Peterson told Reuters.

    "Sorry, Dennis. That darn computer system crashed again, we just can't let ya launch right now. We figure it'll be fixed by... oh... October." <sotto voce: Frank, have you finished the bluescreen plan for Friday yet?>

  17. Re:Deep link by jbridge21 · · Score: 5

    It is specifically Solaris x86 running on a laptop.
    -----

  18. Re:Bad form, Slashdot... by hardcode · · Score: 5

    Try http://www.theregister.co.uk/content/2/18540.html to find out NASAs' rebuttal of that Register story. Seems it's not only /. that froths at the mouth at the thought of bashing IBM and Microsoft.

    hc

  19. They're probably using Coax (10Base-2) by criticalrealist · · Score: 3
    In addition to Microsoft Windows NT, they're probably using Coax, or 10Base-2, also known as thinnet. They probably have BNC connectors on the backs of the NIC's. The logs say they fixed their network problem by jiggling the cables. That's an indication of 10Base-2 if I ever saw one. The logs said they had to cold boot. This is frequently the case after a coax network crash.

    Coax would have the advantage of plenty of shielding from electromagnetic interference. Otherwise, no advantage.

    If you're reading this NASA, here's some advice. Buy some little metal doohickeys for the back of each networked computer. These doohickeys fit around a coax cable, can be screwed into the back of a power supply, and cost about 5 cents. In my experience, using these helps stabilize the cables a lot, and you get more uptime that way.

    --
    I am not a lawyer.
  20. Odd by rjamestaylor · · Score: 4
    Let's get this straight: a space station built with with international cooperation has a computer error threatening to cut off communication with earth-based command-control? The computer is an IBM Thinkpad? The year is 2001?

    That's a space oddyssey, er, oddity.

    Open the pod-bay doors, HAL.

    I'm sorry, Dave, I'm afraid I can't do that

    And the software in question is provided by a huge monolith^H^H^H^Hpoly...
    --
    -- @rjamestaylor on Ello
  21. Re:Assembly language on ISS by revbob · · Score: 5
    You were either misinformed or you misunderstood what your interviewer said.

    Real time software for mission critical systems is written in Ada. That's a no-brainer. If there is any assembler, it's tiny, of severely limited scope, and meticulously tested. In fact, having worked with some very low level networking code for ISS (in Ada), I doubt there's any assembler in there at all.

    As to the 386's, they're rad hardened and known reliable. And, unlike the home computer I bought a couple of months ago that's state of the art, whether I need state of the art or not, the jobs these CPUs had to do simply didn't require anything faster than a 386, even given a hefty allowance of spare cycles and memory for future growth.

    We bought what we needed (in space, rad hardening is not optional) and we didn't buy what we didn't need. That's not $400 hammers, that's the definition of responsible stewardship of the public's money.

  22. Re:Scary stuff? by tibbettsatmit · · Score: 3

    It is not particularly scary. Software systems don't benefit from redudancy in the same way that hardware systems do. Most software bugs are systemic (ie, an uncommon code path that just doesn't work). So redudant software systems (even ones that are multiple seperate "clean room" implementations) frequently go down at the same time when in the same operating environment. For more information check out the work of Nancy Levison and the other people in her group.

  23. Re:Hard up for stories? by b0r1s · · Score: 3

    ONE server went down... the THREE you speak of were clients, which of course are useless because of it.

    --
    Mooniacs for iOS and Android
  24. Re:Assembly language on ISS by boing+boing · · Score: 4

    I just want to contradict one point you made: "in space, rad hardening is not optional".

    That is incorrect.

    Microprocessors (electronics in general also) have a wide variety of radiation response out of the box. For instance, the AMD K6 is known to be pretty bad for single event latch-up and not very usable. On the other hand, the PC603 actually is not to bad right off a commercial foundry line.

    With this in mind, there are also a number of ways to mitigate radiation effects, including latch-up protection circuits, EDAC, redundancy, cold sparing, etc. These methods can remove the number of effects that propogate to the subsystem or system level.

    Radiation hardening in many instances can also succeed in preventing effects from reaching the system level, but there are a number of penalties to pay. Schedule is often the biggest (as you know, many rad hard processors are very old), cost (this stuff isn't cheap since it is boutique), performance (many rad hard processors can't perform to the speed of their commercial brothers because of layout changes, extra resistance etc.), and also many times the required power and size can be affected.

    Now we are presented with two paths: 1) radiation harden a processor, 2) measure the rad effects of a commercial processor and mitigate them with extra circuitry (which has its own extra liabilities in cost, power, size, but typically are much lower).

    In some instances, rad hard is the right choice (in human flight missions, it tends to be a good choice, but not always), and in some commercial products with some workarounds are best.

    Simplifying the issue to "rad hardening is not optional" is wrong...it is optional, but if you say "radiation effects must be dealt with", then I agree with you.

  25. Bad form, Slashdot... by Abcd1234 · · Score: 5

    I'm no fan of Windows... frankly, I use Linux whenever I get the chance. And it's great that Slashdot is evangelical about my favorite OS. But that's no excuse for bad reporting. There's *nothing* in the CNN article (or any of the others, for that matter) implying that Windows is the reason for the server crash. Implying that it is related (with the little tagline "NASA is using Windows for most of their computing functions"... why add this, except to add sensationalism to the article?), is just bad, bad form. If any other publication did this, I'm sure people here would be complaining about poor journalism, bias, etc, etc, et al, ad nauseum. Frankly, I think that little line should be removed, and the post should be allowed to stand on it's own. Please, don't put these little editorial comments into the stories. There's no need. All it does is damage Slashdot's (already shakey) credibility.

    1. Re:Bad form, Slashdot... by Abcd1234 · · Score: 5
      I totally agree. Slashdot posts stories with the author's opinion thrown in. However, an opinion is one thing... warping the facts, implying something that's not true... that's entirely another. The comment (and the title of the story) implies that Windows was the reason for the crash... however, not even NASA knows why the crash occured. Now, if we'd had a confirmation that, yes, Windows caused the problem, and then we had a little MS bashing comment in the story, well, so be it. Or if the title of the story was "Severe server crash on ISS", and the comment was something like "I wonder if Windows had anything to do with it...", that'd be fine, too. But this isn't the case... the author tried to imply causation when there is no proof of it. That's irresponsible.

      Now, I've been around Slashdot for a long time, as well... like you, before the Andover buy-out. But that doesn't mean I'm not going to be objective. The author fscked up here. I'm not saying /. should praise M$... frankly, M$ has absolutely NOTHING to do with it. I simply think that Slashdot should try to report *true*, *accurate* stories. Is that so much to ask? A little journalistic integrity (I know, I know... naive... :)

    2. Re:Bad form, Slashdot... by cavemanf16 · · Score: 5
      ...I'm sure people here would be complaining about poor journalism, bias, etc...

      This isn't a journalism site, it's a bulletin board system. Jon Katz is the only one who really writes stories of his own, each time. Most of the rest of the stories are just links to other sites. So yes, that's why slashdot evangelizes about Linux 24/7 and bashes Microsoft. Sure, we all realize that NASA didn't just pick Windows to run space shuttle operations just cause it was easy to use. I'm sure plenty of considerations went into how well it would work versus other OS's. But it's still fun to discuss whether they made the best choice possible, which is what slashdot is so popular for. Discussion.

  26. Re:Operating Systems by Tyrannosaurus · · Score: 3
    One can only fear what happens when they upgrade to one of the new microsoft leases based licenses so when their link goes down and they can't contact microsofts license server the entire space station shuts down :)

    The worst part is that whenever they upgrade a piece of hardware, they have to re-register with Microsoft. Since their comm is no longer working, they have to use Morse Code by blinking a flashlight out the window.

    ---

    --

    ---
    Gort! Klatu Barata Nikto!
  27. Was it even Microsoft? by micromoog · · Score: 3
    Everyone seems to be jumping to the conclusion that this is somehow Microsoft's fault. Where's the article that even says the systems were running NT/2000? If that is known, is there anything stating that the problem was caused by an OS defect?

    I mean really, people. Sure, we've all had bad M$ experiences, but blame the NASA engineers for a poorly designed redundancy, and let them blame their supplier.

  28. would it kill them?? by mr.ska · · Score: 4
    Geez, would it kill CNN, or any other American news feed, to mention that the robotic armis known as the "Canadarm2", because it was designed and built by Canadians? We may be 1/10th the size of The United States Of America , but for crying out loud, you're allowed to mention that OTHER nations are contributing to the station. Especially when the contribution is the feature that will allow the station to be built over the next 5 years!

    While they're at it, maybe add the fact that the Canadarm2 is the big brother of the Canadarm that each space shuttle has. Maybe that it has 2 "hands", one on each end, that will allow it to "inchworm" its way along the outside of the station. Perhaps mention that Canadian Chris Hadfield, the first Canadian spacewalker (as of this mission) is the one who installed the arm??

    You'd think every American news editor has a spark plug up their GI orifice that gives them a shock anytime they allow "Canada" to get into print. Sheesh.

    Mr. Ska

    I slit a sheet
    A sheet I slit

    --

    Mr. Ska

  29. Re:Sig by Random+Utinni · · Score: 3

    Well, AFAIK, it's "Klaatu, Barata, N..." ergh. Necktie... Nickel... it's definitely an 'N' word.

    Hmmm... "Klaatu, Barata, N<cough>" There you go. Works like a charm... : )

  30. Re:Windows bashing by rabtech · · Score: 3

    The password change is a well-known bug in the Novell client that they refuse to fix. Novell has suspended pretty much all work on their client software. Netware is dying, jump now while you can.

    Your HP situation highlights 99% of Windows 2000 BSODs: faulty drivers. If you only use HCL-approved hardware and signed drivers, you aren't going to get any BSODs, unless you have faulty hardware.

    I believe that the ISS is using NT4.0, in which case I'm not surprised. While somewhat stable, it pales in comparison to Windows 2000.
    -------
    -- russ

    "You want people to think logically? ACK! Turn in your UID, you traitor!"

    --
    Natural != (nontoxic || beneficial)
  31. Crashed computers don't use Windows by ec_hack · · Score: 5

    The ISS computers that have been crashing (the MDMs) don't use Windows. The MDMs and other embedded computer systems are based on Intel 386 chips. If they have a kernel, it is probably VxWorks or other commercial RTOS. AFAIK, the only ISS computers that use Windows are some of the laptops, however, some use the Intel version of Solaris.

    Why 386 chips? Because they have been tested and been found to be relatively radiation tolerant. More current chips are likely to be subject to more radiation-induced faults due to smaller transistor size.

    1. Re:Crashed computers don't use Windows by pavonis · · Score: 5

      For gods' sakes, someone with some karma mod this thing up. /.'s reaction to this story, in the complete absence of the relevant facts, was kind of distressing- so many instant Windoze bashers popping up, the usual modding-separating-wheat-from-chaff system failed completely. The only systems aboard ISS running Windows that I am aware of are some of the laptops, which are not the sole interfaces to any critical system, and servers for some relatively minor tasks, like e-mail I believe.

      I assume this choice was made for the sake of simplicity. I don't agree with running windows at all, but so far as I know they're being fairly sensible about it. Those referring to NASA decisions that 'everything would run windows', or massive M$ marketing campaigns, please provide some sort of reference if at all possible...

      Side note: there are other means of communication with ground, even if Endeavor weren't parked there. They just switched to the shuttle as the simplest thing. If all else fails, amateur radio should always be usable...

      Repeat of question I posed in an earlier article: Apart from simple answers like 'More testing' and 'be more careful', do any of you have suggestions for how NASA's software might be made more robust? Of late software problems have caused more trouble than hardware, which seems odd.

  32. Scary stuff? by imipak · · Score: 3
    It sounds really rather scary to me. Apart from the fact that three redundant computers going down at once just should NOT happen - if Endeavour hadn't happenedto be docked, they'd have no voice/date uplink /at all/.

    As far as I can see, wouldn't that put the crew into a really hairy position? Without support from the ground, how they'd have no way to know how to try diagnosing / fixing the problem. And if they couldn't get it going... well, perhaps they'd all just goof off for a while, like when the boss takes a day off sick ;) ... but wouldn't they have serious problems, say, preparing for the next shuttle or Soyuz docking?
    --
    If the good lord had meant me to live in Los Angeles

  33. Re:Window Cleaning? by JediTrainer · · Score: 3

    NASA is using Windows for most of their computing functions,

    In that case forget it. I'm not setting foot on that death trap! I think I'd rather take my chances on Mir! Oh wait, too late....


    Personally, I'd still rather take my chances on Mir!

    --

    You can accomplish anything you set your mind to. The impossible just takes a little longer.
  34. But here's the twist... by Spamalamadingdong · · Score: 5
    As an orbiting object decreases speed, it falls in its orbital path.
    Which is correct as far as it goes (it only applies to single-impulse velocity changes). However, after losing speed the object falls into a lower orbit (it no longer has the velocity to maintain its original orbit), and the trade of potential energy for kinetic energy increases the orbital speed.

    Total energy/mass of an object in orbit is 1/2 v^2 - GM(earth)/r; you get a circular orbit when the kinetic energy is equal to half the (negative) potential energy, i.e. v = sqrt(GM(earth)/r). The total energy of an object in an orbit (as opposed to an escape trajectory) is always negative.
    --
    spam spam spam spam spam spam
    No one expects the Spammish Repetition!

  35. Deep link by sllort · · Score: 5

    The link that specifically mentions Windows, for those of you wondering, is here.

    Now what do you guys make of this?

    "Used the startup disk in the onboard software suite, but could not find a particular file while hunting around with DOS. This would have been much easier with some bootable media (CD-ROM?) that could run Windows. (Or if Shep was not indoctrinated by that "other" operating system). We may need an emergency boot capability again. After 5+ attempts, finally got the hard drive to take an image off the ghost CD. One of the Autoloader floppies went down, but SSC 2 is now running normally. ( 3+ hours troubleshooting). "

    Guesses? Bets?