Space Station BSOD
Lostman writes: "CNN has an article that details a computer glitch that has occured at the international space station. The problem disrupted all communication from the command computers on the station. Although NASA knows that this was because an onboard server had crashed, the cause of this was not immediately known." See also space.com, the BBC, or NASA's status update. NASA is using Windows for most of their computing functions, as mentioned here.
Oh... wait a sec! :-)
---
"Hasta la victoria siempre!" El Comandante
There is no mention what OS the thinkpad in the picture is running. For all we know that might be the "server" they are talking about... http://www.mdrobotics.ca/rws.htm
The web site runs linux, though... :)
STFU about slashdot bias.
I intervewied at Boeing for doing Space Station networking work.....here's the surprising part, the Space Station is all run off of 386s!!! They do most of the low level programming in assembly to squueze out as much performance as possible.
It totally blew my mind. This was about 14 months ago.
---------------------------
That's not what I meant.
Great...so the ISS is really a giant pinball machine with one of the flippers locked up, so we need to get it to go "TILT" and shut down so we can reset it? :-)
---
Hacker Public Radio is our Friend
Netware 3.12
Yeah, memory protection is for wusses.
Seriously, tho, in a former life as a network guy in the early 90s, I saw far more NetWare ABENDs than I've saw NT Bluescreens. It was generally OK file+print, but if you tried to run any slightly non-standard NLM (AppleShare, OS2 namespace, backup software, btrieve, CD-ROM drivers, etc) you had to keep your fingers crossed. I guess that goes to show if you keep a product in maintenance for 10 years or more, anything can become rock stable.
--
Business. Numbers. Money. People. Computer World.
XFree86 drivers run as root and have full access to your systems memory. Poorly coded user space X drivers could easily crash your system.
NT servers don't use the Nvidia drivers and aren't expected to do things like optimize video playback. They generally run a rather generic unaccellerated SVGA driver. I've seen lots of bluescreens on servers, and none of them that I recall could be traced to the video drivers. There's the usual SCSI and NIC driver issues that could crash any OS, and for a long time in the NT 4.0 series, there was some issue in NTFS.SYS that caused systems to fall over.
I'll accept that it's somewhat stupid to have a mandatory GUI on a server, but I don't think this is the stablility issue that the NT-haters club makes it out to be. NT has/had plenty of larger reliablity problems.
--
Business. Numbers. Money. People. Computer World.
sllort asks:
Now what do you guys make of this?
... This would have been much easier with some bootable media that could run Windows. (Or if Shep was not indoctrinated by that "other" operating system).
According to this Expedition One crew debriefing, Shep answered a provocative question thus:
Ops LAN
? Was the service pack distribution system easy to follow?
Shep: Yes. No problems.
Sergei: I'd like to have a little more explanation of what is in the service pack.
Shep & Sergei: That way we would have known if it was really critical to load the new version or not.
? Was the desktop configuration (SSC Client, SSC File Server) easy to navigate? Any suggestions on how to improve the desktop layout?
o Shep (joking): Go to a Mac OS.
This fits with the wording: Shep is a Mac user. The log is tweaking him for being less technical because he uses a Mac. It's unclear if this section of the log was written by one of the cosmonauts, or possibly Shep tweaking himself. But he's known to have a real sense of humor.
----
lake effect weblog
lake effect weblog
{Network engineer in Chicago--looking for work!}
Man, it is really bizarre to see a press release about an oranization cold booting into safe mode. The way they write it up, you'd think it was rocket science. . .
Of course, the fact that NASA had just installed a bunch of critical hotfixes from Microsoft's FunLove-infected update site is purely coincidental.
Lacking <sarcasm> tags,
That is not what happened at all. The IBM thinkpads are just INTERFACES for the control system. They don't actually control things. They just allow the astronauts to see what is going on in the station and sendc ommands. All of the actual control (autonomous and commanded) is done by other machines: three Command and Control Multiplexor/DeMultiplexors (not running windows).
IANAL, but I play one on
In this case, the problem was not with the interface software OR interface computer (thinkpad) but with the core system (they were still not sure whether it was software or hardware last I checked). Not only that, but the software of the Thinkpad was not provided by a "monolith^H^H^H^Hpoly" unless you consider Sun Solaris a monopoly.
I guess I always did think of HAL as an OS and not an interface. That is an interesting revelation to me, but that still doesn't change the fact that the interface didn't cause the problem and the fact that the interface wasn't supplied by a monopoly.
IANAL, but I play one on
What really happened is the US control module computers stopped responding to any inputs from the ground. They weren't able to control the station or tell it to shutdown or anything. Their plan to fix it (last I heard) was to have the Russian control module move and shake the ISS around until the US system thought it was out of control and went into what is called Free Drift Mode. In this mode, it can be completely controlled by the Russian module and we can debug the system and bring it back online.
IANAL, but I play one on
IIRC, the stated reason for using Windows is that astronauts (who are not necessarily computer experts) can manage it. Well, is it worth the risk?
Wouldn't it be better to use whatever system is best for the job, and send a computer guy up there to maintain it?
(Yes, I admit it, I'm only suggesting this because it increases my chances of getting into space from zero to negligible.)
--
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
There's *nothing* in the CNN article ... implying that Windows is the reason for the server crash
Micro~1.oft spent a lot of time, energy and money to ensure that their OSes were dominant on the ISS. They have spent millions of $$$ just to place a few hundred copies on the ISS, in the space flight centre, and in the russian control centres. The reason for this massive cost was to use the ISS as a giant marketing tool, and they even created a whole marketing campaign around it.
Windoze is not the only OS on the ISS, but it is dominant. There are some *nixes running critical communication processes, such as the main link from the station to ground points, and these have not had many problems at all.
When the M$ servers started crashing, the whole micr~1.oft in space campaign was put on hold. If you read the logs created by the station crew, they are pretty upset having to spend entire days trying to fix micr~1.oft problems. NASA has a direct line into the best and brightest engineers at M$, but even they are clueless as to why certain processes hang, why backups fail to happen, why entire directories are blown away with no trace, or why new patches cause driver conflicts.
Since the Register article highlighting the ISS problems in the logs, micr~1.oft has been putting pressure on NASA to redact all mention of micr~1.oft. Certainly someone has been archiving copies of the logs since they appeared, so they can diff them later and see when NASA bows to micr~1.oft pressure.
As you noticed, none of the mainstream reporting now mentions micr~1.oft by name, that is due to a pressure campaign by one of the largest advertising bugdets in the US. But when the logs are posted for these events, you will notice a great many references to the machines running micr~1.oft, even if the name of OS is redacted out. If you do a little research, you will see these machines are running either DoS or windoze.
the AC
Hemos is like...sci-fi fans;he thinks technology is cool, but he hasn't bothered to understand the science it's based on
Dude, I was referring to the Yorktown discussion thread. I never said it BSOD'ed. I said crashed. There's a difference.
Here's the article about the Yorktown.
I used to work for a defense contractor, so I know how these things should be tested. You don't just test on good inputs, you test with bad ones. That's why I said that the app crashing was unacceptable. However, nothing should ever cause an OS to crash, especially in a military environment.
It doesn't have to be a BSOD, it could be some other failure mode, which is what appeared to happen to the Yorktown.
General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
Not that I believe this at all, but it occured to me and I figure it's amusing enough to share.
"Sorry, Dennis. That darn computer system crashed again, we just can't let ya launch right now. We figure it'll be fixed by... oh... October." <sotto voce: Frank, have you finished the bluescreen plan for Friday yet?>
It is specifically Solaris x86 running on a laptop.
-----
Try http://www.theregister.co.uk/content/2/18540.html to find out NASAs' rebuttal of that Register story. Seems it's not only /. that froths at the mouth at the thought of bashing IBM and Microsoft.
hc
Coax would have the advantage of plenty of shielding from electromagnetic interference. Otherwise, no advantage.
If you're reading this NASA, here's some advice. Buy some little metal doohickeys for the back of each networked computer. These doohickeys fit around a coax cable, can be screwed into the back of a power supply, and cost about 5 cents. In my experience, using these helps stabilize the cables a lot, and you get more uptime that way.
I am not a lawyer.
That's a space oddyssey, er, oddity.
And the software in question is provided by a huge monolith^H^H^H^Hpoly...-- @rjamestaylor on Ello
Real time software for mission critical systems is written in Ada. That's a no-brainer. If there is any assembler, it's tiny, of severely limited scope, and meticulously tested. In fact, having worked with some very low level networking code for ISS (in Ada), I doubt there's any assembler in there at all.
As to the 386's, they're rad hardened and known reliable. And, unlike the home computer I bought a couple of months ago that's state of the art, whether I need state of the art or not, the jobs these CPUs had to do simply didn't require anything faster than a 386, even given a hefty allowance of spare cycles and memory for future growth.
We bought what we needed (in space, rad hardening is not optional) and we didn't buy what we didn't need. That's not $400 hammers, that's the definition of responsible stewardship of the public's money.
It is not particularly scary. Software systems don't benefit from redudancy in the same way that hardware systems do. Most software bugs are systemic (ie, an uncommon code path that just doesn't work). So redudant software systems (even ones that are multiple seperate "clean room" implementations) frequently go down at the same time when in the same operating environment. For more information check out the work of Nancy Levison and the other people in her group.
ONE server went down... the THREE you speak of were clients, which of course are useless because of it.
Mooniacs for iOS and Android
I just want to contradict one point you made: "in space, rad hardening is not optional".
That is incorrect.
Microprocessors (electronics in general also) have a wide variety of radiation response out of the box. For instance, the AMD K6 is known to be pretty bad for single event latch-up and not very usable. On the other hand, the PC603 actually is not to bad right off a commercial foundry line.
With this in mind, there are also a number of ways to mitigate radiation effects, including latch-up protection circuits, EDAC, redundancy, cold sparing, etc. These methods can remove the number of effects that propogate to the subsystem or system level.
Radiation hardening in many instances can also succeed in preventing effects from reaching the system level, but there are a number of penalties to pay. Schedule is often the biggest (as you know, many rad hard processors are very old), cost (this stuff isn't cheap since it is boutique), performance (many rad hard processors can't perform to the speed of their commercial brothers because of layout changes, extra resistance etc.), and also many times the required power and size can be affected.
Now we are presented with two paths: 1) radiation harden a processor, 2) measure the rad effects of a commercial processor and mitigate them with extra circuitry (which has its own extra liabilities in cost, power, size, but typically are much lower).
In some instances, rad hard is the right choice (in human flight missions, it tends to be a good choice, but not always), and in some commercial products with some workarounds are best.
Simplifying the issue to "rad hardening is not optional" is wrong...it is optional, but if you say "radiation effects must be dealt with", then I agree with you.
I'm no fan of Windows... frankly, I use Linux whenever I get the chance. And it's great that Slashdot is evangelical about my favorite OS. But that's no excuse for bad reporting. There's *nothing* in the CNN article (or any of the others, for that matter) implying that Windows is the reason for the server crash. Implying that it is related (with the little tagline "NASA is using Windows for most of their computing functions"... why add this, except to add sensationalism to the article?), is just bad, bad form. If any other publication did this, I'm sure people here would be complaining about poor journalism, bias, etc, etc, et al, ad nauseum. Frankly, I think that little line should be removed, and the post should be allowed to stand on it's own. Please, don't put these little editorial comments into the stories. There's no need. All it does is damage Slashdot's (already shakey) credibility.
science is a religion
The worst part is that whenever they upgrade a piece of hardware, they have to re-register with Microsoft. Since their comm is no longer working, they have to use Morse Code by blinking a flashlight out the window.
---
---
Gort! Klatu Barata Nikto!
I mean really, people. Sure, we've all had bad M$ experiences, but blame the NASA engineers for a poorly designed redundancy, and let them blame their supplier.
While they're at it, maybe add the fact that the Canadarm2 is the big brother of the Canadarm that each space shuttle has. Maybe that it has 2 "hands", one on each end, that will allow it to "inchworm" its way along the outside of the station. Perhaps mention that Canadian Chris Hadfield, the first Canadian spacewalker (as of this mission) is the one who installed the arm??
You'd think every American news editor has a spark plug up their GI orifice that gives them a shock anytime they allow "Canada" to get into print. Sheesh.
Mr. Ska
I slit a sheet
A sheet I slit
Mr. Ska
Well, AFAIK, it's "Klaatu, Barata, N..." ergh. Necktie... Nickel... it's definitely an 'N' word.
Hmmm... "Klaatu, Barata, N<cough>" There you go. Works like a charm... : )
The password change is a well-known bug in the Novell client that they refuse to fix. Novell has suspended pretty much all work on their client software. Netware is dying, jump now while you can.
Your HP situation highlights 99% of Windows 2000 BSODs: faulty drivers. If you only use HCL-approved hardware and signed drivers, you aren't going to get any BSODs, unless you have faulty hardware.
I believe that the ISS is using NT4.0, in which case I'm not surprised. While somewhat stable, it pales in comparison to Windows 2000.
-------
-- russ
"You want people to think logically? ACK! Turn in your UID, you traitor!"
Natural != (nontoxic || beneficial)
The ISS computers that have been crashing (the MDMs) don't use Windows. The MDMs and other embedded computer systems are based on Intel 386 chips. If they have a kernel, it is probably VxWorks or other commercial RTOS. AFAIK, the only ISS computers that use Windows are some of the laptops, however, some use the Intel version of Solaris.
Why 386 chips? Because they have been tested and been found to be relatively radiation tolerant. More current chips are likely to be subject to more radiation-induced faults due to smaller transistor size.
As far as I can see, wouldn't that put the crew into a really hairy position? Without support from the ground, how they'd have no way to know how to try diagnosing / fixing the problem. And if they couldn't get it going... well, perhaps they'd all just goof off for a while, like when the boss takes a day off sick ;) ... but wouldn't they have serious problems, say, preparing for the next shuttle or Soyuz docking?
--
If the good lord had meant me to live in Los Angeles
NASA is using Windows for most of their computing functions,
In that case forget it. I'm not setting foot on that death trap! I think I'd rather take my chances on Mir! Oh wait, too late....
Personally, I'd still rather take my chances on Mir!
You can accomplish anything you set your mind to. The impossible just takes a little longer.
Total energy/mass of an object in orbit is 1/2 v^2 - GM(earth)/r; you get a circular orbit when the kinetic energy is equal to half the (negative) potential energy, i.e. v = sqrt(GM(earth)/r). The total energy of an object in an orbit (as opposed to an escape trajectory) is always negative.
--
spam spam spam spam spam spam
No one expects the Spammish Repetition!
Scientists restrict study to entire physical universe; creationist
The link that specifically mentions Windows, for those of you wondering, is here.
Now what do you guys make of this?
"Used the startup disk in the onboard software suite, but could not find a particular file while hunting around with DOS. This would have been much easier with some bootable media (CD-ROM?) that could run Windows. (Or if Shep was not indoctrinated by that "other" operating system). We may need an emergency boot capability again. After 5+ attempts, finally got the hard drive to take an image off the ghost CD. One of the Autoloader floppies went down, but SSC 2 is now running normally. ( 3+ hours troubleshooting). "
Guesses? Bets?