Computer Date Glitch May Limit Next Shuttle Launch
n3hat writes "Reuters reports that the next Space Shuttle mission may have to be deferred if it gets too close to the New Year because the onboard computers do not handle the changing of the date in the same way as the ground computers. From the article: '"The shuttle computers were never envisioned to fly through a year-end changeover," space shuttle program manager Wayne Hale told a briefing. The problem, according to Hale, is that the shuttle's computers do not reset to day one, as ground-based systems that support shuttle navigation do. Instead, after December 31, the 365th day of the year, shuttle computers figure January 1 is just day 366."
Is there a reason these aren't built on standard parts and operating systems? If they ran their shuttles on something like Debian stable it would be a rock solid platform and probably end up saving them lots of money. Or am I missing something here.
Oh, shit! You mean we're not supposed to be following intergalactic star dates?? No wonder those programs I wrote have so many date bugs...
My blog
The shuttle runs on three modified IBM 360 systems. Were pushing 35, almost 40 year old systems here.
Do you know how many eligible 35 year old computer bachelors there are out there? Ill tell you: none. Of course the shuttle computers can't get a date.
Granted, the work they do is very impressive and the process is very exacting. But come on...they haven't been able to fix a simple year rollover event in 30 years?!?
From the Fast Company article:
I would say that requiring a reboot every year on December 31 is a pretty huge error. In this case, it is forcing NASA to launch earlier than they otherwise would wish. And this isn't the first time this type of problem has caused problems. The New Scientist has a similar article that goes into more detail:
The end-of-year rollover depends on the leap year and leap second (if any), and has traditionally been a source of problems.
Mea navis aericumbens anguillis abundat
Nah, everyone knows geeks are useless at dates because they never get any. Predictable failure, that one.
"I've got more toys than Teruhisa Kitahara."
I work with military navigation software, and that is sorta remotely applicable to this. Here's my thoughts:
You people with your "WTF NASA SUXORS THIS IS EASY FIX!!!11!!1!one!!" need to stop and think for a second. This is a space application that carries HUMAN BEINGS! Think about how hard it will be to get this "easy fix" qualified, proven, documented, etc. Its not an easy task. A formal qualification test on the systems I work on (military land- and air-, but not space-based navigation software) can take months, and require all sorts of tests and documentation. Anything that isn't formally tested (i.e. run in a van, on a plane, etc) must be shown to not fail in any way; all exceptions handled, no bad data can cause an undesireable state, etc. I would hate to see the type of scrutiny that the Shuttle software goes through (although I could probably call somebody in our Space division across the street and find out).
Second, I don't know exact specifics, but based on the information provided, I think this "glitch" will have to do with the data/time difference between ground stations and the Shuttle computers. Things like message time stamping between the Earth and the Shuttle, etc, will be wrong, and things could be garbled or just dropped all together. The navigation systems themselves should not be terribly impacted since the date will just roll to the next day. Inertial instrument samples will continue to flow in and be correctly time stamped, be it the 366th or 400th or 500th day.
There are only 10 kinds of people in this world... those who understand binary and those who don't
Perhaps the fact that the shuttle was being developed at the same time as ADA might have something to do with it. Or do you recommend using a not even fully designed, coded, and tested language for controlling the most complex piece of equipment that man has ever built?
I prefer the "u" in honour as it seems to be missing these days.
Imagine you are a member of the shuttle design team and you can make a choice (for the next 20 years) to either know for sure that you're with the kids at home on X-mas and New Year .... or you can suggest a software feature that could result in your New Year's Eve being spoiled down the road because you have to be for days in a dumb control room. Hey, what would you do??
And I still remember, when I was a kid, that we had that Apollo flight during X-mas. I think it was the one that would for the first time go behind the moon. Someone in the control room that year made it into an important enough person on the Shuttle program so that this WOULD NEVER HAPPEN AGAIN. :-)
Browsers shouldn't have a back button!! It's all about going forward...
They put computers in bra straps now? Sheesh, I was just getting used to the old ones, and now this?
... and then they built the supercollider.
>capability of missiles (and other defence systems) to handle war through a year-end changeover?
That's why every new year all the soldiers climb out the trenches, swap chocolates, cigarettes etc, and shake hands before climbing back in and resuming war the next day.
I want a list of atrocities done in your name - Recoil
Actually, the estimated failure rate for the shuttle program was 1 in 35, though the shuttles themselves may have been designed to withstand 100 launch/landing cycles*. This was a bit of an issue when the 25th mission resulted in a failure (since most of the population does not understand statistics).
And, for the record, there have been 117 launches, according to wiki, which I will take as accurate enough for this discussion (far less than 200).
*yes, IWAAE (I was an aerospace engineer) working for NASA, and was involved with shuttle payloads and structural reliability analyses.
Is it just my observation, or are there way too many stupid people in the world?
So, what if, oh, say, the CO2 scrubbers need to work differently depending on how many days the mission has been run. So, they keep track of the first day number, and the current day number. The amount of CO2 scrubbing then is varied based on elapsed days.
^^and here's the key -- it's something you don't know about^^
Now, you make your little 5-second fix, and send seven astronauts into space.
New Year's Eve rolls around, and suddenly the mission started on day 360 and it's now day 1. Holy crap, says the scrubber, we have to scrub as though it's a 359-day mission, instead of a lousy 6.
Scrubbers go into overtime, and break. (Or, scrubber math is done in eight bits, and they think the shuttle's still on the ground and not ready to launch for another ~100 days due to integer roll-over. Or any other set of unforseen possibilities.)
Next, astronauts die of CO2 poisoning because the scrubber subsystem has been compromised.
Great fix, mister five-second-coder.
Do daemons dream of electric sleep()?
I thought five 9's was the standard for uptime/availability. And you don't achieve that by having 1 server that has five 9's, you achieve that by having a group of servers, so when one goes down, your service is still up.
t ml
Space shuttle's a little different:
http://www.fastcompany.com/online/06/writestuff.h
Here we're talking *six* 9s of *bug-free code* (1 error in 420,000 lines of code in the previous version). Not uptime -- bugs. Mistakes. For the simple reason that if you make a mistake on the Shuttle, people die.
You won't get a Java implementation that bug-free without a crack team of developers working for decades, by which point Java will be just as outdated then as the Shuttle code is now.
Remember -- if it ain't broke, don't fix it!
Hundreds of comments and not a single one mentions that NASA is a CMMI Level 5 organization. For those that don't know (and apparently, that's a lot of you), CMMI, aka Capability Maturity Model Integration, is software ENGINEERING methodology for developing processes and technologies around IT systems. It is a very in-depth methodology for developing software and comes about as close to "engineering" as you can get in software development.
Here is a list of participants in this program.
And here is a general overview of what CMMI is.
And just to put it into perspective, when I was last working with CMMI, there were only 3 companies certfied at level 5. Nasa, Motorola, and another one I can't remember. I am sure that has changed but nonetheless, it's a big deal and shows a serious effort to do things in a controlled, measureable, testable, way.
I only bring this up to counter the ridiculous "solutions" that some have proposed on this site.
"I can fix that in 3 lines of code".
Well, great. That might work at YOUR company. But please don't do that at NASA. Despite what many think here, NASA is a top-notch software development house. And I would expect nothing less given what is at stake.