Computer Date Glitch May Limit Next Shuttle Launch
n3hat writes "Reuters reports that the next Space Shuttle mission may have to be deferred if it gets too close to the New Year because the onboard computers do not handle the changing of the date in the same way as the ground computers. From the article: '"The shuttle computers were never envisioned to fly through a year-end changeover," space shuttle program manager Wayne Hale told a briefing. The problem, according to Hale, is that the shuttle's computers do not reset to day one, as ground-based systems that support shuttle navigation do. Instead, after December 31, the 365th day of the year, shuttle computers figure January 1 is just day 366."
*ducks and runs for cover*
Seriously though- they never "envisioned" a mission occuring over the end-of-year? Let me guess: a defense (space) contractor designed the systems.
Please help metamoderate.
your idea of rock solid and their idea is little different. they have probably one of the most bug-free pieces of software in existence. it's tailored to do what it needs to do, nothing more, nothing less and it does it perfectly.
Gone!
Bug free except for the rollover to a new year...
Standard parts don't like being bombarded with radiation. Standard operating systems aren't fault-tolerant.
http://outcampaign.org/
It probably has to do with the mismatch between systems, not a lack of the engineers' or astronauts ability to count on their piggies and toes. Their current configuration doesn't have a middleware layer that accounts for any possible differences. In other words, while the shuttle continues on thinking it's the 366th day, the ground control systems might get confused (e.g. "Hey, there's no such thing as a 366th day") and their programs may crash (no pun intended) as a result.
Could it simply be that the date is a hard concept? You've got months with uneven number of days in them, including one month that can have an extra day added to it based on a somewhat complex concept (every 4 years, except if it's divisible by 100, UNLESS that year also happens to be divisible by 400). Calculating how many days there are between now and some future date, without using magic numbers? Heck, even software in the 90's couldn't get it right that there was a Feb 29, 2000.
Every date math equation I've seen has all sorts of wierd magic numbers in them where it isn't clear how those numbers were obtained. This may work just fine in day to day computations, but oddball bugs in date calculations can lead to some very wierd errors. Look at the C library sometime for the date functions. It's quite impressive.
Perhaps when the shuttles were designed, the inability to schedule across the new year was acceptable to avoid introducing odd bugs in the program to keep the software provably correct. Ground systems, which can be repaired in the middle of a mission easily, can be a little less bug-free, since a miscalculation won't cause the Earth to suddenly veer off course.
I work with military navigation software, and that is sorta remotely applicable to this. Here's my thoughts:
You people with your "WTF NASA SUXORS THIS IS EASY FIX!!!11!!1!one!!" need to stop and think for a second. This is a space application that carries HUMAN BEINGS! Think about how hard it will be to get this "easy fix" qualified, proven, documented, etc. Its not an easy task. A formal qualification test on the systems I work on (military land- and air-, but not space-based navigation software) can take months, and require all sorts of tests and documentation. Anything that isn't formally tested (i.e. run in a van, on a plane, etc) must be shown to not fail in any way; all exceptions handled, no bad data can cause an undesireable state, etc. I would hate to see the type of scrutiny that the Shuttle software goes through (although I could probably call somebody in our Space division across the street and find out).
Second, I don't know exact specifics, but based on the information provided, I think this "glitch" will have to do with the data/time difference between ground stations and the Shuttle computers. Things like message time stamping between the Earth and the Shuttle, etc, will be wrong, and things could be garbled or just dropped all together. The navigation systems themselves should not be terribly impacted since the date will just roll to the next day. Inertial instrument samples will continue to flow in and be correctly time stamped, be it the 366th or 400th or 500th day.
There are only 10 kinds of people in this world... those who understand binary and those who don't
I would say that requiring a reboot every year on December 31 is a pretty huge error.
I wouldn't. When you're designing something like Shuttle software that has to work absolutely flawlessly 100% of the time, you don't put in any frills. And on something that is only ever in space for 10-15 consecutive days at most, year-end handling is most certainly a frill. (If you are a professional software developer, it ought to be obvious just how many things could break by adding a feature like that. If the original design calls for a monotonically increasing day number, for example, there's very likely to be some code that relies on that, so you have to go through the entire system, checking everything that even touches the day counter to ensure it can handle a reset from 365 to 1--and then check everything that uses those routines, and so on and so on.)
I suspect this is routine to NASA, and the reporter just blew it out of proportion. After all, Windows can handle end-of-year rollover, so if the Shuttle can't then it's broken, right?
You know, people like you give programmers a bad rep. You just dive in for the fix without knowing the cause - and on top of that add a few bugs that are even harder to iron out if you happen to be the only person knowing that code segment.
TFA carefully does NOT say that anything actually will fail, but that something might fail. Thank you, Fallon: your link (http://www.fastcompany.com/magazine/06/writestuff .html) is a good explanation. (However, the "on-board shuttle group" is actually called the "on-board systems group").
It's like this: A clock rollover (such as at midnight or the last day of the month or year) always sets something back to zero. That resetting is a risk: Is there something somewhere that doesn't take the rollover into account? It may be an obvious bug, or not so obvious - what if the problem is dynamic? For example, what if system A sends some data and rolls over, and system B rolls over and receives the data? Then it looks like stale data, but isn't. How do you test for dynamic conditions like this?
Dodging this bullet is far, far cheaper than testing for it.
The only time I know of that a shuttle flight software bug affected a flight was uh...STS 2 or 3 or thereabouts. The shuttle often flies an updated load on one or two of its computers before the load is installed on all of them. On this mission, a new load on one GPC dumped (crashed) at T -9 seconds or so, causing everything to shut down automatically. The shuttle launched a day or two later, after the new load was rolled back.
Funny thing was, the same bug had occurred in the training simulators before launch, but was written off as a lack of fidelity of the simulator itself, not a bug in the flight software.
After that, the astronauts really began to appreciate running the real GPCs with the real flight software in the simulators.
PS: Although I work at NASA, this message is my own expression, and not that of NASA or my employer. I am a programmer only, not anyone with any kind of authority or insight except for my experiences here.
Pavlov wouldn't be so famous if he'd used a can opener instead of a bell.
So, what if, oh, say, the CO2 scrubbers need to work differently depending on how many days the mission has been run. So, they keep track of the first day number, and the current day number. The amount of CO2 scrubbing then is varied based on elapsed days.
^^and here's the key -- it's something you don't know about^^
Now, you make your little 5-second fix, and send seven astronauts into space.
New Year's Eve rolls around, and suddenly the mission started on day 360 and it's now day 1. Holy crap, says the scrubber, we have to scrub as though it's a 359-day mission, instead of a lousy 6.
Scrubbers go into overtime, and break. (Or, scrubber math is done in eight bits, and they think the shuttle's still on the ground and not ready to launch for another ~100 days due to integer roll-over. Or any other set of unforseen possibilities.)
Next, astronauts die of CO2 poisoning because the scrubber subsystem has been compromised.
Great fix, mister five-second-coder.
Do daemons dream of electric sleep()?