Slashdot Mirror


Computer Date Glitch May Limit Next Shuttle Launch

n3hat writes "Reuters reports that the next Space Shuttle mission may have to be deferred if it gets too close to the New Year because the onboard computers do not handle the changing of the date in the same way as the ground computers. From the article: '"The shuttle computers were never envisioned to fly through a year-end changeover," space shuttle program manager Wayne Hale told a briefing. The problem, according to Hale, is that the shuttle's computers do not reset to day one, as ground-based systems that support shuttle navigation do. Instead, after December 31, the 365th day of the year, shuttle computers figure January 1 is just day 366."

21 of 354 comments (clear)

  1. Well.... by SuperBanana · · Score: 3, Insightful
    ...I guess it -is- rocket science.

    *ducks and runs for cover*

    Seriously though- they never "envisioned" a mission occuring over the end-of-year? Let me guess: a defense (space) contractor designed the systems.

  2. Re:wtf? by schnikies79 · · Score: 4, Insightful

    your idea of rock solid and their idea is little different. they have probably one of the most bug-free pieces of software in existence. it's tailored to do what it needs to do, nothing more, nothing less and it does it perfectly.

    --
    Gone!
  3. Re:wtf? by Anonymous Coward · · Score: 4, Insightful

    Bug free except for the rollover to a new year...

  4. Re:wtf? by Schraegstrichpunkt · · Score: 4, Insightful
    Is there a reason these aren't built on standard parts and operating systems?

    Standard parts don't like being bombarded with radiation. Standard operating systems aren't fault-tolerant.

  5. Re:Uhm...and? by plaxion · · Score: 3, Insightful

    It probably has to do with the mismatch between systems, not a lack of the engineers' or astronauts ability to count on their piggies and toes. Their current configuration doesn't have a middleware layer that accounts for any possible differences. In other words, while the shuttle continues on thinking it's the 366th day, the ground control systems might get confused (e.g. "Hey, there's no such thing as a 366th day") and their programs may crash (no pun intended) as a result.

  6. Re:wtf? by lnjasdpppun · · Score: 2, Insightful

    And leap years? And anything else neither you nor I have thought of?

    I suspect they don't do a 5-second fix up because it's a space shuttle and they do far more testing and documentation for their code than any other project in existence.

  7. Re:wtf? by Xipher · · Score: 2, Insightful

    On top of that, it's a realtime system, none of this get it done when I want to, its get it done by this dead line, or people DIE!

    --
    I don't know everything.
  8. Re:wtf? by Anonymous Coward · · Score: 1, Insightful

    i would much rather nasa use outdated yet 99% bug free tech than current buggy tech....there is little room for error in space travel.....as we all know (rip)

  9. Could it be... dates are hard? by tlhIngan · · Score: 3, Insightful

    Could it simply be that the date is a hard concept? You've got months with uneven number of days in them, including one month that can have an extra day added to it based on a somewhat complex concept (every 4 years, except if it's divisible by 100, UNLESS that year also happens to be divisible by 400). Calculating how many days there are between now and some future date, without using magic numbers? Heck, even software in the 90's couldn't get it right that there was a Feb 29, 2000.

    Every date math equation I've seen has all sorts of wierd magic numbers in them where it isn't clear how those numbers were obtained. This may work just fine in day to day computations, but oddball bugs in date calculations can lead to some very wierd errors. Look at the C library sometime for the date functions. It's quite impressive.

    Perhaps when the shuttles were designed, the inability to schedule across the new year was acceptable to avoid introducing odd bugs in the program to keep the software provably correct. Ground systems, which can be repaired in the middle of a mission easily, can be a little less bug-free, since a miscalculation won't cause the Earth to suddenly veer off course.

  10. Re:Just goes to show by Alizarin+Erythrosin · · Score: 2, Insightful
    That's barely ten lines of code no matter what language you want it in...

    Do you even remember why the Y2K thing happened? People saved space back in the day by using a 2 digit year. Hell, in the 1970's, people were using a one digit year to save even more memory and storage space. The Space Shuttle uses very old technology for its computer systems (read: 1970's level technology), and doesn't have much memory. That extra 10 lines of code could make it oversized.

    Additionally, making a change to space (or even military) software requires a shitload of paperwork and testing. Its a wild guess, but it would probably take almost a year to get that "ten lines of code" into the Shuttle and cost more money than its worth to just not have the birds up in space at an end-of-year scenario.
    --
    There are only 10 kinds of people in this world... those who understand binary and those who don't
  11. Re:wtf? by Agelmar · · Score: 2, Insightful

    Actually, if you want to be correct, it was built for the Government. There's a difference - rather than building a piece of crap using underpaid (government) labor, we paid top dollar so that it could get subcontracted out multiple levels, while still winding up with the same crap.

  12. Hold up, everybody by Alizarin+Erythrosin · · Score: 4, Insightful

    I work with military navigation software, and that is sorta remotely applicable to this. Here's my thoughts:

    You people with your "WTF NASA SUXORS THIS IS EASY FIX!!!11!!1!one!!" need to stop and think for a second. This is a space application that carries HUMAN BEINGS! Think about how hard it will be to get this "easy fix" qualified, proven, documented, etc. Its not an easy task. A formal qualification test on the systems I work on (military land- and air-, but not space-based navigation software) can take months, and require all sorts of tests and documentation. Anything that isn't formally tested (i.e. run in a van, on a plane, etc) must be shown to not fail in any way; all exceptions handled, no bad data can cause an undesireable state, etc. I would hate to see the type of scrutiny that the Shuttle software goes through (although I could probably call somebody in our Space division across the street and find out).

    Second, I don't know exact specifics, but based on the information provided, I think this "glitch" will have to do with the data/time difference between ground stations and the Shuttle computers. Things like message time stamping between the Earth and the Shuttle, etc, will be wrong, and things could be garbled or just dropped all together. The navigation systems themselves should not be terribly impacted since the date will just roll to the next day. Inertial instrument samples will continue to flow in and be correctly time stamped, be it the 366th or 400th or 500th day.

    --
    There are only 10 kinds of people in this world... those who understand binary and those who don't
  13. Re:lame by Anonymous Coward · · Score: 2, Insightful

    Microsoft can't get a lot of things right. On the other hand, I sure hope Microsoft's software isn't anywhere near a space shutle, that would be a disaster waiting to happen.

  14. At least they know it's a problem... by jesterzog · · Score: 2, Insightful

    ...which is more than many software development processes would reveal. Chances are that this known restriction is on a check-list which every shuttle mission has to be checked against, and the list would exist precisely because the software development and verification process is so solid and conservative.

    As a professional software developer, I have heard on countless occasions about how the Space Shuttle software development process is so incredible, and how all other developers should try to live up their high standards.

    Opinions vary, but I don't think I'd ever recommend working to the same standards, unless the customer actually had good reason to require it. (NASA does.) Even aside from your own code, doing it properly would require an extensive understanding of any and every third party library and system the code interacts with, which could add orders of magnitude to the development time and cost, even if it's open source and open hardware. I don't like hacks and yucky untested code any more than most people, but at some point it can just make sense to avoid extensive and pedantic formal development processes in favour of just getting it to work.

    A lot of development processes (perhaps most) wouldn't have stopped the shuttle launching, even if this were reported as a bug. Chances are that it'd be forgotten about (if not fixed straight away), and someone would stumble on it again accidentally. Many bugs aren't even reported until someone's stumbled on them at least once. This is fine in most situations. Once it becomes a problem again, you can go and look it up, quickly find out everything that's known about it from before, apply any known workarounds, and spend time to fix it if necessary. The point, though, is that many systems wouldn't be sure to keep you informed about the restriction in a way that actively prevents someone stumbling on it later.

    I still agree that it seems a little strange that this problem wasn't fixed ages ago. Realistically, though, the Shuttle was never expected to fly this long. It sounds a lot like a compromise that was made in the earlier days when computers were more limited, probably even moreso for the restricted range of systems that are certified to work under such conditions. Any update is likely to be very expensive and time consuming, simply because the software development and verification process is so solid and reliable.

    I would say that requiring a reboot every year on December 31 is a pretty huge error. In this case, it is forcing NASA to launch earlier than they otherwise would wish.

    From the article you quoted, it sounds more like they dropped a spacewalk (for Hubble maintenance, probably not safety-critical) so they could return sooner and avoid encountering the bug. To me it sounds like they did what they should have done, with safety as a priority.

    Launching spacecraft is an industry in which the stakeholders usually prepare for possible or likely delays. NASA has to delay launches all the time for all sorts of reasons. I'm not sure why a possible software problem would be treated any differently. If the problem is with the managers dangerously forcing early launches, NASA should really be fixing their managers as a priority over fixing a known bug with a known workaround. Weighing it out, it's probably a lot cheaper, easier and safer to simply delay the occasional launch for a few more days, especially given that the Shuttle's remaining days are limited. Why risk the safety of future launches by making changes that will soon become obsolete?

    Anyway, those are just my own thoughts. I don't work in software development where the process is quite so strict, but I'm sure they know what they're doing when they don't fix something like this.

  15. That's pronounced "design decision" by achurch · · Score: 3, Insightful

    I would say that requiring a reboot every year on December 31 is a pretty huge error.

    I wouldn't. When you're designing something like Shuttle software that has to work absolutely flawlessly 100% of the time, you don't put in any frills. And on something that is only ever in space for 10-15 consecutive days at most, year-end handling is most certainly a frill. (If you are a professional software developer, it ought to be obvious just how many things could break by adding a feature like that. If the original design calls for a monotonically increasing day number, for example, there's very likely to be some code that relies on that, so you have to go through the entire system, checking everything that even touches the day counter to ensure it can handle a reset from 365 to 1--and then check everything that uses those routines, and so on and so on.)

    I suspect this is routine to NASA, and the reporter just blew it out of proportion. After all, Windows can handle end-of-year rollover, so if the Shuttle can't then it's broken, right?

  16. Re:wtf? by Splab · · Score: 5, Insightful

    You know, people like you give programmers a bad rep. You just dive in for the fix without knowing the cause - and on top of that add a few bugs that are even harder to iron out if you happen to be the only person knowing that code segment.

  17. Re:How Many Times? by mindwhip · · Score: 2, Insightful

    The real question should be why wasn't the (presumably) newer hardware on the ground specified to be compatible with the legacy hardware (i.e. the shuttle)....

    --
    [The Universe] has gone offline.
  18. A [sorta] Rocket Scientist Replies... by JetScootr · · Score: 3, Insightful

    TFA carefully does NOT say that anything actually will fail, but that something might fail. Thank you, Fallon: your link (http://www.fastcompany.com/magazine/06/writestuff .html) is a good explanation. (However, the "on-board shuttle group" is actually called the "on-board systems group").
    It's like this: A clock rollover (such as at midnight or the last day of the month or year) always sets something back to zero. That resetting is a risk: Is there something somewhere that doesn't take the rollover into account? It may be an obvious bug, or not so obvious - what if the problem is dynamic? For example, what if system A sends some data and rolls over, and system B rolls over and receives the data? Then it looks like stale data, but isn't. How do you test for dynamic conditions like this?
    Dodging this bullet is far, far cheaper than testing for it.
    The only time I know of that a shuttle flight software bug affected a flight was uh...STS 2 or 3 or thereabouts. The shuttle often flies an updated load on one or two of its computers before the load is installed on all of them. On this mission, a new load on one GPC dumped (crashed) at T -9 seconds or so, causing everything to shut down automatically. The shuttle launched a day or two later, after the new load was rolled back.
    Funny thing was, the same bug had occurred in the training simulators before launch, but was written off as a lack of fidelity of the simulator itself, not a bug in the flight software.
    After that, the astronauts really began to appreciate running the real GPCs with the real flight software in the simulators.
    PS: Although I work at NASA, this message is my own expression, and not that of NASA or my employer. I am a programmer only, not anyone with any kind of authority or insight except for my experiences here.

    --
    Pavlov wouldn't be so famous if he'd used a can opener instead of a bell.
  19. You're clearly not a very good SA by multipartmixed · · Score: 4, Insightful

    So, what if, oh, say, the CO2 scrubbers need to work differently depending on how many days the mission has been run. So, they keep track of the first day number, and the current day number. The amount of CO2 scrubbing then is varied based on elapsed days.

    ^^and here's the key -- it's something you don't know about^^

    Now, you make your little 5-second fix, and send seven astronauts into space.

    New Year's Eve rolls around, and suddenly the mission started on day 360 and it's now day 1. Holy crap, says the scrubber, we have to scrub as though it's a 359-day mission, instead of a lousy 6.

    Scrubbers go into overtime, and break. (Or, scrubber math is done in eight bits, and they think the shuttle's still on the ground and not ready to launch for another ~100 days due to integer roll-over. Or any other set of unforseen possibilities.)

    Next, astronauts die of CO2 poisoning because the scrubber subsystem has been compromised.

    Great fix, mister five-second-coder.

    --

    Do daemons dream of electric sleep()?
  20. Re:How Many Times? by Amouth · · Score: 2, Insightful

    I have the feeling that the shuttle is far more than the 5 9's .. if you think about it.. if the airline industry followed 5 9's they would have 1-2 crashs every day.. and that would be withing range..

    --
    '...if only "Jumping to a Conclusion" was an event in the Olympics.'
  21. GP is right. NASA programmers screwed up. by Vellmont · · Score: 2, Insightful

    What you've failed to realize is that the flaw isn't so much that they decided to not
    do a rollover, it's that the ground computers do a rollover, and the shuttle computers don't.

    --
    AccountKiller