Slashdot Mirror


Long Uptime Makes Boeing 787 Lose Electrical Power

jones_supa writes: A dangerous software glitch has been found in the Boeing 787 Dreamliner. If the plane is left turned on for 248 days, it will enter a failsafe mode that will lead to the plane losing all of its power, according to a new directive from the US Federal Aviation Administration. If the bug is triggered, all the Generator Control Units will shut off, leaving the plane without power, and the control of the plane will be lost. Boeing is working on a software upgrade that will address the problems, the FAA says. The company is said to have found the problem during laboratory testing of the plane, and thankfully there are no reports of it being triggered on the field.

178 of 250 comments (clear)

  1. Oh come on. by fisted · · Score: 1

    You should see what happens after -2147483648 days of upt-- oh wait.

    1. Re:Oh come on. by IndigoZulu · · Score: 5, Interesting

      It could be the overflow of a counter of 10ms intervals. There are 86400 seconds per day, so 8640000 10ms intervals per day ... 2147483648 / 8640000 = 248.55

    2. Re:Oh come on. by Mirar · · Score: 2

      Oh, so they can make it fine for 497.10 days by changing the type to unsigned!

    3. Re:Oh come on. by SJHillman · · Score: 3, Informative

      Which is apparently what Windows does:

      https://www.ctm-it.com/it-supp...

      You'd think they would have learned since Windows 95/98 did the same thing.

      https://support.microsoft.com/...

      But hey, at least it goes 10 times as long now.

    4. Re:Oh come on. by Anonymous Coward · · Score: 1

      The least risk change might be to have the system refuse to consider a takeoff when your above half of that limit and then require a power cycle at that point, while on the ground. If you change to say I64, then everything that interacts with that time must be updated and validated. That may of course be the long term solution, though it would seem slightly riskier than just forcing the repower while the plane is on the ground. There could be other valid reasons to repower it as well and again that may be done in a maintenance check anyway.

    5. Re:Oh come on. by jones_supa · · Score: 2

      I am not completely familiar with the matter, but I remember hearing that using signed types in some situations can be a better choice, even when the value would normally be used to represent only a non-negative value. It could make overflows more obvious and calculating deltas might be easier? If someone actually knows about this stuff, feel free to chime in.

    6. Re:Oh come on. by Anonymous Coward · · Score: 1

      Still pretty scary that a simple counter like that can cause a chain of events that chucks off the power completely. How can this be possible?

    7. Re:Oh come on. by fisted · · Score: 3, Informative

      In C, overflowing a signed integer type is undefined behaviour; unsigned type wrap around to zero in a defined manner.
      Of course, either is often undesired, but the latter at least doesn't allow basically anything to happen.

    8. Re:Oh come on. by Anonymous Coward · · Score: 1

      Which makes you wonder whether there's a maintenance check that would force a power cycle before you get to the errant condition.So it's the functional equivalent of dividing by zero shuts the power off when there's a denominator check the line above.

    9. Re:Oh come on. by plopez · · Score: 2

      It doesn't matter what country programmers come from, in my experience too many programmers have no clue about reality outside of their cube. They are building software for things they do not understand. I am going to rant about this in another thread so I will leave it at that for now.

      --
      putting the 'B' in LGBTQ+
    10. Re:Oh come on. by fuzzyfuzzyfungus · · Score: 2

      Man, if only we could afford to use 64 bit values for things. I realize that transistors are simply too expensive right now; but perhaps, in the future, the miracles of science will make this possible...

    11. Re:Oh come on. by dunkelfalke · · Score: 4, Funny

      And this is why C should never be used for mission critical software.

      --
      "It's such a fine line between stupid and clever" -- David St. Hubbins, Spinal Tap
    12. Re:Oh come on. by terrab0t · · Score: 1

      That was my guess. As soon as I read the problem I thought of the bug in the patriot missle software that the US ran into during the first Gulf war back in 1991.

      In that case it was even worse. From the page I linked:

      They told the Army that the Patriots suffered a 20% targeting inaccuracy after continuous operation for 8 hours.

    13. Re:Oh come on. by catchblue22 · · Score: 1

      Still pretty scary that a simple counter like that can cause a chain of events that chucks off the power completely. How can this be possible?

      Yeah. Imagine if it happened on final approach.

      --
      This and no other is the root from which a tyrant springs; when first he appears as a protector - Plato (423 to 327 BC)
    14. Re: Oh come on. by RightwingNutjob · · Score: 1

      Not like Ada, where you can fuck up much easier by choosing the wrong floating point type for your altitude indicator.

    15. Re:Oh come on. by delt0r · · Score: 1

      In both C and C++ just about everything in the spec has the words "undefined behaviour".

      --
      If information wants to be free, why does my internet connection cost so much?
    16. Re:Oh come on. by stooo · · Score: 1

      >> ... How can this be possible?

      It's called common mode failure. You have multiple identical redundant computers running the same software. All of them have the same bugs. Boom.
      an example here : http://www.around.com/ariane.h...

      --
      aaaaaaa
    17. Re:Oh come on. by CauseBy · · Score: 1

      I didn't know that. Is that considered a feature or a bug? Why not just define it to wrap around to the min int value?

    18. Re:Oh come on. by fisted · · Score: 1

      I'm not sure if one could think of it in terms of 'feature' or 'bug', but if anything, i'd go for feature. There are at least 3 major ways to represent negative numbers (1s complement, 2s complement, sign-magnitude), and overflowing the representation of INT_MAX doesn't necessarily give anything close to INT_MIN (e.g. sign-magnitude would roll over to negative zero).

      So in order to be as widely adaptable as possible, C can't assume a particular way of how negative numbers are represented, so defining the consequences of overflowing is intentionally omitted

    19. Re:Oh come on. by fisted · · Score: 1

      That being said, I think it wouldn't hurt making it implementation-defined behaviour instead (which isn't much better from a portability point of view, but at least requires implementations to document their choice)

    20. Re:Oh come on. by shutdown+-p+now · · Score: 1

      The main reason why people recommend it is because of what happens if you mix signed and unsigned. If they are of the same size (e.g. signed int and unsigned int), then according to the spec, the result will be unsigned. So you divide, say, -2 by 1u, and get something very unexpected. If you always use the same signedness, then you can dodge this problem, and in general you do want to represent negative numbers every now and then, hence the default is signed.

      In practice it doesn't work so well simply because so much of the language and the standard library uses unsigned anyway. For example, sizeof is unsigned, and so is strlen(), and in C++, size() on all the standard container types, including string. So if you want to write C or C++, you have to deal with signed/unsigned mismatch anyway.

    21. Re:Oh come on. by david_thornley · · Score: 1

      You have to understand the history. C was designed as a machine-independent system implementation language, which meant that it had to have as good performance as possible for commonly used things like integers, and it ran on a much wider range of processors than you'd expect to run into nowadays. The processors could have ones' complement, twos' complement, or signed magnitude for negative integer values. They could be designed to halt execution and raise some sort of signal on integer overflow, or designed to ignore it. Machine-addressible units of memory could range from one bit to 60 bits. Given the variety in what processors would do, any specific behavior would kill performance for processors that didn't match the behavior, so they left it as "undefined".

      There were other sorts of incompletely specified behaviors. "Implementation-defined" usually referred to fairly minor differences, such as how long an "int" was. Unspecified behavior usually referred to cases where there would be a few obvious choices, such as order of evaluation of function parameters. Whether or not it was a good idea, C generally labeled more complex potential incompatibilities as undefined. Personally, I'd like to see less "undefined behavior", substituting "implementation-defined" or "unspecified" as much as possible.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
  2. Have you tried turning it off and on again? by Anonymous Coward · · Score: 5, Funny

    Finally!

    IT support advice that's useful!

    1. Re:Have you tried turning it off and on again? by rjniland · · Score: 4, Interesting

      Yes, but perform a clean systems shut down BEFORE turning off power.

      I was on an airliner once that crashed at the gate, prior to departure.

      Ground power was disconnected before they had spun up the APU. Lights out. Lights on. ... Several minutes later we get an announcement that we'd have to wait for a backup plane, which took 45 minutes to arrange.

      They were unable to reboot the airliner.
      Robust systems design wasn't a phrase that came to mind.

  3. This is Boeing Tech Support by mikeabbott420 · · Score: 4, Funny

    "have you tried turning it off and then back on?"

    --
    This program was made possible by a grant from the Ultra-Humanite, and viewers like you.
    1. Re:This is Boeing Tech Support by sphealey · · Score: 1

      The first time I was on a new plane where the pilots did that at the gate to "fix a computer glitch" (~1998) I was utterly terrified.

      sPh

    2. Re:This is Boeing Tech Support by fahrbot-bot · · Score: 1

      "have you tried turning it off and then back on?"

      • Customer: How do I do that?
      • Tech Support: Use the big red switch at the back of the fuselage, just under the elevator. Flip it to 0/Off, count to 10 and flip it back to 1/On.

      True story: Back in the early 1980s, I actually had a long-distance phone call with someone in which I was the "tech support" part of the above conversation. ... Me: "Are you sitting in front of the PC? Lean to your right... See that big red switch at the back of the case? ..."

      --
      It must have been something you assimilated. . . .
    3. Re:This is Boeing Tech Support by minstrelmike · · Score: 1

      How about CNTL-ALT-DEL?

      Yup. Reboot the plane every time it's at the gate.

    4. Re:This is Boeing Tech Support by jcdr · · Score: 1

      I remember that in 1996 the pilot of a FBW aircraft has say that one computer displayed an error, then there restarted the computer and the error was not displayed again, so all is nominal and we can go. He didn't detailed the error displayed. Maybe this was minor, maybe not. The 6 hours fly was without any problem.

    5. Re:This is Boeing Tech Support by plopez · · Score: 1

      "Thank you for calling Boeing tech support. Did you most of your questions can be answered online in our FAQ section? Simply go to www.boeingcares.com/customercare/support/FAQ. If this is an ground problem press 1, if it is a maintenance question please press 2, if this is about the galley hotbox recall press 3, for in-flight problems press 4"

      *beep*
      "You have selected in-flight problem. For engine fires press 1, for structural failure please press 2, for fuel system faults please check 3, for all other in-flight challenges please press 4."

      *beep*
      "You have selected other. Your call is very important to us you will be transferred to the next available customer care representative. Did you most of your questions can be answered online in our FAQ section? Simply go to www.boeingcares.com/customercare/support/FAQ. Due to unusually high call volumes you expected hold time is 15 minutes. Please hold..."

      --
      putting the 'B' in LGBTQ+
    6. Re:This is Boeing Tech Support by fuzzyfuzzyfungus · · Score: 3, Funny

      NTSB investigators reported the cause of the crash as 'Controlled reboot into terrain".

    7. Re:This is Boeing Tech Support by daveime · · Score: 1

      Cannot find CNTL key, please suggest alternative.

  4. Very unlikely to be triggered in the field by Brandano · · Score: 2, Informative

    A commercial plane will most probably undergo through several maintenance events and checks during that sort of time frame, where cycling the power is part of the procedure.

    1. Re:Very unlikely to be triggered in the field by hawguy · · Score: 4, Insightful

      A commercial plane will most probably undergo through several maintenance events and checks during that sort of time frame, where cycling the power is part of the procedure.

      It's very reassuring to know that it probably won't happen.

    2. Re:Very unlikely to be triggered in the field by antiperimetaparalogo · · Score: 1

      A commercial plane will most probably undergo through several maintenance events and checks during that sort of time frame, where cycling the power is part of the procedure.

      Yes, but when you have people taking pride for their desktop's uptime... well, better safe than sorry!

      --
      Antisthenes: "Wisdom begins by examining the words/names." - excuse my English, i am (slightly...) better with my Greek!
    3. Re:Very unlikely to be triggered in the field by compro01 · · Score: 2

      You will probably not be struck by lightning, but I can't guarantee that it won't happen.

      Actually, when talking about airliners, getting struck by lighting is a fairly common occurrence. A typical airliner experiences a lightning strike about once a year.

      --
      upon the advice of my lawyer, i have no sig at this time
    4. Re:Very unlikely to be triggered in the field by confused+one · · Score: 5, Interesting

      If it ever happened on a plane, then it means that the maintenance was intentionally skipped. If they reach 248 days of continuous operation then a number of significant maintenance cycles have been skipped (some 23-25 inspection / maintenance cycles that generally require shutting down the electrical system). The generators in question are attached to the engines. The engines have a overhaul schedule that is shorter than 248 days of continuous operation. If they managed to reach this point, then the major maintenance cycles have been skipped and the engines are long overdue for a tear down inspection and overhaul. Any plane which could reach this point, 248 days of continuous operation missing all of the required maintenance; this is not a plane (or an airline for that matter) which anyone should be flying on.

    5. Re:Very unlikely to be triggered in the field by Mirar · · Score: 1

      Waiting 248 days on the tarmac before flight... Improbable. I hope.

    6. Re:Very unlikely to be triggered in the field by kthreadd · · Score: 2

      If it ever happened on a plane, then it means that the maintenance was intentionally skipped.

      And that would of course never happen.

    7. Re:Very unlikely to be triggered in the field by Anonymous Coward · · Score: 1

      no he means intentionally skipped 25 times.

      This is going from "Airline that has some dodgy fucking maintenance crew" to "Airline that just fired the ground support staff" and you won't be able to get on the plane because the FAA will deregister them.

      idiot.

    8. Re: Very unlikely to be triggered in the field by JWW · · Score: 1

      Yes, but if your desktop fails it doesn't fall out of the sky.... most of the time

    9. Re:Very unlikely to be triggered in the field by sphealey · · Score: 1

      The entire world isn't the US/Japan/EU. While most airlines outside that region who operate 787s run tight operations (Ethiopian for example is often mentioned as very well-run with a strong safety culture), there are a few who do not.

      That said, in the few instances where less organized airlines have managed to acquired 787s they are probably being shut down 2-3 times/week much less every 9 months.

      sPh

    10. Re:Very unlikely to be triggered in the field by Anonymous Coward · · Score: 2, Funny

      You must not fly United.

    11. Re:Very unlikely to be triggered in the field by mrchaotica · · Score: 1

      Hey, it could be possible on planes flying the Tripoli - Mogadishu - Kabul route!

      --

      "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

    12. Re:Very unlikely to be triggered in the field by plopez · · Score: 1

      Is that a cold boot or a warm boot?

      --
      putting the 'B' in LGBTQ+
    13. Re:Very unlikely to be triggered in the field by JWSmythe · · Score: 1

      That's what I was thinking. I didn't look it up, but I'd be pretty sure that the maintenance interval is shorter than 5,952 hours.

      --
      Serious? Seriousness is well above my pay grade.
    14. Re: Very unlikely to be triggered in the field by Sloppy · · Score: 1

      In an alternate timeline, Keith Moon found the 21st century to be full of challenges.

      --
      As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
    15. Re:Very unlikely to be triggered in the field by Rich0 · · Score: 1

      Sure, but if somebody did operate an airliner in this manner, I imagine many other components would be failing, creating numerous hazardous conditions.

      Maintenance schedules on big things like airliners aren't just created arbitrarily. If the manual says to inspect the turbine blades every n hours then somebody probably did a study that shows that at x% of n hours you start to get measurable deterioration. If they could make the intervals longer they would - it would be a major selling point for the plane.

      Sure, this software bug should be fixed, but in general if you're going to allow companies to ignore the manufacturer's guidelines, then you can't really hold the manufacturer responsible for failure.

    16. Re:Very unlikely to be triggered in the field by Idarubicin · · Score: 1

      A commercial plane will most probably undergo through several maintenance events and checks during that sort of time frame, where cycling the power is part of the procedure.

      It's very reassuring to know that it probably won't happen.

      As other posters have noted, 248 days of operation means skipping twenty-plus maintenance and inspection cycles, plus missing one or more engine overhauls. That sucker's going to fall out of the sky due to a hardware problem before the software error gets the chance.

      Even in the absence of regular, scheduled, required maintenance, there will be hardware failures due to stuff wearing out, with sufficient frequency to force reboots at less-than-eight-month intervals. Honestly, the FAA is going to ground any airline that was so lax as to get within six months of tripping over this bug.

      That's not to say that this bug is a good or acceptable thing, nor that something like it couldn't have much more serious effects. But this particular error is a non-issue from a real-life consequences standpoint.

      --
      ~Idarubicin
    17. Re:Very unlikely to be triggered in the field by thegarbz · · Score: 1

      If it ever happened on a plane, then it means that the maintenance was intentionally skipped. If they reach 248 days of continuous operation then a number of significant maintenance cycles have been skipped (some 23-25 inspection / maintenance cycles that generally require shutting down the electrical system). The generators in question are attached to the engines. The engines have a overhaul schedule that is shorter than 248 days of continuous operation. If they managed to reach this point, then the major maintenance cycles have been skipped and the engines are long overdue for a tear down inspection and overhaul. Any plane which could reach this point, 248 days of continuous operation missing all of the required maintenance; this is not a plane (or an airline for that matter) which anyone should be flying on.

      Are you trying to say that to get to this point required maintenance would need to be skipped?

    18. Re:Very unlikely to be triggered in the field by hawguy · · Score: 2

      If it ever happened on a plane, then it means that the maintenance was intentionally skipped. If they reach 248 days of continuous operation then a number of significant maintenance cycles have been skipped (some 23-25 inspection / maintenance cycles that generally require shutting down the electrical system). The generators in question are attached to the engines. The engines have a overhaul schedule that is shorter than 248 days of continuous operation. If they managed to reach this point, then the major maintenance cycles have been skipped and the engines are long overdue for a tear down inspection and overhaul. Any plane which could reach this point, 248 days of continuous operation missing all of the required maintenance; this is not a plane (or an airline for that matter) which anyone should be flying on.

      You would think that if this situation was unlikely to ever happen in practice that the FAA wouldn't have deemed it necessary to issue an AD requiring that the GCUs be power cycled at intervals no longer than 120 days. You'd think they'd already be aware of required maintenance intervals that require powercycling the GCUs, and they waived the usual comment period before issuing the AD due to the perceived imminent danger.

    19. Re:Very unlikely to be triggered in the field by pem · · Score: 1

      Meh. Those will get shot down well before 248 days are up.

    20. Re:Very unlikely to be triggered in the field by StikyPad · · Score: 1

      My locality posts speed limit signs in residential areas despite the fact that there are statewide speed limits of 25MPH in residential areas, and despite the fact that drivers are required to know this to pass the driving test.

      Redundant != pointless or worthless. In both cases, it reduces the operator's ability to say "I had no idea!"

    21. Re:Very unlikely to be triggered in the field by tverbeek · · Score: 1

      "most probably".

      Relax.

      --
      http://alternatives.rzero.com/
  5. Lesson Here by TechNeilogy · · Score: 1

    Always, always, always do the math on counters and give yourself orders of magnitude of space. Figured this out the hard way once (fortunately not in a situation where safety was a concern).

    --
    "The wisdom of the Patriarchs was that they *knew* they were fools." --Master Foo
    1. Re:Lesson Here by fisted · · Score: 1

      If you did the math, you don't need excess space. If you need excess space, you're just shifting the day of failure into the future. Yes, perhaps far enough, but still.

    2. Re:Lesson Here by Megane · · Score: 2

      Also, use the difference of the current time minus the start time, instead of computing the end time and using a simple less than/greater than comparison. This properly handles wraparounds, and only has a problem with differences more than half of the full range. (so don't keep comparing the time after it's ended!)

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
    3. Re:Lesson Here by tompaulco · · Score: 1

      Always, always, always do the math on counters and give yourself orders of magnitude of space. Figured this out the hard way once (fortunately not in a situation where safety was a concern).

      As far as I am concerned, there are three valid quantities in programming. Zero, one and unlimited.

      --
      If you are not allowed to question your government then the government has answered your question.
    4. Re:Lesson Here by HornWumpus · · Score: 1

      Good luck using a float as a counter. It won't overflow, but will eventually stop counting.

      The trick is knowing what you are doing. Which means erasing that 'three valid quantities' thinking.

      --
      John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
    5. Re:Lesson Here by dgtangman · · Score: 1

      Good idea in principle, not always helpful in practice. I diagnosed an application failure on a UNIX system some years back that resulted from using the system's "time since last boot" function as a real-time clock with greater than one-second precision. We discovered that in order to prevent the terrible things that would happen if the 32-bit signed counter of 0.01-second intervals ever overflowed, the UNIX vendor had programmed the reported time to stop changing when it reached 2^31-1. Since the system provided no other interface that provided elapsed time with greater than one-second precision, we ultimately had to tell those of our customers with systems from that vendor to be sure to reboot their servers at least every six months.

    6. Re:Lesson Here by ttucker · · Score: 1

      If you did the math, you don't need excess space. If you need excess space, you're just shifting the day of failure into the future. Yes, perhaps far enough, but still.

      What math would you do to determine exactly how high a counter should count?

      Would using a 64-bit long on a millisecond counter be lazy programming?

    7. Re:Lesson Here by Anne+Thwacks · · Score: 1
      The correct answer is: During pre-flight ground checks, detect all counters at imminent risk of overflowing*, and flag requirement for corrective action at next maintenance. Probably should be checked at all routine services as well.

      * "imminent risk of overflowing" probably means less than four routine maintenance intervals remaining, but consult the requirements document for more detail.

      This is aerospace, not gaming.

      --
      Sent from my ASR33 using ASCII
    8. Re:Lesson Here by ChrisMaple · · Score: 1

      63 bits for a nanosecond counter gives 292 years.

      --
      Contribute to civilization: ari.aynrand.org/donate
    9. Re:Lesson Here by TechNeilogy · · Score: 1

      Trust me, someone somewhere will leave it running for 293 years; it's what users do.

      --
      "The wisdom of the Patriarchs was that they *knew* they were fools." --Master Foo
    10. Re:Lesson Here by petervandervos · · Score: 1

      If you did the math, you don't need excess space. If you need excess space, you're just shifting the day of failure into the future. Yes, perhaps far enough, but still.

      What math would you do to determine exactly how high a counter should count? Would using a 64-bit long on a millisecond counter be lazy programming?

      Yes, that is lazy programming.

      You should not determine a duration by subtracting two points of times from each other. You should call a function that can handle timer overflows.

    11. Re:Lesson Here by ttucker · · Score: 1

      A 64 bit signed long counter will merrily count milliseconds for 29,238 millennia.

    12. Re:Lesson Here by ttucker · · Score: 1

      The correct answer is: write the stuff in a language that is safe in the first place.

    13. Re:Lesson Here by ttucker · · Score: 1

      63 bits for a nanosecond counter gives 292 years.

      My post was not about nanoseconds, it was about milliseconds.

    14. Re:Lesson Here by petervandervos · · Score: 1

      Yes, but is still lazy.
      And you have to adjust a lot of variables to become long. All temp vars that hold a timestamp. If you miss a single one, your screwed.

    15. Re:Lesson Here by ttucker · · Score: 1

      And you have to adjust a lot of variables to become long. All temp vars that hold a timestamp. If you miss a single one, your screwed.

      Yes, the program would have to be implemented without error, to not have an error... that is a tautology. Pragmatically, use a statically typed language, and do not change anything, use the correct type while implementing the program the first time.

      What would a non-lazy programmer use instead? An arbitrary precision int or something? Can you think of any downsides to that approach?

    16. Re:Lesson Here by HornWumpus · · Score: 1

      Global replace 'long' and 'int' with the 'unlimited size int type name' and report back just how badly your system now runs.

      --
      John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
    17. Re:Lesson Here by petervandervos · · Score: 1
      A non lazy programmer shouldn't subtract two timestamps from each other to get a duration but uses a (self written) function that can handle overflows.

      A programmer that doesn't have total control over the whole system (in most cases) should reboot the system after a fixed amount of time. Even your iPhone or Android phone can not handle half a year uptime (at least all phones I have seen).

      Yes, the program would have to be implemented without error, to not have an error... that is a tautology.

      Sorry, programming for more than eh.. 35 years but never managed to write a non trivial program that doesn't have errors in it. But that's not really a problem as long as the program can recover from it.

      One hint: if you see a solution to a problem and think "that is easy way to solve it" don't use it. It will always come back and haunt you (like using a long for a timer).

    18. Re:Lesson Here by ttucker · · Score: 1

      A non lazy programmer shouldn't subtract two timestamps from each other to get a duration but uses a (self written) function that can handle overflows.

      I am not sure who you are even talking to. My response was in response to a smart ass comment made by a user named fisted, where he basically said that someone was a moron for suggesting counters that will run for orders of magnitude longer (ie. tens of thousands of millennia) are a pretty OK idea.

      Nobody mentioned calculating duration besides you (in a perfectly sensible way, I might add). This is a smart answer to the question that it is an answer to, but a really kind of silly answer to a question that it is not an answer to.

    19. Re:Lesson Here by petervandervos · · Score: 1

      Would using a 64-bit long on a millisecond counter be lazy programming?

      I am not sure who you are even talking to.

      Ah, that would be my mistake. As a non native English speaker I sometime mis 'irony'.

      Thanks for the conversation.

  6. Centiseconds in signed 32bit int by Roceh · · Score: 1

    A signed integer overflow for timing - scary...

    1. Re:Centiseconds in signed 32bit int by wonkey_monkey · · Score: 1

      That's why I always use unsigned integers like a boss.

      --
      systemd is Roko's Basilisk.
  7. Control unit runs at 100 Hz? by photonic · · Score: 5, Insightful

    I guess this might be due to a 32-bit signed integer being incremented at 100 Hz: 2^31 / 24 / 3600 / 100 = 248.5 days.

    --
    karma police: arrest this man, he talks in maths; he buzzes like a fridge, he's like a detuned radio. [radiohead]
    1. Re:Control unit runs at 100 Hz? by Megane · · Score: 1

      At least that's better than Window 98 crashing after 7 weeks! (because 1ms instead of 10ms)

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
    2. Re:Control unit runs at 100 Hz? by Anonymous Coward · · Score: 2, Funny

      I call BS. No WIndows 98 machine could possibly stay up for 7 weeks, so this was a non-issue.

    3. Re:Control unit runs at 100 Hz? by Anonymous Coward · · Score: 1

      Which actually caused a shutdown of the Voice Switch at the FAA's Los Angeles Center in 2004 when maintenance failed to reset the system on schedule to avoid this bug.

      http://it.slashdot.org/story/0...

    4. Re:Control unit runs at 100 Hz? by bosef1 · · Score: 2

      That makes a lot of sense. A lot of aviation power systems run with 400 Hz AC current (the higher frequency lets them use smaller transformers). They could be dividing down the power signal to 100 Hz, and using that to increment a counter.

      The other option is that many operating systems use 10 ms = 100 Hz for their internal interrupt timers. So it could just be a counter that is being incremented every interrupt cycle, and doesn't care what frequency of electricity is being used.
      (cf. the jiffy http://en.wikipedia.org/wiki/Jiffy_(time) )

    5. Re:Control unit runs at 100 Hz? by TheRealHocusLocus · · Score: 5, Funny

      I guess this might be due to a 32-bit signed integer being incremented at 100 Hz: 2^31 / 24 / 3600 / 100 = 248.5 days.

      Yes, the moment the big bird would shut down was correctly prognosticated by the Connecticut Yankee in King Arthur's Court. While testing a crowbar circuit he ran out of time and came to while munching on phattened feasant at Medieval Times, in a daze of King Arthur. He noticed an unused carrion bit, and realized that birds of prayer who managed the King's affairs were hard-sinewed to pluck quills for signing and always discarded the carrion bit. He caught the underflow was heralded by the people and befriended by the King, who set him to work hacking the Code of Chivalry and cracking the Y1K problem. In that time there were only punch cards and knights on horseback only had a resolution of 1 bit, so tournaments were long the fields were full of snakes, to avoid spooking the horses the knights would dismount and cleave them with sword, leaving half-adders strewn about. It was Pendragon who had built the famous Round Table with 12 seats, two complete I Chings, where Arthur and the knights would drop in and punch out binary sums in a rudimentary form of patty-cake, which inspired the mechanical circular adder of later years. The Yankee's refinement was a 13th chair left unoccupied to mark the betrayal of Judas, and also to serve as a carrion bit.

      There is a great deal more about gum-powder and 99 cent gamut of Steampunk-driven micro commerce, a Debian release called 'Guinevere' and a whole lotta Lancelot, but time is fun when you're having flies.

      --
      <blink>down the rabbit hole</blink>
  8. Failsafe by ISoldat53 · · Score: 1

    How is losing power in an airplane a safe mode?

    1. Re:Failsafe by antiperimetaparalogo · · Score: 1

      How is losing power in an airplane a safe mode?

      In the same way as cutting off the power is a safe mode for any machine? But i guess that for a plane it's better to do it while on the ground...

      --
      Antisthenes: "Wisdom begins by examining the words/names." - excuse my English, i am (slightly...) better with my Greek!
    2. Re:Failsafe by confused+one · · Score: 1

      It's a failsafe mode for the controller and generator. There are four (4) of them. There is more than enough redundancy.

    3. Re:Failsafe by X0563511 · · Score: 2

      ... not when they would all have nearly the exact same runtime - they would all hit the failsafe at around the same time.

      Not that this should ever happen in the air - as others have said, if the thing manages to run for this long, someone hasn't been doing maintenance.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    4. Re:Failsafe by stooo · · Score: 1

      Yeah, except that there may be other bugs in this piece of software which could trigger a common mode failure.

      --
      aaaaaaa
  9. If Boeing believed in software QA.... by Bomarc · · Score: 2

    For all of the QA at Boing; they don't believe in software QA. Take a look at their job openings some time: In years of searching, I've seen only one software QA position, and it wasn't dealing with aircraft. Any such search results will return developers that are to write their own tests against the spec. Developers are not Testers.... and I'll ask: How many more such bugs are out there?

    I know of two other software "bugs" ... that can be attributed to a lack of QA. How many people will die due to a bad management decision on the part of Boeing?
    Disclosure: Yes, I'm a software QA / Test professional.

    1. Re:If Boeing believed in software QA.... by binarylarry · · Score: 1

      Because these people are normally called TEST PILOTS. ;)

      --
      Mod me down, my New Earth Global Warmingist friends!
    2. Re:If Boeing believed in software QA.... by Anonymous Coward · · Score: 2, Informative

      The Primary Flight Computer software for the 777 was written in England by GEC. Indeed the hardware for the PFC was designed and built by GEC.

      I was on the software QA team for the PFC code. There were tens of us working three shifts 24 hours per day devising tests of the PFC against it's requirement spec. There were even more doing unit tests on all the Ada code.

      That is perhaps why you don't see Boeing advertising for QA engineers. They outsource the hardware and software.

    3. Re:If Boeing believed in software QA.... by fisted · · Score: 1

      I suspect now would be the best time to apply at Boeing :-)

    4. Re:If Boeing believed in software QA.... by Feral+Nerd · · Score: 1

      For all of the QA at Boing; they don't believe in software QA. Take a look at their job openings some time: In years of searching, I've seen only one software QA position, and it wasn't dealing with aircraft. Any such search results will return developers that are to write their own tests against the spec. Developers are not Testers.... and I'll ask: How many more such bugs are out there? I know of two other software "bugs" ... that can be attributed to a lack of QA. How many people will die due to a bad management decision on the part of Boeing? Disclosure: Yes, I'm a software QA / Test professional.

      The worst part is that when the Software bugs are finally discovered they are not fixed because it takes too much time and is too expensive to do (even though the physical update process is essentially no different to re-flashing/updating the firmware/software in a consumer grade digital device). I'd argue that you could cut the red security tape, reduce costs and install updates quicker if you massively increase the software QA work being done. Apparently Boeing disagrees, I dunno about Airbus, they might be just as bad but for some reason it's Boeing planes that seem to top the list over software related bloopers we get in our sector. Another good example is American Airlines who replaced 35 pounds of on board paper documentation with iPads only to have massive delays when the damn app they were using forced pilots to return to gate to get a wifi connection. I'm not sure about the wisdom of using garden variety consumer level tablets for this but the idea in it self is a good one, pilots are probably way quicker at looking up stuff up on a tablet of some description than rifling through 35 pounds of paper documents but you'd think issues like that could be fixed with a combination of proper software/hardware QA and adding whatever iOS/Android/Linux/Windows/WhateverOS tablet the pilots are using for their docs to the pre-flight checklist and having each aircraft carry two of devices. Perhaps the thing to do would be to create a quality rating/stamp for "aviation certified" hardened tablets? ... but knowing the aviation industry such devices would be updated once over their lifespan (at production time) because getting an update certified takes 8 months of wading through a quagmire of red tape, it would costs several hundreds of thousands of dollars to get an update vetted and it would costs of thousands of dollars to have it installed by a duly certified and highly trained aviation safety professional even though he'd essentially just be doing the same thing the rest of us do when we update our iPad, Galaxy Tab, etc....

    5. Re:If Boeing believed in software QA.... by Anonymous Coward · · Score: 2, Insightful

      Actually I took my work there testing the 777 software very seriously.

      On at least two occasions I escalated what I thought was a problem in the specification all the way back to Boeing. One of them turned out to be a "real-world" issue in the spec.

      I believe the rest of the team took the same attitude. We used to talk about that a lot.

      At the end of the day what you are asking for is impossible. The spec we worked to was a stack of paper 2 yards high when printed out. How many QA engineers know enough about flight dynamics to question if any of it is correct or not?
       

    6. Re:If Boeing believed in software QA.... by Joe_Dragon · · Score: 1

      the maps and other info get's updated quite a bit.

    7. Re:If Boeing believed in software QA.... by drinkypoo · · Score: 1

      Wasn't it Boeing QA that discovered this flaw in the first place?

      After the code was already released, and is already being used in the field. As opposed to before release, as part of a responsible code review.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    8. Re:If Boeing believed in software QA.... by NicBenjamin · · Score: 1

      Remember that time back in the 90s when a Marine Corps plane on maneuvers knocked down a Cable Car in the Italian Alps, killing 20 people? That was partly because when they planned the maneuver they used charts that were 6 months old, and the cable car line was less then 6 months old. If they'd had iPads, or any other electronic chart equipment that automatically updated itself, it wouldn't have happened.

      Civil aviation changes just as frequently. Which approach each airport wants you to use (to avoid colliding with the rest of the aircraft), which airspace you should fly in only in extreme circumstances (perhaps due to Russian-allied rebels indiscriminate use of BUK missile launchers), etc. all changes quite frequently, and you need the data for every airport the plane could possibly be sent to because there's no way in hell the airlines gonna buy a nine-figure aircraft and only use it at three airports.

      Thus the electronic equipment, which may not work due to computer issues some small percent of the time, but when it does work will always be updated properly. Whereas with paper you get 100% uptime, but are guaranteed less then 100% accuracy. After pilots get used to the equipment it also has fewer user-fuck-up issues. The Marines I mentioned, for example, actually had the right charts, in a sealed envelope, in the fucking cockpit, but they hadn't opened the damn thing because they figured it was a waste of time having nothing to do with them.

    9. Re:If Boeing believed in software QA.... by plopez · · Score: 1

      Testing is about checking compliance to a spec. QA is a *much* broader topic. E.g. reviewing the spec to ensure it was well writing and there are no gaps. Unfortunately many people do not understand the difference.

      --
      putting the 'B' in LGBTQ+
    10. Re:If Boeing believed in software QA.... by Required+Snark · · Score: 5, Informative
      You have no idea what you are talking about. All FAA certified aircraft software has to conform to the DO-178B / DO-178C standard. The standard imposes design, testing, process and documentation standards that are extremely demanding.

      QC isn't just a department or a step in the release process, it is built into the full life cycle of the software. Safety is the goal, and the requirement for good practice starts at the beginning of the process, with the requirement documents.

      For example, there are five levels of error severity defined from A to E. E has no impact on safety and A is catastrophic, where a crash could occur. The level of software test and validation depends on the severity level.

      The number of objectives to be satisfied (eventually with independence) is determined by the software level A-E. The phrase "with independence" refers to a separation of responsibilities where the objectivity of the verification and validation processes is ensured by virtue of their "independence" from the software development team. For objectives that must be satisfied with independence, the person verifying the item (such as a requirement or source code) may not be the person who authored the item and this separation must be clearly documented. In some cases, an automated tool may be equivalent to independence. However, the tool itself must then be qualified if it substitutes for human review.

      Your inability to find a "QC" position is because you don't know the structure of aerospace software development and have no idea of the job titles or terminology used to describe the standards used. You are projecting your lack of knowledge into a inconceivable lapse of competence on the part of Boeing and the FAA. In what universe would there be no software safety requirements for the civilian aircraft industry? All you have shown is that you are ignorant and have a basic lack of common sense.

      --
      Why is Snark Required?
    11. Re:If Boeing believed in software QA.... by Bomarc · · Score: 1

      You have no idea what you are talking about.

      Then you have never looked for a software tester / QA position at Boeing.

      For example you search Boeing jobs for QA on 5/2/2015 you will see 15 jobs -- none are software specific QA, two of them are software fields ... including Cloud Architect 4 and a Software Release Engineer

      If you search for test you will see 97 (Adjusted search for only IT); and a typical job posting (most of the "Software Engineer" postings) will have something like:
      Other duties may include:
      -- Develops software verification plans, test procedures and test environments, executing the test procedures and documenting test results to ensure software system requirements are met;


      They may "conform to the DO-178B / DO-178C standard" ... but my point is the person performing the test is NOT a software QA professional, rather is the developer of the software.

      Full disclosure: There currently are a few QA/test positions open -- including one that is a subsidiary of Boeing.

    12. Re:If Boeing believed in software QA.... by matfud · · Score: 1

      In safety critical systems software tends to be designed to shut down if anything unexpected is encountered. It follows from the concept of "do no harm".

      In some situations that is obviously not the best of ideas. There is nothing to say that the plane can not continue flying. Even if it requires shuting down the flight computers and deploying the RAT.
       

    13. Re:If Boeing believed in software QA.... by matfud · · Score: 1

      As per the previous slashdot post about the huge amount of paper needed on aircraft and how replacing it with an ipad caused problems.
      Very little of that huge stack of paper and how to handle the paper is related to pre/post flight check lists. The majority is exceptions If there is a problem then the pilots are expected to dig through that to find a remediation (if they do not already know)
      .

    14. Re:If Boeing believed in software QA.... by Malenx · · Score: 1

      I think your reading too far into that requirement. Every single developer job posting that I've read since college has included that line. We have to be able to test and verify our own code before we pass it up to QA for further verification. On top of that, Boeing might likely take a very heavy testing approach such as TDD for some of their software applications, in which knowing how to write automated tests is critical.

  10. History repeating itself? by Anonymous Coward · · Score: 1

    248 days kind of sounds similar to this 497 days issue. In fact, it could be the same issue if they are using a signed 32bit integer.

    https://www.ibm.com/developerw...

    1. Re:History repeating itself? by photonic · · Score: 1

      [Mod parent informative] Indeed, this seems exactly the same issue. For Boeing, it might either be a signed integer being incremented at 100 Hz, or an unsigned one at 200 Hz.

      --
      karma police: arrest this man, he talks in maths; he buzzes like a fridge, he's like a detuned radio. [radiohead]
  11. Re: Maybe they should have used Rust. by EmeraldBot · · Score: 1

    Using a language that hasn't even reached a stable release in an environment where the tiniest mistake kills hundreds of people?

    --
    "Set a man a fire, he'll be warm for the rest of the night. Set a man afire, he'll be warm for the rest of his life."
  12. Just turn if off by Murdoch5 · · Score: 1

    Don't they ever switch the planes off? If all you have to do is reboot the system once every 200 days, then just reboot it.

  13. Graceful degradation by thisisauniqueid · · Score: 2

    The plane's control systems should have several levels of degraded-mode operation, so if one system stops working, the plane still hobbles along the best it can without the non-working system. Google's self-driving cars have something like 7 layers of nested failure modes, each with slightly degraded functions relative to the next higher level. It's almost impossible to trigger enough failures to completely shut the system down, which is a good thing if you're traveling at highway speeds. It's very concerning that a company like Boeing didn't catch this before product release, but even more concerning that they didn't design the system to be resilient against this sort of failure.

    1. Re:Graceful degradation by photonic · · Score: 1

      Indeed, they would need some mechanism like this, which is implemented using several heterogeneous processes. Triple hardware redundancy is useless if they all have a common mode software bug. Same thing happened to the first flight of Ariane 5, where all 3 controllers crashed within milliseconds.

      --
      karma police: arrest this man, he talks in maths; he buzzes like a fridge, he's like a detuned radio. [radiohead]
    2. Re:Graceful degradation by Anonymous Coward · · Score: 1

      The 787 is resilient against this sort of failure. Avionics and some flight surfaces will function with DC battery backup and even if that were to fail the ram-air turbine automatically deploys when DC power fails.

    3. Re: Graceful degradation by Anonymous Coward · · Score: 1

      They do, in a sense. If the generators fail off, the plane switches to battery power until the Ram Air Turbine (RAT) automatically deploys, providing essential power until the generators and/or APU can be reset.

    4. Re:Graceful degradation by X0563511 · · Score: 1

      It is designed such that this would never be an issue. Why? Because you have to skip several critical maintenance periods to hit it. Imagine if you, somehow, kept your car engine running for two years. Ignoring the logistics of this, doing so means you cannot have changed your oil etc.

      Now, if it was on the order of 11 hours, that would be more of a concern.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
  14. Give a whole new meaning to ... by quax · · Score: 1

    ... "failsafe".

  15. It is probably a non-issue. by 140Mandak262Jamuna · · Score: 5, Funny

    The company is said to have found the problem during laboratory testing of the plane, and thankfully there are no reports of it being triggered on the field.

    The spokesman continued, "The battery would have caught fire long before that integer overflow."

    --
    sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
    1. Re:It is probably a non-issue. by DeBattell · · Score: 1

      Well that's reassuring.

    2. Re:It is probably a non-issue. by ChrisMaple · · Score: 1

      "Don't worry, we'll never hit the seven month software failure because the battery fails in 6 months"

      "We bought new batteries, they last 3 years now."

      "Are there any job openings in countries without extradition treaties?"

      --
      Contribute to civilization: ari.aynrand.org/donate
  16. Re:Maybe they should have used Rust. by Jeremi · · Score: 1

    What mechanism does Rust use to prevent 32-bit counter overflows?

    --


    I don't care if it's 90,000 hectares. That lake was not my doing.
  17. What idiot doesn't know what "failsafe"means? by johnnys · · Score: 1

    So first they say " left turned on for 248 days, it will enter a failsafe mode" then they say "all the Generator Control Units will shut off, leaving the plane without power, and the control of the plane will be lost."

    That is NOT fail "SAFE". That is fail "EVERYBODY DEAD".

    --
    Sometimes the "writing on the wall" is blood spatter...
    1. Re:What idiot doesn't know what "failsafe"means? by Anonymous Coward · · Score: 2, Insightful

      If you actually read the AD it will say "We are issuing this AD to prevent loss of all AC electrical power, which could result in loss of control of the airplane."

      COULD lose control, not WILL. The 787 has at least 3 additional backup systems against this sort of failure, the APU, DC battery backup, and Ram Air Turbine.

    2. Re:What idiot doesn't know what "failsafe"means? by Danielsen · · Score: 1

      Often a Fail Operational system consists of several fail safe systems in parallel.
      It is then important that the systems dont have common course faults.

      ”This AD was prompted by the determination that a Model 787 airplane that has been powered continuously for 248 days can lose all alternating current (AC) electrical power due to the generator control units (GCUs) simultaneously going into failsafe mode”

      So the problem is that the same software is running in both GCU's, and they have been powered up at the same time.

    3. Re:What idiot doesn't know what "failsafe"means? by NicBenjamin · · Score: 1

      Pilots lose control of planes all the time, just as drivers do. The question is a) how long are you out of control and how hard is it to get control back, and b) how likely is the scenario in the first place?

      In this case the answer seems to be a) not very long and not very hard as there are backup generators allowing you to reset the computers, and b) not bloody likely because you have to skip quite a few (as in dozens) routine maintenance cycles on these generators to get to 248 days without a restart.

      There's a reason the damn things have been flying for 3.5 years and nobody noticed the problem.

  18. Re:3 shifts? by Anonymous Coward · · Score: 3, Informative

    The reason for the three shifts was that we were using actual PFC computers connected to hardware that could simulate all the inputs and read all the outputs.

    That hardware was a big complicated rack of electronics and there were maybe 8 or 10 such units in a lab.

    As such, to optimize use of the facilities it was necessary to have three shifts 24 hours per day. This went on for a year or more.

    Very good planning in fact.

    Now I could tell you stories of the real corners cut to meet the schedule. But that's a complicated story.

     

  19. Re:queue the.. by jones_supa · · Score: 4, Informative

    As a sidenote, there exists a somewhat famous bug in Windows 95 and 98 (later patched) that caused these operating systems to stop functioning after 49.7 days of uptime.

  20. Halting Problem by ChromaticDragon · · Score: 1

    What a profound demonstration of the Halting Problem.

    1. Re:Halting Problem by Danielsen · · Score: 1

      What a profound demonstration of the Halting Problem.

      No...
      The halting problem is the problem of defining if a program would ever finish.
      In safety critical software you have to prove that the systems 'Worst Case Execution Time' is less that the safe process time. (Google WCET and AbsInt)
      In this case the program is not coming to a HALT, is is still running a loop time of 100Hz or whatever is generating the overflow.

      I have seen this on an industrial control system, where a faulty C++ timer class was used to monitor timeouts on a CAN bus. When the system had been online for a month, all nodes failed simultaneously with communication timeout due to an integer wraparound.

      Often timers are also used in conjunction with alarms, e.g. stop the engine if the lubrication pressure is lower than 2 bars for 2 secs.
      Or disconnect generator power if ground fault current is higher than 500 mA for 1sec.
      A fault in a timer software block would basically fire all alarms at the same time...

  21. Enough of this by confused+one · · Score: 5, Informative

    This story is being way overblown. Yes, it's a bug. Yes, it should be fixed. However...

    248 days of continuous operation is well past the scheduled major maintenance for the aircraft. By this point, a 787 would have to go through many minor maintenance cycles which would have required shutting down the electrical system. In addition, loss of all 4 generators would not result in a loss of vehicle because there are batteries, an APU (a backup generator) and Ram Air Turbines (RATs), generators that deploy from the wing if the APU won't start. To have to rely on any of these would not make for a good day for the pilots; but, they would certainly provide the necessary power to safely land the aircraft at the nearest airport. They might even be able to continue on and finish their flight if they successfully reset the generators.

    This is not the OMG Planes Are Going to Fall From The Sky! event the media is making it out to be.

    1. Re:Enough of this by PPH · · Score: 2

      This is not the OMG Planes Are Going to Fall From The Sky!

      No. This is a "What the f* were you goofballs thinking when you wrote this code? And if this is all the better you can do, what other gotchas are hiding in there?"

      --
      Have gnu, will travel.
    2. Re:Enough of this by NicBenjamin · · Score: 2

      Dude, this is a for-profit company, not a research university. It's not written by people whose entire job is to prove to the world they write the most robustest code ever designed with zero bugs. If it doesn't kill people or delay flights it doesn't cost them money and nobody, except computer geeks, gives a shit.

      In this case the Dreamliner's designed to have all the relevant systems turned off for routine maintenance once every two weeks. Which means if they go more then 248 days without being restarted the airline has skipped several dozen (25 or 26 according to another slashdotter) routine maintenance cycles, which is likely a much bigger problem then the pilot needing to a) restart the computers mid-flight, or b) needing to glide to an emergency landing.

      Given that there've been something on the order of 400 plane-years of actual flight performance, and nobody noticed this bug until now, the software design seems to be about right. Not perfect, but if the planes are even being given 10% of the maintenance the specs call for this bug is a non-iissue.

      OTOH, the problems with various batteries were dumb engineering. Altho those also seem to be solved.

    3. Re:Enough of this by ArylAkamov · · Score: 2

      and Ram Air Turbines (RATs), generators that deploy from the wing if the APU won't start.

      Holy shit. That is cool, though looking at the pictures I can't stop laughing at how comical it looks.

      http://en.wikipedia.org/wiki/R...

    4. Re:Enough of this by joe_frisch · · Score: 2

      Even though this bug isn't a direct threat, it could interact with other future software changes. If it is a counter overflow there is a risk that the counter would run at a higher rate in some future version where more functionality is needed. If 248 days went to 2.48 days, it might not be caught in testing, but could (rarely) happen in real life.

    5. Re:Enough of this by ChrisMaple · · Score: 1

      Never, never program for me. You're taking somebody's word that 248 / 14 > 24.

      --
      Contribute to civilization: ari.aynrand.org/donate
    6. Re:Enough of this by ray-auch · · Score: 2

      Bingo.

      If this was only spotted recently in "lab testing" (and why was it being tested now, and not before flight... what prompted the testing...) then it was known / not documented that overflow of this counter would cause shutdown. Some future revision could easily be to increase the precision, at the expense of range, or persist the counter across reboots, and that might not be considered a problem because the system was thought to handle the counter overflowing because no one documented that it didn't.

      That is why I think the AD is there - to ensure this issue is known when this software is messed with in future.

    7. Re:Enough of this by NicBenjamin · · Score: 1

      Ahh engineers. Such strong fans of extreme precision. I could've sworn I put an at least in there.

      Given the tenuous nature of the data (a guy on slashdot told me ain't gonna hold up in Court), and that I wasn't sure I'd remembered the numbers exactly right, but I did know they were high enough that the 248 thing would not come up in the real world, going with a high-end ballpark estimate seemed sensible. Worst-case scenario somebody does the math and say "Dude, you said this is fine because 14 days is less then 248, but your actual evidence is that it's 9.92 or 9.53 days, which rounds down to one week." That guys is you.

      Which changes the math, and if we were using this thread to actually design an aircraft it would change everything, but it does not actually affect the conclusion.

      BTW, I did remember the numbers wrong. It was 23-25, so I'm assuming he thinks they're on an overhaul schedule that is almost exactly every 10 days:

      If it ever happened on a plane, then it means that the maintenance was intentionally skipped. If they reach 248 days of continuous operation then a number of significant maintenance cycles have been skipped (some 23-25 inspection / maintenance cycles that generally require shutting down the electrical system). The generators in question are attached to the engines. The engines have a overhaul schedule that is shorter than 248 days of continuous operation. If they managed to reach this point, then the major maintenance cycles have been skipped and the engines are long overdue for a tear down inspection and overhaul. Any plane which could reach this point, 248 days of continuous operation missing all of the required maintenance; this is not a plane (or an airline for that matter) which anyone should be flying on.

      And no, I'm neither a programmer nor an engineer, so I won;t be doing any of that kind of work for you.

    8. Re:Enough of this by tlhIngan · · Score: 1

      No. This is a "What the f* were you goofballs thinking when you wrote this code? And if this is all the better you can do, what other gotchas are hiding in there?"

      Well, most of the case would be that they didn't realize it might be an issue.

      Early Linux suffered from this issue a lot - device drivers could not be counted on to survive if jiffies overflowed. Modern day Linux implements a bunch of utilities to compare jiffies with an elapsed time (that handles overflows), as well as starting the jiffies counter 3 minutes before overflowing so it overflows early and bugs are detected.

      Of course, in this case, it was discovered in a lab setting - not only is it unlikely to happen in the real world (no, making a change to cause the roll over early will not happen as it turns working code into an untested state), but it also relied on someone pretty much leaving the equipment on the whole period then noticing it died.

      I don't know about you, but finding out the reason why something died 250 days later is difficult and probably only was discovered accidentally because someone left it set up at their desk the whole time.and forgot about it.

      Hell, it's probably a given the bug exists in plenty of other things as well, just they're normally cycled long before it's a problem and no one actually ran it long enough to test.

    9. Re:Enough of this by PPH · · Score: 1

      There's the principle of lessons learned. And not rewriting everything from scratch. This problem has been addressed and solved in numerous RTOSs and libraries. And even if these could not be used, simple things like overflows, underflows and other sorts of out of range variables are supposed to be caught by the sorts of rigorous analysis avionics s/w is supposed to be subject to. That this was caught in a lab test (and this far after the system went into service) is problematic as well. The complexity of most software (particularly real-time apps) rules out being able to cover all combinations of use cases by overall system tests.

      --
      Have gnu, will travel.
    10. Re:Enough of this by shutdown+-p+now · · Score: 1

      You don't need to be writing software for airplanes to understand the notion of an overflowing counter, and why you'd want to use a 64-bit int for it just in case.

  22. I've been reading The Strain Trilogy by Hohlraum · · Score: 1

    I kind of did a double take when I saw the title. The book/series starts out with a brand new Boeing 777 losing power on the run way. :)

  23. Re:3 shifts? by tompaulco · · Score: 1

    I have seen it happen all too often, an unrealistic development schedule is made to get the contract and as shit rolls down the schedule, QA takes the brunt of any deadlines problems. It's one thing if your developing software for insurance companies, it's another when it's aerospace.

    Until recently I was the software QA Director for a software company. I completely agree with your assessment. My company used to pad in about a week for QA. I kept telling them, that it might take us a week to QA, but if we find any issues, then it will have to go back to development, and I can't speak for how long that would take. They really didn't like how i couldn't give them a solid date, but how could I speak for how long it would take another department to fix something?
    At any rate, all of that was moot as development literally never got the project to me until the actual date it was due to the customer, and it would be broken and not meet the specs. I did what I could to try to get development on track so we could deliver a quality product to our customer, but in the end, my company grew tired of my efforts and fired the whole QA team, so now the product just goes straight to the customer, bugs and all. And late.

    --
    If you are not allowed to question your government then the government has answered your question.
  24. Failsafe mode? by elgatozorbas · · Score: 1

    Sounds very safe.

    1. Re:Failsafe mode? by PPH · · Score: 1

      Don't worry. The 787 can always fail over to battery power ...... Umm, oh, oh.

      --
      Have gnu, will travel.
  25. and when it boots, by roc97007 · · Score: 1

    ...does it display the Windows 95 splash screen?

    --
    Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
  26. Damn kids! by Lawrence_Bird · · Score: 1

    How many times do I have to tell you to shut the plane the f off before you go to bed? What do you think, I'm made of money?!?

  27. That is what the FAA is proposing in their AD by Anonymous Coward · · Score: 1

    That is the interim solution proposed by the FAA!

  28. Re:Maybe they should have used Rust. by drinkypoo · · Score: 1

    Maybe they should have used Rust.

    They can't use rust, because they build with a minimum of Ferrous materials. They have to wait for the fork, AlOx.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  29. Re:queue the.. by plopez · · Score: 1

    Yeah, I don't have my 'back of the envelop' calculations in front of me but I think I worked that out to be a 'timedGetTime' rollover bug. I wonder if the same thing is happening in this case, i.e. a timer rollover bug.

    --
    putting the 'B' in LGBTQ+
  30. failsafe by unjedai · · Score: 1

    failsafe: I don't think that means what you think it means.

  31. Re:Keeps Living Up To It's Name by tompaulco · · Score: 1

    The Boeing Screamliner -- the proud product of innovative Project Management in a Globalized Economy

    I thought the name was the Dreamliner? Yup. It is, according to the Boeing website. No mention of it being called the screamliner. But you are right about it living up to it's name. No accidents or injuries have occurred on a 787 in all of it's years of service.

    --
    If you are not allowed to question your government then the government has answered your question.
  32. Please reinstall by www.sorehands.com · · Score: 1

    Please format the drive and res-install windows.

  33. Long uptime... by Guy+From+V · · Score: 1

    ...some witty remark about airplanes and downtime.

  34. Real situation? by JBMcB · · Score: 1

    Would this ever happen in normal operation? I would think that every few hundred hours of flight time the plane would be pulled out of service for maintenance where everything would be shut down for a couple of days.

    --
    My Other Computer Is A Data General Nova III.
    1. Re:Real situation? by Z00L00K · · Score: 1

      I agree here - the maintenance is probably interrupting the uptime of the system. Any airline that have an uptime of their aircraft for 248 days is likely to suffer other problems as well with their vessels, not only software glitches but also general wear issues.

      --
      If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
  35. Re:queue the.. by fuzzyfuzzyfungus · · Score: 1

    Cue the "If they'd chosen Windows, it would be impossible for this bug to occur" jokes...

    Those have mostly been unfair since the NT-derived era; but, in the spirit of the joke, there was a bug in win95 and 98 that would cause the system to crash after 49.7 days of uptime. It remained undiscovered for years.

  36. Re:queue the.. by fuzzyfuzzyfungus · · Score: 1

    I certainly have no useful information to add to the speculation about cause; but that is what would worry me. Having to reboot a system every 284 days or less is a nuisance; but not a terribly big one(especially since the system is connected to a giant mass of moving parts governed by comparatively strict regulations concerning maintenance, so it probably gets taken to the shop fairly frequently anyway). However, if there is some value incrementing its way up that eventually causes the system to crash; I'd want to be very sure that there is absolutely nothing else that might modify that value in a way that causes it to grow faster than expected.

  37. Re:queue the.. by dunkelfalke · · Score: 5, Informative

    Only theoretical, though. Windows 9x would crash long before reaching this uptime.

    --
    "It's such a fine line between stupid and clever" -- David St. Hubbins, Spinal Tap
  38. Re:Maybe they should have used Rust. by ChrisMaple · · Score: 1

    Any language with polymorphism should do, although polycarbonate is better than polystyrene.

    --
    Contribute to civilization: ari.aynrand.org/donate
  39. They should run OpenVMS by thedavidcathey · · Score: 2

    OpenVMS systems have had many systems up for several years without rebooting. Their equivalent of the "ps" utility had to fixed one time because systems were exceeding 9999 days uptime.

  40. Re:queue the.. by HiThere · · Score: 1

    Actually, MSWindNT wasn't that stable, but I've heard that the recent releases actually are pretty stable. I'll never be able to test though, since I won't agree to the EULA.

    Also, I never experienced any real problems even with an unmodified MSWind95. The problems started when you installed additional software or hardware. (Yes, the 49.7 days bug existed, but it doesn't exist in the final version of MSWind95. I've got a machine that's running that, and has been up for years. It doesn't get much use, but there are a couple of abandoned programs that I can't export data from.)

    --

    I think we've pushed this "anyone can grow up to be president" thing too far.
  41. Re:queue the.. by roc97007 · · Score: 2

    "(psshsquawk)This is the Captain speaking, we are cruising at 30,000 feet, have a bit of a tail wind and will be in San Francisco a little ahead of schedule. ...Ummm... Ah.... I'm putting the seatbelt sign on now. Please return to your seats as we reboot the airplane.(pssshsquawk)"

    --
    Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
  42. Re:queue the.. by roc97007 · · Score: 1

    Only theoretical, though. Windows 9x would crash long before reaching this uptime.

    Well, in fairness, only if you tried to do something with it.

    --
    Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
  43. Re:queue the.. by roc97007 · · Score: 1

    You're right. In a similar vein, I ride a Harley, and conversations always lead back to "it's not leaking oil, it's marking its territory!" Har. Har. Yes Harleys used to leak oil. They were famous for it at one time. But they don't now, anymore than any motor vehicle does.

    Similarly, in all the years I've been using Windows 7, I've yet to have a hang or bluescreen, and I don't reboot my machine unless absolutely necessary. But people still make jokes about the Windows 49.7 day issue. Just goes to show, it takes a LONG time to live down a tremendous goof.

    --
    Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
  44. Re:Keeps Living Up To It's Name by ChrisMaple · · Score: 1

    No accidents or injuries have occurred on a 787 in all of it's [sic] years of service.

    All three of them.

    --
    Contribute to civilization: ari.aynrand.org/donate
  45. Re:queue the.. by catchblue22 · · Score: 1

    I remember supporting an office with win95 and Access. I had tech support conversations that almost went like this:

    Him: My computer just crashed.

    Me: So what did you do then?

    Him: I rebooted it.

    Me: Well there's your problem. Reboot the computer again. Then tap the computer gently and pray to the god of your choice and reboot a third time...

    Him: ...Thanks. That worked.

    --
    This and no other is the root from which a tyrant springs; when first he appears as a protector - Plato (423 to 327 BC)
  46. Re:Keeps Living Up To It's Name by Ethanol · · Score: 2

    All three of them.

    Hey, 248 days is five dog-years.

  47. Re:queue the.. by Aereus · · Score: 1

    This is where I then call Bullshit on anyone actually getting Windows 95 or 98 to run for 49 days. My average uptime before bluescreen was around 2 days...

  48. Re:Maybe they should have used Rust. by TheRealHocusLocus · · Score: 2

    This is a prime example of why we need to use the Rust programming language ... blazingly ... eliminates data races ... guaranteed memory ... threads ... greatest minds ... the great ... the superb ... the glorious ... the mightiest ... Git ... Hub ... ... properly ... where it's at ... what we need ... It's what [the world] need[s] now.

    Oh yeah? Sheeeit.
    Pump it up! (endorsed by M.I.A.).

    Ericsson Calling!
    Speak the Erlang now (Seattle boys say Wha? Penguin Girls say Wha-What [x2]

    Use Erlang Erlang Erlang, Ga la ga la ga la Land ga Lang ga Lang
    Con-currency get you down?
    Stack em flat, get down get down
    Too late you down D-down D-down D-down
    Ta na ta na ta na Ta na ta na ta

    Bench mark a-blaze Erlang a lang a lang lang
    Eager evaluation Erlang a lang a lang lang
    Single assignment Erlang a lang a lang lang
    Dynamic typing Erlang a lang a lang lang

    Who the hell is huntin' you?
    Distributed, fault-tolerant,
    In the BMW
    How the hell they find you?
    hot swapping,
    Feds gonna get you
    non-stop applications
    Pull the strings on the hood
    soft-real-time
    concurrency explicit
    message passing, Erlang a lang a lang lang
    Nah explicit locks Erlang a lang a lang lang
    open source Erlang a lang a lang lang.

    CHORUS:
    fib(1) -> 1; % If 1, then return 1, otherwise (note the semicolon ; meaning 'else')
    fib(2) -> 1; % If 2, then return 1, otherwise
    fib(N) -> fib(N - 2) + fib(N - 1).

    Needs some work though.
    An AIRPLANE would make a good sandbox. The price of failure is so high no one will make a mistake.

    --
    <blink>down the rabbit hole</blink>
  49. Can the clock be changed? by RubberDogBone · · Score: 1

    Where I work, we currently tell one of our PCs that it is February because a software license expired on March 1 and nobody will pay to renew it while we work on getting a replacement up to speed. Meanwhile the old expired version runs fine thinking its February.

    So what would happen if somebody told the plane today's date was 248 days forward of today? Or for fun, five minutes less than that. While it was in flight.

    I'm assuming there are safeguards to prevent this but what if nobody ever considered that there could be a need to prevent changing the plane's clock? What if this was left exposed? Somebody from Boeing please tell me this clock was well protected and there is no way a virus could get into the plane, look for parameters like "wheels up" "seat belt sign off" and execute a clock change. It would be a magnificent disaster where not even the data recorders would capture what happened, if all power is cut off and all systems drop dead.

    --
    Sig for hire.
  50. Failsafe? by PhunkySchtuff · · Score: 1

    ... If the plane is left turned on for 248 days, it will enter a failsafe mode...

    You keep using that word. I do not think it means what you think it means.
    http://en.wikipedia.org/wiki/F...

    1. Re:Failsafe? by PPH · · Score: 1

      failsafe mode

      Well, it is for a single generator. The power source is removed from the system so that not subsequent failures can damage the aircraft. Problem is: This applies to a single generator, not the entire aircraft. Aircraft power systems are designed so that an alternate source can take over for the failed one. But if they all go off line together, not so safe.

      --
      Have gnu, will travel.
  51. Re:queue the.. by Anonymous Coward · · Score: 1

    It's still an issue is some modern OS's

    All the TCP/IP ports that are in a TIME_WAIT status are not closed after 497 days from system startup in Windows Vista, in Windows 7, in Windows Server 2008 and in Windows Server 2008 R2

    https://support.microsoft.com/en-us/kb/2553549

  52. Re:queue the.. by angel'o'sphere · · Score: 1

    I had a win 95 and later a win 98 system, both where very stable, mainly used for development and only for occasional gaming, like Decent or Settlers or War Craft (not, WoW, the RTS).

    The win 98 one only crashed when it was time to replace the processor fan.

    I was impressed at that time about MS ...

    --
    Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
  53. This was fixed in the 2.0 Linux kernel! by verifine · · Score: 1

    Do the math. If you have a 32-bit unsigned binary counter and you increment it 200 times a second, guess what - it will overflow in 248 days. Coincidence? I think not!

    I had one of those early Linux kernels running on a machine I mostly used as a server. I did run Netscape on it, displaying the web content on another Linux machine. Both machines ran on UPSes, being located in the third world (Los Angeles.) I was excited about seeing 500 days uptime, but one morning the uptime measured in hours. What? Netscape was still running, so I knew the machine hadn't rebooted. Linux then (and probably now) ran with 100 interrupts/second for task switching, NTP and other goodness. A good friend explained that I had what was probably a very rare item - the uptime counter had overflowed.

    I'm tellin' ya, it's a simple counter overflow. WTF uses 32-bit counters for uptime any more? Answer: Boeing.

  54. Re:Maybe they should have used Rust. by stooo · · Score: 1

    >> What mechanism does Rust use to prevent 32-bit counter overflows?
    With "Rusted", the computer falls appart before reaching the integer overflow, so the overflow cannot happen.

    --
    aaaaaaa
  55. Re:queue the.. by Puppet+Master · · Score: 1

    Only theoretical, though. Windows 9x would crash long before reaching this uptime.

    Well, in fairness, only if you tried to do something with it.

    Like.. Fly a plane?

    --
    The day Microsoft creates a product that doesn't suck, it will be known as the Microsoft Vaccuum Cleaner!
  56. Re:queue the.. by toddestan · · Score: 1

    I gotten a Windows 95 machine up to the 49.7 day limit. The key was the machine was hooked up to some special scanner that didn't get used that much, so the computer spent 99% of its time idling at the desktop. Once I realized it was getting close I figured out when exactly it was going to hit the limit so I could witness what would happen. Which turned out to be nothing, until I clicked the mouse and it BSOD'd.

    I've also managed to get a Vista machine up to the 497 day limit. In that case the computer was still running okay other than the networking being hosed.

  57. Re: queue the.. by Talderas · · Score: 1

    "Release the landing gear Hal."

    "I can't do that Steven."

    --
    "Lack of speed can be overcome. In the worst case by patience." --Znork
  58. If by cwsumner · · Score: 1

    "If Engineers built buildings the way Programmers write programs, the first woodpecker that came along would destroy civilization!"

    Seriously. Check for errors and do something reasonable about them. Calling the GPF vector is not reasonable! 8-(

  59. This reminds me my last workplace... by lagi · · Score: 1

    We had an issue with the Apache server crashing after X amount of time.
    The solution (by lead developer)? A cronjob that restarts the server every X hours.

  60. WTF? by wcrowe · · Score: 1

    What I'm hearing here is not a story about a potential software bug. I'm hearing about a serious design problem. An airplane should not be so reliant on software that it shut down if the software is not working.

    I was in Naval aviation. The 1960's - era A-7's I worked on for most of my career had redundant systems. There was even an air-stream-driven generator that could be deployed in the event of engine failure that would not only supply electrical power, but provide a minimum amount of hydraulic power to critical systems so that the plane had a chance of safely landing.

    I can't believe we're designing aircraft that can carry hundreds of people that lacks redundant systems and can literally fall out of the sky due to a simple software glitch. Have I read this wrong? Are they exaggerating the danger in this article?

    --
    Proverbs 21:19
  61. Re:queue the.. by sbaker · · Score: 1

    The real question is how long the reboot time is relative to the glide duration from 30,000 feet?

    --
    www.sjbaker.org