Slashdot Mirror


Long Uptime Makes Boeing 787 Lose Electrical Power

jones_supa writes: A dangerous software glitch has been found in the Boeing 787 Dreamliner. If the plane is left turned on for 248 days, it will enter a failsafe mode that will lead to the plane losing all of its power, according to a new directive from the US Federal Aviation Administration. If the bug is triggered, all the Generator Control Units will shut off, leaving the plane without power, and the control of the plane will be lost. Boeing is working on a software upgrade that will address the problems, the FAA says. The company is said to have found the problem during laboratory testing of the plane, and thankfully there are no reports of it being triggered on the field.

250 comments

  1. queue the.. by Anonymous Coward · · Score: 0

    'it must be running windows' jokes..

    like...

    'it must be running windows.... owait, windows doesn't stay up that long'

    1. Re:queue the.. by Anonymous Coward · · Score: 0

      *Cue

    2. Re:queue the.. by jones_supa · · Score: 4, Informative

      As a sidenote, there exists a somewhat famous bug in Windows 95 and 98 (later patched) that caused these operating systems to stop functioning after 49.7 days of uptime.

    3. Re:queue the.. by Anonymous Coward · · Score: 0

      Both work. There's gonna be so many of them, they'll have to line up and wait their turn.

    4. Re:queue the.. by Anonymous Coward · · Score: 0

      What blows my mind is that somehow that French vowel soup is always properly spelled...

    5. Re:queue the.. by plopez · · Score: 1

      Yeah, I don't have my 'back of the envelop' calculations in front of me but I think I worked that out to be a 'timedGetTime' rollover bug. I wonder if the same thing is happening in this case, i.e. a timer rollover bug.

      --
      putting the 'B' in LGBTQ+
    6. Re:queue the.. by fuzzyfuzzyfungus · · Score: 1

      Cue the "If they'd chosen Windows, it would be impossible for this bug to occur" jokes...

      Those have mostly been unfair since the NT-derived era; but, in the spirit of the joke, there was a bug in win95 and 98 that would cause the system to crash after 49.7 days of uptime. It remained undiscovered for years.

    7. Re:queue the.. by fuzzyfuzzyfungus · · Score: 1

      I certainly have no useful information to add to the speculation about cause; but that is what would worry me. Having to reboot a system every 284 days or less is a nuisance; but not a terribly big one(especially since the system is connected to a giant mass of moving parts governed by comparatively strict regulations concerning maintenance, so it probably gets taken to the shop fairly frequently anyway). However, if there is some value incrementing its way up that eventually causes the system to crash; I'd want to be very sure that there is absolutely nothing else that might modify that value in a way that causes it to grow faster than expected.

    8. Re:queue the.. by dunkelfalke · · Score: 5, Informative

      Only theoretical, though. Windows 9x would crash long before reaching this uptime.

      --
      "It's such a fine line between stupid and clever" -- David St. Hubbins, Spinal Tap
    9. Re:queue the.. by HiThere · · Score: 1

      Actually, MSWindNT wasn't that stable, but I've heard that the recent releases actually are pretty stable. I'll never be able to test though, since I won't agree to the EULA.

      Also, I never experienced any real problems even with an unmodified MSWind95. The problems started when you installed additional software or hardware. (Yes, the 49.7 days bug existed, but it doesn't exist in the final version of MSWind95. I've got a machine that's running that, and has been up for years. It doesn't get much use, but there are a couple of abandoned programs that I can't export data from.)

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    10. Re:queue the.. by roc97007 · · Score: 2

      "(psshsquawk)This is the Captain speaking, we are cruising at 30,000 feet, have a bit of a tail wind and will be in San Francisco a little ahead of schedule. ...Ummm... Ah.... I'm putting the seatbelt sign on now. Please return to your seats as we reboot the airplane.(pssshsquawk)"

      --
      Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
    11. Re:queue the.. by roc97007 · · Score: 1

      Only theoretical, though. Windows 9x would crash long before reaching this uptime.

      Well, in fairness, only if you tried to do something with it.

      --
      Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
    12. Re:queue the.. by roc97007 · · Score: 1

      You're right. In a similar vein, I ride a Harley, and conversations always lead back to "it's not leaking oil, it's marking its territory!" Har. Har. Yes Harleys used to leak oil. They were famous for it at one time. But they don't now, anymore than any motor vehicle does.

      Similarly, in all the years I've been using Windows 7, I've yet to have a hang or bluescreen, and I don't reboot my machine unless absolutely necessary. But people still make jokes about the Windows 49.7 day issue. Just goes to show, it takes a LONG time to live down a tremendous goof.

      --
      Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
    13. Re:queue the.. by catchblue22 · · Score: 1

      I remember supporting an office with win95 and Access. I had tech support conversations that almost went like this:

      Him: My computer just crashed.

      Me: So what did you do then?

      Him: I rebooted it.

      Me: Well there's your problem. Reboot the computer again. Then tap the computer gently and pray to the god of your choice and reboot a third time...

      Him: ...Thanks. That worked.

      --
      This and no other is the root from which a tyrant springs; when first he appears as a protector - Plato (423 to 327 BC)
    14. Re:queue the.. by Aereus · · Score: 1

      This is where I then call Bullshit on anyone actually getting Windows 95 or 98 to run for 49 days. My average uptime before bluescreen was around 2 days...

    15. Re:queue the.. by Anonymous Coward · · Score: 1

      It's still an issue is some modern OS's

      All the TCP/IP ports that are in a TIME_WAIT status are not closed after 497 days from system startup in Windows Vista, in Windows 7, in Windows Server 2008 and in Windows Server 2008 R2

      https://support.microsoft.com/en-us/kb/2553549

    16. Re:queue the.. by angel'o'sphere · · Score: 1

      I had a win 95 and later a win 98 system, both where very stable, mainly used for development and only for occasional gaming, like Decent or Settlers or War Craft (not, WoW, the RTS).

      The win 98 one only crashed when it was time to replace the processor fan.

      I was impressed at that time about MS ...

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    17. Re: queue the.. by Anonymous Coward · · Score: 0

      This a fully automated airplane. The cockpit is unmanned. There is nothing to worry about... Worry about... Worry about...

    18. Re:queue the.. by Puppet+Master · · Score: 1

      Only theoretical, though. Windows 9x would crash long before reaching this uptime.

      Well, in fairness, only if you tried to do something with it.

      Like.. Fly a plane?

      --
      The day Microsoft creates a product that doesn't suck, it will be known as the Microsoft Vaccuum Cleaner!
    19. Re:queue the.. by Anonymous Coward · · Score: 0

      I can't remember now whether it was IIS 3 or IIS 4 but one of those two versions would stop incrementing the time in their W3C logs after 40 days of uptime. Until Microsoft acknowledged the issue and released a hot fix we had "at" tasks restart IIS on hundreds of servers on the 1st of each month. You may laugh but "long uptime" issues are disturbingly common, even today.

    20. Re:queue the.. by toddestan · · Score: 1

      I gotten a Windows 95 machine up to the 49.7 day limit. The key was the machine was hooked up to some special scanner that didn't get used that much, so the computer spent 99% of its time idling at the desktop. Once I realized it was getting close I figured out when exactly it was going to hit the limit so I could witness what would happen. Which turned out to be nothing, until I clicked the mouse and it BSOD'd.

      I've also managed to get a Vista machine up to the 497 day limit. In that case the computer was still running okay other than the networking being hosed.

    21. Re: queue the.. by Talderas · · Score: 1

      "Release the landing gear Hal."

      "I can't do that Steven."

      --
      "Lack of speed can be overcome. In the worst case by patience." --Znork
    22. Re:queue the.. by Anonymous Coward · · Score: 0

      Having not learned from previous experience good ol Messysoft still had rollover issues in IIS 7.0:
      FIX: The IIS 7.0 performance counters stop updating after a Windows Vista system or a Windows Server 2008 system runs continuously for 30 to 45 days
      https://support.microsoft.com/en-us/kb/957448

    23. Re:queue the.. by sbaker · · Score: 1

      The real question is how long the reboot time is relative to the glide duration from 30,000 feet?

      --
      www.sjbaker.org
  2. Oh come on. by fisted · · Score: 1

    You should see what happens after -2147483648 days of upt-- oh wait.

    1. Re:Oh come on. by IndigoZulu · · Score: 5, Interesting

      It could be the overflow of a counter of 10ms intervals. There are 86400 seconds per day, so 8640000 10ms intervals per day ... 2147483648 / 8640000 = 248.55

    2. Re:Oh come on. by Anonymous Coward · · Score: 0

      Notice
      I will never fly in a plane that doesn't FLY-BY-WIRE.

    3. Re:Oh come on. by Anonymous Coward · · Score: 0

      Yeah, if I'm remembering right, 248 days is roughly the rollover point for a 32-bit signed integer counter measured in 100ths of a second, although it doesn't seem likely a plane would be left running for that long without the power being cycled during maintenance at some point.

    4. Re:Oh come on. by Mirar · · Score: 2

      Oh, so they can make it fine for 497.10 days by changing the type to unsigned!

    5. Re:Oh come on. by SJHillman · · Score: 3, Informative

      Which is apparently what Windows does:

      https://www.ctm-it.com/it-supp...

      You'd think they would have learned since Windows 95/98 did the same thing.

      https://support.microsoft.com/...

      But hey, at least it goes 10 times as long now.

    6. Re:Oh come on. by Anonymous Coward · · Score: 1

      The least risk change might be to have the system refuse to consider a takeoff when your above half of that limit and then require a power cycle at that point, while on the ground. If you change to say I64, then everything that interacts with that time must be updated and validated. That may of course be the long term solution, though it would seem slightly riskier than just forcing the repower while the plane is on the ground. There could be other valid reasons to repower it as well and again that may be done in a maintenance check anyway.

    7. Re:Oh come on. by jones_supa · · Score: 2

      I am not completely familiar with the matter, but I remember hearing that using signed types in some situations can be a better choice, even when the value would normally be used to represent only a non-negative value. It could make overflows more obvious and calculating deltas might be easier? If someone actually knows about this stuff, feel free to chime in.

    8. Re:Oh come on. by Anonymous Coward · · Score: 1

      Still pretty scary that a simple counter like that can cause a chain of events that chucks off the power completely. How can this be possible?

    9. Re:Oh come on. by fisted · · Score: 3, Informative

      In C, overflowing a signed integer type is undefined behaviour; unsigned type wrap around to zero in a defined manner.
      Of course, either is often undesired, but the latter at least doesn't allow basically anything to happen.

    10. Re:Oh come on. by Anonymous Coward · · Score: 0

      Still pretty scary that a simple counter like that can cause a chain of events that chucks off the power completely. How can this be possible?

      Because software engineering is taken as a joke.
      The tools are shit, the methodologies are shit. Most programmers are sourced in countries that are... (well the less said the better). Why are companies allowed to thrust alfa/beta software unto the market and clean their hands off any problems (ie. we absolve ourselves of all responsability of this software if used in your systems in all software licenses).
      The entire software industry starting by Microsoft and Google shoudl be tarred and feathered and burned at the stake. Then educate a new generation of software developers that know whet athe fuck they're doing. And implement civil and criminal liabilites for buggy software that puts lives at risk.

    11. Re:Oh come on. by Anonymous Coward · · Score: 1

      Which makes you wonder whether there's a maintenance check that would force a power cycle before you get to the errant condition.So it's the functional equivalent of dividing by zero shuts the power off when there's a denominator check the line above.

    12. Re:Oh come on. by plopez · · Score: 2

      It doesn't matter what country programmers come from, in my experience too many programmers have no clue about reality outside of their cube. They are building software for things they do not understand. I am going to rant about this in another thread so I will leave it at that for now.

      --
      putting the 'B' in LGBTQ+
    13. Re: Oh come on. by Anonymous Coward · · Score: 0

      I'm sorry -- what?

    14. Re:Oh come on. by fuzzyfuzzyfungus · · Score: 2

      Man, if only we could afford to use 64 bit values for things. I realize that transistors are simply too expensive right now; but perhaps, in the future, the miracles of science will make this possible...

    15. Re:Oh come on. by dunkelfalke · · Score: 4, Funny

      And this is why C should never be used for mission critical software.

      --
      "It's such a fine line between stupid and clever" -- David St. Hubbins, Spinal Tap
    16. Re:Oh come on. by terrab0t · · Score: 1

      That was my guess. As soon as I read the problem I thought of the bug in the patriot missle software that the US ran into during the first Gulf war back in 1991.

      In that case it was even worse. From the page I linked:

      They told the Army that the Patriots suffered a 20% targeting inaccuracy after continuous operation for 8 hours.

    17. Re:Oh come on. by catchblue22 · · Score: 1

      Still pretty scary that a simple counter like that can cause a chain of events that chucks off the power completely. How can this be possible?

      Yeah. Imagine if it happened on final approach.

      --
      This and no other is the root from which a tyrant springs; when first he appears as a protector - Plato (423 to 327 BC)
    18. Re: Oh come on. by RightwingNutjob · · Score: 1

      Not like Ada, where you can fuck up much easier by choosing the wrong floating point type for your altitude indicator.

    19. Re:Oh come on. by Anonymous Coward · · Score: 0

      And this is why C should never be used for mission critical software.

      So what should be used instead? A "safe" language that throws an exception or aborts instead? You're still stuck with a thread/process/whatever doing something unexpected that might not be recoverable. At best you'll get a nice error screen for a few milliseconds before the power goes out anyway. The real problem here is the unwarranted dependency of the power-generation code on the absolute time. "Safe" languages can't help with poor algorithm choices. Maybe they did it to keep the frequency right. But in cases like that, it is _always_ coded up in terms of delta-times, not absolutes, for exactly this reason. Someone wasn't thinking, and neither was several others when the code-review came around.

    20. Re:Oh come on. by Anonymous Coward · · Score: 0

      We found the butthurt C programmer. :)

    21. Re:Oh come on. by Anonymous Coward · · Score: 0

      I don't think this has much to do with C per se. It sounds more like a simple logic error of suddenly having negative uptimes where the software doesn't expect them. That would also happen with any language where signed overflow is defined to wrap around. It would also happen with any language where such an overflow triggers an exception (since the programmers obviously didn't expect it to happen, it's highly unlikely they would handle it correctly).

      The amazing thing is that they didn't (properly) use some static analysis tool to diagnose and fix this. Finding possible arithmetic overflows is really not airplane science. Oh well, more business for Airbus, I guess. They are known to do proper program analysis.

    22. Re:Oh come on. by delt0r · · Score: 1

      In both C and C++ just about everything in the spec has the words "undefined behaviour".

      --
      If information wants to be free, why does my internet connection cost so much?
    23. Re:Oh come on. by stooo · · Score: 1

      >> ... How can this be possible?

      It's called common mode failure. You have multiple identical redundant computers running the same software. All of them have the same bugs. Boom.
      an example here : http://www.around.com/ariane.h...

      --
      aaaaaaa
    24. Re:Oh come on. by Anonymous Coward · · Score: 0

      A similar problem happend on SCO Open Server that brought the license manager daemon into an infinite loop after 248 days of uptime.
      http://www.linuxmisc.com/20-sco-unix/4e68e6bb799bcdc8.htm

    25. Re:Oh come on. by CauseBy · · Score: 1

      I didn't know that. Is that considered a feature or a bug? Why not just define it to wrap around to the min int value?

    26. Re:Oh come on. by Anonymous Coward · · Score: 0

      And in this application resulting in, wait for it... wait for it... The Blue Screen of Death! Badabing! Thank you very much I'll be here all week. Enjoy the veal!

    27. Re:Oh come on. by fisted · · Score: 1

      I'm not sure if one could think of it in terms of 'feature' or 'bug', but if anything, i'd go for feature. There are at least 3 major ways to represent negative numbers (1s complement, 2s complement, sign-magnitude), and overflowing the representation of INT_MAX doesn't necessarily give anything close to INT_MIN (e.g. sign-magnitude would roll over to negative zero).

      So in order to be as widely adaptable as possible, C can't assume a particular way of how negative numbers are represented, so defining the consequences of overflowing is intentionally omitted

    28. Re:Oh come on. by fisted · · Score: 1

      That being said, I think it wouldn't hurt making it implementation-defined behaviour instead (which isn't much better from a portability point of view, but at least requires implementations to document their choice)

    29. Re:Oh come on. by shutdown+-p+now · · Score: 1

      The main reason why people recommend it is because of what happens if you mix signed and unsigned. If they are of the same size (e.g. signed int and unsigned int), then according to the spec, the result will be unsigned. So you divide, say, -2 by 1u, and get something very unexpected. If you always use the same signedness, then you can dodge this problem, and in general you do want to represent negative numbers every now and then, hence the default is signed.

      In practice it doesn't work so well simply because so much of the language and the standard library uses unsigned anyway. For example, sizeof is unsigned, and so is strlen(), and in C++, size() on all the standard container types, including string. So if you want to write C or C++, you have to deal with signed/unsigned mismatch anyway.

    30. Re:Oh come on. by david_thornley · · Score: 1

      You have to understand the history. C was designed as a machine-independent system implementation language, which meant that it had to have as good performance as possible for commonly used things like integers, and it ran on a much wider range of processors than you'd expect to run into nowadays. The processors could have ones' complement, twos' complement, or signed magnitude for negative integer values. They could be designed to halt execution and raise some sort of signal on integer overflow, or designed to ignore it. Machine-addressible units of memory could range from one bit to 60 bits. Given the variety in what processors would do, any specific behavior would kill performance for processors that didn't match the behavior, so they left it as "undefined".

      There were other sorts of incompletely specified behaviors. "Implementation-defined" usually referred to fairly minor differences, such as how long an "int" was. Unspecified behavior usually referred to cases where there would be a few obvious choices, such as order of evaluation of function parameters. Whether or not it was a good idea, C generally labeled more complex potential incompatibilities as undefined. Personally, I'd like to see less "undefined behavior", substituting "implementation-defined" or "unspecified" as much as possible.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
    31. Re:Oh come on. by Anonymous Coward · · Score: 0

      True. Nasal demons on an airplane are a Very Bad Thing.

  3. Have you tried turning it off and on again? by Anonymous Coward · · Score: 5, Funny

    Finally!

    IT support advice that's useful!

    1. Re:Have you tried turning it off and on again? by rjniland · · Score: 4, Interesting

      Yes, but perform a clean systems shut down BEFORE turning off power.

      I was on an airliner once that crashed at the gate, prior to departure.

      Ground power was disconnected before they had spun up the APU. Lights out. Lights on. ... Several minutes later we get an announcement that we'd have to wait for a backup plane, which took 45 minutes to arrange.

      They were unable to reboot the airliner.
      Robust systems design wasn't a phrase that came to mind.

  4. This is Boeing Tech Support by mikeabbott420 · · Score: 4, Funny

    "have you tried turning it off and then back on?"

    --
    This program was made possible by a grant from the Ultra-Humanite, and viewers like you.
    1. Re:This is Boeing Tech Support by Anonymous Coward · · Score: 0

      I know of a non-functioning printer that required two calls. First because it was not plugged in, second because it had no paper.

    2. Re:This is Boeing Tech Support by Anonymous Coward · · Score: 0

      How about CNTL-ALT-DEL?

    3. Re:This is Boeing Tech Support by sphealey · · Score: 1

      The first time I was on a new plane where the pilots did that at the gate to "fix a computer glitch" (~1998) I was utterly terrified.

      sPh

    4. Re:This is Boeing Tech Support by fahrbot-bot · · Score: 1

      "have you tried turning it off and then back on?"

      • Customer: How do I do that?
      • Tech Support: Use the big red switch at the back of the fuselage, just under the elevator. Flip it to 0/Off, count to 10 and flip it back to 1/On.

      True story: Back in the early 1980s, I actually had a long-distance phone call with someone in which I was the "tech support" part of the above conversation. ... Me: "Are you sitting in front of the PC? Lean to your right... See that big red switch at the back of the case? ..."

      --
      It must have been something you assimilated. . . .
    5. Re:This is Boeing Tech Support by minstrelmike · · Score: 1

      How about CNTL-ALT-DEL?

      Yup. Reboot the plane every time it's at the gate.

    6. Re:This is Boeing Tech Support by jcdr · · Score: 1

      I remember that in 1996 the pilot of a FBW aircraft has say that one computer displayed an error, then there restarted the computer and the error was not displayed again, so all is nominal and we can go. He didn't detailed the error displayed. Maybe this was minor, maybe not. The 6 hours fly was without any problem.

    7. Re:This is Boeing Tech Support by plopez · · Score: 1

      "Thank you for calling Boeing tech support. Did you most of your questions can be answered online in our FAQ section? Simply go to www.boeingcares.com/customercare/support/FAQ. If this is an ground problem press 1, if it is a maintenance question please press 2, if this is about the galley hotbox recall press 3, for in-flight problems press 4"

      *beep*
      "You have selected in-flight problem. For engine fires press 1, for structural failure please press 2, for fuel system faults please check 3, for all other in-flight challenges please press 4."

      *beep*
      "You have selected other. Your call is very important to us you will be transferred to the next available customer care representative. Did you most of your questions can be answered online in our FAQ section? Simply go to www.boeingcares.com/customercare/support/FAQ. Due to unusually high call volumes you expected hold time is 15 minutes. Please hold..."

      --
      putting the 'B' in LGBTQ+
    8. Re:This is Boeing Tech Support by Anonymous Coward · · Score: 0

      That's nothing.

      I developed the software for a general aviation telephone/data system back in the early 90s. When we took it up for our first test flight, they didn't bother to wire it up to a separate circuit breaker. So, while we were flying, when we needed to reboot the test system, the pilot would turn the master circuit breaker off and back on, which killed everything (including the engines).

      Scared the shit out of me, but apparently it was no big deal.

    9. Re:This is Boeing Tech Support by fuzzyfuzzyfungus · · Score: 3, Funny

      NTSB investigators reported the cause of the crash as 'Controlled reboot into terrain".

    10. Re:This is Boeing Tech Support by Anonymous Coward · · Score: 0

      The first time I was on a new plane where the pilots did that at the gate to "fix a computer glitch" (~1998) I was utterly terrified.

      sPh

      how did this happen? big airplanes at the gate have engines turned off and running on APU for the avionics, etc. they just start engines at pushback so if they had to restart the APU that something that if you were not at the back of the airplane you wont even notice, and even if you realise that why would you be terrified? the airplane was at the gate for gods sake.

    11. Re:This is Boeing Tech Support by daveime · · Score: 1

      Cannot find CNTL key, please suggest alternative.

  5. Very unlikely to be triggered in the field by Brandano · · Score: 2, Informative

    A commercial plane will most probably undergo through several maintenance events and checks during that sort of time frame, where cycling the power is part of the procedure.

    1. Re:Very unlikely to be triggered in the field by hawguy · · Score: 4, Insightful

      A commercial plane will most probably undergo through several maintenance events and checks during that sort of time frame, where cycling the power is part of the procedure.

      It's very reassuring to know that it probably won't happen.

    2. Re:Very unlikely to be triggered in the field by antiperimetaparalogo · · Score: 1

      A commercial plane will most probably undergo through several maintenance events and checks during that sort of time frame, where cycling the power is part of the procedure.

      Yes, but when you have people taking pride for their desktop's uptime... well, better safe than sorry!

      --
      Antisthenes: "Wisdom begins by examining the words/names." - excuse my English, i am (slightly...) better with my Greek!
    3. Re:Very unlikely to be triggered in the field by Anonymous Coward · · Score: 0

      You will probably not be struck by lightning, but I can't guarantee that it won't happen. You'd better stay in the basement for the foreseeable future.

      Only marketing would promise that something can't happen. For people have morals it is of importance to be truthful.

    4. Re:Very unlikely to be triggered in the field by compro01 · · Score: 2

      You will probably not be struck by lightning, but I can't guarantee that it won't happen.

      Actually, when talking about airliners, getting struck by lighting is a fairly common occurrence. A typical airliner experiences a lightning strike about once a year.

      --
      upon the advice of my lawyer, i have no sig at this time
    5. Re:Very unlikely to be triggered in the field by confused+one · · Score: 5, Interesting

      If it ever happened on a plane, then it means that the maintenance was intentionally skipped. If they reach 248 days of continuous operation then a number of significant maintenance cycles have been skipped (some 23-25 inspection / maintenance cycles that generally require shutting down the electrical system). The generators in question are attached to the engines. The engines have a overhaul schedule that is shorter than 248 days of continuous operation. If they managed to reach this point, then the major maintenance cycles have been skipped and the engines are long overdue for a tear down inspection and overhaul. Any plane which could reach this point, 248 days of continuous operation missing all of the required maintenance; this is not a plane (or an airline for that matter) which anyone should be flying on.

    6. Re:Very unlikely to be triggered in the field by Mirar · · Score: 1

      Waiting 248 days on the tarmac before flight... Improbable. I hope.

    7. Re:Very unlikely to be triggered in the field by kthreadd · · Score: 2

      If it ever happened on a plane, then it means that the maintenance was intentionally skipped.

      And that would of course never happen.

    8. Re:Very unlikely to be triggered in the field by Anonymous Coward · · Score: 0

      +1

      The probability of getting struck by lightning is significantly different between being on the ground at resting/walking pace most of the time where thunderstorms might happen every couple of months (depending where you are) and flying *directly through the lightning cloud* of a thunderstorm every couple of days.

    9. Re:Very unlikely to be triggered in the field by Anonymous Coward · · Score: 1

      no he means intentionally skipped 25 times.

      This is going from "Airline that has some dodgy fucking maintenance crew" to "Airline that just fired the ground support staff" and you won't be able to get on the plane because the FAA will deregister them.

      idiot.

    10. Re: Very unlikely to be triggered in the field by JWW · · Score: 1

      Yes, but if your desktop fails it doesn't fall out of the sky.... most of the time

    11. Re:Very unlikely to be triggered in the field by sphealey · · Score: 1

      The entire world isn't the US/Japan/EU. While most airlines outside that region who operate 787s run tight operations (Ethiopian for example is often mentioned as very well-run with a strong safety culture), there are a few who do not.

      That said, in the few instances where less organized airlines have managed to acquired 787s they are probably being shut down 2-3 times/week much less every 9 months.

      sPh

    12. Re:Very unlikely to be triggered in the field by Anonymous Coward · · Score: 2, Funny

      You must not fly United.

    13. Re:Very unlikely to be triggered in the field by Anonymous Coward · · Score: 0

      Even if I am standing out in the middle of a flat open field during a thunderstorm, there is a slim chance that I won't get hit. And if I do get hit, it is still possible that I could survive. The chances of that happening may be miniscule, but they exist.

      If the plane doesn't end up going through maintenance, there is a one-hundred percent chance of everyone on board dying. And I don't trust the airliners enough to not skimp on maintenance as much as possible.

    14. Re:Very unlikely to be triggered in the field by mrchaotica · · Score: 1

      Hey, it could be possible on planes flying the Tripoli - Mogadishu - Kabul route!

      --

      "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

    15. Re:Very unlikely to be triggered in the field by plopez · · Score: 1

      Is that a cold boot or a warm boot?

      --
      putting the 'B' in LGBTQ+
    16. Re:Very unlikely to be triggered in the field by JWSmythe · · Score: 1

      That's what I was thinking. I didn't look it up, but I'd be pretty sure that the maintenance interval is shorter than 5,952 hours.

      --
      Serious? Seriousness is well above my pay grade.
    17. Re:Very unlikely to be triggered in the field by Anonymous Coward · · Score: 0

      Every aspect of life is reassuring like that.

    18. Re: Very unlikely to be triggered in the field by Sloppy · · Score: 1

      In an alternate timeline, Keith Moon found the 21st century to be full of challenges.

      --
      As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
    19. Re:Very unlikely to be triggered in the field by Anonymous Coward · · Score: 0

      Air India would just publicly ridicule to get more concessions from Boeing. So it's win-win, Boeing gets another sale because of the incompetence of their customer.

    20. Re: Very unlikely to be triggered in the field by Anonymous Coward · · Score: 0

      Actually it's the Tripoli Benghazi Tunis Dubai Mogadishu Istanbul Kabul route.

    21. Re:Very unlikely to be triggered in the field by Rich0 · · Score: 1

      Sure, but if somebody did operate an airliner in this manner, I imagine many other components would be failing, creating numerous hazardous conditions.

      Maintenance schedules on big things like airliners aren't just created arbitrarily. If the manual says to inspect the turbine blades every n hours then somebody probably did a study that shows that at x% of n hours you start to get measurable deterioration. If they could make the intervals longer they would - it would be a major selling point for the plane.

      Sure, this software bug should be fixed, but in general if you're going to allow companies to ignore the manufacturer's guidelines, then you can't really hold the manufacturer responsible for failure.

    22. Re:Very unlikely to be triggered in the field by Idarubicin · · Score: 1

      A commercial plane will most probably undergo through several maintenance events and checks during that sort of time frame, where cycling the power is part of the procedure.

      It's very reassuring to know that it probably won't happen.

      As other posters have noted, 248 days of operation means skipping twenty-plus maintenance and inspection cycles, plus missing one or more engine overhauls. That sucker's going to fall out of the sky due to a hardware problem before the software error gets the chance.

      Even in the absence of regular, scheduled, required maintenance, there will be hardware failures due to stuff wearing out, with sufficient frequency to force reboots at less-than-eight-month intervals. Honestly, the FAA is going to ground any airline that was so lax as to get within six months of tripping over this bug.

      That's not to say that this bug is a good or acceptable thing, nor that something like it couldn't have much more serious effects. But this particular error is a non-issue from a real-life consequences standpoint.

      --
      ~Idarubicin
    23. Re:Very unlikely to be triggered in the field by thegarbz · · Score: 1

      If it ever happened on a plane, then it means that the maintenance was intentionally skipped. If they reach 248 days of continuous operation then a number of significant maintenance cycles have been skipped (some 23-25 inspection / maintenance cycles that generally require shutting down the electrical system). The generators in question are attached to the engines. The engines have a overhaul schedule that is shorter than 248 days of continuous operation. If they managed to reach this point, then the major maintenance cycles have been skipped and the engines are long overdue for a tear down inspection and overhaul. Any plane which could reach this point, 248 days of continuous operation missing all of the required maintenance; this is not a plane (or an airline for that matter) which anyone should be flying on.

      Are you trying to say that to get to this point required maintenance would need to be skipped?

    24. Re:Very unlikely to be triggered in the field by hawguy · · Score: 2

      If it ever happened on a plane, then it means that the maintenance was intentionally skipped. If they reach 248 days of continuous operation then a number of significant maintenance cycles have been skipped (some 23-25 inspection / maintenance cycles that generally require shutting down the electrical system). The generators in question are attached to the engines. The engines have a overhaul schedule that is shorter than 248 days of continuous operation. If they managed to reach this point, then the major maintenance cycles have been skipped and the engines are long overdue for a tear down inspection and overhaul. Any plane which could reach this point, 248 days of continuous operation missing all of the required maintenance; this is not a plane (or an airline for that matter) which anyone should be flying on.

      You would think that if this situation was unlikely to ever happen in practice that the FAA wouldn't have deemed it necessary to issue an AD requiring that the GCUs be power cycled at intervals no longer than 120 days. You'd think they'd already be aware of required maintenance intervals that require powercycling the GCUs, and they waived the usual comment period before issuing the AD due to the perceived imminent danger.

    25. Re:Very unlikely to be triggered in the field by pem · · Score: 1

      Meh. Those will get shot down well before 248 days are up.

    26. Re:Very unlikely to be triggered in the field by Anonymous Coward · · Score: 0

      What if they manage to do the engine maintenance without shutting everything down? I know plenty of computer people who grumble when they have to reboot for anything - even a kernel replacement.

    27. Re:Very unlikely to be triggered in the field by StikyPad · · Score: 1

      My locality posts speed limit signs in residential areas despite the fact that there are statewide speed limits of 25MPH in residential areas, and despite the fact that drivers are required to know this to pass the driving test.

      Redundant != pointless or worthless. In both cases, it reduces the operator's ability to say "I had no idea!"

    28. Re:Very unlikely to be triggered in the field by tverbeek · · Score: 1

      "most probably".

      Relax.

      --
      http://alternatives.rzero.com/
  6. These problems keep resurfacing by Anonymous Coward · · Score: 0

    Its troubling that as much focus on this new battery backup that has happened that we continue to see problem creep up on its reliability and safety. I think it clearly represents a lack of detailed focus on testing and that maybe someone at the FAA needs to say something is still not right here. Then you have people like Elon Musk touting Lithium Ion technology for home energy backup and you have to ask yourself with all the lithium battery recalls with notebook PC's if a storage systems far greater for a home solution or a aircraft is proper and safe? At this point in time I would not want to have the capacity of a lithium battery like the one for a home backup system in my house. At some point maybe they will be proven safe enough but as with the 787 I don't want to be the guinea pig that finds out.

    1. Re:These problems keep resurfacing by Anonymous Coward · · Score: 0

      This has nothing to do with battery backups, but may have something to do with you being retarded.

  7. Lesson Here by TechNeilogy · · Score: 1

    Always, always, always do the math on counters and give yourself orders of magnitude of space. Figured this out the hard way once (fortunately not in a situation where safety was a concern).

    --
    "The wisdom of the Patriarchs was that they *knew* they were fools." --Master Foo
    1. Re:Lesson Here by fisted · · Score: 1

      If you did the math, you don't need excess space. If you need excess space, you're just shifting the day of failure into the future. Yes, perhaps far enough, but still.

    2. Re:Lesson Here by Megane · · Score: 2

      Also, use the difference of the current time minus the start time, instead of computing the end time and using a simple less than/greater than comparison. This properly handles wraparounds, and only has a problem with differences more than half of the full range. (so don't keep comparing the time after it's ended!)

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
    3. Re:Lesson Here by tompaulco · · Score: 1

      Always, always, always do the math on counters and give yourself orders of magnitude of space. Figured this out the hard way once (fortunately not in a situation where safety was a concern).

      As far as I am concerned, there are three valid quantities in programming. Zero, one and unlimited.

      --
      If you are not allowed to question your government then the government has answered your question.
    4. Re:Lesson Here by HornWumpus · · Score: 1

      Good luck using a float as a counter. It won't overflow, but will eventually stop counting.

      The trick is knowing what you are doing. Which means erasing that 'three valid quantities' thinking.

      --
      John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
    5. Re:Lesson Here by dgtangman · · Score: 1

      Good idea in principle, not always helpful in practice. I diagnosed an application failure on a UNIX system some years back that resulted from using the system's "time since last boot" function as a real-time clock with greater than one-second precision. We discovered that in order to prevent the terrible things that would happen if the 32-bit signed counter of 0.01-second intervals ever overflowed, the UNIX vendor had programmed the reported time to stop changing when it reached 2^31-1. Since the system provided no other interface that provided elapsed time with greater than one-second precision, we ultimately had to tell those of our customers with systems from that vendor to be sure to reboot their servers at least every six months.

    6. Re:Lesson Here by ttucker · · Score: 1

      If you did the math, you don't need excess space. If you need excess space, you're just shifting the day of failure into the future. Yes, perhaps far enough, but still.

      What math would you do to determine exactly how high a counter should count?

      Would using a 64-bit long on a millisecond counter be lazy programming?

    7. Re:Lesson Here by Anne+Thwacks · · Score: 1
      The correct answer is: During pre-flight ground checks, detect all counters at imminent risk of overflowing*, and flag requirement for corrective action at next maintenance. Probably should be checked at all routine services as well.

      * "imminent risk of overflowing" probably means less than four routine maintenance intervals remaining, but consult the requirements document for more detail.

      This is aerospace, not gaming.

      --
      Sent from my ASR33 using ASCII
    8. Re:Lesson Here by Anonymous Coward · · Score: 0

      Would using a 64-bit long on a millisecond counter be lazy programming?

      Linus recommended a 256-bit time-of-day counter after the early death of the 32-bit time-of-day counter caused the Linux kernel to morph into Microsoft Windows 98. When Linus say the boot screen the next morning he nearly flipped a penguin.

    9. Re:Lesson Here by ChrisMaple · · Score: 1

      63 bits for a nanosecond counter gives 292 years.

      --
      Contribute to civilization: ari.aynrand.org/donate
    10. Re:Lesson Here by TechNeilogy · · Score: 1

      Trust me, someone somewhere will leave it running for 293 years; it's what users do.

      --
      "The wisdom of the Patriarchs was that they *knew* they were fools." --Master Foo
    11. Re:Lesson Here by Anonymous Coward · · Score: 0

      Good luck using a float as a counter. It won't overflow, but will eventually stop counting.

      The trick is knowing what you are doing. Which means erasing that 'three valid quantities' thinking.

      If you were "knowing what you are doing" then you would know that there are possibilities such as variable length quantities which easily and reasonably efficiently allow coding of arbitrary length integers. This allows "unlimited" without any loss of accuracy.

    12. Re:Lesson Here by petervandervos · · Score: 1

      If you did the math, you don't need excess space. If you need excess space, you're just shifting the day of failure into the future. Yes, perhaps far enough, but still.

      What math would you do to determine exactly how high a counter should count? Would using a 64-bit long on a millisecond counter be lazy programming?

      Yes, that is lazy programming.

      You should not determine a duration by subtracting two points of times from each other. You should call a function that can handle timer overflows.

    13. Re:Lesson Here by ttucker · · Score: 1

      A 64 bit signed long counter will merrily count milliseconds for 29,238 millennia.

    14. Re:Lesson Here by ttucker · · Score: 1

      The correct answer is: write the stuff in a language that is safe in the first place.

    15. Re:Lesson Here by ttucker · · Score: 1

      63 bits for a nanosecond counter gives 292 years.

      My post was not about nanoseconds, it was about milliseconds.

    16. Re:Lesson Here by petervandervos · · Score: 1

      Yes, but is still lazy.
      And you have to adjust a lot of variables to become long. All temp vars that hold a timestamp. If you miss a single one, your screwed.

    17. Re:Lesson Here by ttucker · · Score: 1

      And you have to adjust a lot of variables to become long. All temp vars that hold a timestamp. If you miss a single one, your screwed.

      Yes, the program would have to be implemented without error, to not have an error... that is a tautology. Pragmatically, use a statically typed language, and do not change anything, use the correct type while implementing the program the first time.

      What would a non-lazy programmer use instead? An arbitrary precision int or something? Can you think of any downsides to that approach?

    18. Re:Lesson Here by HornWumpus · · Score: 1

      Global replace 'long' and 'int' with the 'unlimited size int type name' and report back just how badly your system now runs.

      --
      John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
    19. Re:Lesson Here by petervandervos · · Score: 1
      A non lazy programmer shouldn't subtract two timestamps from each other to get a duration but uses a (self written) function that can handle overflows.

      A programmer that doesn't have total control over the whole system (in most cases) should reboot the system after a fixed amount of time. Even your iPhone or Android phone can not handle half a year uptime (at least all phones I have seen).

      Yes, the program would have to be implemented without error, to not have an error... that is a tautology.

      Sorry, programming for more than eh.. 35 years but never managed to write a non trivial program that doesn't have errors in it. But that's not really a problem as long as the program can recover from it.

      One hint: if you see a solution to a problem and think "that is easy way to solve it" don't use it. It will always come back and haunt you (like using a long for a timer).

    20. Re:Lesson Here by ttucker · · Score: 1

      A non lazy programmer shouldn't subtract two timestamps from each other to get a duration but uses a (self written) function that can handle overflows.

      I am not sure who you are even talking to. My response was in response to a smart ass comment made by a user named fisted, where he basically said that someone was a moron for suggesting counters that will run for orders of magnitude longer (ie. tens of thousands of millennia) are a pretty OK idea.

      Nobody mentioned calculating duration besides you (in a perfectly sensible way, I might add). This is a smart answer to the question that it is an answer to, but a really kind of silly answer to a question that it is not an answer to.

    21. Re:Lesson Here by petervandervos · · Score: 1

      Would using a 64-bit long on a millisecond counter be lazy programming?

      I am not sure who you are even talking to.

      Ah, that would be my mistake. As a non native English speaker I sometime mis 'irony'.

      Thanks for the conversation.

  8. Good thing it runs on a Windows O/S by Anonymous Coward · · Score: 0

    Since it runs on Windows O/S we don't have to worry about it reaching that long of an up-time except in perfect laboratory conditions.

  9. Slashdot? Qucik? by Anonymous Coward · · Score: 0

    Wow, Slashdot is early for once. You beat numerous national civil regulators. I learnt this before even our HAAMC did....

  10. Centiseconds in signed 32bit int by Roceh · · Score: 1

    A signed integer overflow for timing - scary...

    1. Re:Centiseconds in signed 32bit int by wonkey_monkey · · Score: 1

      That's why I always use unsigned integers like a boss.

      --
      systemd is Roko's Basilisk.
  11. Control unit runs at 100 Hz? by photonic · · Score: 5, Insightful

    I guess this might be due to a 32-bit signed integer being incremented at 100 Hz: 2^31 / 24 / 3600 / 100 = 248.5 days.

    --
    karma police: arrest this man, he talks in maths; he buzzes like a fridge, he's like a detuned radio. [radiohead]
    1. Re:Control unit runs at 100 Hz? by Megane · · Score: 1

      At least that's better than Window 98 crashing after 7 weeks! (because 1ms instead of 10ms)

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
    2. Re:Control unit runs at 100 Hz? by Anonymous Coward · · Score: 0

      Thanks for that, I was trying to work out the significance.

    3. Re:Control unit runs at 100 Hz? by Anonymous Coward · · Score: 2, Funny

      I call BS. No WIndows 98 machine could possibly stay up for 7 weeks, so this was a non-issue.

    4. Re:Control unit runs at 100 Hz? by Anonymous Coward · · Score: 1

      Which actually caused a shutdown of the Voice Switch at the FAA's Los Angeles Center in 2004 when maintenance failed to reset the system on schedule to avoid this bug.

      http://it.slashdot.org/story/0...

    5. Re:Control unit runs at 100 Hz? by Anonymous Coward · · Score: 0

      Well, if Windows 98 stayed up for 7 weeks, it is a clear sign that the computer wasn't even used ;).

    6. Re:Control unit runs at 100 Hz? by bosef1 · · Score: 2

      That makes a lot of sense. A lot of aviation power systems run with 400 Hz AC current (the higher frequency lets them use smaller transformers). They could be dividing down the power signal to 100 Hz, and using that to increment a counter.

      The other option is that many operating systems use 10 ms = 100 Hz for their internal interrupt timers. So it could just be a counter that is being incremented every interrupt cycle, and doesn't care what frequency of electricity is being used.
      (cf. the jiffy http://en.wikipedia.org/wiki/Jiffy_(time) )

    7. Re:Control unit runs at 100 Hz? by TheRealHocusLocus · · Score: 5, Funny

      I guess this might be due to a 32-bit signed integer being incremented at 100 Hz: 2^31 / 24 / 3600 / 100 = 248.5 days.

      Yes, the moment the big bird would shut down was correctly prognosticated by the Connecticut Yankee in King Arthur's Court. While testing a crowbar circuit he ran out of time and came to while munching on phattened feasant at Medieval Times, in a daze of King Arthur. He noticed an unused carrion bit, and realized that birds of prayer who managed the King's affairs were hard-sinewed to pluck quills for signing and always discarded the carrion bit. He caught the underflow was heralded by the people and befriended by the King, who set him to work hacking the Code of Chivalry and cracking the Y1K problem. In that time there were only punch cards and knights on horseback only had a resolution of 1 bit, so tournaments were long the fields were full of snakes, to avoid spooking the horses the knights would dismount and cleave them with sword, leaving half-adders strewn about. It was Pendragon who had built the famous Round Table with 12 seats, two complete I Chings, where Arthur and the knights would drop in and punch out binary sums in a rudimentary form of patty-cake, which inspired the mechanical circular adder of later years. The Yankee's refinement was a 13th chair left unoccupied to mark the betrayal of Judas, and also to serve as a carrion bit.

      There is a great deal more about gum-powder and 99 cent gamut of Steampunk-driven micro commerce, a Debian release called 'Guinevere' and a whole lotta Lancelot, but time is fun when you're having flies.

      --
      <blink>down the rabbit hole</blink>
  12. Failsafe by ISoldat53 · · Score: 1

    How is losing power in an airplane a safe mode?

    1. Re:Failsafe by antiperimetaparalogo · · Score: 1

      How is losing power in an airplane a safe mode?

      In the same way as cutting off the power is a safe mode for any machine? But i guess that for a plane it's better to do it while on the ground...

      --
      Antisthenes: "Wisdom begins by examining the words/names." - excuse my English, i am (slightly...) better with my Greek!
    2. Re:Failsafe by confused+one · · Score: 1

      It's a failsafe mode for the controller and generator. There are four (4) of them. There is more than enough redundancy.

    3. Re:Failsafe by X0563511 · · Score: 2

      ... not when they would all have nearly the exact same runtime - they would all hit the failsafe at around the same time.

      Not that this should ever happen in the air - as others have said, if the thing manages to run for this long, someone hasn't been doing maintenance.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    4. Re:Failsafe by stooo · · Score: 1

      Yeah, except that there may be other bugs in this piece of software which could trigger a common mode failure.

      --
      aaaaaaa
  13. If Boeing believed in software QA.... by Bomarc · · Score: 2

    For all of the QA at Boing; they don't believe in software QA. Take a look at their job openings some time: In years of searching, I've seen only one software QA position, and it wasn't dealing with aircraft. Any such search results will return developers that are to write their own tests against the spec. Developers are not Testers.... and I'll ask: How many more such bugs are out there?

    I know of two other software "bugs" ... that can be attributed to a lack of QA. How many people will die due to a bad management decision on the part of Boeing?
    Disclosure: Yes, I'm a software QA / Test professional.

    1. Re:If Boeing believed in software QA.... by Anonymous Coward · · Score: 0

      Wasn't it Boeing QA that discovered this flaw in the first place?

      dom

    2. Re:If Boeing believed in software QA.... by binarylarry · · Score: 1

      Because these people are normally called TEST PILOTS. ;)

      --
      Mod me down, my New Earth Global Warmingist friends!
    3. Re:If Boeing believed in software QA.... by Anonymous Coward · · Score: 2, Informative

      The Primary Flight Computer software for the 777 was written in England by GEC. Indeed the hardware for the PFC was designed and built by GEC.

      I was on the software QA team for the PFC code. There were tens of us working three shifts 24 hours per day devising tests of the PFC against it's requirement spec. There were even more doing unit tests on all the Ada code.

      That is perhaps why you don't see Boeing advertising for QA engineers. They outsource the hardware and software.

    4. Re:If Boeing believed in software QA.... by fisted · · Score: 1

      I suspect now would be the best time to apply at Boeing :-)

    5. Re:If Boeing believed in software QA.... by Feral+Nerd · · Score: 1

      For all of the QA at Boing; they don't believe in software QA. Take a look at their job openings some time: In years of searching, I've seen only one software QA position, and it wasn't dealing with aircraft. Any such search results will return developers that are to write their own tests against the spec. Developers are not Testers.... and I'll ask: How many more such bugs are out there? I know of two other software "bugs" ... that can be attributed to a lack of QA. How many people will die due to a bad management decision on the part of Boeing? Disclosure: Yes, I'm a software QA / Test professional.

      The worst part is that when the Software bugs are finally discovered they are not fixed because it takes too much time and is too expensive to do (even though the physical update process is essentially no different to re-flashing/updating the firmware/software in a consumer grade digital device). I'd argue that you could cut the red security tape, reduce costs and install updates quicker if you massively increase the software QA work being done. Apparently Boeing disagrees, I dunno about Airbus, they might be just as bad but for some reason it's Boeing planes that seem to top the list over software related bloopers we get in our sector. Another good example is American Airlines who replaced 35 pounds of on board paper documentation with iPads only to have massive delays when the damn app they were using forced pilots to return to gate to get a wifi connection. I'm not sure about the wisdom of using garden variety consumer level tablets for this but the idea in it self is a good one, pilots are probably way quicker at looking up stuff up on a tablet of some description than rifling through 35 pounds of paper documents but you'd think issues like that could be fixed with a combination of proper software/hardware QA and adding whatever iOS/Android/Linux/Windows/WhateverOS tablet the pilots are using for their docs to the pre-flight checklist and having each aircraft carry two of devices. Perhaps the thing to do would be to create a quality rating/stamp for "aviation certified" hardened tablets? ... but knowing the aviation industry such devices would be updated once over their lifespan (at production time) because getting an update certified takes 8 months of wading through a quagmire of red tape, it would costs several hundreds of thousands of dollars to get an update vetted and it would costs of thousands of dollars to have it installed by a duly certified and highly trained aviation safety professional even though he'd essentially just be doing the same thing the rest of us do when we update our iPad, Galaxy Tab, etc....

    6. Re:If Boeing believed in software QA.... by Anonymous Coward · · Score: 0

      Bomarc, I couldn't agree more. This is a frightening, dangerous failure by Boeing management. Here's hoping that the FAA's mandated workaround is actually performed until the "bug" is fixed. Counters inevitably overflow. How could this be missed in aviation QA? Utterly, utterly reprehensible...

    7. Re:If Boeing believed in software QA.... by Anonymous Coward · · Score: 0

      There were tens of us working three shifts 24 hours per day devising tests of the PFC against it's requirement spec.

      Ah, the old, "Well, this wasn't in the spec..."

      QA is always about checking against the spec. It's rarely about checking against the real world. This schoolboy counter overflow error illustrates the resultant problem.

      Fortunately for those in the planes, it was spotted before anyone died. Unfortunately for those wishing to advance the state of QA art, it was spotted before anyone died, so it won't cause much of a change in best practice.

    8. Re:If Boeing believed in software QA.... by Anonymous Coward · · Score: 0

      In the parlance of the Aviation Software Development Processes, you would be part of the Verification Effort. You really have to read the DO-178 Document to understand they are advertising for your position when they are advertising for a "Tester". QA in that process is just checking that everything has been done according to plan, they are not supposed to look at content in a critical fashion from a software engineering perspective.
      Yes, I am a Software Engineer who is part of the verification process on a Generator Control Unit for Aircraft and Rotocraft. I don't work for whomever developed the software for that unit. I would bet this was thought of as a "Feature" not a bug, and not a problem with the coding or testing. I would think that the requirements say this is what should be done when that counter overflows.

    9. Re:If Boeing believed in software QA.... by Anonymous Coward · · Score: 0

      Who use fly by wire planes where software is likely your weakest link. So it's apparent this company doesn't care about the QA engineering aspect, and neither do you since you are probably just as ignorant as a software engineering is of mechanical laws. You're both idiots.

    10. Re:If Boeing believed in software QA.... by Anonymous Coward · · Score: 2, Insightful

      Actually I took my work there testing the 777 software very seriously.

      On at least two occasions I escalated what I thought was a problem in the specification all the way back to Boeing. One of them turned out to be a "real-world" issue in the spec.

      I believe the rest of the team took the same attitude. We used to talk about that a lot.

      At the end of the day what you are asking for is impossible. The spec we worked to was a stack of paper 2 yards high when printed out. How many QA engineers know enough about flight dynamics to question if any of it is correct or not?
       

    11. Re:If Boeing believed in software QA.... by Joe_Dragon · · Score: 1

      the maps and other info get's updated quite a bit.

    12. Re:If Boeing believed in software QA.... by drinkypoo · · Score: 1

      Wasn't it Boeing QA that discovered this flaw in the first place?

      After the code was already released, and is already being used in the field. As opposed to before release, as part of a responsible code review.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    13. Re:If Boeing believed in software QA.... by NicBenjamin · · Score: 1

      Remember that time back in the 90s when a Marine Corps plane on maneuvers knocked down a Cable Car in the Italian Alps, killing 20 people? That was partly because when they planned the maneuver they used charts that were 6 months old, and the cable car line was less then 6 months old. If they'd had iPads, or any other electronic chart equipment that automatically updated itself, it wouldn't have happened.

      Civil aviation changes just as frequently. Which approach each airport wants you to use (to avoid colliding with the rest of the aircraft), which airspace you should fly in only in extreme circumstances (perhaps due to Russian-allied rebels indiscriminate use of BUK missile launchers), etc. all changes quite frequently, and you need the data for every airport the plane could possibly be sent to because there's no way in hell the airlines gonna buy a nine-figure aircraft and only use it at three airports.

      Thus the electronic equipment, which may not work due to computer issues some small percent of the time, but when it does work will always be updated properly. Whereas with paper you get 100% uptime, but are guaranteed less then 100% accuracy. After pilots get used to the equipment it also has fewer user-fuck-up issues. The Marines I mentioned, for example, actually had the right charts, in a sealed envelope, in the fucking cockpit, but they hadn't opened the damn thing because they figured it was a waste of time having nothing to do with them.

    14. Re:If Boeing believed in software QA.... by plopez · · Score: 1

      Testing is about checking compliance to a spec. QA is a *much* broader topic. E.g. reviewing the spec to ensure it was well writing and there are no gaps. Unfortunately many people do not understand the difference.

      --
      putting the 'B' in LGBTQ+
    15. Re:If Boeing believed in software QA.... by Anonymous Coward · · Score: 0

      Not unless your Indian, or fresh out of college and are willing to work for cheap. My advise, avoid at all cost.

    16. Re:If Boeing believed in software QA.... by Required+Snark · · Score: 5, Informative
      You have no idea what you are talking about. All FAA certified aircraft software has to conform to the DO-178B / DO-178C standard. The standard imposes design, testing, process and documentation standards that are extremely demanding.

      QC isn't just a department or a step in the release process, it is built into the full life cycle of the software. Safety is the goal, and the requirement for good practice starts at the beginning of the process, with the requirement documents.

      For example, there are five levels of error severity defined from A to E. E has no impact on safety and A is catastrophic, where a crash could occur. The level of software test and validation depends on the severity level.

      The number of objectives to be satisfied (eventually with independence) is determined by the software level A-E. The phrase "with independence" refers to a separation of responsibilities where the objectivity of the verification and validation processes is ensured by virtue of their "independence" from the software development team. For objectives that must be satisfied with independence, the person verifying the item (such as a requirement or source code) may not be the person who authored the item and this separation must be clearly documented. In some cases, an automated tool may be equivalent to independence. However, the tool itself must then be qualified if it substitutes for human review.

      Your inability to find a "QC" position is because you don't know the structure of aerospace software development and have no idea of the job titles or terminology used to describe the standards used. You are projecting your lack of knowledge into a inconceivable lapse of competence on the part of Boeing and the FAA. In what universe would there be no software safety requirements for the civilian aircraft industry? All you have shown is that you are ignorant and have a basic lack of common sense.

      --
      Why is Snark Required?
    17. Re:If Boeing believed in software QA.... by Bomarc · · Score: 1

      You have no idea what you are talking about.

      Then you have never looked for a software tester / QA position at Boeing.

      For example you search Boeing jobs for QA on 5/2/2015 you will see 15 jobs -- none are software specific QA, two of them are software fields ... including Cloud Architect 4 and a Software Release Engineer

      If you search for test you will see 97 (Adjusted search for only IT); and a typical job posting (most of the "Software Engineer" postings) will have something like:
      Other duties may include:
      -- Develops software verification plans, test procedures and test environments, executing the test procedures and documenting test results to ensure software system requirements are met;


      They may "conform to the DO-178B / DO-178C standard" ... but my point is the person performing the test is NOT a software QA professional, rather is the developer of the software.

      Full disclosure: There currently are a few QA/test positions open -- including one that is a subsidiary of Boeing.

    18. Re:If Boeing believed in software QA.... by matfud · · Score: 1

      In safety critical systems software tends to be designed to shut down if anything unexpected is encountered. It follows from the concept of "do no harm".

      In some situations that is obviously not the best of ideas. There is nothing to say that the plane can not continue flying. Even if it requires shuting down the flight computers and deploying the RAT.
       

    19. Re:If Boeing believed in software QA.... by matfud · · Score: 1

      As per the previous slashdot post about the huge amount of paper needed on aircraft and how replacing it with an ipad caused problems.
      Very little of that huge stack of paper and how to handle the paper is related to pre/post flight check lists. The majority is exceptions If there is a problem then the pilots are expected to dig through that to find a remediation (if they do not already know)
      .

    20. Re:If Boeing believed in software QA.... by Malenx · · Score: 1

      I think your reading too far into that requirement. Every single developer job posting that I've read since college has included that line. We have to be able to test and verify our own code before we pass it up to QA for further verification. On top of that, Boeing might likely take a very heavy testing approach such as TDD for some of their software applications, in which knowing how to write automated tests is critical.

  14. History repeating itself? by Anonymous Coward · · Score: 1

    248 days kind of sounds similar to this 497 days issue. In fact, it could be the same issue if they are using a signed 32bit integer.

    https://www.ibm.com/developerw...

    1. Re:History repeating itself? by photonic · · Score: 1

      [Mod parent informative] Indeed, this seems exactly the same issue. For Boeing, it might either be a signed integer being incremented at 100 Hz, or an unsigned one at 200 Hz.

      --
      karma police: arrest this man, he talks in maths; he buzzes like a fridge, he's like a detuned radio. [radiohead]
  15. Keeps Living Up To It's Name by Anonymous Coward · · Score: 0

    The Boeing Screamliner -- the proud product of innovative Project Management in a Globalized Economy

    1. Re:Keeps Living Up To It's Name by tompaulco · · Score: 1

      The Boeing Screamliner -- the proud product of innovative Project Management in a Globalized Economy

      I thought the name was the Dreamliner? Yup. It is, according to the Boeing website. No mention of it being called the screamliner. But you are right about it living up to it's name. No accidents or injuries have occurred on a 787 in all of it's years of service.

      --
      If you are not allowed to question your government then the government has answered your question.
    2. Re:Keeps Living Up To It's Name by ChrisMaple · · Score: 1

      No accidents or injuries have occurred on a 787 in all of it's [sic] years of service.

      All three of them.

      --
      Contribute to civilization: ari.aynrand.org/donate
    3. Re:Keeps Living Up To It's Name by Ethanol · · Score: 2

      All three of them.

      Hey, 248 days is five dog-years.

    4. Re:Keeps Living Up To It's Name by Anonymous Coward · · Score: 0

      Have you forgotten the whole thing about it being grounded sooner after launch than any other airliner ever launched? And for several months? And that the cause for the battery fire is still unknown and the "solution" is just to put it in a more fireproof container? In addition to screamliner it has also earned the nickname firebird. Among pilots and ATC it's known as "sparky". No joke! It's used the same way as e.g. the 777 is called "bigfoot" because of its distinct landing gear. And referring to "all it's years of service" (to quote you and your poor grammar) is just dumb when it's such a new aircraft and the gold standard is the A340 which soon turns 25 with no fatalities.

  16. Re:Maybe they should have used Rust. by Anonymous Coward · · Score: 0

    Except that the problem here was an integer overflow problem with a constantly incrementing time counter. All a "better" language could really do would be to either abort when the overflow happened, or have automatic support some kind of bignums, at a significant performance reduction throughout the entire program. From the sound of the description in TFA, the first one was basically exactly what was already happening.

  17. Re: Maybe they should have used Rust. by EmeraldBot · · Score: 1

    Using a language that hasn't even reached a stable release in an environment where the tiniest mistake kills hundreds of people?

    --
    "Set a man a fire, he'll be warm for the rest of the night. Set a man afire, he'll be warm for the rest of his life."
  18. Just turn if off by Murdoch5 · · Score: 1

    Don't they ever switch the planes off? If all you have to do is reboot the system once every 200 days, then just reboot it.

    1. Re:Just turn if off by Anonymous Coward · · Score: 0

      If keeping it switched on for 248 days is enough to cause a software crash instead of an error or a mere warning, people tend to wonder what else is enough to cause a software crash. Look beyond your nose please.

  19. Common bug... by Anonymous Coward · · Score: 0

    SunOS had this bug. Solaris had this bug. Linux had this bug. NT had this bug. HP-UX had this bug. Oracle had this bug.

    Some of them were 248 days. Some were double that. One way or another, this issue has hit countless platforms over the years.

  20. Must be running Microsoft by Anonymous Coward · · Score: 0

    Trolling...

  21. Graceful degradation by thisisauniqueid · · Score: 2

    The plane's control systems should have several levels of degraded-mode operation, so if one system stops working, the plane still hobbles along the best it can without the non-working system. Google's self-driving cars have something like 7 layers of nested failure modes, each with slightly degraded functions relative to the next higher level. It's almost impossible to trigger enough failures to completely shut the system down, which is a good thing if you're traveling at highway speeds. It's very concerning that a company like Boeing didn't catch this before product release, but even more concerning that they didn't design the system to be resilient against this sort of failure.

    1. Re:Graceful degradation by photonic · · Score: 1

      Indeed, they would need some mechanism like this, which is implemented using several heterogeneous processes. Triple hardware redundancy is useless if they all have a common mode software bug. Same thing happened to the first flight of Ariane 5, where all 3 controllers crashed within milliseconds.

      --
      karma police: arrest this man, he talks in maths; he buzzes like a fridge, he's like a detuned radio. [radiohead]
    2. Re:Graceful degradation by Anonymous Coward · · Score: 1

      The 787 is resilient against this sort of failure. Avionics and some flight surfaces will function with DC battery backup and even if that were to fail the ram-air turbine automatically deploys when DC power fails.

    3. Re: Graceful degradation by Anonymous Coward · · Score: 1

      They do, in a sense. If the generators fail off, the plane switches to battery power until the Ram Air Turbine (RAT) automatically deploys, providing essential power until the generators and/or APU can be reset.

    4. Re:Graceful degradation by X0563511 · · Score: 1

      It is designed such that this would never be an issue. Why? Because you have to skip several critical maintenance periods to hit it. Imagine if you, somehow, kept your car engine running for two years. Ignoring the logistics of this, doing so means you cannot have changed your oil etc.

      Now, if it was on the order of 11 hours, that would be more of a concern.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    5. Re: Graceful degradation by Anonymous Coward · · Score: 0

      KIRK: Scotty, ...what's left?
      SCOTT (on intercom): Just the batteries, sir. I can have auxiliary power in a few minutes.
      KIRK: We don't have a few minutes!

  22. Give a whole new meaning to ... by quax · · Score: 1

    ... "failsafe".

    1. Re:Give a whole new meaning to ... by Anonymous Coward · · Score: 0

      Wait until it does its automatic update in mid flight.

  23. 3 shifts? by Anonymous Coward · · Score: 0

    There were tens of us working three shifts 24 hours per day...

    Round the clock? That's sounds like piss poor planning - as in QA was an after thought.

    I have seen it happen all too often, an unrealistic development schedule is made to get the contract and as shit rolls down the schedule, QA takes the brunt of any deadlines problems. It's one thing if your developing software for insurance companies, it's another when it's aerospace.

    1. Re:3 shifts? by Anonymous Coward · · Score: 3, Informative

      The reason for the three shifts was that we were using actual PFC computers connected to hardware that could simulate all the inputs and read all the outputs.

      That hardware was a big complicated rack of electronics and there were maybe 8 or 10 such units in a lab.

      As such, to optimize use of the facilities it was necessary to have three shifts 24 hours per day. This went on for a year or more.

      Very good planning in fact.

      Now I could tell you stories of the real corners cut to meet the schedule. But that's a complicated story.

       

    2. Re:3 shifts? by tompaulco · · Score: 1

      I have seen it happen all too often, an unrealistic development schedule is made to get the contract and as shit rolls down the schedule, QA takes the brunt of any deadlines problems. It's one thing if your developing software for insurance companies, it's another when it's aerospace.

      Until recently I was the software QA Director for a software company. I completely agree with your assessment. My company used to pad in about a week for QA. I kept telling them, that it might take us a week to QA, but if we find any issues, then it will have to go back to development, and I can't speak for how long that would take. They really didn't like how i couldn't give them a solid date, but how could I speak for how long it would take another department to fix something?
      At any rate, all of that was moot as development literally never got the project to me until the actual date it was due to the customer, and it would be broken and not meet the specs. I did what I could to try to get development on track so we could deliver a quality product to our customer, but in the end, my company grew tired of my efforts and fired the whole QA team, so now the product just goes straight to the customer, bugs and all. And late.

      --
      If you are not allowed to question your government then the government has answered your question.
  24. It is probably a non-issue. by 140Mandak262Jamuna · · Score: 5, Funny

    The company is said to have found the problem during laboratory testing of the plane, and thankfully there are no reports of it being triggered on the field.

    The spokesman continued, "The battery would have caught fire long before that integer overflow."

    --
    sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
    1. Re:It is probably a non-issue. by DeBattell · · Score: 1

      Well that's reassuring.

    2. Re:It is probably a non-issue. by Anonymous Coward · · Score: 0

      I can see how that would be unsettling, but if you have a uptime limit in your system, in this case the battery, your other systems really don't need to be designed to run 'forever.' In this case there are physical limits on how long this stuff can be on, so you don't need to design overly complex logical systems that can run indefinitely. Doing so would only provide more opportunities for bugs.

    3. Re:It is probably a non-issue. by Anonymous Coward · · Score: 0

      I believe that is supposed to be the spokeshole, not spokesman.

    4. Re:It is probably a non-issue. by ChrisMaple · · Score: 1

      "Don't worry, we'll never hit the seven month software failure because the battery fails in 6 months"

      "We bought new batteries, they last 3 years now."

      "Are there any job openings in countries without extradition treaties?"

      --
      Contribute to civilization: ari.aynrand.org/donate
    5. Re:It is probably a non-issue. by Anonymous Coward · · Score: 0

      That's why safety critical software is certified for specific hardware only and the hardware is certified for specific software. The regulations really suck when you want to fix a simple broken part, they but they exist specifically for things like you mentioned. There're reasons behind all the red-tape madness.

  25. Re:Maybe they should have used Rust. by Jeremi · · Score: 1

    What mechanism does Rust use to prevent 32-bit counter overflows?

    --


    I don't care if it's 90,000 hectares. That lake was not my doing.
  26. What idiot doesn't know what "failsafe"means? by johnnys · · Score: 1

    So first they say " left turned on for 248 days, it will enter a failsafe mode" then they say "all the Generator Control Units will shut off, leaving the plane without power, and the control of the plane will be lost."

    That is NOT fail "SAFE". That is fail "EVERYBODY DEAD".

    --
    Sometimes the "writing on the wall" is blood spatter...
    1. Re:What idiot doesn't know what "failsafe"means? by Anonymous Coward · · Score: 2, Insightful

      If you actually read the AD it will say "We are issuing this AD to prevent loss of all AC electrical power, which could result in loss of control of the airplane."

      COULD lose control, not WILL. The 787 has at least 3 additional backup systems against this sort of failure, the APU, DC battery backup, and Ram Air Turbine.

    2. Re:What idiot doesn't know what "failsafe"means? by Danielsen · · Score: 1

      Often a Fail Operational system consists of several fail safe systems in parallel.
      It is then important that the systems dont have common course faults.

      ”This AD was prompted by the determination that a Model 787 airplane that has been powered continuously for 248 days can lose all alternating current (AC) electrical power due to the generator control units (GCUs) simultaneously going into failsafe mode”

      So the problem is that the same software is running in both GCU's, and they have been powered up at the same time.

    3. Re:What idiot doesn't know what "failsafe"means? by Anonymous Coward · · Score: 0

      Remember the battery packs catching fire in the Dreamliners? That was the backup power...

    4. Re:What idiot doesn't know what "failsafe"means? by NicBenjamin · · Score: 1

      Pilots lose control of planes all the time, just as drivers do. The question is a) how long are you out of control and how hard is it to get control back, and b) how likely is the scenario in the first place?

      In this case the answer seems to be a) not very long and not very hard as there are backup generators allowing you to reset the computers, and b) not bloody likely because you have to skip quite a few (as in dozens) routine maintenance cycles on these generators to get to 248 days without a restart.

      There's a reason the damn things have been flying for 3.5 years and nobody noticed the problem.

  27. Re:Maybe they should have used Rust. by Anonymous Coward · · Score: 0

    Nice one lol

  28. Halting Problem by ChromaticDragon · · Score: 1

    What a profound demonstration of the Halting Problem.

    1. Re:Halting Problem by Danielsen · · Score: 1

      What a profound demonstration of the Halting Problem.

      No...
      The halting problem is the problem of defining if a program would ever finish.
      In safety critical software you have to prove that the systems 'Worst Case Execution Time' is less that the safe process time. (Google WCET and AbsInt)
      In this case the program is not coming to a HALT, is is still running a loop time of 100Hz or whatever is generating the overflow.

      I have seen this on an industrial control system, where a faulty C++ timer class was used to monitor timeouts on a CAN bus. When the system had been online for a month, all nodes failed simultaneously with communication timeout due to an integer wraparound.

      Often timers are also used in conjunction with alarms, e.g. stop the engine if the lubrication pressure is lower than 2 bars for 2 secs.
      Or disconnect generator power if ground fault current is higher than 500 mA for 1sec.
      A fault in a timer software block would basically fire all alarms at the same time...

    2. Re:Halting Problem by Anonymous Coward · · Score: 0

      What a profound demonstration of the Halting Problem.

      The halting problem only applies to unconstrained systems. If any variation of the halting problem applies to your code you've already failed as most do.

  29. Enough of this by confused+one · · Score: 5, Informative

    This story is being way overblown. Yes, it's a bug. Yes, it should be fixed. However...

    248 days of continuous operation is well past the scheduled major maintenance for the aircraft. By this point, a 787 would have to go through many minor maintenance cycles which would have required shutting down the electrical system. In addition, loss of all 4 generators would not result in a loss of vehicle because there are batteries, an APU (a backup generator) and Ram Air Turbines (RATs), generators that deploy from the wing if the APU won't start. To have to rely on any of these would not make for a good day for the pilots; but, they would certainly provide the necessary power to safely land the aircraft at the nearest airport. They might even be able to continue on and finish their flight if they successfully reset the generators.

    This is not the OMG Planes Are Going to Fall From The Sky! event the media is making it out to be.

    1. Re:Enough of this by PPH · · Score: 2

      This is not the OMG Planes Are Going to Fall From The Sky!

      No. This is a "What the f* were you goofballs thinking when you wrote this code? And if this is all the better you can do, what other gotchas are hiding in there?"

      --
      Have gnu, will travel.
    2. Re:Enough of this by NicBenjamin · · Score: 2

      Dude, this is a for-profit company, not a research university. It's not written by people whose entire job is to prove to the world they write the most robustest code ever designed with zero bugs. If it doesn't kill people or delay flights it doesn't cost them money and nobody, except computer geeks, gives a shit.

      In this case the Dreamliner's designed to have all the relevant systems turned off for routine maintenance once every two weeks. Which means if they go more then 248 days without being restarted the airline has skipped several dozen (25 or 26 according to another slashdotter) routine maintenance cycles, which is likely a much bigger problem then the pilot needing to a) restart the computers mid-flight, or b) needing to glide to an emergency landing.

      Given that there've been something on the order of 400 plane-years of actual flight performance, and nobody noticed this bug until now, the software design seems to be about right. Not perfect, but if the planes are even being given 10% of the maintenance the specs call for this bug is a non-iissue.

      OTOH, the problems with various batteries were dumb engineering. Altho those also seem to be solved.

    3. Re:Enough of this by Anonymous Coward · · Score: 0

      Plenty but I was derided and called a liar last time. Just safe to say that we were told the communication equipment we did would never be switched off so we made damn sure tick wraps were safe, however the dickwads that wrote the DO178B components...

    4. Re:Enough of this by ArylAkamov · · Score: 2

      and Ram Air Turbines (RATs), generators that deploy from the wing if the APU won't start.

      Holy shit. That is cool, though looking at the pictures I can't stop laughing at how comical it looks.

      http://en.wikipedia.org/wiki/R...

    5. Re:Enough of this by Anonymous Coward · · Score: 0

      I am no expert on testing or safety. But I think this viewpoint of "if it appears to work for n years, then it's safe to run this code for n+1 years" overlooks a number of advances in computer science as well as some aspects of human nature. Computer science has made great progress in formal verification (both in industry and academia) such as formally verified operating system kernels, compilers, and ASIC design. I don't see any reason that significant parts of aircraft logic could not be formally verified. There has also been progress in identifying "safer" subsets of languages such as MISRA C used in the automobile industry. Modern bug testing such as fuzz testing and genetic algorithm based testing should be able to identify many bugs like this automatically. Also, humans have the perverse abilities to find odd, creative, or unorthodox workarounds that could trigger catastrophic bugs like this. For example, software and hardware upgrades and backwards compatibility.

      Since the Boeing team failed to catch bugs like this, it puts into question the process and tools that were used to create such an aircraft.

        - Connelly Barnes

    6. Re:Enough of this by joe_frisch · · Score: 2

      Even though this bug isn't a direct threat, it could interact with other future software changes. If it is a counter overflow there is a risk that the counter would run at a higher rate in some future version where more functionality is needed. If 248 days went to 2.48 days, it might not be caught in testing, but could (rarely) happen in real life.

    7. Re:Enough of this by Anonymous Coward · · Score: 0

      This is not the OMG Planes Are Going to Fall From The Sky! event the media is making it out to be.

      This is an acute bug during a zombie apocalypse, when those people that can stay up in the air will stay in the air until their Boeing goes bong.

    8. Re:Enough of this by ChrisMaple · · Score: 1

      Never, never program for me. You're taking somebody's word that 248 / 14 > 24.

      --
      Contribute to civilization: ari.aynrand.org/donate
    9. Re:Enough of this by ray-auch · · Score: 2

      Bingo.

      If this was only spotted recently in "lab testing" (and why was it being tested now, and not before flight... what prompted the testing...) then it was known / not documented that overflow of this counter would cause shutdown. Some future revision could easily be to increase the precision, at the expense of range, or persist the counter across reboots, and that might not be considered a problem because the system was thought to handle the counter overflowing because no one documented that it didn't.

      That is why I think the AD is there - to ensure this issue is known when this software is messed with in future.

    10. Re:Enough of this by NicBenjamin · · Score: 1

      Ahh engineers. Such strong fans of extreme precision. I could've sworn I put an at least in there.

      Given the tenuous nature of the data (a guy on slashdot told me ain't gonna hold up in Court), and that I wasn't sure I'd remembered the numbers exactly right, but I did know they were high enough that the 248 thing would not come up in the real world, going with a high-end ballpark estimate seemed sensible. Worst-case scenario somebody does the math and say "Dude, you said this is fine because 14 days is less then 248, but your actual evidence is that it's 9.92 or 9.53 days, which rounds down to one week." That guys is you.

      Which changes the math, and if we were using this thread to actually design an aircraft it would change everything, but it does not actually affect the conclusion.

      BTW, I did remember the numbers wrong. It was 23-25, so I'm assuming he thinks they're on an overhaul schedule that is almost exactly every 10 days:

      If it ever happened on a plane, then it means that the maintenance was intentionally skipped. If they reach 248 days of continuous operation then a number of significant maintenance cycles have been skipped (some 23-25 inspection / maintenance cycles that generally require shutting down the electrical system). The generators in question are attached to the engines. The engines have a overhaul schedule that is shorter than 248 days of continuous operation. If they managed to reach this point, then the major maintenance cycles have been skipped and the engines are long overdue for a tear down inspection and overhaul. Any plane which could reach this point, 248 days of continuous operation missing all of the required maintenance; this is not a plane (or an airline for that matter) which anyone should be flying on.

      And no, I'm neither a programmer nor an engineer, so I won;t be doing any of that kind of work for you.

    11. Re:Enough of this by Anonymous Coward · · Score: 0

      Given that there've been something on the order of 400 plane-years of actual flight performance, and nobody noticed this bug until now, the software design seems to be about right.

      I don't think you are paying attention. The computer design means that four separate "redundant" systems have a single failure mode in that they each run the same identical software.

    12. Re:Enough of this by Anonymous Coward · · Score: 0

      It's not written by people whose entire job is to prove to the world they write the most robustest code ever designed with zero bugs. If it doesn't kill people or delay flights it doesn't cost them money

      For life critical systems, yes, you try to write the most robust code ever designed with zero bugs. It doesn't matter that this bug won't cause a failure in real life. If it was documented as such, then fine. But it wasn't, it was a bug meaning the developers and testers aren't properly thinking about these types of failures. Now you have to wonder/audit what other overflow bugs might exist and will any of those cause real issues?

      This is a PR hit to Boeing. Such things are very hard to quantify, but it has impacted them financially.

    13. Re:Enough of this by Anonymous Coward · · Score: 0

      Because software with unknown bugs is never fielded? I've fixed my share of bugs from released fielded avionics software that the customer never knew about and never would have found out about.

      Why wasn't it caught before?
      If this had requirements it almost certainly was for a reasonable amount of time. As others have noted an aircraft having continuous power for more than a few weeks never happens. If there were requirements, then it would have been tested to ensure proper functionality to that spec prior to release. More likely that this particular code wasn't driven by a specific requirement so it wasn't specifically tested. I'd guess it was caught because of some other change happening in a related area triggering re-testing. Perhaps the test prompted some simulation that caused the counter to increment fast enough to actually trip the overflow, somebody noticed that the thing crashed and started investigating.

      I could imagine a conversation the software engineer might have had with the system requirements engineer when this was originally developed: "Do I need to worry about this being powered up more than a month?" "No, that will never ever happen."

    14. Re:Enough of this by tlhIngan · · Score: 1

      No. This is a "What the f* were you goofballs thinking when you wrote this code? And if this is all the better you can do, what other gotchas are hiding in there?"

      Well, most of the case would be that they didn't realize it might be an issue.

      Early Linux suffered from this issue a lot - device drivers could not be counted on to survive if jiffies overflowed. Modern day Linux implements a bunch of utilities to compare jiffies with an elapsed time (that handles overflows), as well as starting the jiffies counter 3 minutes before overflowing so it overflows early and bugs are detected.

      Of course, in this case, it was discovered in a lab setting - not only is it unlikely to happen in the real world (no, making a change to cause the roll over early will not happen as it turns working code into an untested state), but it also relied on someone pretty much leaving the equipment on the whole period then noticing it died.

      I don't know about you, but finding out the reason why something died 250 days later is difficult and probably only was discovered accidentally because someone left it set up at their desk the whole time.and forgot about it.

      Hell, it's probably a given the bug exists in plenty of other things as well, just they're normally cycled long before it's a problem and no one actually ran it long enough to test.

    15. Re:Enough of this by PPH · · Score: 1

      There's the principle of lessons learned. And not rewriting everything from scratch. This problem has been addressed and solved in numerous RTOSs and libraries. And even if these could not be used, simple things like overflows, underflows and other sorts of out of range variables are supposed to be caught by the sorts of rigorous analysis avionics s/w is supposed to be subject to. That this was caught in a lab test (and this far after the system went into service) is problematic as well. The complexity of most software (particularly real-time apps) rules out being able to cover all combinations of use cases by overall system tests.

      --
      Have gnu, will travel.
    16. Re:Enough of this by shutdown+-p+now · · Score: 1

      You don't need to be writing software for airplanes to understand the notion of an overflowing counter, and why you'd want to use a 64-bit int for it just in case.

  30. I've been reading The Strain Trilogy by Hohlraum · · Score: 1

    I kind of did a double take when I saw the title. The book/series starts out with a brand new Boeing 777 losing power on the run way. :)

    1. Re:I've been reading The Strain Trilogy by Anonymous Coward · · Score: 0

      Jeez and I thought the product placement and (likely paid) apologists for Boeing on this thread were bad. Let's go ahead and plug a completely unrelated book while we're at it. Christ almighty.

  31. Failsafe mode? by elgatozorbas · · Score: 1

    Sounds very safe.

    1. Re:Failsafe mode? by PPH · · Score: 1

      Don't worry. The 787 can always fail over to battery power ...... Umm, oh, oh.

      --
      Have gnu, will travel.
  32. and when it boots, by roc97007 · · Score: 1

    ...does it display the Windows 95 splash screen?

    --
    Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
    1. Re:and when it boots, by Anonymous Coward · · Score: 0

      No, it only displays that when the system stops over water.

  33. Damn kids! by Lawrence_Bird · · Score: 1

    How many times do I have to tell you to shut the plane the f off before you go to bed? What do you think, I'm made of money?!?

  34. That is what the FAA is proposing in their AD by Anonymous Coward · · Score: 1

    That is the interim solution proposed by the FAA!

  35. third party issues? by Anonymous Coward · · Score: 0

    they must have old equalogic firmware. http://www.vcrumbs.com/2015/02/12/dell-equallogic-ps6210x-controller-failures/

  36. Re:Maybe they should have used Rust. by drinkypoo · · Score: 1

    Maybe they should have used Rust.

    They can't use rust, because they build with a minimum of Ferrous materials. They have to wait for the fork, AlOx.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  37. failsafe by unjedai · · Score: 1

    failsafe: I don't think that means what you think it means.

  38. Please reinstall by www.sorehands.com · · Score: 1

    Please format the drive and res-install windows.

  39. Long uptime... by Guy+From+V · · Score: 1

    ...some witty remark about airplanes and downtime.

  40. Real situation? by JBMcB · · Score: 1

    Would this ever happen in normal operation? I would think that every few hundred hours of flight time the plane would be pulled out of service for maintenance where everything would be shut down for a couple of days.

    --
    My Other Computer Is A Data General Nova III.
    1. Re:Real situation? by Z00L00K · · Score: 1

      I agree here - the maintenance is probably interrupting the uptime of the system. Any airline that have an uptime of their aircraft for 248 days is likely to suffer other problems as well with their vessels, not only software glitches but also general wear issues.

      --
      If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
  41. Re:Maybe they should have used Rust. by ChrisMaple · · Score: 1

    Any language with polymorphism should do, although polycarbonate is better than polystyrene.

    --
    Contribute to civilization: ari.aynrand.org/donate
  42. They should run OpenVMS by thedavidcathey · · Score: 2

    OpenVMS systems have had many systems up for several years without rebooting. Their equivalent of the "ps" utility had to fixed one time because systems were exceeding 9999 days uptime.

  43. The 'Diversity-liner' strikes back by Anonymous Coward · · Score: 0

    Well, we WHITES did try to warn you about the 'diversity' bullshit that is destroying every white country on Earth. More and more failures of technology as THIRD WORLD parasites, who can't make their own shithole countries work, are flooding into WHITE countries, and being given jobs over WHITE people. One can only hope that the first plane to crash because of this bullshit is full of Left wing shits who think that 'diversity' is just wonderful.

  44. Re:Maybe they should have used Rust. by TheRealHocusLocus · · Score: 2

    This is a prime example of why we need to use the Rust programming language ... blazingly ... eliminates data races ... guaranteed memory ... threads ... greatest minds ... the great ... the superb ... the glorious ... the mightiest ... Git ... Hub ... ... properly ... where it's at ... what we need ... It's what [the world] need[s] now.

    Oh yeah? Sheeeit.
    Pump it up! (endorsed by M.I.A.).

    Ericsson Calling!
    Speak the Erlang now (Seattle boys say Wha? Penguin Girls say Wha-What [x2]

    Use Erlang Erlang Erlang, Ga la ga la ga la Land ga Lang ga Lang
    Con-currency get you down?
    Stack em flat, get down get down
    Too late you down D-down D-down D-down
    Ta na ta na ta na Ta na ta na ta

    Bench mark a-blaze Erlang a lang a lang lang
    Eager evaluation Erlang a lang a lang lang
    Single assignment Erlang a lang a lang lang
    Dynamic typing Erlang a lang a lang lang

    Who the hell is huntin' you?
    Distributed, fault-tolerant,
    In the BMW
    How the hell they find you?
    hot swapping,
    Feds gonna get you
    non-stop applications
    Pull the strings on the hood
    soft-real-time
    concurrency explicit
    message passing, Erlang a lang a lang lang
    Nah explicit locks Erlang a lang a lang lang
    open source Erlang a lang a lang lang.

    CHORUS:
    fib(1) -> 1; % If 1, then return 1, otherwise (note the semicolon ; meaning 'else')
    fib(2) -> 1; % If 2, then return 1, otherwise
    fib(N) -> fib(N - 2) + fib(N - 1).

    Needs some work though.
    An AIRPLANE would make a good sandbox. The price of failure is so high no one will make a mistake.

    --
    <blink>down the rabbit hole</blink>
  45. Can the clock be changed? by RubberDogBone · · Score: 1

    Where I work, we currently tell one of our PCs that it is February because a software license expired on March 1 and nobody will pay to renew it while we work on getting a replacement up to speed. Meanwhile the old expired version runs fine thinking its February.

    So what would happen if somebody told the plane today's date was 248 days forward of today? Or for fun, five minutes less than that. While it was in flight.

    I'm assuming there are safeguards to prevent this but what if nobody ever considered that there could be a need to prevent changing the plane's clock? What if this was left exposed? Somebody from Boeing please tell me this clock was well protected and there is no way a virus could get into the plane, look for parameters like "wheels up" "seat belt sign off" and execute a clock change. It would be a magnificent disaster where not even the data recorders would capture what happened, if all power is cut off and all systems drop dead.

    --
    Sig for hire.
  46. Failsafe? by PhunkySchtuff · · Score: 1

    ... If the plane is left turned on for 248 days, it will enter a failsafe mode...

    You keep using that word. I do not think it means what you think it means.
    http://en.wikipedia.org/wiki/F...

    1. Re:Failsafe? by PPH · · Score: 1

      failsafe mode

      Well, it is for a single generator. The power source is removed from the system so that not subsequent failures can damage the aircraft. Problem is: This applies to a single generator, not the entire aircraft. Aircraft power systems are designed so that an alternate source can take over for the failed one. But if they all go off line together, not so safe.

      --
      Have gnu, will travel.
  47. This was fixed in the 2.0 Linux kernel! by verifine · · Score: 1

    Do the math. If you have a 32-bit unsigned binary counter and you increment it 200 times a second, guess what - it will overflow in 248 days. Coincidence? I think not!

    I had one of those early Linux kernels running on a machine I mostly used as a server. I did run Netscape on it, displaying the web content on another Linux machine. Both machines ran on UPSes, being located in the third world (Los Angeles.) I was excited about seeing 500 days uptime, but one morning the uptime measured in hours. What? Netscape was still running, so I knew the machine hadn't rebooted. Linux then (and probably now) ran with 100 interrupts/second for task switching, NTP and other goodness. A good friend explained that I had what was probably a very rare item - the uptime counter had overflowed.

    I'm tellin' ya, it's a simple counter overflow. WTF uses 32-bit counters for uptime any more? Answer: Boeing.

  48. they allowed an air traffice comms system which by Anonymous Coward · · Score: 0

    they allowed an air traffic communications system based on Windows(replaced UNIX) which required rebooting every 30 days. A new tech came in, saw the not to reboot the two computers but since they were fine he didn't and a couple of weeks later LAX lost comms with all air traffice.

    So I would not be surprised if a reboot was allowed. I doubt Boeing would accept it but in a pinch it would probably get the FAA off their back and the planes stay in the sky.

    google "lax communications windows unix reboot" if you don't believe.

  49. 2^32 by Anonymous Coward · · Score: 0

    Win 98 had an issue crashing every 49.7 days,
    Dreamliner has the same integer overflow , it's using 100x larger multiplier it seems though, 2^32 10'ths of a second for Dreamliner. 2^32 milliseconds for windows.

  50. Re:Maybe they should have used Rust. by stooo · · Score: 1

    >> What mechanism does Rust use to prevent 32-bit counter overflows?
    With "Rusted", the computer falls appart before reaching the integer overflow, so the overflow cannot happen.

    --
    aaaaaaa
  51. If by cwsumner · · Score: 1

    "If Engineers built buildings the way Programmers write programs, the first woodpecker that came along would destroy civilization!"

    Seriously. Check for errors and do something reasonable about them. Calling the GPF vector is not reasonable! 8-(

  52. This reminds me my last workplace... by lagi · · Score: 1

    We had an issue with the Apache server crashing after X amount of time.
    The solution (by lead developer)? A cronjob that restarts the server every X hours.

  53. WTF? by wcrowe · · Score: 1

    What I'm hearing here is not a story about a potential software bug. I'm hearing about a serious design problem. An airplane should not be so reliant on software that it shut down if the software is not working.

    I was in Naval aviation. The 1960's - era A-7's I worked on for most of my career had redundant systems. There was even an air-stream-driven generator that could be deployed in the event of engine failure that would not only supply electrical power, but provide a minimum amount of hydraulic power to critical systems so that the plane had a chance of safely landing.

    I can't believe we're designing aircraft that can carry hundreds of people that lacks redundant systems and can literally fall out of the sky due to a simple software glitch. Have I read this wrong? Are they exaggerating the danger in this article?

    --
    Proverbs 21:19