Long Uptime Makes Boeing 787 Lose Electrical Power
jones_supa writes: A dangerous software glitch has been found in the Boeing 787 Dreamliner. If the plane is left turned on for 248 days, it will enter a failsafe mode that will lead to the plane losing all of its power, according to a new directive from the US Federal Aviation Administration. If the bug is triggered, all the Generator Control Units will shut off, leaving the plane without power, and the control of the plane will be lost. Boeing is working on a software upgrade that will address the problems, the FAA says. The company is said to have found the problem during laboratory testing of the plane, and thankfully there are no reports of it being triggered on the field.
Finally!
IT support advice that's useful!
"have you tried turning it off and then back on?"
This program was made possible by a grant from the Ultra-Humanite, and viewers like you.
A commercial plane will most probably undergo through several maintenance events and checks during that sort of time frame, where cycling the power is part of the procedure.
I guess this might be due to a 32-bit signed integer being incremented at 100 Hz: 2^31 / 24 / 3600 / 100 = 248.5 days.
karma police: arrest this man, he talks in maths; he buzzes like a fridge, he's like a detuned radio. [radiohead]
It could be the overflow of a counter of 10ms intervals. There are 86400 seconds per day, so 8640000 10ms intervals per day ...
2147483648 / 8640000 = 248.55
For all of the QA at Boing; they don't believe in software QA. Take a look at their job openings some time: In years of searching, I've seen only one software QA position, and it wasn't dealing with aircraft. Any such search results will return developers that are to write their own tests against the spec. Developers are not Testers.... and I'll ask: How many more such bugs are out there?
... that can be attributed to a lack of QA. How many people will die due to a bad management decision on the part of Boeing?
I know of two other software "bugs"
Disclosure: Yes, I'm a software QA / Test professional.
Also, use the difference of the current time minus the start time, instead of computing the end time and using a simple less than/greater than comparison. This properly handles wraparounds, and only has a problem with differences more than half of the full range. (so don't keep comparing the time after it's ended!)
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
The plane's control systems should have several levels of degraded-mode operation, so if one system stops working, the plane still hobbles along the best it can without the non-working system. Google's self-driving cars have something like 7 layers of nested failure modes, each with slightly degraded functions relative to the next higher level. It's almost impossible to trigger enough failures to completely shut the system down, which is a good thing if you're traveling at highway speeds. It's very concerning that a company like Boeing didn't catch this before product release, but even more concerning that they didn't design the system to be resilient against this sort of failure.
The company is said to have found the problem during laboratory testing of the plane, and thankfully there are no reports of it being triggered on the field.
The spokesman continued, "The battery would have caught fire long before that integer overflow."
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
Oh, so they can make it fine for 497.10 days by changing the type to unsigned!
The reason for the three shifts was that we were using actual PFC computers connected to hardware that could simulate all the inputs and read all the outputs.
That hardware was a big complicated rack of electronics and there were maybe 8 or 10 such units in a lab.
As such, to optimize use of the facilities it was necessary to have three shifts 24 hours per day. This went on for a year or more.
Very good planning in fact.
Now I could tell you stories of the real corners cut to meet the schedule. But that's a complicated story.
Which is apparently what Windows does:
https://www.ctm-it.com/it-supp...
You'd think they would have learned since Windows 95/98 did the same thing.
https://support.microsoft.com/...
But hey, at least it goes 10 times as long now.
As a sidenote, there exists a somewhat famous bug in Windows 95 and 98 (later patched) that caused these operating systems to stop functioning after 49.7 days of uptime.
... not when they would all have nearly the exact same runtime - they would all hit the failsafe at around the same time.
Not that this should ever happen in the air - as others have said, if the thing manages to run for this long, someone hasn't been doing maintenance.
For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
If you actually read the AD it will say "We are issuing this AD to prevent loss of all AC electrical power, which could result in loss of control of the airplane."
COULD lose control, not WILL. The 787 has at least 3 additional backup systems against this sort of failure, the APU, DC battery backup, and Ram Air Turbine.
This story is being way overblown. Yes, it's a bug. Yes, it should be fixed. However...
248 days of continuous operation is well past the scheduled major maintenance for the aircraft. By this point, a 787 would have to go through many minor maintenance cycles which would have required shutting down the electrical system. In addition, loss of all 4 generators would not result in a loss of vehicle because there are batteries, an APU (a backup generator) and Ram Air Turbines (RATs), generators that deploy from the wing if the APU won't start. To have to rely on any of these would not make for a good day for the pilots; but, they would certainly provide the necessary power to safely land the aircraft at the nearest airport. They might even be able to continue on and finish their flight if they successfully reset the generators.
This is not the OMG Planes Are Going to Fall From The Sky! event the media is making it out to be.
I am not completely familiar with the matter, but I remember hearing that using signed types in some situations can be a better choice, even when the value would normally be used to represent only a non-negative value. It could make overflows more obvious and calculating deltas might be easier? If someone actually knows about this stuff, feel free to chime in.
In C, overflowing a signed integer type is undefined behaviour; unsigned type wrap around to zero in a defined manner.
Of course, either is often undesired, but the latter at least doesn't allow basically anything to happen.
CLI paste? paste.pr0.tips!
It doesn't matter what country programmers come from, in my experience too many programmers have no clue about reality outside of their cube. They are building software for things they do not understand. I am going to rant about this in another thread so I will leave it at that for now.
putting the 'B' in LGBTQ+
Man, if only we could afford to use 64 bit values for things. I realize that transistors are simply too expensive right now; but perhaps, in the future, the miracles of science will make this possible...
Only theoretical, though. Windows 9x would crash long before reaching this uptime.
"It's such a fine line between stupid and clever" -- David St. Hubbins, Spinal Tap
OpenVMS systems have had many systems up for several years without rebooting. Their equivalent of the "ps" utility had to fixed one time because systems were exceeding 9999 days uptime.
And this is why C should never be used for mission critical software.
"It's such a fine line between stupid and clever" -- David St. Hubbins, Spinal Tap
"(psshsquawk)This is the Captain speaking, we are cruising at 30,000 feet, have a bit of a tail wind and will be in San Francisco a little ahead of schedule. ...Ummm... Ah.... I'm putting the seatbelt sign on now. Please return to your seats as we reboot the airplane.(pssshsquawk)"
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
All three of them.
Hey, 248 days is five dog-years.
This is a prime example of why we need to use the Rust programming language ... blazingly ... eliminates data races ... guaranteed memory ... threads ... greatest minds ... the great ... the superb ... the glorious ... the mightiest ... Git ... Hub ... ... properly ... where it's at ... what we need ... It's what [the world] need[s] now.
Oh yeah? Sheeeit.
Pump it up! (endorsed by M.I.A.).
Ericsson Calling!
Speak the Erlang now (Seattle boys say Wha? Penguin Girls say Wha-What [x2]
Use Erlang Erlang Erlang, Ga la ga la ga la Land ga Lang ga Lang
Con-currency get you down?
Stack em flat, get down get down
Too late you down D-down D-down D-down
Ta na ta na ta na Ta na ta na ta
Bench mark a-blaze Erlang a lang a lang lang
Eager evaluation Erlang a lang a lang lang
Single assignment Erlang a lang a lang lang
Dynamic typing Erlang a lang a lang lang
Who the hell is huntin' you?
Distributed, fault-tolerant,
In the BMW
How the hell they find you?
hot swapping,
Feds gonna get you
non-stop applications
Pull the strings on the hood
soft-real-time
concurrency explicit
message passing, Erlang a lang a lang lang
Nah explicit locks Erlang a lang a lang lang
open source Erlang a lang a lang lang.
CHORUS:
fib(1) -> 1; % If 1, then return 1, otherwise (note the semicolon ; meaning 'else')
fib(2) -> 1; % If 2, then return 1, otherwise
fib(N) -> fib(N - 2) + fib(N - 1).
Needs some work though.
An AIRPLANE would make a good sandbox. The price of failure is so high no one will make a mistake.
<blink>down the rabbit hole</blink>