Long Uptime Makes Boeing 787 Lose Electrical Power
jones_supa writes: A dangerous software glitch has been found in the Boeing 787 Dreamliner. If the plane is left turned on for 248 days, it will enter a failsafe mode that will lead to the plane losing all of its power, according to a new directive from the US Federal Aviation Administration. If the bug is triggered, all the Generator Control Units will shut off, leaving the plane without power, and the control of the plane will be lost. Boeing is working on a software upgrade that will address the problems, the FAA says. The company is said to have found the problem during laboratory testing of the plane, and thankfully there are no reports of it being triggered on the field.
You should see what happens after -2147483648 days of upt-- oh wait.
CLI paste? paste.pr0.tips!
Finally!
IT support advice that's useful!
"have you tried turning it off and then back on?"
This program was made possible by a grant from the Ultra-Humanite, and viewers like you.
A commercial plane will most probably undergo through several maintenance events and checks during that sort of time frame, where cycling the power is part of the procedure.
Always, always, always do the math on counters and give yourself orders of magnitude of space. Figured this out the hard way once (fortunately not in a situation where safety was a concern).
"The wisdom of the Patriarchs was that they *knew* they were fools." --Master Foo
A signed integer overflow for timing - scary...
I guess this might be due to a 32-bit signed integer being incremented at 100 Hz: 2^31 / 24 / 3600 / 100 = 248.5 days.
karma police: arrest this man, he talks in maths; he buzzes like a fridge, he's like a detuned radio. [radiohead]
How is losing power in an airplane a safe mode?
For all of the QA at Boing; they don't believe in software QA. Take a look at their job openings some time: In years of searching, I've seen only one software QA position, and it wasn't dealing with aircraft. Any such search results will return developers that are to write their own tests against the spec. Developers are not Testers.... and I'll ask: How many more such bugs are out there?
... that can be attributed to a lack of QA. How many people will die due to a bad management decision on the part of Boeing?
I know of two other software "bugs"
Disclosure: Yes, I'm a software QA / Test professional.
248 days kind of sounds similar to this 497 days issue. In fact, it could be the same issue if they are using a signed 32bit integer.
https://www.ibm.com/developerw...
Using a language that hasn't even reached a stable release in an environment where the tiniest mistake kills hundreds of people?
"Set a man a fire, he'll be warm for the rest of the night. Set a man afire, he'll be warm for the rest of his life."
Don't they ever switch the planes off? If all you have to do is reboot the system once every 200 days, then just reboot it.
The plane's control systems should have several levels of degraded-mode operation, so if one system stops working, the plane still hobbles along the best it can without the non-working system. Google's self-driving cars have something like 7 layers of nested failure modes, each with slightly degraded functions relative to the next higher level. It's almost impossible to trigger enough failures to completely shut the system down, which is a good thing if you're traveling at highway speeds. It's very concerning that a company like Boeing didn't catch this before product release, but even more concerning that they didn't design the system to be resilient against this sort of failure.
... "failsafe".
The company is said to have found the problem during laboratory testing of the plane, and thankfully there are no reports of it being triggered on the field.
The spokesman continued, "The battery would have caught fire long before that integer overflow."
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
What mechanism does Rust use to prevent 32-bit counter overflows?
I don't care if it's 90,000 hectares. That lake was not my doing.
So first they say " left turned on for 248 days, it will enter a failsafe mode" then they say "all the Generator Control Units will shut off, leaving the plane without power, and the control of the plane will be lost."
That is NOT fail "SAFE". That is fail "EVERYBODY DEAD".
Sometimes the "writing on the wall" is blood spatter...
The reason for the three shifts was that we were using actual PFC computers connected to hardware that could simulate all the inputs and read all the outputs.
That hardware was a big complicated rack of electronics and there were maybe 8 or 10 such units in a lab.
As such, to optimize use of the facilities it was necessary to have three shifts 24 hours per day. This went on for a year or more.
Very good planning in fact.
Now I could tell you stories of the real corners cut to meet the schedule. But that's a complicated story.
As a sidenote, there exists a somewhat famous bug in Windows 95 and 98 (later patched) that caused these operating systems to stop functioning after 49.7 days of uptime.
What a profound demonstration of the Halting Problem.
This story is being way overblown. Yes, it's a bug. Yes, it should be fixed. However...
248 days of continuous operation is well past the scheduled major maintenance for the aircraft. By this point, a 787 would have to go through many minor maintenance cycles which would have required shutting down the electrical system. In addition, loss of all 4 generators would not result in a loss of vehicle because there are batteries, an APU (a backup generator) and Ram Air Turbines (RATs), generators that deploy from the wing if the APU won't start. To have to rely on any of these would not make for a good day for the pilots; but, they would certainly provide the necessary power to safely land the aircraft at the nearest airport. They might even be able to continue on and finish their flight if they successfully reset the generators.
This is not the OMG Planes Are Going to Fall From The Sky! event the media is making it out to be.
I kind of did a double take when I saw the title. The book/series starts out with a brand new Boeing 777 losing power on the run way. :)
I have seen it happen all too often, an unrealistic development schedule is made to get the contract and as shit rolls down the schedule, QA takes the brunt of any deadlines problems. It's one thing if your developing software for insurance companies, it's another when it's aerospace.
Until recently I was the software QA Director for a software company. I completely agree with your assessment. My company used to pad in about a week for QA. I kept telling them, that it might take us a week to QA, but if we find any issues, then it will have to go back to development, and I can't speak for how long that would take. They really didn't like how i couldn't give them a solid date, but how could I speak for how long it would take another department to fix something?
At any rate, all of that was moot as development literally never got the project to me until the actual date it was due to the customer, and it would be broken and not meet the specs. I did what I could to try to get development on track so we could deliver a quality product to our customer, but in the end, my company grew tired of my efforts and fired the whole QA team, so now the product just goes straight to the customer, bugs and all. And late.
If you are not allowed to question your government then the government has answered your question.
Sounds very safe.
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
How many times do I have to tell you to shut the plane the f off before you go to bed? What do you think, I'm made of money?!?
That is the interim solution proposed by the FAA!
Maybe they should have used Rust.
They can't use rust, because they build with a minimum of Ferrous materials. They have to wait for the fork, AlOx.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Yeah, I don't have my 'back of the envelop' calculations in front of me but I think I worked that out to be a 'timedGetTime' rollover bug. I wonder if the same thing is happening in this case, i.e. a timer rollover bug.
putting the 'B' in LGBTQ+
failsafe: I don't think that means what you think it means.
The Boeing Screamliner -- the proud product of innovative Project Management in a Globalized Economy
I thought the name was the Dreamliner? Yup. It is, according to the Boeing website. No mention of it being called the screamliner. But you are right about it living up to it's name. No accidents or injuries have occurred on a 787 in all of it's years of service.
If you are not allowed to question your government then the government has answered your question.
Please format the drive and res-install windows.
Fight Spammers!
...some witty remark about airplanes and downtime.
Would this ever happen in normal operation? I would think that every few hundred hours of flight time the plane would be pulled out of service for maintenance where everything would be shut down for a couple of days.
My Other Computer Is A Data General Nova III.
Cue the "If they'd chosen Windows, it would be impossible for this bug to occur" jokes...
Those have mostly been unfair since the NT-derived era; but, in the spirit of the joke, there was a bug in win95 and 98 that would cause the system to crash after 49.7 days of uptime. It remained undiscovered for years.
I certainly have no useful information to add to the speculation about cause; but that is what would worry me. Having to reboot a system every 284 days or less is a nuisance; but not a terribly big one(especially since the system is connected to a giant mass of moving parts governed by comparatively strict regulations concerning maintenance, so it probably gets taken to the shop fairly frequently anyway). However, if there is some value incrementing its way up that eventually causes the system to crash; I'd want to be very sure that there is absolutely nothing else that might modify that value in a way that causes it to grow faster than expected.
Only theoretical, though. Windows 9x would crash long before reaching this uptime.
"It's such a fine line between stupid and clever" -- David St. Hubbins, Spinal Tap
Any language with polymorphism should do, although polycarbonate is better than polystyrene.
Contribute to civilization: ari.aynrand.org/donate
OpenVMS systems have had many systems up for several years without rebooting. Their equivalent of the "ps" utility had to fixed one time because systems were exceeding 9999 days uptime.
Actually, MSWindNT wasn't that stable, but I've heard that the recent releases actually are pretty stable. I'll never be able to test though, since I won't agree to the EULA.
Also, I never experienced any real problems even with an unmodified MSWind95. The problems started when you installed additional software or hardware. (Yes, the 49.7 days bug existed, but it doesn't exist in the final version of MSWind95. I've got a machine that's running that, and has been up for years. It doesn't get much use, but there are a couple of abandoned programs that I can't export data from.)
I think we've pushed this "anyone can grow up to be president" thing too far.
"(psshsquawk)This is the Captain speaking, we are cruising at 30,000 feet, have a bit of a tail wind and will be in San Francisco a little ahead of schedule. ...Ummm... Ah.... I'm putting the seatbelt sign on now. Please return to your seats as we reboot the airplane.(pssshsquawk)"
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
Only theoretical, though. Windows 9x would crash long before reaching this uptime.
Well, in fairness, only if you tried to do something with it.
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
You're right. In a similar vein, I ride a Harley, and conversations always lead back to "it's not leaking oil, it's marking its territory!" Har. Har. Yes Harleys used to leak oil. They were famous for it at one time. But they don't now, anymore than any motor vehicle does.
Similarly, in all the years I've been using Windows 7, I've yet to have a hang or bluescreen, and I don't reboot my machine unless absolutely necessary. But people still make jokes about the Windows 49.7 day issue. Just goes to show, it takes a LONG time to live down a tremendous goof.
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
All three of them.
Contribute to civilization: ari.aynrand.org/donate
I remember supporting an office with win95 and Access. I had tech support conversations that almost went like this:
Him: My computer just crashed.
Me: So what did you do then?
Him: I rebooted it.
Me: Well there's your problem. Reboot the computer again. Then tap the computer gently and pray to the god of your choice and reboot a third time...
Him: ...Thanks. That worked.
This and no other is the root from which a tyrant springs; when first he appears as a protector - Plato (423 to 327 BC)
All three of them.
Hey, 248 days is five dog-years.
This is where I then call Bullshit on anyone actually getting Windows 95 or 98 to run for 49 days. My average uptime before bluescreen was around 2 days...
This is a prime example of why we need to use the Rust programming language ... blazingly ... eliminates data races ... guaranteed memory ... threads ... greatest minds ... the great ... the superb ... the glorious ... the mightiest ... Git ... Hub ... ... properly ... where it's at ... what we need ... It's what [the world] need[s] now.
Oh yeah? Sheeeit.
Pump it up! (endorsed by M.I.A.).
Ericsson Calling!
Speak the Erlang now (Seattle boys say Wha? Penguin Girls say Wha-What [x2]
Use Erlang Erlang Erlang, Ga la ga la ga la Land ga Lang ga Lang
Con-currency get you down?
Stack em flat, get down get down
Too late you down D-down D-down D-down
Ta na ta na ta na Ta na ta na ta
Bench mark a-blaze Erlang a lang a lang lang
Eager evaluation Erlang a lang a lang lang
Single assignment Erlang a lang a lang lang
Dynamic typing Erlang a lang a lang lang
Who the hell is huntin' you?
Distributed, fault-tolerant,
In the BMW
How the hell they find you?
hot swapping,
Feds gonna get you
non-stop applications
Pull the strings on the hood
soft-real-time
concurrency explicit
message passing, Erlang a lang a lang lang
Nah explicit locks Erlang a lang a lang lang
open source Erlang a lang a lang lang.
CHORUS:
fib(1) -> 1; % If 1, then return 1, otherwise (note the semicolon ; meaning 'else')
fib(2) -> 1; % If 2, then return 1, otherwise
fib(N) -> fib(N - 2) + fib(N - 1).
Needs some work though.
An AIRPLANE would make a good sandbox. The price of failure is so high no one will make a mistake.
<blink>down the rabbit hole</blink>
Where I work, we currently tell one of our PCs that it is February because a software license expired on March 1 and nobody will pay to renew it while we work on getting a replacement up to speed. Meanwhile the old expired version runs fine thinking its February.
So what would happen if somebody told the plane today's date was 248 days forward of today? Or for fun, five minutes less than that. While it was in flight.
I'm assuming there are safeguards to prevent this but what if nobody ever considered that there could be a need to prevent changing the plane's clock? What if this was left exposed? Somebody from Boeing please tell me this clock was well protected and there is no way a virus could get into the plane, look for parameters like "wheels up" "seat belt sign off" and execute a clock change. It would be a magnificent disaster where not even the data recorders would capture what happened, if all power is cut off and all systems drop dead.
Sig for hire.
... If the plane is left turned on for 248 days, it will enter a failsafe mode...
You keep using that word. I do not think it means what you think it means.
http://en.wikipedia.org/wiki/F...
Specialist Mac support for creative pros, Melbourne
It's still an issue is some modern OS's
All the TCP/IP ports that are in a TIME_WAIT status are not closed after 497 days from system startup in Windows Vista, in Windows 7, in Windows Server 2008 and in Windows Server 2008 R2
https://support.microsoft.com/en-us/kb/2553549
I had a win 95 and later a win 98 system, both where very stable, mainly used for development and only for occasional gaming, like Decent or Settlers or War Craft (not, WoW, the RTS).
The win 98 one only crashed when it was time to replace the processor fan.
I was impressed at that time about MS ...
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Do the math. If you have a 32-bit unsigned binary counter and you increment it 200 times a second, guess what - it will overflow in 248 days. Coincidence? I think not!
I had one of those early Linux kernels running on a machine I mostly used as a server. I did run Netscape on it, displaying the web content on another Linux machine. Both machines ran on UPSes, being located in the third world (Los Angeles.) I was excited about seeing 500 days uptime, but one morning the uptime measured in hours. What? Netscape was still running, so I knew the machine hadn't rebooted. Linux then (and probably now) ran with 100 interrupts/second for task switching, NTP and other goodness. A good friend explained that I had what was probably a very rare item - the uptime counter had overflowed.
I'm tellin' ya, it's a simple counter overflow. WTF uses 32-bit counters for uptime any more? Answer: Boeing.
>> What mechanism does Rust use to prevent 32-bit counter overflows?
With "Rusted", the computer falls appart before reaching the integer overflow, so the overflow cannot happen.
aaaaaaa
Only theoretical, though. Windows 9x would crash long before reaching this uptime.
Well, in fairness, only if you tried to do something with it.
Like.. Fly a plane?
The day Microsoft creates a product that doesn't suck, it will be known as the Microsoft Vaccuum Cleaner!
I gotten a Windows 95 machine up to the 49.7 day limit. The key was the machine was hooked up to some special scanner that didn't get used that much, so the computer spent 99% of its time idling at the desktop. Once I realized it was getting close I figured out when exactly it was going to hit the limit so I could witness what would happen. Which turned out to be nothing, until I clicked the mouse and it BSOD'd.
I've also managed to get a Vista machine up to the 497 day limit. In that case the computer was still running okay other than the networking being hosed.
"Release the landing gear Hal."
"I can't do that Steven."
"Lack of speed can be overcome. In the worst case by patience." --Znork
"If Engineers built buildings the way Programmers write programs, the first woodpecker that came along would destroy civilization!"
Seriously. Check for errors and do something reasonable about them. Calling the GPF vector is not reasonable! 8-(
We had an issue with the Apache server crashing after X amount of time.
The solution (by lead developer)? A cronjob that restarts the server every X hours.
What I'm hearing here is not a story about a potential software bug. I'm hearing about a serious design problem. An airplane should not be so reliant on software that it shut down if the software is not working.
I was in Naval aviation. The 1960's - era A-7's I worked on for most of my career had redundant systems. There was even an air-stream-driven generator that could be deployed in the event of engine failure that would not only supply electrical power, but provide a minimum amount of hydraulic power to critical systems so that the plane had a chance of safely landing.
I can't believe we're designing aircraft that can carry hundreds of people that lacks redundant systems and can literally fall out of the sky due to a simple software glitch. Have I read this wrong? Are they exaggerating the danger in this article?
Proverbs 21:19
The real question is how long the reboot time is relative to the glide duration from 30,000 feet?
www.sjbaker.org