Long Uptime Makes Boeing 787 Lose Electrical Power
jones_supa writes: A dangerous software glitch has been found in the Boeing 787 Dreamliner. If the plane is left turned on for 248 days, it will enter a failsafe mode that will lead to the plane losing all of its power, according to a new directive from the US Federal Aviation Administration. If the bug is triggered, all the Generator Control Units will shut off, leaving the plane without power, and the control of the plane will be lost. Boeing is working on a software upgrade that will address the problems, the FAA says. The company is said to have found the problem during laboratory testing of the plane, and thankfully there are no reports of it being triggered on the field.
'it must be running windows' jokes..
like...
'it must be running windows.... owait, windows doesn't stay up that long'
You should see what happens after -2147483648 days of upt-- oh wait.
CLI paste? paste.pr0.tips!
Finally!
IT support advice that's useful!
"have you tried turning it off and then back on?"
This program was made possible by a grant from the Ultra-Humanite, and viewers like you.
A commercial plane will most probably undergo through several maintenance events and checks during that sort of time frame, where cycling the power is part of the procedure.
Its troubling that as much focus on this new battery backup that has happened that we continue to see problem creep up on its reliability and safety. I think it clearly represents a lack of detailed focus on testing and that maybe someone at the FAA needs to say something is still not right here. Then you have people like Elon Musk touting Lithium Ion technology for home energy backup and you have to ask yourself with all the lithium battery recalls with notebook PC's if a storage systems far greater for a home solution or a aircraft is proper and safe? At this point in time I would not want to have the capacity of a lithium battery like the one for a home backup system in my house. At some point maybe they will be proven safe enough but as with the 787 I don't want to be the guinea pig that finds out.
Always, always, always do the math on counters and give yourself orders of magnitude of space. Figured this out the hard way once (fortunately not in a situation where safety was a concern).
"The wisdom of the Patriarchs was that they *knew* they were fools." --Master Foo
Since it runs on Windows O/S we don't have to worry about it reaching that long of an up-time except in perfect laboratory conditions.
Wow, Slashdot is early for once. You beat numerous national civil regulators. I learnt this before even our HAAMC did....
A signed integer overflow for timing - scary...
I guess this might be due to a 32-bit signed integer being incremented at 100 Hz: 2^31 / 24 / 3600 / 100 = 248.5 days.
karma police: arrest this man, he talks in maths; he buzzes like a fridge, he's like a detuned radio. [radiohead]
How is losing power in an airplane a safe mode?
For all of the QA at Boing; they don't believe in software QA. Take a look at their job openings some time: In years of searching, I've seen only one software QA position, and it wasn't dealing with aircraft. Any such search results will return developers that are to write their own tests against the spec. Developers are not Testers.... and I'll ask: How many more such bugs are out there?
... that can be attributed to a lack of QA. How many people will die due to a bad management decision on the part of Boeing?
I know of two other software "bugs"
Disclosure: Yes, I'm a software QA / Test professional.
248 days kind of sounds similar to this 497 days issue. In fact, it could be the same issue if they are using a signed 32bit integer.
https://www.ibm.com/developerw...
The Boeing Screamliner -- the proud product of innovative Project Management in a Globalized Economy
Except that the problem here was an integer overflow problem with a constantly incrementing time counter. All a "better" language could really do would be to either abort when the overflow happened, or have automatic support some kind of bignums, at a significant performance reduction throughout the entire program. From the sound of the description in TFA, the first one was basically exactly what was already happening.
Using a language that hasn't even reached a stable release in an environment where the tiniest mistake kills hundreds of people?
"Set a man a fire, he'll be warm for the rest of the night. Set a man afire, he'll be warm for the rest of his life."
Don't they ever switch the planes off? If all you have to do is reboot the system once every 200 days, then just reboot it.
SunOS had this bug. Solaris had this bug. Linux had this bug. NT had this bug. HP-UX had this bug. Oracle had this bug.
Some of them were 248 days. Some were double that. One way or another, this issue has hit countless platforms over the years.
Trolling...
The plane's control systems should have several levels of degraded-mode operation, so if one system stops working, the plane still hobbles along the best it can without the non-working system. Google's self-driving cars have something like 7 layers of nested failure modes, each with slightly degraded functions relative to the next higher level. It's almost impossible to trigger enough failures to completely shut the system down, which is a good thing if you're traveling at highway speeds. It's very concerning that a company like Boeing didn't catch this before product release, but even more concerning that they didn't design the system to be resilient against this sort of failure.
... "failsafe".
There were tens of us working three shifts 24 hours per day...
Round the clock? That's sounds like piss poor planning - as in QA was an after thought.
I have seen it happen all too often, an unrealistic development schedule is made to get the contract and as shit rolls down the schedule, QA takes the brunt of any deadlines problems. It's one thing if your developing software for insurance companies, it's another when it's aerospace.
The company is said to have found the problem during laboratory testing of the plane, and thankfully there are no reports of it being triggered on the field.
The spokesman continued, "The battery would have caught fire long before that integer overflow."
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
What mechanism does Rust use to prevent 32-bit counter overflows?
I don't care if it's 90,000 hectares. That lake was not my doing.
So first they say " left turned on for 248 days, it will enter a failsafe mode" then they say "all the Generator Control Units will shut off, leaving the plane without power, and the control of the plane will be lost."
That is NOT fail "SAFE". That is fail "EVERYBODY DEAD".
Sometimes the "writing on the wall" is blood spatter...
Nice one lol
What a profound demonstration of the Halting Problem.
This story is being way overblown. Yes, it's a bug. Yes, it should be fixed. However...
248 days of continuous operation is well past the scheduled major maintenance for the aircraft. By this point, a 787 would have to go through many minor maintenance cycles which would have required shutting down the electrical system. In addition, loss of all 4 generators would not result in a loss of vehicle because there are batteries, an APU (a backup generator) and Ram Air Turbines (RATs), generators that deploy from the wing if the APU won't start. To have to rely on any of these would not make for a good day for the pilots; but, they would certainly provide the necessary power to safely land the aircraft at the nearest airport. They might even be able to continue on and finish their flight if they successfully reset the generators.
This is not the OMG Planes Are Going to Fall From The Sky! event the media is making it out to be.
I kind of did a double take when I saw the title. The book/series starts out with a brand new Boeing 777 losing power on the run way. :)
Sounds very safe.
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
How many times do I have to tell you to shut the plane the f off before you go to bed? What do you think, I'm made of money?!?
That is the interim solution proposed by the FAA!
they must have old equalogic firmware. http://www.vcrumbs.com/2015/02/12/dell-equallogic-ps6210x-controller-failures/
Maybe they should have used Rust.
They can't use rust, because they build with a minimum of Ferrous materials. They have to wait for the fork, AlOx.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
failsafe: I don't think that means what you think it means.
Please format the drive and res-install windows.
Fight Spammers!
...some witty remark about airplanes and downtime.
Would this ever happen in normal operation? I would think that every few hundred hours of flight time the plane would be pulled out of service for maintenance where everything would be shut down for a couple of days.
My Other Computer Is A Data General Nova III.
Any language with polymorphism should do, although polycarbonate is better than polystyrene.
Contribute to civilization: ari.aynrand.org/donate
OpenVMS systems have had many systems up for several years without rebooting. Their equivalent of the "ps" utility had to fixed one time because systems were exceeding 9999 days uptime.
Well, we WHITES did try to warn you about the 'diversity' bullshit that is destroying every white country on Earth. More and more failures of technology as THIRD WORLD parasites, who can't make their own shithole countries work, are flooding into WHITE countries, and being given jobs over WHITE people. One can only hope that the first plane to crash because of this bullshit is full of Left wing shits who think that 'diversity' is just wonderful.
This is a prime example of why we need to use the Rust programming language ... blazingly ... eliminates data races ... guaranteed memory ... threads ... greatest minds ... the great ... the superb ... the glorious ... the mightiest ... Git ... Hub ... ... properly ... where it's at ... what we need ... It's what [the world] need[s] now.
Oh yeah? Sheeeit.
Pump it up! (endorsed by M.I.A.).
Ericsson Calling!
Speak the Erlang now (Seattle boys say Wha? Penguin Girls say Wha-What [x2]
Use Erlang Erlang Erlang, Ga la ga la ga la Land ga Lang ga Lang
Con-currency get you down?
Stack em flat, get down get down
Too late you down D-down D-down D-down
Ta na ta na ta na Ta na ta na ta
Bench mark a-blaze Erlang a lang a lang lang
Eager evaluation Erlang a lang a lang lang
Single assignment Erlang a lang a lang lang
Dynamic typing Erlang a lang a lang lang
Who the hell is huntin' you?
Distributed, fault-tolerant,
In the BMW
How the hell they find you?
hot swapping,
Feds gonna get you
non-stop applications
Pull the strings on the hood
soft-real-time
concurrency explicit
message passing, Erlang a lang a lang lang
Nah explicit locks Erlang a lang a lang lang
open source Erlang a lang a lang lang.
CHORUS:
fib(1) -> 1; % If 1, then return 1, otherwise (note the semicolon ; meaning 'else')
fib(2) -> 1; % If 2, then return 1, otherwise
fib(N) -> fib(N - 2) + fib(N - 1).
Needs some work though.
An AIRPLANE would make a good sandbox. The price of failure is so high no one will make a mistake.
<blink>down the rabbit hole</blink>
Where I work, we currently tell one of our PCs that it is February because a software license expired on March 1 and nobody will pay to renew it while we work on getting a replacement up to speed. Meanwhile the old expired version runs fine thinking its February.
So what would happen if somebody told the plane today's date was 248 days forward of today? Or for fun, five minutes less than that. While it was in flight.
I'm assuming there are safeguards to prevent this but what if nobody ever considered that there could be a need to prevent changing the plane's clock? What if this was left exposed? Somebody from Boeing please tell me this clock was well protected and there is no way a virus could get into the plane, look for parameters like "wheels up" "seat belt sign off" and execute a clock change. It would be a magnificent disaster where not even the data recorders would capture what happened, if all power is cut off and all systems drop dead.
Sig for hire.
... If the plane is left turned on for 248 days, it will enter a failsafe mode...
You keep using that word. I do not think it means what you think it means.
http://en.wikipedia.org/wiki/F...
Specialist Mac support for creative pros, Melbourne
Do the math. If you have a 32-bit unsigned binary counter and you increment it 200 times a second, guess what - it will overflow in 248 days. Coincidence? I think not!
I had one of those early Linux kernels running on a machine I mostly used as a server. I did run Netscape on it, displaying the web content on another Linux machine. Both machines ran on UPSes, being located in the third world (Los Angeles.) I was excited about seeing 500 days uptime, but one morning the uptime measured in hours. What? Netscape was still running, so I knew the machine hadn't rebooted. Linux then (and probably now) ran with 100 interrupts/second for task switching, NTP and other goodness. A good friend explained that I had what was probably a very rare item - the uptime counter had overflowed.
I'm tellin' ya, it's a simple counter overflow. WTF uses 32-bit counters for uptime any more? Answer: Boeing.
they allowed an air traffic communications system based on Windows(replaced UNIX) which required rebooting every 30 days. A new tech came in, saw the not to reboot the two computers but since they were fine he didn't and a couple of weeks later LAX lost comms with all air traffice.
So I would not be surprised if a reboot was allowed. I doubt Boeing would accept it but in a pinch it would probably get the FAA off their back and the planes stay in the sky.
google "lax communications windows unix reboot" if you don't believe.
Win 98 had an issue crashing every 49.7 days,
Dreamliner has the same integer overflow , it's using 100x larger multiplier it seems though, 2^32 10'ths of a second for Dreamliner. 2^32 milliseconds for windows.
>> What mechanism does Rust use to prevent 32-bit counter overflows?
With "Rusted", the computer falls appart before reaching the integer overflow, so the overflow cannot happen.
aaaaaaa
"If Engineers built buildings the way Programmers write programs, the first woodpecker that came along would destroy civilization!"
Seriously. Check for errors and do something reasonable about them. Calling the GPF vector is not reasonable! 8-(
We had an issue with the Apache server crashing after X amount of time.
The solution (by lead developer)? A cronjob that restarts the server every X hours.
What I'm hearing here is not a story about a potential software bug. I'm hearing about a serious design problem. An airplane should not be so reliant on software that it shut down if the software is not working.
I was in Naval aviation. The 1960's - era A-7's I worked on for most of my career had redundant systems. There was even an air-stream-driven generator that could be deployed in the event of engine failure that would not only supply electrical power, but provide a minimum amount of hydraulic power to critical systems so that the plane had a chance of safely landing.
I can't believe we're designing aircraft that can carry hundreds of people that lacks redundant systems and can literally fall out of the sky due to a simple software glitch. Have I read this wrong? Are they exaggerating the danger in this article?
Proverbs 21:19