In-Flight Reboot?
steelem writes "The Washington Post is running a story about how the F-22 Raptor's software requires in-flight reboots. Apparently the 2 million line software project is 93% done. Knowing most projects I've been on, it'll stay that way for another few years."
I've said it a hundred times and I will say it again. Software is getting way to complex for human management in developing bug-free code.
Life is not for the lazy.
Even 36 seconds per reboot is too much, and would be totally unacceptable if it were say, a navigation computer on a 737 with a hundred civilians on-board.
What makes you think that it takes 36 seconds to reboot their systems? That's an average time spent per flight -- we don't know how many times the systems are crashing per flight.
Also note that this covers all their computer systems, not just the actual flight control. Some systems are obviously more important than others; it probably doesn't matter if the target identification system fails for a few seconds.
Tarsnap: Online backups for the truly paranoid
The article doesn't say that it takes 36 seconds to reboot the computers. It says 36 seconds per flight are spent rebooting the avionics. It doesn't say how long the reboots take. The total reboot time per flight could have been reduced by quicker reboots or less reboots or both.
By reboot, I'm thinking they mean from "press button" until "I can use again."
That means running the program and getting all necessary information from the hardware so that pilots can make decisions from it.
The BIOS is insignificant in this case.
"Some systems are obviously more important than others; it probably doesn't matter if the target identification system fails for a few seconds." Unless you're on the wrong end of the target id system. We have enough 'friendly fire'(although who cares how 'friendly' it is when you're dead?) problems already. I don't care what OS it's using, it needs to be fixed.
Please consider having Slashdot do a quick search, esp in the last 2-3 weeks. Even if this is done at the submittor level, then they could avoid this. I have no doubt that most submittors would prefer to avoid this. /., but more indicative of the problem that stories keep getting retold on the same news. Sad really.
Likewise, when viewing for submission, check the same search, so that you can see what the use saw
BTW, this is not really a problem with just
I prefer the "u" in honour as it seems to be missing these days.
What's funny is I always thought the guys writing this sort of software were uber-coders, and never had this sort of problem. Throw those few extra hundred million dollars at the coding effort, and I just thought this sort of problem went away. It's worrying though - isn't code which ever needed to be rebooted fundamentally flawed? Can you ever really fix that sort of code, or are we just waiting for the day whenever another edge test case comes along mid-flight, and an F-22 falls out of the sky? Even one of this sort of error seems like impending doom to me.
Good enough isn't. Stable code can be written. It merely takes talented engineers, design time to conceptualize and architech the product up front before coding it and giving QA what they need to test and committment to FIXING the issues that QA identifies.
I'm curious -- do you do development? Have you ever worked on a 2 million line program? No offense, but anyone who uses the word "merely" in a paragraph like that strikes me as someone with a tenuous grip on reality.
I am a senior engineer at a very big company. Applications I have written are in use by literally millions of people. And I'm scared stiff by the idea of writing the kind of software that powers the F-22. Software of this scale is the single most complicated project humanity has ever undertaken, and to belittle the efforts of the engineers involved by suggesting that they don't know what they're doing or aren't following responsible development guidelines shows a serious lack of understanding. I promise you, the software on the F-22 has been subjected to more rigorous QA than anything you or I have ever touched, but that still doesn't make it easy.
Humans aren't perfect, and as long as that continues to be the case, writing a multi-million line chunk of software will always be a ridiculously expensive and difficult proposition with no guarantee of success.
ZFS: because love is never having to say fsck
Second, I have seen this coming for about 10 years now. In the 70s and 80s I worked with digital control systems. Not avionics, but similar. In those days the systems were expected to work right, every time, for years at a time. 2 years between system restarts was considered "acceptable". If a system did fail, the manufacturer was expected to get its collective butt out to the site, figure out why, and issue a (solid!) fix pronto.
In the last 5 years, I have repeatedly been on brand-new airplanes at the gate when the pilot comes on and says "we are having a little problem with the system - don't be alarmed if the lights go off" followed by what is clearly a "reboot" of the airplane! When the fsk did it become acceptable to fix problems in avionics by rebooting the airplane?
And if the system designers really think the Microsoft Rebooting Disease is an acceptable way to handle system faults, how long before one of those faults occurs in the air?
I guess I am just old and crusty, expecting life-critical systems to work to spec 100.0% of the time.
sPh
IMNSHO, it's basically common knowledge that these things CAN NOT be flown without computers regulating all the doohickeys. We're not talking about Cessnas (sorry if I spelled that wrong), we're talking about extremely complex jets flying at high speeds.
Granted, some things (ejector seats, cupholders, maybe even bomb-dropping aparatus) don't need computer control, but all those wing flaps and engines, etc. do, at least in a vehicle this complex.
Ron Paul 2012
That's a training issue. Pilots need to learn that "cannot identify target" means *wait*, not *shoot now*.
Tarsnap: Online backups for the truly paranoid
But has the pilot of that unidentified target, who might be foe, learned that he's not supposed to shoot the guy 'cause his system is rebooting?
There ain't no rules here; we're trying to accomplish something.
I'm curious -- do you do development? Have you ever worked on a 2 million line program? No offense, but anyone who uses the word "merely" in a paragraph like that strikes me as someone with a tenuous grip on reality.
I think where people get thrown is that they see houses and cars and bridges and think, "If we can build those, why can't we build software? Programmers must be lazy"
Well, is every 2x4 in a house the exact same length? Are all the boards perfectly flush? A crooked door in a house will usually cause no problems, but the equivalent in a piece of software can cause a crash. Even computer hardware is never perfect. Does every 2.0 GHz processor run at EXACTLY 2.0 GHz? Not even close, but they are good enough. The problem with software is that it needs to be perfect to be perfect, and people aren't perfect.
The beauty of the F-22 system is that the developers realize this, and they designed the system knowing there would be flaws and that the software would crash. When some of the software crashes, the jet keeps right on going, which is the sign of ultimate stability.
I've just re-re-read the article, and I can't find any mention that the software on board was Windows based.
Yes, you're all very droll, but the Microsoft bashing seems a little knee-jerk. It's insanely complicated to write software like this (as a few other posters have said, and I'm posting only because I have no mod points for them).
I doubt these errors are OS-based at all. Real-time systems like this are built on top of extremely well-tested embedded OSes. They reboot because they're writing pretty close to the bare metal, and mistakes are punished hard. Best practices are applied (interminable code reviews, fascist levels of regression testing, ungodly coding style standards), but not always followed, and even best practices don't always work.
I'd like to see a gradual shift to languages which enforce best practices (i.e. not C and assembly). Meantime, these pilots are pretty damn brave. But it's probably not Microsoft's fault, this time.
Go build me a pyramid. Without any modern machines. In the middle of the desert.
With ten thousand workers to help, a government that doesn't give a crap about death tolls or reasonable working conditions, and enough funding to bankrupt an empire, I'm sure I could manage.
The pyramids were gigantic, backbreaking undertakings, but I maintain my stance that software is the most complicated endeavor undertaken by mankind.
ZFS: because love is never having to say fsck
The vast majority of downed pilots, 80+% ?, never saw the attack coming. They were taken by surprise. The most successful aces avoided dogfights, they would try to surprise someone, if not they would disengage and look for someone else. Your account sounds like some romanticised story or an aberration that occurred in the earliest days of the war. WW1 pilots looked at battle the same way pilots do today. Give the other guy a chance and you may die, your wife a widow, your children fatherless.
Rather than the monolithic system which we all secretly love (which allegedly produces Blue Screens of Death when things go squiffy, although my own XP Home system has been thundering on with nary a problem for quite a while now), you build systems which can tolerate components restarting themselves. I don't care if you're RMS writing the purest code with GNU/Ada for the EFF Air Force, you're not going to write something that will never fail. Better to design and build an overall system which can tolerate minor interruptions, especially if you are going to be flying into a war zone.
In any case (I worked on some of the stuff on the fringes of the F22 program a long long time ago), there are a bunch of computers in the air vehicle; it's an airborne network. Saying "oh my god, I can't believe the plane is rebooting" is dissingenuous.(aside from the many Windows jokes). It's akin to "I had to power-cycle the printer twice today -- I can't believe the network stayed up for the 35 seconds it took the Lexmark to come back to life!".
Rebooting a subsystem computer works quite well in robotics too, which further leads into the concept of many small robots rather than one large beast screaming "Danger Will Robinson".
Cthulhu Barata Nikto
The article stated that the reboots were for subsystems, not the fly-by-wire systems or the navigational system. The main problems have been in the sensor-weapon integration. This is one reason why the plane is not yet in full-scale production.
Cole's Axiom: The sum of the intelligence on the planet is a constant. The population is growing.