Database Glitch Grounds American/US Airways
An anonymous reader writes "According to numerous news sources, all American Airlines and US Airways flights were grounded for two or three hours this morning. Both problems were caused by a computer glitch in the systems hosted by EDS. Quote: The operating system that drives the airline's flight plans went down."
How in the world can they state that as singular. Surely they have a backup of some sort. Especially with all the supposed "increased security" around air flight, you are telling me that one system crash can knock out half of the major airlines? That's ridiculous. Have they not learned about redundancy?
I'm guessing the last thing you want to hear on a plane now is the pilot saying, "What do you mean, fatal exception error?"
>_ Why don't they swtich to Linux?
Sorry, have to rant where I see EDS mentioned.
EDS, in cahoots with the UK govenment, have wasted millions of pounds of taxpayers money on failed IT projects. Notable ones include the Inland Revenue (UK IRS), Child Support Agency (£50M over budget and still not working) and an email and directory service for the NHS (withdrew at last minute allowing C&W to steal at a much inflated price).
Though the blame cannot completely be laid at the door of EDS, the government has been guilty of sloppy auditing and the worst being the willingness to hand over extra money when EDS has come around with the begging bowl.
For all intensive porpoises your a bunch of rediculous loosers
NEVER open Windows in an airplane!
The following entities were NOT mentioned in the article you're linking to:
(1) American Airlines,
(2) US Airways,
(3) EDS.
So, what the hell are you talking about?
Why did you link to this article?
(I know, I know, because nobody will read it anyway)
This is undoubtedly a problem with Sabre, which EDS runs on behalf of Sabre Holdings. Both American Airlines and US Airways use Sabre for much of their operations.
Sabre started it's life as an American Airlines internal system (SABER, slight spelling difference), running on a rare operating system (PARS, later called ACP and currently TPF) on IBM mainframes. In the last few years Sabre completed a lengthy migration to HP Unix on Non-Stop (i.e. ex-Tandem) hardware. The mainframe systems were rock solid, but software talent was hard to come by, so they decided the time had come to switch.
Sorry, no Microsoft to blame here!
At about 4:30 a.m., the outsourced SysAdmin was setting up to do routine patches to Windows 2003 server nodes. But just before, he decided to check his e-mail with Outlook and he opened an important message from his system administrator advising him that his e-mail would be de-activated if he didn't open the important attachment. I think we all know what happened after that...
I might know what I'm talkin' about, but then again, this is Slashdot...
The systems that run the aircraft and the navigational and communication systems really are redundant. It's the law. It also means that usually there are two different ways to do something not just the same thing repeated twice.
... We used to joke that the controllers would climb to the top of the tower and wave fire extinguishers to warn the planes away. (I think it was a joke.)
Example 1 - The pilot and co-pilot can't eat the same meal. That way, only one of them can get food poisoning.
Example 2 - The hydraulic system fails and the wheels won't go down. There's a hand crank.
Example 3 - The communication systems at every tower I have worked at have two separate backbones. There are two of absolutely everything. If that fails, there are emergency radios under the desk. If the emergency radios don't work
Example 4 - You can't fly very far over open water in a single engine aircraft.
It used to be frustrating working on systems older than I was but we never had to worry about surprises.
Of course all of this redundancy is very expensive. You spend the money where people's lives are at stake. On the other hand, if the worst problem is that some planes will be late, perhaps you don't spend the big bucks.
There is a line of code that raised the problem but is commented in Punjabi, I think it says "fuck this $3/hour job".
Second, this failure isn't in the Sabre reservations system, it's in some ancillary product, so who knows? Maybe they have no intention of switching it to Unix.
Third, he didn't say so, but the migration isn't just to Unix. It's also migration to MySQL! (Hahahahahahahaha. Then again, coming from TPF, coded in assembly language for 4Kword pages, and a hierarchical database, that might seem pretty advanced.) Sabre had to fund a MySQL port to 64 bits, and a new "stored procedures" feature.
When they said "operating system", they meant "operations system" - not the OS.
See this quote from one of the articles:
Wagner said a database malfunctioned that "basically runs every aspect of our client operations -- aircraft dispatch, crew scheduling (and) reporting weight, passenger load, balance."
This system is hosted by EDS, who only said it was a "systems issue".
So there's no evidence it was an OS problem. It could have been anything - OS, Oracle/DB2/SQL Server database, application code, upgrade, whatever.
Nothing to conclude here except that somebody screwed up - and even that isn't certain - could have been a bad memory board someplace, who knows.
Not having a backup is even irrelevant, since the "backup" might have taken three hours to bring up, when you're dealing with a production system like this. "Failover" is what you want, and they should have had, but if something got screwed there, it could still have been three hours.
Shouldn't have happened, but crap like this happens all the time because nobody can do their damn jobs.
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!