Database Glitch Grounds American/US Airways
An anonymous reader writes "According to numerous news sources, all American Airlines and US Airways flights were grounded for two or three hours this morning. Both problems were caused by a computer glitch in the systems hosted by EDS. Quote: The operating system that drives the airline's flight plans went down."
Well isn't that some great news, that makes me feel 20x better about taking my gf to the airport this morning. Fortunately she wasn't flying U.S. Airways or American Airlines.
She is absolutely frightened of flying, and somewhat of a computer nerd, I can't wait to talk to her, and tell her the scary news.
YOU'RE WINNER !
Another lame blog
How in the world can they state that as singular. Surely they have a backup of some sort. Especially with all the supposed "increased security" around air flight, you are telling me that one system crash can knock out half of the major airlines? That's ridiculous. Have they not learned about redundancy?
as to what Operating System they were running?
Forecast for tomorrow: A few sprinklings of genius with a chance of DOOM!
EDS is by no means a Windows shop. They work extensively with "big iron" mainframes. In fact, they recently got the contract to handle the database of terrorist information that'll be used at airports. Likely this will be hosted on a 390 or something... Windows can't handle that kind of I/O.
Airport BSOD
I'm guessing the last thing you want to hear on a plane now is the pilot saying, "What do you mean, fatal exception error?"
>_ Why don't they swtich to Linux?
... am I glad I'm flying Delta next Saturday :-) :-) :-)
Don't be so sure...
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
What reason would they have for not giving even the smallest of hints as to the nature of this glitch?
The PATRIOT Act?
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
Blue screen of life. Because US Air cancelled the flight and we were forced to fly on a competent airline.
Yep, they just don't make computers as they used to. I've seen computers ran over by cars still working (only needed a couple of replace plastic case supports replacing), laptops dropped down 12 stories ...
My experience with EDS is that problem is most likely to have actually been operator error. These people, and CSC are the absolute bottom of the barrel as far as outsourced data centres go. Yes IBM GS costs more, but there's a good reason for that! I'd sooner use Accenture than EDS, and that's saying something.
Sorry, have to rant where I see EDS mentioned.
EDS, in cahoots with the UK govenment, have wasted millions of pounds of taxpayers money on failed IT projects. Notable ones include the Inland Revenue (UK IRS), Child Support Agency (£50M over budget and still not working) and an email and directory service for the NHS (withdrew at last minute allowing C&W to steal at a much inflated price).
Though the blame cannot completely be laid at the door of EDS, the government has been guilty of sloppy auditing and the worst being the willingness to hand over extra money when EDS has come around with the begging bowl.
For all intensive porpoises your a bunch of rediculous loosers
You know he's going to convince them not to switch to linux. First he's going to get on a plane...oh wait.
Only one of the articles mention said anything about an "operating system." The rest called it a system problem. That does not necessarily mean an OS, or anything related to it. I think katu's reporter jumped to a conclusion.
Mod point free since 2001
Isn't it stated somewhere that a cetain OS, which is forever fair game in this community, should not be used for 'Mission Critical' situations?
Sometimes I wish I was a plumber, then I'd know how to deal with other people's shit.
Microsoft Bob. Now, where do I go to collect my bonus air miles?
NEVER open Windows in an airplane!
... would be a hand-crafted real time kernel, written in assembler, running on an IBM 360 mainframe - isn't that still what drives critical aviation systems?
Again another example of EDS shoddiness, why anyone would give money to EDS for anything is beyond me, they deliver inferior service at outrageous prices. M
They aren't telling the whole story.
I come from Solaris/Veritas/Oracle and Redhat/Oracle RAC environments. One single system going down cannot take out the service. Database HA is somewhat complicated and expensive, but it's not rocket science, regardless of platform.
I find it very difficult to believe that they would have any single points of failure in a system of that importance. Blaming MS is the easy way out.
The following entities were NOT mentioned in the article you're linking to:
(1) American Airlines,
(2) US Airways,
(3) EDS.
So, what the hell are you talking about?
Why did you link to this article?
(I know, I know, because nobody will read it anyway)
Sounds like a troll. The article quoted by the parent is about a small regional airline (Atlantic Coast Airlines) that's doing its IT work internally. The article doesn't mention EDS at all. Moreover, browsing EDS's site, you can see that the solution they implemented for Continental Airlines is UNIX-based.
This is undoubtedly a problem with Sabre, which EDS runs on behalf of Sabre Holdings. Both American Airlines and US Airways use Sabre for much of their operations.
Sabre started it's life as an American Airlines internal system (SABER, slight spelling difference), running on a rare operating system (PARS, later called ACP and currently TPF) on IBM mainframes. In the last few years Sabre completed a lengthy migration to HP Unix on Non-Stop (i.e. ex-Tandem) hardware. The mainframe systems were rock solid, but software talent was hard to come by, so they decided the time had come to switch.
Sorry, no Microsoft to blame here!
well, i guess IT Does Matter after all....
Any attempt to retrieve information from them (flight data, schedules, FOIA requests) will result in total, immediate, and irreversable loss of data!
Yeah, right.
Not only that! It's Windows 98.
How ya like dat?
(1) American Airlines,
(2) US Airways,
(3) EDS.
So, what the hell are you talking about?
Why did you link to this article?
(I know, I know, because nobody will read it anyway)
You are such a nitpicker.
"It looks like you are flying an aeroplane, Would you like help?" YES!
At about 4:30 a.m., the outsourced SysAdmin was setting up to do routine patches to Windows 2003 server nodes. But just before, he decided to check his e-mail with Outlook and he opened an important message from his system administrator advising him that his e-mail would be de-activated if he didn't open the important attachment. I think we all know what happened after that...
I might know what I'm talkin' about, but then again, this is Slashdot...
... were any human beings killed or injured? ... were any human beings in danger of being killed or injured?
It's hard to tell from the sketchy news stories, but it looks like AA and UA *do* have a backup plan and *are* executing it. The backup plan is a ground stop for 2-3 hours while they sort things out.
If you want them to have a backup plan which involves providing full service with no interruptions, then you would have a ticket price to fund that.
It seems that computer failures are not very graceful. In a large business if an employee or even the chairman of the board is sick, the business still runs. However, failure of the central computer means no one knows how to make anything run.
Perhaps the efficiencies of a computerized business offset the cost of short downtimes, and the business is able to grow to the complexity that it isn't worth running without the computer. A 2 or 3 hour stoppage once in a blue moon (that was last month, and it looked big) might not be worth working around.
All the same I'm hesitant to let computer failures stand in the way of normality. Major infrastructure may be interrupted by nature but it can be scary for it to be stopped by computer problems. Who knows how long the system will be down? Who knows how much damage to information went unnoticed? Who knows what errors still exist?
Increasing computerization causes increasing paranoia. Guard yourself prophylactically? Ask hard questions before entering relationships with big business? Insist on financial compensation against computer delays?
Computer systems need to be built with more safeguards (redundancy, logging, checkpoints, backups), isolation of failure, data accessibility during failure (example: Windows safe mode) even for end users, etc.
Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.
Anyone know whether these airlines are using Linux in any [mission critical] way? BTW, what would you consider as the most important mission critical system? Could it be in Banking, Hospitals, Airline systems, Educational, Nuclear....? What is a mission critical system anyway?
The system in question is most likely "AirFlite",
a Unix based system hosted by a joint
venture between Sabre and EDS.
Here around we studied it, for one major airline in EU. We wanted a "backup system" in case the main system went down. Total Cost, without maintenance, about *3 whole day* of traffic "benefits"... Yes, that much. Right now the project is still discussed but most of us thinks it is dead in the egg. Instead the "older" and "less powerfull" developpement system will be used in case of break down.
Redundancy is OK, as long as it is not bleeding you dry.
C. Sagan : A demon haunted world:
http://www.amazon.com/gp/product/0345409469/
visit randi.org
The systems that run the aircraft and the navigational and communication systems really are redundant. It's the law. It also means that usually there are two different ways to do something not just the same thing repeated twice.
... We used to joke that the controllers would climb to the top of the tower and wave fire extinguishers to warn the planes away. (I think it was a joke.)
Example 1 - The pilot and co-pilot can't eat the same meal. That way, only one of them can get food poisoning.
Example 2 - The hydraulic system fails and the wheels won't go down. There's a hand crank.
Example 3 - The communication systems at every tower I have worked at have two separate backbones. There are two of absolutely everything. If that fails, there are emergency radios under the desk. If the emergency radios don't work
Example 4 - You can't fly very far over open water in a single engine aircraft.
It used to be frustrating working on systems older than I was but we never had to worry about surprises.
Of course all of this redundancy is very expensive. You spend the money where people's lives are at stake. On the other hand, if the worst problem is that some planes will be late, perhaps you don't spend the big bucks.
Regarding his pay, Ridge has got to have one of the most stressful, time consuming, and important jobs in the country, and as such I for one do not think that $175K is nearly enough. Corporate officers frequently make more than that, so why would anybody willingly subject himself to a much more stressful and dangerous job for less money? I've got to side with Ridge.
Sorry, I think this is happening to a number of Airlines:C TVNews/1091237095342_4/?hub=TopStories
href=http://www.ctv.ca/servlet/ArticleNews/story/
Probably just the CIA moving them all onto some big CIA super-computer.
-- http://thegirlorthecar.com funny dating game for guys
On a more serious note ; a few weeks ago (two i think) : We also had a software problem in the Netherlands (Schiphol Airport) : Causing the monitors which display the departures/arrivals, to go blank : Only to get working 2 hours later.
This also caused great delays, as people had to ask to the airport personnel where to go.
was it something like this ?s od.jpg
http://www.cs.utk.edu/~shuford/terminal/airport_b
Kid, you don't know what in the hell you're talking about. EDS is a mssive, global company that does hundreds of millions in sales a year. They handle systems for thousands of large companies. To say that EDS uses Windows is fucking moronic. A company as large as EDS, and doing as many diverse things as they do probably has every OS ever invented running somewhere in the organization. Sheesh. Slashdot needs some age requirements for posting.
Even though this sounds dire, I have a feeling that this does nothing to compromise airline safety.
From the sounds of it, the flight planning system went down. This is a ground-system only, often a terminal next to the ticket checking counter. The purpose is to file flight plans, check weather airport conditions, etc. It is not an onboard system. This would not have likely decreased passenger safety.
The reason that the FAA got involved was because AA decided to ground the planes because the pilots most likely couldn't file flight plans electronically. If left to the filing flight plans the old way, it would have delayed things more and caused more headaches to just wait out the system outage.
However, when any business runs and depends on a particular piece of software to generate revenue and to provide a service, I would be more inclined to host such a system on something like a mainframe or at least a big Unix server.
Contrary to popular belief, life is not a bitch. It is far far worse.
"Hotard said the problem was purely technological."
Oh, what a relief!
That reminds me of the Seinfeld episode "The Betrayal" where the gang goes to India and George finds out Jerry slept with Nina:
JERRY: Alright, I admit it. I slept with Nina, but that's all.
GEORGE: (Outraged) "That's all"?! That's everything! I don't know what all the rest of it is for anyway!
BSOL? Used in a sentence: If you wanted to fly USAIR you'll BSOL?
Never confuse volume with power.
According to this, EDS runs UNIX-based systems.
Nice try, though.
There is a line of code that raised the problem but is commented in Punjabi, I think it says "fuck this $3/hour job".
Folks, Not only does the link say nothing about windows, but AMR's flight res system (SABRE) is located in Tulsa in a silo and is absolutely not Windows(IIRC, IBM mainframes). Now, it is more likely not their flight system but some immeadiary system. While it is likely to be Windows (based on past history), there is so far no comfirmation of that. In addition, historically, AMR did not run windows as it was too expensive and too prone to crashes. Of course, that was when R. Crandel was there, which was a while ago.
I prefer the "u" in honour as it seems to be missing these days.
For my money, I see so many "Fuck Whoever", and "GNAA" posts when I read at -1 that I only bother when I'm moderating. Their first ammendment rights a) don't apply to a privately owned board, and b) don't mean I have to wade through the crap they spew to see the good stuff.
I'm completely offtopic here, but it bugs me to see the whole "first ammendment guarantees an audience" argument.
"Mission Accomplished" -- George W. Bush May 1, 2003
That is not a BSOD. ... after you hit the key to continue about 20-30 times.
That is simply a Fatal Exception Error.
With a FEE, there is still a chance that the OS may recover
A BSOD does not offer the ability to hit any key to continue but rather dumps your RAM to disk and then stares at you !
This is Sabre. EDS was contracted to run the Data Center. The shop has been mostly TPF/VM/MVS(ZOS) for some time, but they announced a couple of years back that they were going to shift most of the work onto servers (not sure what operating system. I do know that they were from around 20? mainframes(these are now the size of large refridgerators and IBM likes calling them servers) to about 4,000 servers. I am not sure how far they have gotten on this. As for this knocking out a couple of major carriers, I'm not surprised. Most of the domestic carriers are handled in 3 data processing centers, and lately those centers have gone down hill. I know at Galileo, they outsourced to IBM who are doing a very lackluster job of running it. So stand by for more of these type of outages.
Every time I see one of these articles, I read the fine print just to be sure it wasn't one of my bugs that brought the system down. Looks like I'm probably safe this time.
Second, this failure isn't in the Sabre reservations system, it's in some ancillary product, so who knows? Maybe they have no intention of switching it to Unix.
Third, he didn't say so, but the migration isn't just to Unix. It's also migration to MySQL! (Hahahahahahahaha. Then again, coming from TPF, coded in assembly language for 4Kword pages, and a hierarchical database, that might seem pretty advanced.) Sabre had to fund a MySQL port to 64 bits, and a new "stored procedures" feature.
... only to have DIGITAL purchased by Compaq, which was later melded with HP. ... to have Tru64 (nee DIGITAL Unix) cannibalized into a new version of HPUX.
:(
Lovely.
Sorry, we can't blame HP-UX either. The Sabre system was converted to Tandem Non-Stop hardware running Unix. This is only and HP issue because Compaq bought Tandem before HP bought Compaq. Doncha love the computer business?
Hmm... what's going on here?
Mainframes will be the death of us (Due to a lack of talent)! The last major delay in the UK air traffic system was their NAS mainframe going down after a failed patch, and taking three hours (on a Friday morning!) to get back in service. (you can google for NATS being grilled by parliament)
In the future, I would want to not be isolated from my friends in the Space Station.
"Customers won't necessarily miss their connections," he said, "because everything was stopped."
This, ladies and gentlemen, is a flight plan. Now how the hell you gonna die because some FAA form can't get filled out right? All it was was a paperwork requirement. Planes still fly, pilots still know how to land them rubber side down.
First, he ain't starving.
Second, he'll get a cushy retirement package.
Third, hello, this is public service, isn't a higher calling more important than $?
Fourth, he'll get $millions to write a book.
and Fifth, of course, I'm sure Halliburton will come up with a place for him when everything is said and done!
(OK, that was a low blow--but seriously, he'll be fielding all kinds of high dollar job offers when he gets out; former cabinet members don't drive Hyundais)
Repetition does not transform a lie into the truth. - FDR
When they said "operating system", they meant "operations system" - not the OS.
See this quote from one of the articles:
Wagner said a database malfunctioned that "basically runs every aspect of our client operations -- aircraft dispatch, crew scheduling (and) reporting weight, passenger load, balance."
This system is hosted by EDS, who only said it was a "systems issue".
So there's no evidence it was an OS problem. It could have been anything - OS, Oracle/DB2/SQL Server database, application code, upgrade, whatever.
Nothing to conclude here except that somebody screwed up - and even that isn't certain - could have been a bad memory board someplace, who knows.
Not having a backup is even irrelevant, since the "backup" might have taken three hours to bring up, when you're dealing with a production system like this. "Failover" is what you want, and they should have had, but if something got screwed there, it could still have been three hours.
Shouldn't have happened, but crap like this happens all the time because nobody can do their damn jobs.
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
Their computer system went down for at least *3* hours in Minneapolis, shutting down the entire terminal. They couldn't check flight plans, ticket information, scheduling, logistics, etc. No planes in, no planes out.
You'd think they'd have redundancy and backups, but they probably don't. That requires some planning beyond the immediate need of the company and, even if it's more profitable to invest in backups, long term planning simply isn't considered as much.
This happens to my University all the time. The power goes out in one building for a few hours and services across the entire University are disrupted completely. This building happens to house most of the license servers for important software, but no one would _ever_ think of putting a backup license server in another building _just in case_. No, that'd be thinking ahead.
Favorite
Sabre is a multitude of software products, for lack of a better definition. They include RES, DECS, TIM, BMAS and a couple of others that I can't remember.
:)
All Sabre applications are text mode, no GUI whatsoever... think CLI from hell, with no command history if you fat finger an entry.
The system that went down was probably DECS (Dispatch Environment Control System), which is the system used by both American and USAir for generating flight plans, load planning, weight and balance, and various other flight operations functions.
RES is the Reservations system, which covers the spectrum from building reservations and selling tickets, to customer checkin, boarding and god knows what else. IIRC, it will even do car rentals and hotels.
TIM is also called Timatic. Its used for accessing information from the US State Department regarding internation travel to any country, from any country in the world. It covers entry and exit requirements, documentation, and pretty much anything you could want to know.
I don't remember what BMAS stands for, but it is a lost bag tracking and reporting system. When AA or US looses your luggage, this is what they use to find it.
Sabre is used by a whole variety of airlines and travel agencies, and is customised in modules to each particular user's needs.
Now you are probably wondering how I know all this... I work for a major airline that uses a majority of the systems listed above, with the exception of the Dispatch system. We were not affected by whatever snafu took down that portion of Sabre
The airline's backend systems will continue to run on either old Tandem mainframes or port to new IBM mainframes (not running Linux, as of yet). Most of the airline's new IT investments are at the airport end.
Unfortunately, the Windows-everywhere trend seems to be winning here. My airport is going to a common-use terminal system, and it's Win2K based. All but one of the big common use vendors are selling Windows-based equipment. Northwest's CUSS (common use self service) terminals are Win2K based as well.
When I asked our vendor, who specializes in smaller airports, whether his company was doing any Linux development, he replied that nope, since most of the systems will never be on a public internet, it was easier and cheaper to get windows developers. No security concerns without the Internet, and 2K/XP/2K3 have become much more stable than older Windows platforms (his company still has older installations overseas that run NT 4 based systems, all due for an upgrade).
Life is hard, and the world is cruel
Nope. FOS is on TPF and odds are TPF didn't tank, the database got currupted. YOu can duplicate the hardware, make the software non-stop, but if some dork loads a bad module that currupts the data you're down.
From what I saw from the reports is was Flight Operations (FOS). TPF based. Unlikely it was a TPF failure or hardware. One report mentioned database curruption, which means some programmer's going to be in deep dodo in the am. I'll know in the morning, one of my guys used to work on FOS.
I'm not sure how true this is for modern two engine planes, but I have heard that some of those little two engine prop planes are twice as dangerous as one engine planes because they need both engines for safe flight, and they are therefore twice as likely to have an engine failure and crash.
I don't really care if it is true, it makes a good story anyway :P
EDS has a major US government contract to
...
upgrade many/most of the US Marine Corp.
computers (desktops AND servers). The
contract is in very big trouble, being
plagued with huge cost overruns and failure
to supply the equipment in a timely manner.
EDS keeps popping up in the news here, and
the news is rarely good. I think the company
really went downhill after H. Ross Perot
sold it off. Too bad, really
Please use the full acronym, or its full name: "Uniting and Strengthening America by Providing Appropriate Tools Required to Intercept and Obstruct Trrorism".
The "USAPATRIOT" Act has nothing to do with patriotism, so calling it the "PATRIOT Act" is misleading.
(Considering how the Act is being misused these days, even using its full name is somewhat misleading (How is copyright infringement "terrorism"?).)
Personally, I pronounce it "the you sap at riot act" to avoid confusion.
Other pronunciations are "the US ap uh TRY ot act" and (as Jar-Jar) "the YOUsa pah TR-R-RE-E-E at act".
Those who sacrifice security to condemn liberty deserve to repeat history or something. - Benjamin Santayana