Slashdot Mirror


Database Glitch Grounds American/US Airways

An anonymous reader writes "According to numerous news sources, all American Airlines and US Airways flights were grounded for two or three hours this morning. Both problems were caused by a computer glitch in the systems hosted by EDS. Quote: The operating system that drives the airline's flight plans went down."

28 of 274 comments (clear)

  1. Operating System (singular) by Hypharse · · Score: 5, Insightful
    "The operating system that drives the airline's flight plans went down."

    How in the world can they state that as singular. Surely they have a backup of some sort. Especially with all the supposed "increased security" around air flight, you are telling me that one system crash can knock out half of the major airlines? That's ridiculous. Have they not learned about redundancy?

    1. Re:Operating System (singular) by Anonymous Coward · · Score: 5, Funny

      Yeah, Have they not learned about redundancy?

    2. Re:Operating System (singular) by njcoder · · Score: 5, Funny
      "Have they not learned about redundancy?"

      Yep, their so good, even the failure was replicated!

    3. Re:Operating System (singular) by Vlad_the_Inhaler · · Score: 4, Informative

      This is partially a question of cost, redundancy costs money and those airlines are rather short of the readies (although this crash will cost serious money).

      For any *normal* 'extreme situation', a reboot should help.

      Having just read that The operating system that drives the airline's flight plans went down, it might even be a Windows problem. A 'Flight Planning' application is a low volume application where you work out the optimum route for a plane based on the weather. That bit about the weather involves serious number crunching and the PC world has more of that kind of power to spare than the mainframe world. I helped write one of these apps 20-18 years ago and the central part has since been converted to run on PCs.

      --
      Mielipiteet omiani - Opinions personal, facts suspect.
    4. Re:Operating System (singular) by Anonymous Coward · · Score: 4, Funny

      I'm an MBA. Would you please explain the joke?

  2. EDS works with a variety of systems by Rex+Code · · Score: 4, Insightful

    EDS is by no means a Windows shop. They work extensively with "big iron" mainframes. In fact, they recently got the contract to handle the database of terrorist information that'll be used at airports. Likely this will be hosted on a 390 or something... Windows can't handle that kind of I/O.

    1. Re:EDS works with a variety of systems by johnnyb · · Score: 4, Informative

      Their reservation system is on Vax/VMS, if I remember correctly. I used to work in their midrange department, but I knew some of the VMS coverage guys. They have quite a diverse setup. The only operating system I _didn't_ see there was HPUX.

      They actually have a data center that is underground, and has a retinal scanner to get in (for some reason, our group got in with keycards - I'm not really sure why). Their tape library is about three times the size of my house. It's a pretty massive operation. Travelocity, hosted in the same location (but on the ground floor, not downstairs), is a bunch of huge SGI machines (8 processors and more each - probably about 30 of them).

      They run pretty much everything under the sun. I enjoyed being around the cool equipment while I was there, but absolutely hated the "big company" mentality, so I left after a year.

  3. Settle down, we've all seen this before... by the_seal · · Score: 4, Funny
  4. Last thing you want to hear by xIcemanx · · Score: 5, Funny

    I'm guessing the last thing you want to hear on a plane now is the pilot saying, "What do you mean, fatal exception error?"

    >_ Why don't they swtich to Linux?

    1. Re:Last thing you want to hear by Brandybuck · · Score: 5, Funny

      Q: How far can the plane fly after a fatal exception error?

      A: All the way to the scene of the crash. Hell, it will probably beat the paramedics there by half an hour!

      --
      Don't blame me, I didn't vote for either of them!
  5. BSOL by Anonymous Coward · · Score: 4, Funny

    Blue screen of life. Because US Air cancelled the flight and we were forced to fly on a competent airline.

  6. EDS? Quelle surprise. by leathered · · Score: 5, Interesting

    Sorry, have to rant where I see EDS mentioned.

    EDS, in cahoots with the UK govenment, have wasted millions of pounds of taxpayers money on failed IT projects. Notable ones include the Inland Revenue (UK IRS), Child Support Agency (£50M over budget and still not working) and an email and directory service for the NHS (withdrew at last minute allowing C&W to steal at a much inflated price).

    Though the blame cannot completely be laid at the door of EDS, the government has been guilty of sloppy auditing and the worst being the willingness to hand over extra money when EDS has come around with the begging bowl.

    --
    For all intensive porpoises your a bunch of rediculous loosers
    1. Re:EDS? Quelle surprise. by oddRaisin · · Score: 4, Insightful

      If we're going to lay blame, let's make sure we're spreading it evenly. A lot of contracts, and especially government ones, suffer from extreme scope creep. I have seen projects that started out with a 20 page description grow to over 150 pages by the end of the project.

      EDS and other large IT vendors try their best to discourage scope creep by making changes-after-the-fact billable for time and materials, instead of a negotiated cost. This makes the project go over budget. If the clients knew what they wanted at the begining, instead of wasting time and money doing engineering on the fly during the project, then the costs wouldn't be so high.

      Don't be so quick to slag EDS about the outage either. There are lots of factors out there that could have contributed. I have worked on projects where the clients say the servers are mission critical, yet can't be bothered to shell out money to upgrade from ultra-1 and ultra-5s, let alone pay for an HA solution. The technical people keep providing the justification and making the requests, but it's the project managers and accountants that really determine what kind of solution is feasible.

  7. I thought everyone knew by Xerp · · Score: 5, Funny

    NEVER open Windows in an airplane!

  8. Re:Great News! by Vlad_the_Inhaler · · Score: 4, Informative

    Typical Airline applications are Reservations, Check-In, Weight-and-Balance, Flight Planning (which route to take and how much fuel to carry) and Ticketing. Once you have left the terminal and are heading for the runway, software crashes cease to be relevant.

    Once you head for the runway, you care about Air Traffic Control's software. The only exception I can think of is for flights to the US where the authorities want passenger lists.

    I work for an airline and we host for other airlines. I feel sorry for whoever carries the can for this mess. As to the OS, those who said it will be MVS are almost certainly correct. AA and US Airways are/were IBM customers.

    --
    Mielipiteet omiani - Opinions personal, facts suspect.
  9. Re:Great News! by Chess_the_cat · · Score: 4, Insightful

    What scary news? The airplanes are piloted by people, not computers. And certainly not the computers that control flight plans. Do you think that airplanes will start falling from the skies because a computer went down somewhere? I guess you packed your basement with cans of beans for Y2K too.

    --
    Support the First Amendment. Read at -1
  10. Re:Windows by Anonymous Coward · · Score: 5, Informative

    The following entities were NOT mentioned in the article you're linking to:

    (1) American Airlines,
    (2) US Airways,
    (3) EDS.

    So, what the hell are you talking about?
    Why did you link to this article?
    (I know, I know, because nobody will read it anyway)

  11. Re:Windows by ChatHuant · · Score: 4, Informative

    Sounds like a troll. The article quoted by the parent is about a small regional airline (Atlantic Coast Airlines) that's doing its IT work internally. The article doesn't mention EDS at all. Moreover, browsing EDS's site, you can see that the solution they implemented for Continental Airlines is UNIX-based.

  12. Not Windows, Unix by JohnQPublic · · Score: 5, Informative

    This is undoubtedly a problem with Sabre, which EDS runs on behalf of Sabre Holdings. Both American Airlines and US Airways use Sabre for much of their operations.

    Sabre started it's life as an American Airlines internal system (SABER, slight spelling difference), running on a rare operating system (PARS, later called ACP and currently TPF) on IBM mainframes. In the last few years Sabre completed a lengthy migration to HP Unix on Non-Stop (i.e. ex-Tandem) hardware. The mainframe systems were rock solid, but software talent was hard to come by, so they decided the time had come to switch.

    Sorry, no Microsoft to blame here!

  13. What *REALLY* happened... by catdevnull · · Score: 5, Funny

    At about 4:30 a.m., the outsourced SysAdmin was setting up to do routine patches to Windows 2003 server nodes. But just before, he decided to check his e-mail with Outlook and he opened an important message from his system administrator advising him that his e-mail would be de-activated if he didn't open the important attachment. I think we all know what happened after that...

    --

    I might know what I'm talkin' about, but then again, this is Slashdot...
  14. Do you know the cost of redundency ? by aepervius · · Score: 4, Interesting

    Here around we studied it, for one major airline in EU. We wanted a "backup system" in case the main system went down. Total Cost, without maintenance, about *3 whole day* of traffic "benefits"... Yes, that much. Right now the project is still discussed but most of us thinks it is dead in the egg. Instead the "older" and "less powerfull" developpement system will be used in case of break down.

    Redundancy is OK, as long as it is not bleeding you dry.

    --
    C. Sagan : A demon haunted world:
    http://www.amazon.com/gp/product/0345409469/
    visit randi.org
  15. You probably won't hear it by Anonymous Coward · · Score: 5, Informative

    The systems that run the aircraft and the navigational and communication systems really are redundant. It's the law. It also means that usually there are two different ways to do something not just the same thing repeated twice.

    Example 1 - The pilot and co-pilot can't eat the same meal. That way, only one of them can get food poisoning.

    Example 2 - The hydraulic system fails and the wheels won't go down. There's a hand crank.

    Example 3 - The communication systems at every tower I have worked at have two separate backbones. There are two of absolutely everything. If that fails, there are emergency radios under the desk. If the emergency radios don't work ... We used to joke that the controllers would climb to the top of the tower and wave fire extinguishers to warn the planes away. (I think it was a joke.)

    Example 4 - You can't fly very far over open water in a single engine aircraft.

    It used to be frustrating working on systems older than I was but we never had to worry about surprises.

    Of course all of this redundancy is very expensive. You spend the money where people's lives are at stake. On the other hand, if the worst problem is that some planes will be late, perhaps you don't spend the big bucks.

  16. Re:My guess ... by sysjkb · · Score: 4, Informative
    Sounds very unlikely to me. You will find weird custom S/360 derivatives in places like the Space Shuttle, but coordination and route planning doesn't sound a likely place for one.

    Of the 360-based operating systems, IBM's TPF has a major presence in the airline industry, but this probably isn't the system in question. TPF tends to handle ticketing and reservations. TPF stands for the Transaction Processing Facility; it's the descendant of the old Airline Control Program (ACP) developed for Sabre. Sabre in fact is still running TPF, although I believe they're busy transitioning away from the mainframe to Tandem's er I mean Compaq's er I mean HP's NonStop/UX.

    Of course, it might not be an IBM mainframe at all; Unisys has a niche in the airline industry. But heck; given that this is route planning, just about anything from AIX to z/OS is a possibility. Even *shudder* Windows.

  17. I found the root of the problem by Anonymous Coward · · Score: 5, Funny

    There is a line of code that raised the problem but is commented in Punjabi, I think it says "fuck this $3/hour job".

  18. Probably Sabre Holdings, rest probably wrong by Markus+Registrada · · Score: 5, Interesting
    First, they didn't "complete a migration". They're still deep in the middle of it, and will be for years to come.

    Second, this failure isn't in the Sabre reservations system, it's in some ancillary product, so who knows? Maybe they have no intention of switching it to Unix.

    Third, he didn't say so, but the migration isn't just to Unix. It's also migration to MySQL! (Hahahahahahahaha. Then again, coming from TPF, coded in assembly language for 4Kword pages, and a hierarchical database, that might seem pretty advanced.) Sabre had to fund a MySQL port to 64 bits, and a new "stored procedures" feature.

  19. Re:Wild speculation by Archibald+Buttle · · Score: 4, Interesting

    Blaming MS is the easy way out.

    I just read all the stories that were linked to this article.

    None of them blamed Microsoft. In fact the only blame pushed in their direction was your comment...

    The articles did say that there was a problem with the operating system. Now we don't know who exactly said this, or what they said precisely, so it is quite possible that this isn't entirely accurate reporting.

    I find it very difficult to believe that they would have any single points of failure in a system of that importance.

    I agree it's unlikely, but it is possible that there is a single point of failure in their system. There are a great deal of shoddily engineered systems in use today.

  20. Not "OS" by Master+of+Transhuman · · Score: 5, Informative

    When they said "operating system", they meant "operations system" - not the OS.

    See this quote from one of the articles:

    Wagner said a database malfunctioned that "basically runs every aspect of our client operations -- aircraft dispatch, crew scheduling (and) reporting weight, passenger load, balance."

    This system is hosted by EDS, who only said it was a "systems issue".

    So there's no evidence it was an OS problem. It could have been anything - OS, Oracle/DB2/SQL Server database, application code, upgrade, whatever.

    Nothing to conclude here except that somebody screwed up - and even that isn't certain - could have been a bad memory board someplace, who knows.

    Not having a backup is even irrelevant, since the "backup" might have taken three hours to bring up, when you're dealing with a production system like this. "Failover" is what you want, and they should have had, but if something got screwed there, it could still have been three hours.

    Shouldn't have happened, but crap like this happens all the time because nobody can do their damn jobs.

    --
    Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
  21. More info about Sabre than you ever wanted... by airbatica · · Score: 4, Informative

    Sabre is a multitude of software products, for lack of a better definition. They include RES, DECS, TIM, BMAS and a couple of others that I can't remember.

    All Sabre applications are text mode, no GUI whatsoever... think CLI from hell, with no command history if you fat finger an entry.

    The system that went down was probably DECS (Dispatch Environment Control System), which is the system used by both American and USAir for generating flight plans, load planning, weight and balance, and various other flight operations functions.

    RES is the Reservations system, which covers the spectrum from building reservations and selling tickets, to customer checkin, boarding and god knows what else. IIRC, it will even do car rentals and hotels.

    TIM is also called Timatic. Its used for accessing information from the US State Department regarding internation travel to any country, from any country in the world. It covers entry and exit requirements, documentation, and pretty much anything you could want to know.

    I don't remember what BMAS stands for, but it is a lost bag tracking and reporting system. When AA or US looses your luggage, this is what they use to find it.

    Sabre is used by a whole variety of airlines and travel agencies, and is customised in modules to each particular user's needs.

    Now you are probably wondering how I know all this... I work for a major airline that uses a majority of the systems listed above, with the exception of the Dispatch system. We were not affected by whatever snafu took down that portion of Sabre :)