Slashdot Mirror


Comair System Crashes; Passengers Stranded

Broerman writes "30,000 people have had their flights cancelled by Comair this weekend thanks to a computer system shutdown. It appears that due to weather and other problems that flights began to be cancelled on Thursday and the backlog choked the system. 1,100 flights have been cancelled so far, including all flights through 12/26. Does anyone know what platform their system was based on? What kind of system just totally crashes? The official statement is that 'There was a cumulative effect with the canceled flights and trying to get crew assigned that caused the system to be overwhelmed.' It seems highly improbable that a system would crash because it had too many reservations. The system should only be able to hold as many reservations as it has flights/seats. It would seem that it's more likely that the system was overloaded with use and that caused a meltdown. When you add in the problems experienced by US Airways, this hasn't been a Merry Christmas for many."

246 of 398 comments (clear)

  1. Fire away! by weeksie · · Score: 5, Funny

    Anybody know what they were running? I'd like to see this flamewar get started as soon as possible.

    1. Re:Fire away! by mirko · · Score: 5, Insightful

      There recently was a big card problem here, in Europe.
      It did not come from a peculiar OS but just because a partition got filled by index tablespace extents.
      So, it could just be that they ran out of place and it froze the whole application.

      --
      Trolling using another account since 2005.
    2. Re:Fire away! by Deviate_X · · Score: 3, Interesting

      Interesting...

      Job postings might give some insight: Comair, Inc. jobs into what they are using.

    3. Re:Fire away! by theonetruekeebler · · Score: 2, Insightful
      Based on those postings, I'm guessing the application is based on either Oracle or Sybase on HP-UX.

      My preliminary diagnosis: blown rollback segment. With too many flights being cancelled, the simultaneous rescheduling of all those crew resulted in a SQL transaction that exceeded the size of what the DBMS could undo. So an uncommitted statement failed and the application code either was not prepared for such a possibility or could only handle it by timing out. Scheduling tasks could no longer move forward, and right now some poor DBA is hoping to Christ that he printed out that e-mail he wrote asking for more disk space...

      --
      This is not my sandwich.
    4. Re:Fire away! by [Xorian] · · Score: 5, Informative

      Someone from Comair (who shall remain anonymous) provided me with some details whch people here would be interested in:

      The computer system in question runs AIX. The box itself is still up and running just fine; this is purely an application error. This application was not written in-house at Comair, but by another large aerospace company -- SBS (http://www.sbsint.com/, owned by Boeing.) This bit of software does not use an external database, it tracks everything itself. It is a dedicated system responsible only for flight crew assignments. (The blather in the original submission about passenger reservations is way off-base. Those functions are handled by a completely different system.)

      The great majority of Comair's traffic flows through the midwest, and the central base of operations is in Cincinnati. The midwest was hit by a major snowstorm this week, causing many, many crew reassignments. It appears right now that the application in question has a hard limit of 32,000 changes per month (ouch). Consider that Comair runs 1,100 flights a day and there are usually 3 crew members on each aircraft. A big storm like this can cause problems for days after the snow stops falling. That's a whole lot of crew changes.

      In Comair's defense, this has never happened before and is unlikely to happen again. The crew system was already on the chopping block long before this incident, with its replacement scheduled to go live in January. If this freak storm had happened a month later, this likely never would have occurred.

      --
      CVS is teh suck. Use Vesta instead.
    5. Re:Fire away! by [Xorian] · · Score: 4, Informative


      Just to be absolutely clear: I've only ever communicated with this person on-line, and I can't verify who they are in real life or that they actually work for Comair. It seemed credible though, and it seemed worth posting to de-bunk the slashdot knee-jerk reaction of blaming Microsoft. To me, an application using a 16-bit integer for something seems like a very likely explanation.


      --
      CVS is teh suck. Use Vesta instead.
    6. Re:Fire away! by Anonymous Coward · · Score: 5, Informative
      If it was the crew scheduling system, and it was SBS's Maestro Crew scheduling system, I can fill in some details.

      Maestro is delivered on AIX, uses a rather old version of Informix for it's database, and is tied together using the TUXEDO TP monitor from BEA.

      The business logic is written in C, and abstracted away using Tuxedo.

      In the case of a major schedule disruption, this program isn't responsible for "solving" the problem, but is responsible as being the system of record for holding the new crew schedule.

      My guess is that the changes to the crew schedule were large enough that some piece of the system was overwhelmed. ( For example, a transaction that was too large and overran the rollback buffers in Informix ).

      Without the system of record in place, a manual process would be very difficult. You would have to figure out:

      • Which crews where in which locations
      • What aircraft each crew member was qualified on.
      • How long they had flown already that day. ( Legalities about how much time you can fly before you need mandatory rest )
      • Which routes to send those crews on
      • How to get the crews back to a specific city to run the next day's schedule
      Of course, any mistakes you made doing this manually would overflow into other systems. For example, you might send an aircraft that's due maintenance to a city with no maintenance facilities.

      Also, for those that were critical of the system not being highly availble...this doesn't sound like the kind of problem that HACMP and replicated databases would have helped. The hot standby would have choked at the exact same point.

    7. Re:Fire away! by Daa · · Score: 5, Informative

      just to give you an idea, here is the applicable FAA reg for crew scheduling, and the pilots contract may have additional terms that must be met.

      121.471 Flight time limitations and rest requirements: All flight crewmembers.
      top

      (a) No certificate holder conducting domestic operations may schedule any flight crewmember and no flight crewmember may accept an assignment for flight time in scheduled air transportation or in other commercial flying if that crewmember's total flight time in all commercial flying will exceed--

      (1) 1,000 hours in any calendar year;

      (2) 100 hours in any calendar month;

      (3) 30 hours in any 7 consecutive days;

      (4) 8 hours between required rest periods.

      (b) Except as provided in paragraph (c) of this section, no certificate holder conducting domestic operations may schedule a flight crewmember and no flight crewmember may accept an assignment for flight time during the 24 consecutive hours preceding the scheduled completion of any flight segment without a scheduled rest period during that 24 hours of at least the following:

      (1) 9 consecutive hours of rest for less than 8 hours of scheduled flight time.

      (2) 10 consecutive hours of rest for 8 or more but less than 9 hours of scheduled flight time.

      (3) 11 consecutive hours of rest for 9 or more hours of scheduled flight time.

      (c) A certificate holder may schedule a flight crewmember for less than the rest required in paragraph (b) of this section or may reduce a scheduled rest under the following conditions:

      (1) A rest required under paragraph (b)(1) of this section may be scheduled for or reduced to a minimum of 8 hours if the flight crewmember is given a rest period of at least 10 hours that must begin no later than 24 hours after the commencement of the reduced rest period.

      (2) A rest required under paragraph (b)(2) of this section may be scheduled for or reduced to a minimum of 8 hours if the flight crewmember is given a rest period of at least 11 hours that must begin no later than 24 hours after the commencement of the reduced rest period.

      (3) A rest required under paragraph (b)(3) of this section may be scheduled for or reduced to a minimum of 9 hours if the flight crewmember is given a rest period of at least 12 hours that must begin no later than 24 hours after the commencement of the reduced rest period.

      (4) No certificate holder may assign, nor may any flight crewmember perform any flight time with the certificate holder unless the flight crewmember has had at least the minimum rest required under this paragraph.

      (d) Each certificate holder conducting domestic operations shall relieve each flight crewmember engaged in scheduled air transportation from all further duty for at least 24 consecutive hours during any 7 consecutive days.

      (e) No certificate holder conducting domestic operations may assign any flight crewmember and no flight crewmember may accept assignment to any duty with the air carrier during any required rest period.

      (f) Time spent in transportation, not local in character, that a certificate holder requires of a flight crewmember and provides to transport the crewmember to an airport at which he is to serve on a flight as a crewmember, or from an airport at which he was relieved from duty to return to his home station, is not considered part of a rest period.

      (g) A flight crewmember is not considered to be scheduled for flight time in excess of flight time limitations if the flights to which he is assigned are scheduled and normally terminate within the limitations, but due to circumstances beyond the control of the certificate holder (such as adverse weather conditions), are not at the time of departure expected to reach their destination within the scheduled time.

    8. Re:Fire away! by Anonymous Coward · · Score: 5, Informative
      No. It is the version of SBS that pre-dated Maestro. It was brought into Comair in the early 1980's. It's written in FORTRAN and uses whatever record managment system that came with the compiler.

      As such it used some very interesting data representations. For example, it tracked time using julian minutes. There are 44640 minutes in a 31 day month. That's small enough to fit in a 16-bit unsigned variable. This approach, nearly taboo by modern standards, was a God-send during Y2K. The system never needed to know what year it was. It became the running wisecrack, "You can't have a Y2K problem if you don't have a 'Y'".

      The Aircraft to Flight assignments is another system, but the two share information.

    9. Re:Fire away! by Anonymous Coward · · Score: 1, Insightful

      Ahh. I'm surprised the "pre Maestro" stuff still
      exists. In fact, I think SBS's preferred platform for the older stuff was Ultrix. If
      COMAIR waited this long to address replacing this
      ancient FORTRAN spiderweb, they made their own
      bed. I think SBS released Maestro to replace that stuff in 1993 or so.

    10. Re:Fire away! by pVoid · · Score: 2, Interesting
      I don't think they keep a SQL transaction running for as long as the flight hasn't taken off.

      SQL transactions generally last seconds and involve operations like "open tr, is there space in this flight?, reserve space, close tr". Not "open tr, wait for flight to fill up, close tr". Rescheduling or canceling flights probably isn't accomplished using transactions: it's application level logic.

      My personal diagnosis: I think it has nothing to do with the backlog, and that the system just melted under high strain (of millions of people trying to book other flights). Either that, or they ran out of disk space.

    11. Re:Fire away! by theonetruekeebler · · Score: 1
      According to TFA (please R), the problem appears to be with the software used to reassign crews to planes, not the reservation system. Assigning crews works like this:
      1. here is a list of flights that need crews
      2. here are the unassigned crewmembers
      3. mash them together into a fine paste.

      Step 3 is the one that can be an O(2^n) problem of assigning weights to different crew/flight combinations. Even with a very clever set of heuristics the problem is at the very least still an O(n^2) or more likely (O(n^2*lg(n)) with an unconscionably high coefficient.

      There are an amazing number of variables:

      • where are the flights going from/to
      • where are the crewmembers currently located
      • cost of deadheading crew
      • which crew is rated for what type of equipment
      • which crew are already assigned
      • which multirated crew can/should be reassigned
      • how close are crew to their legal flight time limits
      • which flights will not put those crew over their limits

      As you can see, this sort of thing tends to stack up, and involves building lots of intermediate data. If you feed 500 flights into a system designed to handle 20 or 30 at a time, well, the problem is somewhere between 275 and 625 times more complex for O(n^2) and 1300 times more complex for O(n^2*lg(n)) and while building the cost matrix for the flight assignments yes they ran out of fucking disk space. Do you not know what a rollback segment is? It's what makes you run out of disk space while updating a table 1300 times larger than you thought it would be. Sheesh.

      --
      This is not my sandwich.
    12. Re:Fire away! by pVoid · · Score: 2, Insightful
      Do you not know what a rollback segment is? It's what makes you run out of disk space while updating a table 1300 times larger than you thought it would be

      Yes, but you pretty much spelled out what my point was in that the n^2 complexity issue is unrelated to transactional operations. That is, a transaction is a transaction, it is scalable, so it doesn't matter whether the actual operation for computing stuff is O(n^2), the transaction is still a fixed cost. On a side point: I don't agree that because the problem is 1300 more complexe, the updates are 1300 times bigger. The problem is still based on n elements: it just happens that computing the solution of a problem with n elements takes n^2 time... the end result though is still n elements to update.

      That being said, I am fairly confident modern relational databases are scalable to the point of being able to handle a 500 fold increase (if only by simply slowing down to a crawl - but not crashing). If anything, it's probably internal application logic that wasn't able to handle the added computational complexity and at a certain point hit a hard limit of its scalability (some fixed sized arrays, or indexes of some sort).

      My comment about 'ran out of disk space' was more in the lines of "it's either an application fault, or something mundane like someone forgot to check if they had sufficient disk space" (something which can happen anytime due to neglect)

    13. Re:Fire away! by theonetruekeebler · · Score: 1
      I don't agree that because the problem is 1300 more complexe, the updates are 1300 times bigger.
      More precisely, the complexity is a function of the number of unassigned crew multiplied by the number of unassigned flights. In a typical solution, matrix is built, costs are assigned to each combination, and assignments are built from there. Certain heuristics can keep ludicrous combinations out of the matrix, but the matrix is still going to be big. Since this is an RDBMS-based application, cost setting goes like this:
      UPDATE crew_cost_matrix SET distance=get_distance(crew_id, flight_id);
      Updating the entire table in one pass. So: If the table is big enough, the rollback segments (or something else) explode.

      I will cheerfully concede, however, that the process could require O(n^2) computation time, thus bringing the system to a halt that way, before the disks even light up.

      --
      This is not my sandwich.
    14. Re:Fire away! by pVoid · · Score: 1
      Agreed.

      (always makes me feel warm inside that you can have logical conversations every once in a while on /. =)

    15. Re:Fire away! by theonetruekeebler · · Score: 1

      Yeh. Sorry I got snippy earlier. New baby in the house. Sleeplessness affects mood.

      --
      This is not my sandwich.
    16. Re:Fire away! by TechSoft04 · · Score: 1

      I live in Cincinnati and we have storms like this every few years. Comair has been lucky in prior years that this application hasn't choked. My hunch is that Comair has been growing in the past few years and has added more pilots and stewardesses which caused the application to meet that critical breaking point in a storm like they had. But, to not have any type of contingency plan in place is insane! Even a hot site or live backup server could have helped them in regards to limiting how many changes could be made after the crash. Load balancing hardware and software could have helped. I have worked with hundreds of IT departments in large companies over 22 years and having a fail over system and a contingency plan has been priority 1. IT shops understand how much it costs their companies to be down just one hour. No matter what software or hardware Comair is using, there is no excuse not to have redundancy.

    17. Re:Fire away! by Tassach · · Score: 1

      That's because once you start a highly technical thread, the 14 year olds can't even understand the conversation enough to interject a comment.

      --
      Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
    18. Re:Fire away! by pVoid · · Score: 1

      Ahh. Man, look. I'm really not a "told you so" kind of guy, and I'm just being jovial here, but it turns out I was right. It *was* an internal scalability issue after all! =)

    19. Re:Fire away! by theonetruekeebler · · Score: 1
      Rather than fret that my theory didn't pan out, I'd just like to say

      BAAHAHAHAHA! What'd they prototype this on, a Gameboy?

      TFA says there was a hard-limit of "32000". I wonder if they mean a signed 16-bit counter or a #define TABLE_MAX 32658. Either way, Ewww.!

      --
      This is not my sandwich.
    20. Re:Fire away! by ddusza · · Score: 1

      Rats! I couldn't get my FAR/AIM 2004 out quick enough for this story! Don Student--PPSEL 44.5 hrs and holding....

      --
      Don't fear the penguins
    21. Re:Fire away! by pVoid · · Score: 1
      I know man. Thing is, I'm sure this is in some obscure part of their mainframe system, and that code was written while Pan-Am was still around.

      I wouldn't be surprised if they actually didn't have the source anymore. But I'm sure they do now, since they're a responsible company, right? =)

  2. Happens all the time... by Anonymous Coward · · Score: 5, Interesting

    When I lived in Chicago, they would lose their radar system on what seemed like a strong wind. And I got stuck in Denver overnight once because the computer system they use to calculate the weight of departing flights crashed. I have a feeling these kinds of crashes are much more common than most people think.

    1. Re:Happens all the time... by hughk · · Score: 4, Informative
      I have a lot of friends working at a large airline.

      Yes, but it is mostly recoverable. The heavy iron handles things like backend reservations, checkin and cargo. Smaller systems handle things like weight/balance and fuel and PCs are typically used for the front-ends.

      Weight/balance calcs can be done more or less by hand if necessary, however a larger fuel margin is needed. Checkin can be done by hand (you have seen those sticky label systems). However to lose reservations is a major problem.

      --
      See my journal, I write things there
    2. Re:Happens all the time... by dattaway · · Score: 1

      Apparently, this kind of crash is recoverable, but I wouldn't feel good about it happening.

    3. Re:Happens all the time... by Greyfox · · Score: 4, Interesting
      From looking at the various terminals that the airline people use, I suspect that most of those airline systems are held together with duct tape and library paste and no one really understands how the whole system works anymore. We see that a lot in non-IT industries (And a few IT ones, too.) Of course, the folks using the IBM ones are not ever supposed to go down...

      I moonlighted as an AS/400 operator for a cruise line for a while. We had the system go down once because the janitor turned off the air conditioner in the closet the AS/400 lived in. They didn't dedicate a more secure facility for the computer because the computer wasn't demonstrably central to how the company made money. Turns out they couldn't launch a ship without it. Oops. I suspect that mentality is also prevalent throughout the non-IT industries. They don't know how important their computers are to their business models until those computers die on them.

      --

      I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

    4. Re:Happens all the time... by Rosonowski · · Score: 2, Funny

      Wow. I would hate to be the one sitting there when that happened.

      There's some...thing on the ... wing

      --
      01101001 01100001 01101101 01101110 01101111 01110100 01100001 01101100 01100001 01110111 01111001 01100101 01110010
    5. Re:Happens all the time... by budgenator · · Score: 2, Insightful

      Not to hard to imagine, I see a system that's a combination of Fortran 66, cobol, and C all sort of working together over the years. All parts have had numerous patches and changes applied over the years until no one understands it anymore with each interation making the system more fragile. Now they are lucky if they have the source code for the current build.
      Each time the industry is making money and IT is flush a project is started, to examine all the code in the system and refactor and rewrite to modern standards, and each time the project gets just past the planning phase the economy takes a dump and the team get laid off.
      Now that the problem has had an economic impact on the company, the PHB is going to send it off to India, to some kids 6 months out of college who is going to have to google the internet for the meaning of a GOTO statement, used in the million lines of code that is older than he is.

      --
      Apocalypse Cancelled, Sorry, No Ticket Refunds
    6. Re:Happens all the time... by TopShelf · · Score: 4, Funny

      I used to work with a guy who at one time was an HP3000 operator back when those things were as big as your average washer/dryer combo. His shop had about a dozen of these things, and one night he and a buddy were playing frizbee with the circular write-protect rings that were used on the reel-to-reel tape drives.

      Sure enough, his buddy whipped one at his head, and as he ducked out of the way, he fell back and by accident hit the power switch located on the back of one of the HP3000's. In an instant, all the ticket terminals for one airline (I can't recall which one) at O'Hare airport went down, prompting a frantic call from VP's wondering what disaster had struck. So who knows what could have happened this time around...

      --
      Stop by my site where I write about ERP systems & more
    7. Re:Happens all the time... by phil+reed · · Score: 2, Insightful

      Of course, the folks using the IBM ones are not ever supposed to go down...
      There's a difference between the machine crashing and the application crashing.

      --

      ...phil
      "For a list of the ways which technology has failed to improve our quality of life, press 3."
    8. Re:Happens all the time... by arodland · · Score: 1
      Don't you mean:

      There's a man! on the wing of the plane!

    9. Re:Happens all the time... by Rosonowski · · Score: 1

      Ok, I'll admit. It's been so long since I've actually seen the movie that I'm just working off of faded memories and parodies, so you might be right.

      --
      01101001 01100001 01101101 01101110 01101111 01110100 01100001 01101100 01100001 01110111 01111001 01100101 01110010
    10. Re:Happens all the time... by A+Naughty+Moose · · Score: 1
      as opposed to some one?


      No, as opposed to some thing. As in the Twilight Zone episode Nightmare at 20,000 feet, where William Shatner played a guy who got a window seat on a plane and saw some thing messing around on the wing, presumable to trying to make the plane crash.
    11. Re:Happens all the time... by Coniagas · · Score: 2, Funny

      without mentioning names I have also worked on several airline systems on a contract basis. Two years ago I was asked to look at a problem with flight ops and was shown a 486 DX2/80 running Novel 2.11. I was told to just patch it till they could look at replacing the system. I did and a few months ago I was in the same office and was asked to look at the flight ops server that was "burping". They had upgraded to a P2-400 and still runnig Novel 2.11.

      I was told this was a major upgrade. Some things never change.

    12. Re:Happens all the time... by Greyfox · · Score: 2, Insightful
      Funny how you never really hear about the applications written in COBOL, Fortran and PL/1 crashing. You get the impression that all those applicatons run for years at a time without so much as a hiccup. It's only with the invasion of GUIs and "modern" design techniques and languages that you start hearing about crashes like this. Granted the newer applications tend to be more ambitious about what they do...

      I'd love to see some uptime numbers for past systems versus the systems we have today. I wonder if they'd show the downward trend that I suspect they would.

      --

      I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

    13. Re:Happens all the time... by Kristoffer+Lunden · · Score: 1

      Mr Smith: "There is a gremlin destroying the plane! You've gotta believe me!"
      Speaker: "Why should I believe you? You're Hitler!"

    14. Re:Happens all the time... by myov · · Score: 1

      A few months back, Air Canada shut down for at least a day because the system which calculates fuel requirements (and outsourced to IBM) went down. I thought it was due to an upgrade, but I could be wrong.

      --
      I use Macs to up my productivity, so up yours Microsoft!
    15. Re:Happens all the time... by HiThere · · Score: 2, Interesting

      There were many of them that did, however, crash. But the reason you don't hear about it much is that most of them weren't designed to be running all of the time, but only occasionally. If one crashed (and was a known good program) you'd just re-run it. Frequently that was your only choice, as you might not have anything but the binary. (Sloppy contracts often left consultants with the only copy of the source.)

      I did hear of one company that went out of business because their accounting system was written in a combination of those languages (plus a bit of assembler, and some binary patches). When it was done, they let the consultants go. A few years later the consultants didn't have a copy of the source anymore, and some tax law changes took effect. Oops! (That's not exactly a crash, but it wiped out the whole company.)

      OTOH, when I was writing fortran I had frequent crashes. I never got programs as solid as I later did with C. But they were "good enough". (Actually, a bit better than good enough. I was criticized for "gold plating" code that didn't need it.)

      A new degree of error frequency, however, entered with dynamic memory allocation. This allowed memory leaks that had previously been the provice of the compiler (and assembly language subroutined). One must write very diciplined C code to avoid memory allocation problems unless you just don't do dynamic memory allocation. And as multi-tasking operating systems became common it also became more common to have interaction problems. Etc.

      But I can guarantee you that it's quite as possible to have those problems with PL/1 if you use a multi-tasking OS. And likewise if you use Java or Python, or similar language with constraints on pointer use you can avoid those problems. (This doesn't get rid of other problems. Thread syncronization problems are still problematical ... though you might check out Erlang or inferno. I think they both claim to have general solutions. [The Erlang solution has been ported to Python under the name of Candygram, but I haven't checked it out yet.])

      But if you haven't heard of the older program failing, it's because they are older, and the flakey ones have been retired or repaired.

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    16. Re:Happens all the time... by jjp5421 · · Score: 1

      Preeching to the choir bro!!!

    17. Re:Happens all the time... by randombit · · Score: 1

      Funny how you never really hear about the applications written in COBOL, Fortran and PL/1 crashing.

      Not really. They were probably written 20 or more years ago, there has been plenty of time to catch most of the bugs. Not a function of language, techniques, or skill, just time and use. And stuff written in COBOL and PL/1 probably doesn't get touched much, so no new bugs.

      I'd love to see some uptime numbers for past systems versus the systems we have today. I wonder if they'd show the downward trend that I suspect they would.

      On average, sure. Most servers don't stay up for more than a few months, a couple of years at best. But they also cost a lot less than an S/360 did in 1965. You want a box that will stay up, buy a Tandem or something.

    18. Re:Happens all the time... by arodland · · Score: 1

      Or we're talking about different things. But I still like mine better. Best... Shatner... Ever!

    19. Re:Happens all the time... by some+guy+I+know · · Score: 1
      In addition to what the other two responders wrote, the requirements for present-day software are higher than those of 20 years ago:
      • Applications are expected to interact with other applications.
      • Tax laws and regulations are more numerous and Byzantine than they used to be.
      • The people who use the software tend to get less training than they used to, making it more likely that they will do something stupid.
      • Management is less in awe of and respects less the people who run the machines, and, with rapidly decreasing profit margins (and even without them), are likely to under-allocate the resources needed to do a project well.
      --
      Those who sacrifice security to condemn liberty deserve to repeat history or something. - Benjamin Santayana
    20. Re:Happens all the time... by Greyfox · · Score: 1

      Hmm. That's interesting. By the way, I see a lot of Java programs "crash" with uncaught NullPointerExceptions. Just because Java eliminates the burden of dealing with dynamic memory allocation doesn't mean your programs magically become crashproof in that language either. It takes a sloppy programmer, but that's most of the programmers out there.

      --

      I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

    21. Re:Happens all the time... by Hard_Code · · Score: 1

      NullPointerExceptions are a hell of a lot more recoverable than say, segfaults, or worse yet, corruption that keeps a system nominally "working" but in a corrupted state, e.g. an int wrap around, etc.

      That's sort of like saying "If you think about it, you can put all the armor in the world on a tank and it can still be blown up".

      --

      It's 10 PM. Do you know if you're un-American?
    22. Re:Happens all the time... by aminorex · · Score: 1

      C'mon, insightful? Ludicrous. Wait for 30 years,
      to allow all the easy prey to die off, and the same
      could be said for Java apps, or C#, or OCAML, whatever.

      Selection of the fittest, man. COBOL, Fortran,
      and PL/1 died off because they could not compete.
      The apps they compiled live on, but only while they are competitive.

      --
      -I like my women like I like my tea: green-
  3. Official my arse... by Omicron32 · · Score: 4, Insightful

    Sounds like my Mother wrote the official statement. A techy would never report something in that way.

    Besides, it's pretty obvious their OS wasn't digitally signed. :p

    1. Re:Official my arse... by Saven+Marek · · Score: 2, Informative

      You know I think it was. btw the system being used by Comair?

      Its one of SCO's last large scale deployments. You know who to blame now.

      Online Anime Gallery's

    2. Re:Official my arse... by AKnightCowboy · · Score: 1
      The Comair system runs on Linux using an IBM DB2 backend. No wonder it crashed. Linux isn't built to handle that kind of load. The Windows 2000 Server system they were previously running with MS-SQL handled last year's Christmas rush with no problems.

      /pulled that out of my ass, Merry Post-XMAS day! :-)

  4. Someone's gotta say it... by mOoZik · · Score: 3, Insightful

    Yep, it was Windows XP. ;)

    I don't know. Frankly, it has less to do with the platform than the custom software that runs on it.

    1. Re:Someone's gotta say it... by jcr · · Score: 2, Interesting

      Well, judging by the IT jobs they're advertising on their web site, it looks like a combination Windows/Linux/UNIX shop.

      At any rate, I suspect they'll be looking for a new IT director Real Soon.

      -jcr

      --
      The only title of honor that a tyrant can grant is "Enemy of the State."
    2. Re:Someone's gotta say it... by mOoZik · · Score: 1

      I agree. Either the system was never thorougly tested or there was a weak link that went undiscovered. In any event, heads should roll, as 30,000 people were affected and it resulted in a lot of lost revenue for the airline.

    3. Re:Someone's gotta say it... by adeydas · · Score: 1

      just the word out of my mouth. it was most probable the software or a hardware glitch that could have brought the system down...

    4. Re:Someone's gotta say it... by lachlan76 · · Score: 1

      At any rate, I suspect they'll be looking for a new IT director Real Soon.

      At what point did the head sysadmin become responsible for finding bugs in the code?

    5. Re:Someone's gotta say it... by Brando_Calrisean · · Score: 1


      At what point did the head sysadmin become responsible for finding bugs in the code?


      At what point did IT Directors become 'head sysadmins'? In my experience, IT Directors are executive-types responsible for the entire IT infrastructure of an organization - and this failure would certainly fall on them.

      --
      Don't call me a cowboy, and don't tell me to slow down!
    6. Re:Someone's gotta say it... by Pharmboy · · Score: 3, Insightful

      You would think so. The IT Director is respsonsible for making sure everything IT works. Not to do it himself, but to make sure it is done and done right. I can't see how someone can argue with that. Even if it IS the janitor unplugging the UPS to plug in a floor buffer.

      Whether it is the cooling system for the computers, the operating system, the applications or simple hardware issues, it HAS to be the IT Director's responsibility. I mean, who the hell else?

      --
      Tequila: It's not just for breakfast anymore!
    7. Re:Someone's gotta say it... by Antique+Geekmeister · · Score: 4, Insightful

      Occasionally, however, the head IT guy gets over-ridden by management or by available finances. I've been there, saying "we need to spend money on this" and having to make do with much less money, or even with a cut in funding. You need to document the problem in advance to cover your ass, and get it in print and saved offsite to protect yourself from that kind of mistake. I've done that, too. It helped protect me from a nasty lawsuit because I demonstrated where I had told a consulting client, in print, when the systems would start failing and the resulting legal liabilities, and gotten it signed by the company notary.

    8. Re:Someone's gotta say it... by Subm · · Score: 1


      They were using a cluster of Nintendo NES's.

      It was working fine until someone told them they could overclock them.

      Even then there were only minor glitches until someone pushed the overclocking to 6MHz.

    9. Re:Someone's gotta say it... by Anonymous Coward · · Score: 2, Informative

      I've done contract programming for Comair. They use HP/UX for the servers that handle most things except HR which is mostly Windows. The system that went down was a COTS app that handles crew scheduling. It is a bid system that allows crew to bid for flights based on seniority w/ constraints to match FAA rules.

      Their IT director is really sharp, but he faces some real problems. First, IIRC, they only created a dedicated IT shop about three years ago. Second, their budget is small compared to the task they have to perform. Comair is an airline and airlines have been in real trouble since 9/11.

    10. Re:Someone's gotta say it... by Lally+Singh · · Score: 1

      He's the one responsible for testing it :-)

      --
      Care about electronic freedom? Consider donating to the EFF!
    11. Re:Someone's gotta say it... by rah1420 · · Score: 1

      Their IT director is really sharp, but he faces some real problems.

      Is this the first, second or third envelope? :)

      --
      Mit der Dummheit kämpfen Götter selbst vergebens.
    12. Re:Someone's gotta say it... by Marillion · · Score: 1

      I know that most of the facts in the previous post are true, except that the IT director is a SHE.

      --
      This is a boring sig
  5. It doesn't matter... by Anonymous Coward · · Score: 2, Insightful

    They're a bunch of incompetent boobs. The news keeps reporting on a "computer glitch" or a "computer malfunction". That's bullshit. This happened because some human(s) fucked up.

    1. Re:It doesn't matter... by stfvon007 · · Score: 1

      Computer glitches and malfunctions happen, and that is forgiveable. What makes this so idiotic of the company is they did not have any backups. No competent company would go around without redundancy of critical equipment as well as backups.

      --
      All misspellings and grammatical errors in the above post are intentional and part of my artistic expression.
  6. Bringing the /. effect to the weary masses. by Anonymous Coward · · Score: 2, Funny

    Linking to their home page will surely help the situation..

  7. My theory? by Ckwop · · Score: 4, Funny

    The janitor pulled out the plug for the mainframe and used it to drive is floor polisher..

    Simon.

    1. Re:My theory? by Zorilla · · Score: 1

      Nerd (Doug): We need the outlet for our rock tumbler.
      Bart & Lisa: PLUG IT IN! PLUG IT IN!
      Nerd (Doug): What, the rock tumbler or the TV?
      Bart & Lisa: THE TV! THE TV!

      (Itchy and Scratchy theme plays, Krusty comes back on)

      Krusty: WOW! They'll never let us show that one again... never in a million years!

      --

      It would be cool if it didn't suck.
    2. Re:My theory? by bcmm · · Score: 3, Funny
      BOFH excuse #38: secretary plugged hairdryer into UPS.
      That is where you got the idea for that post, right?
      --
      # cat /dev/mem | strings | grep -i llama
      Damn, my RAM is full of llamas.
    3. Re:My theory? by rlauzon · · Score: 5, Funny

      Probably not. It's an old story (quickly retold):

      Army base computer going down every night. So the grunt in charge of it stayed the night to see what was happening. When the computers went down, he heard the hum of the floor buffer.

      The janitor had plugged his floor buffer into the same power as the computers and it caused the crashes. It was quickly fixed by telling the janitor to not do that and putting locking covers on the power outlets.

      But they dreaded telling the base commander what the issue was. So they told him it was "a buffer problem."

    4. Re:My theory? by caluml · · Score: 1

      It does happen. Even night at 8pm at a web hosting company I used to work at, all the sites on a server would go down, and come back in 5 minutes. It was a cleaner using a socket.

    5. Re:My theory? by budgenator · · Score: 1

      Yes VIKI, your logic is inarguable, we'll just run windows update, and convert all the legacy systems to .NET.

      --
      Apocalypse Cancelled, Sorry, No Ticket Refunds
    6. Re:My theory? by TheWanderingHermit · · Score: 1

      When I was just starting my business, my system was developed on a Linux box and I had to put one in each client's office (until, eventually, I had a Java program that was multi-platform). I hadn't done a lot of programming (almost none in 10 years), but the one smart thing I did was make sure my program logged everything it did, with the time it did it.

      I had a system in an office 2 hours away that went down the same night every week. After checking the logs, an idea hit me and I called the client.

      ME: Do you ever work late in that office?
      CLIENT: Sometimes.
      ME: When does the cleaning staff get there?
      CLIENT: What's that got to do with it?
      ME: I'll explain in a minute.

      It turns out the cleaning staff, once a week, was coming in and turning off the switch on the powerstrip that fed my system. I had them put tape over the switch and add a warning note to it. It never happened again.

    7. Re:My theory? by jridley · · Score: 5, Funny

      A friend was sysadmin at a manufacturing plant, and the janitor kept plugging into the power conditioned sockets with a very large, power-hungry floor polisher. He was actually blowing power supplies. Every one cost several thousand dollars in service calls to replace the power supply and downtime.

      My friend put "COMPUTER USE ONLY" stickers OVER the power-conditioned sockets. The janitor ripped them off to plug in, and blew another power supply.

      My friend finally confronted the janitor, who was a really obstinate PITA. He stood there and said "Yeah, I did it, and I'm gonna keep doing it, and I don't give a damn about you or your fu*kin' computers."

      This was a automotive union shop, very difficult to get people fired.

      But, in a show of karma rarely witnessed by mortals, the VP of the division was standing within earshot but out of sight. When the janitor finished saying he didn't give a damn that he was costing the company $10,000 a week because he was too lazy to go get an extension cord, the VP walked around the corner and said hi. I don't know whether the guy ran to his car or the VP kicked his ass right over the top of it.

    8. Re:My theory? by Decimal+Dave · · Score: 1

      This happened to me once, except the janitor didn't have to unplug anything. What he did was plug the floor waxer into an open socket on the UPS that fed our editorial system. The waxer overloaded the power supply and it shut down. It took us a while to determine exactly why the UPS failed since the diagnostics said everything was ok. Then we noticed how clean the floor was...

      --

      "Leave the strategizing to those of us with planet-sized brains." -Tycho
    9. Re:My theory? by itwerx · · Score: 1

      ...logic is inarguable

      VIKI actually said, "My logic is undeniable."

    10. Re:My theory? by Anonymous Coward · · Score: 1, Funny

      We saw the same problem a few years ago, in DB2 Support. Customer kept getting a system crash around the same time of day, nothing of consequence in either the db2diag.log or the system logs. The customer couldn't recreate, and traces were getting us nothing. Eventually, as the customer kept blaming us (of course) and went critsit, we asked him to physically monitor the system terminal to see what was going on. As he was on a conference call, a janitor came in and (IIRC, got this story second-hand) plugged his vacuum into the UPS circuit.

    11. Re:My theory? by ces · · Score: 1

      We saw the same problem a few years ago, in DB2 Support. Customer kept getting a system crash around the same time of day, nothing of consequence in either the db2diag.log or the system logs. The customer couldn't recreate, and traces were getting us nothing. Eventually, as the customer kept blaming us (of course) and went critsit, we asked him to physically monitor the system terminal to see what was going on. As he was on a conference call, a janitor came in and (IIRC, got this story second-hand) plugged his vacuum into the UPS circuit.

      I've seen similar in person. At the university lab I used to spend way too much time in back in the late 80's and early 90's I watched a cleaner plug his vaccum into the same circuit as a bunch of the lab's X-terminals, macs, and NeXTs were on. There was a 'pop' sound and about 30 X terminals, 15 macs and 10 NeXTs all suddenly lost power. The resulting power surge killed a number of devices that happened to be on the same circuit along with many others that were coupled to them electricly via 10b2 cable outright. For at least a year afterward devices would die and when pulled apart would show evidence of arcing on the circuit board from a power surge.

      Needless to say the university managed to rewire the lab areas in that building with outlets off of the big power conditioner in back for the datacenter gear in record time. Fortunately the janitors already knew better than to plug into the special orange outlets if they wanted to keep their jobs.

      --
      Happy Fun Ball is for external use only.
    12. Re:My theory? by jridley · · Score: 1

      The point was that there was an exceedingly credible witness. There were others there, but everyone else around at the time was union and would not have backed up my friend's story. He just got lucky that there was management around that the guy didn't spot before confessing loudly and obstinately.

      Certainly his boss could have fired him. His boss being around the corner would have been just as good, but having a corporate officer there just made it that much sweeter.

    13. Re:My theory? by oh · · Score: 1
      BOFH excuse #38: secretary plugged hairdryer into UPS.
      That is where you got the idea for that post, right?

      This may sound stupid, but it happens.

      I was once helping to move the computer of the CIO's PA, and I she had a little 2000Kw heater under her desk. Of course, she didn't know that there was a difference between the red power points and the white ones. She was quite surprised when I explained that the red points are reliable power, and are generator backed, she had just used the closest point. (Which was red).

      In another company the generator runs were performed over a weekend, with advance notice given to staff. So food left in the fridge wouldn't go off people would simply plug the fridge into reliable power rather then clean it out for the weekend.

      There are other stupid things, like running the Aircon off the UPS rather then just the generator (if the generator doesn't start you will drain the UPS before the room gets too hot). You may have UPS power, but will your door entry system work? Getthing power right can be hard.

      --
      Democracy isn't about no one telling you what to do. It's about everyone telling you what to do.
  8. stating the obvious by Anonymous Coward · · Score: 5, Insightful

    "Does anyone know what platform their system was based on? What kind of system just totally crashes?"

    A stab in the dark here but I'm assuming a system without foresight and redundancy?

  9. It's obvouis... by bcmm · · Score: 3, Funny
    What kind of system just totally crashes?
    Oh come on...
    That doesn't need answering.
    --
    # cat /dev/mem | strings | grep -i llama
    Damn, my RAM is full of llamas.
    1. Re:It's obvouis... by Zorilla · · Score: 1

      Oh come on...
      That doesn't need answering.


      Damn! We warned them to test KDE 3.3 out before upgrading!

      (Ok, so just more obnoxious than anywhere near fatal)

      --

      It would be cool if it didn't suck.
  10. Software. by eightball01 · · Score: 1

    I blame the software. Sounds like a more likely culprit than the OS, even if it is Windows.

    1. Re:Software. by canuck57 · · Score: 1

      I blame the software. Sounds like a more likely culprit than the OS, even if it is Windows.

      I don't doubt it was the software but the real cause was the management of that software component. Was it tested or to save a few bucks was testing avoided to save money. Or were the testers telling management what they wanted to hear rather than the truth. Or perhaps they needed bigger computers or smarter software developers but they got them cheap.

      Don't know Comair too much but it is safe to say they should be looking at the management practices that lead to this if they really want to fix the problem.

      But it is easy to blame the software as it is not vspsblr of Lie a Lot, denial and usually deals poorly with irrational behavior and input.

  11. It was running on SCO Unix... by bani · · Score: 4, Funny

    They obviously didn't take mcbride's "license or we will have you shut down" threats seriously enough.

  12. blaming the system can backfire by ext42fs · · Score: 5, Insightful

    It's not the OS, it's the people behind who's to blame. Yes, stupidity and MSW often go together but in a few years one will probably occasionally see a massive linux outage due to... similarly stupid people.

  13. Scalability and Twelve Step TrustABLE IT by NZheretic · · Score: 2, Interesting

    Sounds like Comair could have used a little virtualized scalability and third party audited builds.
    See Twelve Step TrustABLE IT : VLSBs in VDNZs From TBAs.
    and also The ActiveGrid(TM) Grid Application Server and Grid Computing in general.

    1. Re:Scalability and Twelve Step TrustABLE IT by hughk · · Score: 4, Insightful
      No, its more difficult in the airline industry. The system by default tries to keep as many planes in the air earning money as possible. If you have an outage which disrupts this choreography, there is a tremendous knock-on effect as passengers/urgent cargo must be rebooked.

      I have seen the major hub for an airline closed because of snow for just a couple of hours in the early morning, but the resulting chaos of rescheduling/rebooking caused the reservations system to crash after just a few minutes of uptime. The same would keep happening after restarts.

      It is normal to test system up to several times normal load, but they were seeing peaks at over 100x. The old, 3270 emulator based system would have slowly got through it but the newer system died.

      --
      See my journal, I write things there
    2. Re:Scalability and Twelve Step TrustABLE IT by gl4ss · · Score: 1

      they aren't building them for normal use so why didn't they test it under the chaos that comes when there is downage?

      it's not an excuse to miss research on what the system could be hit with.

      --
      world was created 5 seconds before this post as it is.
    3. Re:Scalability and Twelve Step TrustABLE IT by Rebar · · Score: 1

      The old, 3270 emulator based system would have slowly got through it but the newer system died.

      Wow, I didn't know that 3270 emulators were even programmable, and surely wouldn't try to base an airline reservation system on them. Seems far better to use something like a mainframe than a grid of terminal emulators, although there must be a few distributed mips there...

    4. Re:Scalability and Twelve Step TrustABLE IT by hughk · · Score: 1

      The old system used keyboard macros but many experienced users just typed the commands completely themselves. The backend ran on heavy metal (Unisys) and that did most of the work. It seems a waste of the PC but the system functioned well.

      --
      See my journal, I write things there
  14. This is getting a little to common for them. by jhobbs · · Score: 4, Interesting

    Back on May 1st of this year Delta's internal traffic monitoring system grounded them worldwide when it was hit by a worm (forget which one). Yours truly was flying that day. I spent 7 hours on a runway in Cleveland. (Talk about adding insult to injury.) Comair is a regional carrier of Detla's. I wonder who handles Delta's IT needs?

    1. Re:This is getting a little to common for them. by sacremon · · Score: 1

      When I was working there as internal tech support about six years ago, there was a dedicated division, named Delta Technology, that did all the development work. That included not only the software that ran on the mainframes and Unix boxen but also the hardware like the ticket scanners that they've implemented in that time. Given how well they knew how to deal with their desktop machines (Win2K Pro boxes), the vast majority of the software developers didn't know squat about Windows. Of course, that doesn't stop them from developing software for it...

      --
      If you can't beat them, embrace and extend them.
    2. Re:This is getting a little to common for them. by Deadstick · · Score: 1
      I spent 7 hours on a runway

      No you didn't...but you may have spent 7 hours on a ramp or taxiway.

      rj

    3. Re:This is getting a little to common for them. by jhobbs · · Score: 1
      No you didn't...but you may have spent 7 hours on a ramp or taxiway.

      Allow me to rephrase. . . I spent 7 whole hours in a non-moving plane that was parked on a concrete surface and filled with just under 200 really pissy people screaming at flight attendants.

  15. Travel tip by Anonymous Coward · · Score: 1, Interesting

    FAA's Rule 240 says that if your flight gets canceled for any reason other than weather, the airline has to get you on the next available flight to your destination, regardless of carrier. So if you're stuck in an airport bar reading this article go talk to your airline!

    1. Re:Travel tip by xlation · · Score: 5, Informative

      From: http://www.fly.faa.gov/FAQ/faq.html

      The term "Rule 240" refers to a rule that existed before airline deregulation. There is no longer an actual Rule 240. The term, as it is now used, refers to each airlines "conditions of carriage" policy. You would need to contact the airlines to obtain this.

    2. Re:Travel tip by reallocate · · Score: 1

      True, but...It's Christmas, everyone is booked up, and thousands of flights were already cancelled due to weather.

      --
      -- Slashdot: When Public Access TV Says "No"
  16. Re:The system runs Linux by bcmm · · Score: 1

    No, it means they didn't make a big enough swap partition.

    --
    # cat /dev/mem | strings | grep -i llama
    Damn, my RAM is full of llamas.
  17. Failure due to inability/unwillingness to test/QA by Anonymous Coward · · Score: 1, Insightful

    It is not easy to do real world extreme situation testing on large systems, but I wish people would at least try.

    It is fun to say Windows, blah, blah but given the number of buffer overflow problems found in programs/packages on all platforms, I would say that many programmers of every stripe severely underestimate the real world range/type/size of data their programs will encounter when in non-typical situations.

    To whoever wrote/maintains/admins this software:Global "climate change" means weather "events" will be more frequent and more exteme in coming years, another terrorist event on US soil may cause days of air travel disruption. Please "refactor" your shit with those things in mind. You're on the East Coast and Midwest for god's sake you're going to get storms that will shut down regions for days at a time. What happens when the FAA finds some issue with an aircraft part or maint. procedure and grounds your whole damn fleet to have it fixed.

  18. BUG! by Piranhaa · · Score: 1

    Now, I could be wrong, but I heard from a lot of people that it is the year 2005 bug... How did we not see this come?

  19. Platform and software? by MadFarmAnimalz · · Score: 1

    This and this might offer a clue.

    --
    Blearf. Blearf, I say.
    1. Re:Platform and software? by Kalak · · Score: 1

      It doesn't look like the staff scheduling software, unless customers need to use a web interface to schedule their own flight attendants and pilots.

      --
      I am, and always will be, an idiot. Karma: Coma (mostly effected by .hack)
  20. Scalability on demand and third party servers by NZheretic · · Score: 1
    First of all, it all depends on what are the bottlenecks in the proccessing of the transactions. That is dictated by the combination of the hardware and network bandwidth and overall design of the existing software system. The worst cases are bottlenecks in the design of the software, where all transactions have to pass some/all data through a single proccess/proccessor. If the problem is just hardware scaleabilty or reliability is the problem then grid/cluster computing can help.

    If you choose a standardized virtualized platform then you need not be limited to using in house clusters. Check out ActiveGrid(TM) info page, it includes support for third party distributed hosting provider such as Akamai, . Other providers in the future, will provide massively scaleable systems such as Cray's Red Storm Cluster. All running Linux.

  21. Where did the system fail under stress? by NZheretic · · Score: 1

    Was it the design of the software or the limitations of the hardware? See my post on Scalability on demand and third party servers.

    1. Re:Where did the system fail under stress? by hughk · · Score: 1

      I do not know the nature of the Comair system, but software design is the major issue with systems that degrade catastrophically rather than gradually. Please remember that major airlines used to run with much slower hardware up to the eighties (indeed, much less processing power than my PDA), however they did have very high I/O throughput and intelligent frontends.

      --
      See my journal, I write things there
  22. But management saved 13.7% by hiring H1-Visas by Soyobob · · Score: 2, Interesting

    Too bad the airline will go bust because of this. But then all airlines lose are loosing billions except for Southwest.

    1. Re:But management saved 13.7% by hiring H1-Visas by canuck57 · · Score: 1
      I noticed in your subject:

      But management saved 13.7% by hiring H1-Visas

      So maybe upper management will outsource management as they can no longer blame the American worker. Poor I/T and computer management practices are not problems of the workers, it is a problem of management and H1 visa have really nothing to do with it. But poorly skilled American workers need to hone their skills even if the company train programs don't exist as many are poorly skilled for their job titles. I actually met a Senior Web Developer that didn't know which part was the host and domain of a URL and they were American. You can go overseas and get this cheaper.

  23. Crew assigment is a hard problem by rsilva · · Score: 5, Informative

    'There was a cumulative effect with the canceled flights and trying to get crew assigned that caused the system to be overwhelmed.'

    I am only trying to make sense out of the above comment from the official statement above.

    Crew assigment is a hard problem, it is usually an MILP (Mixed Interger Linear Programming) .

    Such problems may be very hard to solve in reasonable time. Maybe (I'm shooting in the dark here) the first delays made the crew assigment problems grow too large for being solved in reasonable time.This would generate a snow ball effect as the assimgment problems would keep on growing maing the system "crash".

    We may never know what really happened but this would be a nice example for my classes :-)

    1. Re:Crew assigment is a hard problem by timeOday · · Score: 1

      You can't blame something like this on algorithmic complexity. Finding an optimal solution make require an impractical amount of time, but a workable solution within a few percent of optimal is normally much easier. In the long run, a few percent may mean the difference between life and death for an airline, but you must retain the ability to cope with short-term emergencies, even if it means a lot of scrambling around and some wasted money in the short term. Most businesses can't afford many complete meltdowns like this Comair scheduling disaster.

    2. Re:Crew assigment is a hard problem by sporty · · Score: 1
      Not particularly. You can also use graph theory for this. Once you have a means to convert the data into a graph, and a means to convert something like, vertex colouring back into people assignments, it's as easy as your graph colouring algorithm to be. You can do it either brute force, or via a heuristic in reasonable time with a good enough computer. Even in a huge, huge graph.


      Then it's a matter of feeding the data at a rate so you don't backlog faster than the amount of data you get in.


      If you are wanting an explanation of turning it into a graph, I can spend an hour or so turning the various constraints into one. Just need to know/think about all the constraints, such as, making sure enough staff are moving into the right places to go to a "next flight".

      --

      -
      ping -f 255.255.255.255 # if only

    3. Re:Crew assigment is a hard problem by sporty · · Score: 2

      Gah, I said vertex colouring when I meant network flow. My bad. :)

      --

      -
      ping -f 255.255.255.255 # if only

    4. Re:Crew assigment is a hard problem by coyote-san · · Score: 4, Interesting

      It's far harder than that alone since you also have to get the aircraft back to the right city (many are in the wrong city due to airport shutdowns due to the weather). Obviously you want to optimize the number of passengers carried along for those flights, but at the same time you'll be "burning" allowed worktime for the crew.

      Even worse the crew and aircraft are independent variables. Obviously you need a crew to operate a flight, but the crew may end up in the "wrong" city for the usual schedule. It may be better to leave a plane on the ground and fly its crew "deadhead" to the "right" city than to have them fly a load of passengers to the "wrong" city.

      There are reasonably efficient algorithms to solve these problems, but we spent most of my entire second-semester graduate-level algorithms class studying them (network flows). The algorithms most developers would come up (including me after a decade of experience and graduate-level algorithm class) are extremely inefficient and scale horribly.

      The bottom line is that it's easy to imagine a system that has no problem with pertubations from the regular schedule but is totally overwhelmed when starting from scratch. I hope the bean counter who saved the company a few bucks by insisting on far more modest hardware gets canned for his costly lack of foresight, but we all know that IT will catch the heat.

      --
      For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
    5. Re:Crew assigment is a hard problem by forkazoo · · Score: 1

      On one of the old PBS Math programs, possibly Sol Garfunkle's show, they did a piece on the crew assignment problem. They tried to feed a simplified version of the problem into a massive mainframe, which ran out of memory, and they were never able to get an answer out of it. Granted, this was probably in the 80's, when a massive system had multiple megawords of core, but it impressed me none-the-less!

    6. Re:Crew assigment is a hard problem by Tablizer · · Score: 1

      Software only tends to do well under conditions it has been road-tested under. I suspect one section/component/application get overwhelmed with requests that it was not designed to handle, and dumped the problem off to another application, that similarly was not designed to handle such. Maybe each section got stuck in retry loops and caused a feedback cycle of the problems. And/or, perhaps a deadlock where component A was waiting on component B, and B was waiting on C, which was waiting on A.

    7. Re:Crew assigment is a hard problem by coyote-san · · Score: 1

      They're coupled, but independent in the sense that you have to solve the problem of getting the planes into place _and_ the problem of getting the people into place. The crew that flew a plane in doesn't have to be the crew that flies it out. You can deadhead crew but you can't strap a 737 onto the back of an A3.

      --
      For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
    8. Re:Crew assigment is a hard problem by coyote-san · · Score: 1
      I guess... that they are not glamorous enough to tickle the imagination of the best and brightest.


      No, remarkable progress has been made. The problem is that the appropriate use is rare enough that it's not worth the time to teach the algorithms to most students - the time would be better spent on other algorithms.


      But the results can be dramatic in those one-off situations that can use the algorithm. My professor mentioned rewriting the med student-training hospital matching program and the intern-specialty scheduling programs using network flow algorithms. Overnight runs now take seconds.

      --
      For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
  24. Leasing third party servers for stress testing by NZheretic · · Score: 1

    One more advantage of a Virtualised Standard platform, would be the ability to do development and stress testing on third party servers. Full on stress testing is something that most organizations cannot afford to do on the currently deployed hardware.

    1. Re:Leasing third party servers for stress testing by hughk · · Score: 1
      The problem is that the daily schedule of an airline is extremely complicated. One issue is that many airlines have downsized their older and more experienced staff so they lack the ability to run the airline without their extensive IT systems. Even with the knowledge, you still need to be able to reschedule slots with the airports as well as new flight plans (also usually filed by computer).

      It is then an issue as to whether you really want to design IT systems for every scenario. It costs a *lot* of money to do this and is usually only warranted in a safety critical domain (i.e., ATC). Comair's solution was to scratch the flights and thus ensure that aircraft were at their start positions for the next day.

      --
      See my journal, I write things there
    2. Re:Leasing third party servers for stress testing by budgenator · · Score: 1

      Some how a blizard durring the Christmas or Thankgiving rush would seem like a likely event rather than a bizzare event for an airline.
      The real impact is,
      1. The customers will see this not as an unusual event but as they screwed up my annual vactaion and they will remember for life. How are they going to sell a customer after they screwed up the last christmas a loved relative was a live?
      2. When the airlines beg for a bail-out the customers will remember when they had a chance to make some money they blew it

      --
      Apocalypse Cancelled, Sorry, No Ticket Refunds
    3. Re:Leasing third party servers for stress testing by hughk · · Score: 1

      As mentioned, a lot of airlines have been able to cope with this kind of thing in the past. To shut down for the day seems to be rather an exceptional response, and I hope the market remembers them.

      --
      See my journal, I write things there
  25. Can anyone say... by carlmenezes · · Score: 2, Funny

    ...slashdotted reservations?

    --
    Find a job you like and you will never work a day in your life.
    1. Re:Can anyone say... by ScrewMaster · · Score: 1

      Since we're talking about airline systems, I'd say "crashdotted" would be a better word.

      --
      The higher the technology, the sharper that two-edged sword.
  26. Re:The system runs Linux by carlmenezes · · Score: 1

    Yeah, but the real cause for the crash was an Access backend. So there! :)

    --
    Find a job you like and you will never work a day in your life.
  27. 30,000? by __aafkqj3628 · · Score: 4, Funny

    30,000 passengers? Getting dangerously close to an integer overflow there.

    1. Re:30,000? by La+Gris · · Score: 1

      Sure, now the counter says they put -32768 passengers in the void.

      The overall cost of all this will be less than #NaN and ny chance, the security risk has been rated as low as #DIV0 %.

      --
      Léa Gris
    2. Re:30,000? by sporty · · Score: 1

      This is the 21st century. Integers are 64 bit now. 30k is close to a short, a signed one at that. Such waste. What are schools teaching you these days. Back in my day, an integer was 1 bit! and you liked it!

      --

      -
      ping -f 255.255.255.255 # if only

    3. Re:30,000? by Euler · · Score: 1

      Wouldn't surprise me at all if there were still 16 bit integers in an old creaky database system.

    4. Re:30,000? by forkazoo · · Score: 1

      Don't worry, they are running on an 18 bit minicomputer. They can handle 256 thousand passengers before they get angry at the programmer! Good solid PDP-8, or something. :)

    5. Re:30,000? by edp · · Score: 5, Funny
      "30,000 passengers? Getting dangerously close to an integer overflow there."

      That is not a bug but an accurate model of reality. When you strand 32,768 passengers, they will turn negative.

    6. Re:30,000? by T-Ranger · · Score: 1

      If it is an IBM mainframe app, or an app for a different mainframe who followed IBMs lead, or an app for a PC written by a mainframe programmer, it would likely be 31 bits. Mainframes have been 32 bit systems since the late '60s.

    7. Re:30,000? by handy_vandal · · Score: 1

      When you strand 32,768 passengers, they will turn negative.

      Made me laugh!

      -kgj

      --
      -kgj
  28. System Tracked Crew Location, Not Reservations by reallocate · · Score: 5, Informative

    Of course, a techie didn't write the PR release. Who in their right mind would let a techie anywhere near a PR release?

    BTW, Comair, a Delta feeder headquartered outside Cincinnati, says the system that crashed was used to monitor crew locations and track working hours to ensure no one went over the legal maximum. Comair says the system crashed as a result of massive crew rescheduling following a record snow in their service area on Wednesday. There is no backup.

    --
    -- Slashdot: When Public Access TV Says "No"
    1. Re:System Tracked Crew Location, Not Reservations by Impy+the+Impiuos+Imp · · Score: 2, Funny

      Gosh, looks lke idiot programmer assumed a 256 length crew relocation array was big enuf fer anybuddy!

      --
      (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
    2. Re:System Tracked Crew Location, Not Reservations by Pharmboy · · Score: 4, Funny

      You know, I have my OWN reservations about flying on an airline when they have no backups and can't keep their computers from crashing. Whats to keep their planes in the air?

      The last thing I want to hear at 30k feet is that my current flight has been cancelled...

      --
      Tequila: It's not just for breakfast anymore!
    3. Re:System Tracked Crew Location, Not Reservations by shyster · · Score: 3, Funny
      You know, I have my OWN reservations about flying on an airline when they have no backups and can't keep their computers from crashing. Whats to keep their planes in the air?


      The Bernoulli Principle. And I don't think computers crashing are going to affect it. This isn't the Matrix, after all.

    4. Re:System Tracked Crew Location, Not Reservations by Pharmboy · · Score: 2, Interesting

      I think you are overthinking it. My point is simply that a company that can not be trusted to keep their computers fully functional, can not be trusted to keep their aircraft fully functional. This is based on the premise that it is easier to keep the computers running than the aircraft, which I can easily assume, based upon my own experience.

      I also don't eat at diners where the help isn't properly groomed. Same principal: if you can't take of simple stuff, you probably can't take of something more important and/or complex.

      --
      Tequila: It's not just for breakfast anymore!
    5. Re:System Tracked Crew Location, Not Reservations by logicnazi · · Score: 2, Insightful

      Do you also refuse to eat at a relatives house if their computer is virus laden or crash prone? After all if they can't be trusted to keep their computer working why should you trust them to make safe, sanitary food.

      Perhaps if computer usage/programming had evolved to the level of personal hygenie, namely routine effort anyone could do would prevent computer crashes, your point would be convincing. However, in practice we realize even the best professional programmers make errors even buffer overflows (and we don't even really know it's an 'error' perhaps the program exited gracefully after realizing the demands exceded its capacity and simply hadn't been programed to handle this size situation). So unlike your hygenie example this hardly impeaches the basic organizational discipline/compotency.

      Had this really been a computer engaged in flight critical tasks I would feel quite differntly. Programming error or even an unanticipated shutdown is not acceptable in systems necessery for real-time flight control. Since this was instead a system to reassign crew and guarantee compliance with federal labour law I feel much differntly. In fact if this system had been subject to a rigorous source code review by an outside team to check for bugs, or linked into some sort of failover system with differntly programmed systems accomplishing the same task I would worry that their priorities are being misplaced.

      Arguably an airline, given their limited budgets, which puts too much redundancy into their non-critical systems has an incorrect set of priorities.

      --

      If you liked this thought maybe you would find my blog nice too:

    6. Re:System Tracked Crew Location, Not Reservations by Mr.+Slippery · · Score: 1
      Whats to keep their planes in the air?

      The Bernoulli Principle.

      Not so much. Read the next page at the site you linked.

      --
      Tom Swiss | the infamous tms | my blog
      You cannot wash away blood with blood
    7. Re:System Tracked Crew Location, Not Reservations by benjamindees · · Score: 2, Insightful

      Perhaps there's a better principle you could apply, namely that anyone, be it company or person, only has a finite amount of resources (time, money) at their disposal, and choose to dedicate them to specific tasks.

      Perhaps unkept diners are more concerned with the quality of their food than the ambiance. Perhaps the IT guy with twelve certifications knows more about getting certifications than about working on computers. Perhaps the vendor that sends you a Christmas card every year is pulling employees off of doing real work in order to make it look to you like they have their shit together. Perhaps the antisocial guy with the unkept hair and the socks-with-sandals is more concerned with proving his latest theory than with what you think of him.

      Perhaps appearances can be deceiving.

      --
      "I assumed blithely that there were no elves out there in the darkness"
    8. Re:System Tracked Crew Location, Not Reservations by jonbrewer · · Score: 1

      My point is simply that a company that can not be trusted to keep their computers fully functional, can not be trusted to keep their aircraft fully functional.

      And there you are very wrong. There are a million controls on maintenance of aircraft. Every part of a plane has a history and a pile of paperwork to go with it.

      If such documentation and controls existed for every component of every software package used by the airlines:

      1. New Airline Information Systems would be built by two or three programming shops in the world.
      2. They'd be massively expensive
      3. They would have lifecycles of 30+ years.
      4. The government would audit IT organizations working on the systems and would investigate every system crash.

    9. Re:System Tracked Crew Location, Not Reservations by Anonymous Coward · · Score: 1, Insightful
      I also don't eat at diners where the help isn't properly groomed. Same principal: if you can't take of simple stuff, you probably can't take of something more important and/or complex.
      Principle, not principal. You also left out the word "care", not once but twice. If you can't take care of simple stuff like grammar and spelling, you probably can't take care of something more important and/or complex (like programming, maybe?).
    10. Re:System Tracked Crew Location, Not Reservations by Shark · · Score: 1

      The Bernoulli Principle. And I don't think computers crashing are going to affect it. This isn't the Matrix, after all.

      Pfff, we all know which pill *you* picked...

      --
      Mind the frickin' laser...
  29. Y2K by MicklePickle · · Score: 1

    With the amount of Y2K 'fixes' I have seen around, (and some of them very dubious), I wouldn't be at all suprised if it was a Y2K problem. Looks like someone didn't have all their test cases written down properly and/or didn't test properly and/or tried to 'fix' the problem.

    --
    -- main(s){printf(s="main(s){printf(s=%c%s%c,34,s,34) ;}",34,s,34);} $p='$p=%c%s%
  30. I don't know about their internal system... by Glowing+Fish · · Score: 2, Interesting

    As a preliminary finding that may or may not give us a clue as to what the internet system was running, Netcraft reports that www.comair.com is running Apache on HP-UX.
    So don't assume that the internal system was Windows just yet. Then again, don't assume that it wasn't.

    --
    Hopefully I didn't put any [] around my words.
    1. Re:I don't know about their internal system... by Zachary+Kessin · · Score: 1

      Also don't assume it was the OS that died, it is very posible that the computers were up, just not responding to the client software or otherwise screwed up. I would guess that it was thier own custom software that died.

      --
      Erlang Developer and podcaster
    2. Re:I don't know about their internal system... by angrykeyboarder · · Score: 1

      Their web server is unrelated to thier Crew Scheduling system or reservations systems.

      --
      Scott

      ©20014 angrykeyboarder & Elmer Fudd. All Wights Wesewved
  31. whole story? by confusion · · Score: 4, Informative
    This comair story is all I'm seeing getting press. I think its a lot bigger than that.
    My sister flew Delta on Dec 23rd from Detriot to Atlanta. Plane was 2 hours late, but no big thing. Waited 5 hours for her luggage, with no dice. By the time we got in line for luggage services, there were at least 600 people in the line already.
    Talking to other passengers from 10+ different flights from different cities, no one got their luggage that night. Apparently, it wasn't just Atlanta - the local news in Tampa and Detroit had segments on how the airports had taken over parts of taxiways to sort through seas of bags that didn't make it on to planes.
    It's been 2 days, and Delta has no idea where the stuff from that flight is. I'm guessing it isn't just Comair that got hit by some computer problems.

    Jerry
    http://www.syslog.org/

    1. Re:whole story? by garcia · · Score: 4, Interesting

      Personally I think that Delta was being a bunch of assholes about the whole thing...

      Seeing that my 7pm flight was cancelled for the 23rd I spent 20 minutes redialing from two different phones until I got past a busy signal. After 50 minutes on hold I got through to a representative who scheduled me for the 24th's 7pm flight. I spent the rest of the time rearranging time off from work, the dog's time to be spent at the kennel, car rental stuff, and phone calls to my fiance who would meet me at the airport, and to family we were supposed to see.

      At 7am on the 24th the flight was already cancelled. At this point I didn't give a shit anymore. Delta was saying I would have to use my tickets by the 15th of January because "it wasn't their fault". I knew it wasn't the fucking weather down there as plenty of people were saying it was fine in the area. So I call again and get through after redialing for 65 minutes. I get through to a rep after 50 more minutes in queue. She tells me she can't do anything but schedule me for the 25th at 7pm so I'd have to get in queue for the reissue desk. Fine...

      After 2 hours and 11 minutes in queue (with no hold music or sound for that matter) someone calls on my home line at 5:15pm from Delta to tell me my 7pm flight is cancelled (cute, I would have been at the airport by then). I tell that rep to get me into the reissue queue as I've been on hold with them for 2 hours.

      I finally get through and tell them I want my money back. They tell me I need to speak to customer service. After waiting on hold (with the reissue rep) for 25 minutes the reissue rep offers to refund my money.

      We can't fly out for New Years as the kennel is booked and I'd feel horrible asking someone to watch our dog in our house for me than 1 night. So basically we have to wait quite some time to fly down there again.

      It was a little bit of a pain in the ass to wait on hold and be jerked around for two days for something that was their fault when they continually claimed wasn't. BAD WAY TO TRY AND PLEASE A CUSTOMER.

      Thanks for ruining our Christmas.

    2. Re:whole story? by wwahammy · · Score: 1

      So they acted like a regular airline?

    3. Re:whole story? by realdpk · · Score: 1

      It's amazing to me how much energy has been spent trying to convince us that airport security is better now, and how not having nail clippers and stuff on planes is making us safe, and yet they still can't get something as basic as "put the bag on the plane with the passenger" done right.

    4. Re:whole story? by HeghmoH · · Score: 2, Informative

      These days, "we hate the customer" seems to be the motto of all of the big airlines.

      This summer, I was flying from Paris to Ft. Lauderdale via Philadelphia on USAir. The Paris->Philadelphia leg was handled by the same plane that does USAir's Philadelphia->Paris flight that same day. The incoming flight was about four hours late, so of course our outgoing flight was also four hours late. Sucks, but what can you do.

      So we get into Philadelphia at about 9PM instead of 4:30PM and everybody rushes to get any last-minute connections they can. I was already stuffed and had to wait for the next day's flight, but a lot of people had chances to make late flights to their destination. We all get off the plane, go through customs, get to USAir's rebooking desk.

      Two people are working this desk.

      An airplane with three hundred people comes in four hours late. USAir knew that this flight would be late almost a full day in advance, since it was a cascade effect from the other flight's delay. And yet, none of USAir's genius managers had the presence of mind to call in a few extra employees that night to speed things along.

      A lot of people missed connections they otherwise could have made, because they had to wait in line for an hour to get new tickets.

      Since USAir obviously hates their customers exceptionally strongly, I won't be flying with them again.

      This isn't really an isolated incident, either, just the most recent bad one. The entire industry has a serious problem with this, and I have a feeling it's going to take a couple of high-profile bankruptcies before they get a grip on it.

      --
      Mod down posts with a "Free Mac Mini/iPod" sig, they're spam!
    5. Re:whole story? by the+pickle · · Score: 1

      Apparently the United and US Airways bankruptcies weren't high-profile enough...

      p

    6. Re:whole story? by winwar · · Score: 2, Informative

      "I have little sympathy for people that whine about holiday travel when they didn't plan for things like this."

      Okay troll, I'll bite. Maybe he had a limited amount of time off. Maybe that was the most convenient time to fly. Whatever. It doesn't matter.

      He shouldn't have to plan for weather, high traffic, and/or computer screwups. That is the airlines JOB. You know, the people who took the money and agreed to get him from point A to point B. Bad weather in the winter? From the massive effects it has on the airlines, you think this is the first time they have ever experienced it.... Running a computer system they KNOW will fail under load?!? Other airlines running out of deicing fluid?!? Excuse me, it IS THE AIRLINES FAULT. When your system is such that one winter storm will screw it up, and it happens repeatedly, and you do nothing to change it, it is broken and your fault.

      But they don't care. And that was the grandparents point. Admit it is your fault, refund his money, and let him make other plans.

      People accept that there will be problems-lying to them just pisses them off and guarantees that they WON'T believe you if it ever really isn't your fault. Accepting blame tends to build respect.

    7. Re:whole story? by babbage · · Score: 1

      According to news stories I heard this weekend, there were two problems going on: the meltdown at Comair/Delta due to weather (and as known now, software), and a work stoppage among baggage handlers trying to negotiate a new contract. It sounds like you may be dealing with fallout from the baggage issue, which was purely a human matter, not a software one.

  32. Re:Slashdot this by Zorilla · · Score: 1

    Don't smirk; it'll be the next topic on Ask Slashdot.

    --

    It would be cool if it didn't suck.
  33. Not surprising, coming from Comair by Anonymous Coward · · Score: 5, Interesting

    Some of my co-workers are on contract developing Java software for Comair.

    Comair are very tied to particular systems, and don't want to change even when the developers have pointed out problems. Case in point: a J2EE-based employee portal, based on Novell exteNd (Novell Portal Service) and a one-way HPUX server. NPS runs in Tomcat, which is servicing requests (via mod_jk) through Apache. No other application shares the machine, and Comair will only consider vertical scaling, not horizontal.

    The application creates at least two threads per connection, and when the thread count goes beyond a relatively low threshold (between 300 and 400), Tomcat deadlocks. It's not because they're running out of space in the allocated JVM heap, and they've tuned mod_jk to allow for heavy load. The current solution is to restart Tomcat when the system locks up.

    Novell's support has been less than stellar, so the Java contracting group was informally asked what to do. We had all kinds of useful suggestions, from dumping NPS for another portal implementation, to creating custom thread-pools, to using JDK 1.4 new I/O and a minimally-threaded design, and even using round-robin DNS and a group of independent portal servers to share the load. Comair are wedded to particular minimal cost solutions, however, and it shows.

    At least when the portal crashes, it only impacts employees and not passengers.

  34. Read the code, Luke (episode II) by chiph · · Score: 4, Funny

    Somewhere deep in the code is a comment that says:

    // I don't need to check for this condition because
    // my asshole manager Steve Johnson says it'll
    // never happen


    {friggin' slash - When I say plain old text, I mean plain old text!}

  35. I'm surprised by antifoidulus · · Score: 2, Interesting

    that in the name of sensationalism reporters haven't said, "terrorism is probably not to blame but the Dept. of Homeland Security is looking into it." It seems that after Sep. 11th, the news wants to try to connect everything even remotely bad with terrorism, and of course the Dept. of Homeland Security encourages them by using as vague of language as possible. Are people that easily frightened?

    1. Re:I'm surprised by HangingChad · · Score: 3, Insightful
      It seems that after Sep. 11th, the news wants to try to connect everything even remotely bad with terrorism

      What else do they have to do? They've got this huge ass budget, all those people watching a lot of honest citizens. It was 10 years between the first attempt on the world trade center and the second. We've built and paid for this entire monster agency for an event that might be 10 or 15 years away. What are they going to do in the meantime? Grope women at the airport. They have to do something to justify their existence, Otherwise we'd have admit we over-reacted to 9-11.

      --
      That's our life, the big wheel of shit. - The Fat Man, Blue Tango Salvage
    2. Re:I'm surprised by Quixote · · Score: 1, Insightful
      Are people that easily frightened?

      As Nov 2 showed, yes they are.

    3. Re:I'm surprised by ScrewMaster · · Score: 1

      Well ... on the other hand, your reply has nothing to do with the situation at hand either, and you're currently modded 0, so I guess the moderation system works after all.

      --
      The higher the technology, the sharper that two-edged sword.
  36. Exactly by 36-bitter · · Score: 1

    Computers don't freak out or get depressed when work piles up. Backlogs mean nothing; they just keep processing one piece at a time until the pieces run out. I think someone was speaking imprecisely.

    I suppose that the system *could* have been built with a rule to detect that the results are becoming more and more untimely, and at some point just say "TILT!" and deliberately exit. I can't imagine why, though; getting there late is better than sitting in the terminal forever.

    1. Re:Exactly by Anonymous Coward · · Score: 1

      A lot of their problems are handled by algorithms with quadratic or worse performance. It *should* detect when work is piling up too fast and seek good enough solutions... but that obviously wasn't done.

      Some programmer probably got fired for telling their boss that the code was shit, and unless time was spent fixing it a meltdown was probable. Or maybe I'm imagining things; I know the last company I worked for is going to have this type of problem.

  37. From old information... by gminks · · Score: 5, Informative
    According to this article [written in 1995] , Dell and AT&T created a new company called TransQuest Information Solutions.

    This article outlines how this joint venture re-vamped Delta's IT systems (again remember, this is 1995):


    During 1995 and 1996, TransQuest reengineered Delta's systems to migrate them from Hitachi mainframes running Natural, Adabas, and DB2 to an open systems environment. The new systems are written in C++ and access Sybase databases of reusable and distributed objects. The systems run primarily on Sun, HP and AT&T servers under UNIX with clients running under UNIX, MS-DOS, and Windows. The clients are connected to the servers over high bandwidth TCP/IP frame relay networks.

    Job titles for the company's 1,100 computer professionals include Systems Engineer and Software Engineer 1 through 8. Staff members recently developed an aircraft weight balance system that can be accessed by pilots to determine how luggage and fuel have been distributed within the aircraft for balance during a flight. This system was developed in C++ on AT&T and HP UNIX servers and will be available on 40,000 devices to 2,000 users.


    The trail runs dry here, job postings stopped around 2001.

    Which really raises suspicions that all the code is written and maintained offshore. The question now becomes who is handling this for Delta.

    One of Tata's spinoffs, Airline Financial Support Services, is described as


    "an example of an external service provider that handles a wide range of back-office functions for the airlines. AFS handles sales, refund, traffic and cargo; performs fare audits; manages yields and revenues by performing departure and post-departure processing checks; books crews; deals with overbooked flights and wait-lists; adminsters frequent flyer programs; draws up flight navigation charts; such as landing or route facility charts; and provides customer care." This according to ebstrategy.com


    Wipro handles some of Delta's inbound reservation calls in India and the Phillipines.

    In conclusion, it would appear that either Tata's AFS arm or Wipro do the IT for Delta airlines.
    1. Re:From old information... by Kalak · · Score: 1

      Since it appears to be crew scheduling that was the issue, I'd look in the direction of Tata. Thanks for the info.

      --
      I am, and always will be, an idiot. Karma: Coma (mostly effected by .hack)
    2. Re:From old information... by The+Wookie · · Score: 1

      Having been a part of the ugly TransQuest fiasco, I am pretty familiar with most of this. TransQuest was a joint venture between Delta and NCR, which was owned by AT&T at the time. TQ had an absolutely terrible turnover rate and made a lot of temp agencies rich.

      Around the middle of 1996, I think, AT&T divested itself of NCR and Delta ended up making TQ a wholly-owned subsidiary. I left in early 1997, and I think it wasn't more than a year or two before TQ became Delta Technologies and the TQ employees became Delta employees again. As far as I know, Delta still does most of its own IT, I know some of my former co-workers are still there.

      Delta does contract some things out, but they are almost always installed and maintained in Atlanta.

  38. Re:The system runs Linux by Antique+Geekmeister · · Score: 1

    Nah, under Linux you can trivially create new files as swap space when needed. It may mean they overflowed available partition space on critical systems, or were unable to administer a heavily loaded fast enough to add swap before it overflowed.

    Knowingn nothing else, I'd guess they overflowed a key database partition. A lot of old programmers very foolishly over-partition available disk, trying to outguess the OS about what partition will need how much space and instead of protecting themselves from disastrous overflows, actually causing disastrous overflows. It's an old programmer's habit that's hard to train people out of.

  39. No manual process? by SCHecklerX · · Score: 1

    WTF can't they do it manually? It's just keeping track of seats on planes for fsck's sake. Sure, they may not be able to accomodate everyone right away, but they could certainly do better than "nobody can fly at all because our computer system crashed". If a restaurant loses their computer, they don't stop admitting people. They just go back to paper orders/receipts.

    1. Re:No manual process? by aggles · · Score: 2, Insightful

      Hopefully someone from Commair reads /. and will not be able to resist spilling the beans. This sounds like a lawsuit in the making. It was not weather related - it was someone trying to either save a buck by writing crappy software or having poor operational procedures. This is a Sarbanes-Oxley event - and hopefully, the truth will come out about what happened, and why the backup procedures were either not-in-place or did not work. I don't want to see them go bankrupt, but they should be held accountable.

    2. Re:No manual process? by wwahammy · · Score: 1

      Its a little more complex than a restaurant. We're talking about tens of thousands of passengers, plus rerouting flights to new places, assigning the correct crew members to flights and lots of other accounting type things. I would assume also that the reservation software is linked in someway to one of Homeland Security's wonderful databases so that just makes everything a lot of harder. Its not quite as simple as tallying how many people get on the plane.

    3. Re:No manual process? by winwar · · Score: 1

      "Its a little more complex than a restaurant. We're talking about tens of thousands of passengers, plus rerouting flights to new places, assigning the correct crew members to flights and lots of other accounting type things."

      Sure, it's complex. But they have lots of employees. Anyone not working could have been called in (aka, mandatory overtime).

      The process would have been slow and inefficient. But at least it shows you are doing something. That you give a damn. And it gets something done. Because, after all, it was the airlines fault in the first place....

      But, they don't care and are cheap. And it shows.

  40. seen it before by sacrilicious · · Score: 1
    It appears that due to weather and other problems that flights began to be cancelled on Thursday and the backlog choked the system. 1,100 flights have been cancelled so far, including all flights through 12/26. Does anyone know what platform their system was based on? What kind of system just totally crashes?

    Sounds like Diebold may have been contracted for the job.

    --
    - First they ignore you, then they laugh at you, then ???, then profit.
  41. I'd like to know by HangingChad · · Score: 2, Interesting
    Not just the database platform and front end but who built it. This just has E-D-S stamped all over it. Everybody has a system go down once in a while, but it just seems like EDS has had more than their share.

    This is a worst case scenario for a system of that nature because of so many dependent calculations and calls to other systems. It takes more than just having a plane and a crew...which is a lot of work all by itself. It has to have a gate and connecting flights. Then multiply all that by 30,000 people, roughly 120 plane loads, and complicate it by some airports being closed. I bet you could actually watch the lights get dimmer in the server room. Still when you know the potential peak demand you have reserve capacity. Slow is okay, stop is unacceptable.

    --
    That's our life, the big wheel of shit. - The Fat Man, Blue Tango Salvage
  42. True story by john-gal · · Score: 1

    Well, the janitor did not pull the plug, but at a major airlines, the carpenter did. He wanted to plug in his electric drill and pulled the plug on the server and its power back up and every other wire plugged into a power source in sight!!! This came to light after it was reported to my company (we provide the software) that the system had crashed totally. We sent 2 people over and they came back laughing so hard that it took them 30 minutes to tell us what had happened.

  43. Bailout by parliboy · · Score: 1

    After 9/11, pretty much all of the domestic airlines were bailed out by the government to keep them from going poof (except for Southwest and a couple of others, who didn't have their heads up their asses). So I just want to know how long it will take this Delta affiliate to plead for money. That not only has it screwed over all of those passengers, the taxpayers will collectively pay for it.

    --
    "You're never ready, just less unprepared."
  44. MOD PARENT UP by johannesg · · Score: 1

    At least, if he is speaking the truth about this ;-)

  45. Scare thought by ZeroReality · · Score: 1

    Some company refuse to do a complete system over haul. They just keep patching and upgrading. Their decade old software.

    I did a little researdh Conair was made Y2k compliant with only mirror changes.
    http://budgettravel.about.com/library/weekly/aa101 599.htm What i am saying is the software could be a couple decade old and be a rats nest of code. Kobold anyone?

  46. Re:It's a tragedy when... by gl4ss · · Score: 1

    actually disaster management strategy IS optional.

    this story as an evidence.

    though seriously, there's quite a lot of companies out there that instead of hiring incompetent people could be better off buying the services from outside. outsourcing doesn't necessarely mean it's crap, there's a lot of domestic in-house crap and idiots everywhere.

    --
    world was created 5 seconds before this post as it is.
  47. Southwest refuses to drink the Kool-aid by Oswald · · Score: 4, Interesting
    This computer problem of Comair's just demonstrates how unworkable the hub-and-spoke system of flight scheduling is. It's a flawed concept, foisted on a naive public by an industry locked in some sort of mass psychosis. In the pursuit of minor economies of scale, the big airlines treat their passengers like packages (hey! it works for Fedex, and their cargo can't even walk itself to the next gate...), treat airport runways and air traffic controllers like unlimited resources, and waste vast amounts of jet fuel. The fact that Southwest Airlines (which does not use a hub-and-spoke scheduling system) is profitable, and the rest of our major airlines are either in, just out of, or about to go into, bankruptcy doesn't seem to dent their thick skulls.

    I have watched the operation at Atlanta for over 21 years, and I've seen how cutthroat the competition for a major hub is, but it feels like watching two dogs fight over two bones--you can't tell if they're fighting out of greed or stupidity. Southwest doesn't even fly into Atlanta--they know that only a pyrrhic victory would be possible under those circumstances. Management at the other airlines has been criminally incompetent ever since airline deregulation, but it's the passengers, employees and shareholders who pay the penalty time and again.

  48. Possible system OS by Anonymous Coward · · Score: 1, Informative

    Judging by www.comair.com and their job ops, it's probably HP-UX or Windows. More than likely the Unix flavor rather than Windows. Why down for a couple of days, probably a database restore. Never happens in TPF. Those mainframe systems crash and are back up with very little database degredation. By the way, in the job ops, if you want to be a crew scheduler, only need HS diploma!

  49. Must Account for Overbooking by syntap · · Score: 1

    The system should only be able to hold as many reservations as it has flights/seats.

    Useless commentary. Airlines overbook to fill planes and the system has to accomodate this.

  50. Huh? by slavemowgli · · Score: 1

    What kind of system just totally crashes?

    Is that a trick question?

    --
    quidquid latine dictum sit altum videtur.
  51. Re:Southwest refuses to drink the Kool-aid by HR · · Score: 3, Interesting

    The problem with your analysis is that point-to-point flying doesn't work when you start talking about international travel. It's just not possible to fly passengers to, say, Germany or Japan from every domestic airport. The way you do it is to accumulate passengers at a major hub on the coast and then fly from there.

  52. Re:Southwest refuses to drink the Kool-aid by PPGMD · · Score: 3, Insightful
    It's isn't that easy, for the longest time Southwest was the hardest to book a flight for because they had no web system that could figure out it's route system (only 5 years later they just released one). Up about about July of this year to book a web flight you needed a route map and schedule to figure out what cities you had to go throuh if there was no direct flight option.

    The hub-spoke system is easier to manage, and can be profitable if the airlines relize that they aren't unlimited resources, and decentralize the hubs on a limited basis.

    Anyways Southwest doesn't drink anyone's koolaid, they run all their own in house designed systems (I am not sure they are even on Sabre anymore), including web apps. It's an intresting concept, but it probably causes their IT managers to pull their hair out.

  53. Car dealer by Anonymous Coward · · Score: 1, Interesting

    I worked on a car dealers' wide area network for a short time. Their entire network, all connections to other dealerships, internet connectivity, not to mention their Novell network, dealership inventory, parts, and tie-in to the manufacturer(s) was tied to a single router. They had problems, and I finally drove out there, and found the router "installed" in the drop ceiling above the mechanics' bathroom. The opposite side of that wall was the backer board for the telephone lines, located in a broom closet. I pulled the router down, and the inside had green mildew on the board. Routinely, the housekeeping service would unplug the 25 foot ORANGE extension cord plugged into the single-socket bathroom outlet! I advised the general manager about these problems, told them that they'd best extend their demarc, move the router to a better location, but they never bothered to fix it.

  54. Er, I think I found the problem...they pay squat! by SharpNose · · Score: 2, Interesting

    From Yahoo Jobs:

    Software Engineer Cincinnati, OH $40K -$50K

  55. Probably TPF by Anonymous Coward · · Score: 1, Informative

    More than likely it is TPF as Delta is a TPF shop.

    TPF (http://www-306.ibm.com/software/htp/tpf/index.htm l) has been around since the '60's and is used by all the major airlines, most of the large hotels and most bizarrely NYC 911.

  56. A better snow job. They need it. by twitter · · Score: 1, Interesting
    I am only trying to make sense out of the above comment from the official statement above.

    My wife says things just snowballed.

    Crew assignment is a hard problem...

    Records keeping, very tricky. You would not want to try that with any old database, no sir, it might pop a window. Just thinking about how every other airline has managed this tricky problem since before computers makes my head hurt.

    We may never know what really happened but this would be a nice example for my classes :-)

    Yeah, it's a real class act for those 30,000 people sitting around in airports for Christmas, employees doing the same and those who have to recover from this disaster. Management is going to be happy about the publicity they just earned while their huge capital investment in AIRPLANES sits idle during a time of year that's supposed to be their most profitable because their far to expensive M$ "soloution" "melted". A chain is only as strong as it's weakest link. Employees, I'm sure, are also stranded for Christmas. For the New Year they get to ponder layoffs. What a happy company for you to dissect at your leisure next semester. Season's Best!

    Here's what I'll bet you might learn: WHEN SOMETHING MELTS, YOU LOSE YOUR ASS IF YOU DEPEND ON IT. MICROSOFT MELTS AND HAS POOR OR NO FAIL OVER CAPABILITY, SO YOU BETTER NOT DEPEND ON IT.

    --

    Friends don't help friends install M$ junk.

  57. Re:Southwest refuses to drink the Kool-aid by Anonymous Coward · · Score: 4, Informative
    Actually, the only thing that makes these sort of problems easier for Southwest is the consolidated fleet types. With nothing but 737's, you don't add complexity to the scheduler for things like pilot and f/a qualifications.

    What happened to Comair here could happen to just about any airline. There is no comprehensive suite of software that handles crew scheduling, aircraft scheduling, reservations, and the myriad of other functions that are needed to run an airline.

    Reservations, for other than tiny airlines, are still managed by large TPF mainframes. TPF is a very "bare bones" operating system that runs on IBM mainframes, and was written specifically to deal with high volume / high transaction rate systems. Personally, I've seen 5 attempts at 3 different airlines to replace it with something modern. ( like Unix with an RDBMS ). Each attempt failed miserably, and the airline went back to TPF. Note that TPF is not MVS, OS/390, or any other more mainstream Mainframe OS. It's purpose built.

    Unfortunately, this means that all of the other applications have to interface with TPF via screen scraping. To further compound the problem, no "suites" exist to handle the following functions, so most airlines have to "sew together" best of breed solutions for these basic functions:

    • Crew Scheduling - F/A's and pilots bid on slots to fly, this system takes those bids and turns it into a schedule.
    • Aircraft Scheduling - Tracks which tail numbers are flying which flights for the dispatchers
    • Optimization - Different optimizers to do things like:
      • Fuel Tankering - Use the jets as "tankers" so that you buy fuel where it's cheapest for flights later in the day
      • Crew Optimization - "Traveling Salesman" type solver to incur lowest labor cost, get crews back to home base, etc
      • Schedule Optimization - Use the aircraft in the most cost efficient way to cover all of your scheduled flights.
      • Maintenence Optimization - Pull aircraft in for Scheduled Maintenance at the optimum time.
      • Reacommodation - When things go wrong ( weather, mechanicals, whatever, pull in all of the above variables to crank out a new schedule, crewing, mx schedule, etc )
    • Booking Engines, for the internet and reservations agents
    • Point of Sale and Boarding functions for agents, skycaps, and kiosks
    • Interline functions where other airlines sell your tickets, and transfers for bagggage, etc
    Anyhow, this list isn't comprehensive, but shows enough of the disparate pieces that you can imagine why these "glitches" happen. Very few of the items from the list above come from the same vendor, or even run on the same platforms.
  58. Re:Southwest refuses to drink the Kool-aid by tigerc · · Score: 1

    Hub and spoke isn't the problem. You NEED it to get anywhere that's not a nonpopular destination. By saying that hub and spoke is a flawed concept, you effectively resign smaller cities to death.

    Say I want to go to Butte, Montana and I live in Boston. How is a direct route method of Southwest airlines going to get me there? Isn't the most efficient and cost effective way for an airline to transport me on a larger jet, to say Denver, and then, use a smaller less than 100 passenger plane to fly me to my destination of Butte? How do you say to medium sized cities without rail lines, we're not going to use the hub and spoke method and we're going to destroy any sort of business you have (not just tourism, but meetings/conferences)? How do you tell that to the people who will have to drive hours to get to a major airport?

    I agree that the upper management is corrupt. It really is. But because management is corrupt, you can't go saying that hub and spoke is flawed. In the future, I forsee a multitude of direct route airlines, and one big airline that still employs hub and spoke (either government subsidized, or just large enough and efficient to turn a profit). After all, how are airlines like Southwest going to get us overseas? Suddenly, the Jet Blue/Southwest system doesn't seem to efficient.

  59. Here is a clue by AnuradhaRatnaweera · · Score: 1
    Does anyone know what platform their system was based on?
    This site has a clue: check the link about Alan Cox's Windows 2000 snapshot capabilities. ;-)
  60. Re:Southwest refuses to drink the Kool-aid by ScrewMaster · · Score: 1

    So, what you're saying is this industry is operated by a disorganized hodge-podge of cross-connected hacked-up computer systems, such that it is a minor technological miracle that planes a. get off the ground, or b. ever make it to where they're going. Thanks ... I feel much better now.

    --
    The higher the technology, the sharper that two-edged sword.
  61. Yep, you are right! by Anonymous Coward · · Score: 5, Informative

    Your statements are accurate.

    I was a unix sys admin there, but left for greener pastures during the dot-com craze. The non-redundant hardware at the time ran AIX, and had a great support contract from IBM. The SBS application however, always had monthly issues, at least at that airline. They were looking for a replacement then, and I'm not suprised they still haven't replaced it.

  62. Simple Solution by jeephistorian · · Score: 2, Informative

    Take Amtrak!

    Amtrak receives around $500 million for a total budget, while the airtravel receives around $15 billion in subsidies. Take the train and save everyone money!

    _____________

    --
    Huh?
    1. Re:Simple Solution by jcuervo · · Score: 1
      Take Amtrak!
      Amtrak is actually way more anal about photo ID. Where the airlines accepted my Bank of America debit card with my photo on it as identification, Amtrak didn't. Fortunately, it wasn't that long of a drive...
      --
      Assume I was drunk when I posted this.
    2. Re:Simple Solution by HeghmoH · · Score: 1

      It's too bad Amtrak sucks harder than I believed physically possible, and can't even keep their trains on time in clear weather during the quietest periods of the year.

      --
      Mod down posts with a "Free Mac Mini/iPod" sig, they're spam!
    3. Re:Simple Solution by the+pickle · · Score: 3, Insightful

      Sure, that's eminently practical. I can take 48 hours to get from Detroit to LA, or I can take six (including travel time and check-in time at both airports).

      p

  63. Southwest's fuel by alder · · Score: 1
    ...just demonstrates how unworkable the hub-and-spoke system of flight scheduling is
    Quite possibly that is a big factor. Yet Southwest, at least for now, cannot be used as a litmus test. Their timely fuel hedging helps them reduce the cost enormously. (see also here)
    1. Re:Southwest's fuel by thogard · · Score: 1

      You can look at other airlines around the world that use the old TWA style system and compare them to the airlines using the Southwest style system. The ones using the southwest system are doing well and the ones using the old steam ship ticking model are having huge problems.

      I still don't understand why they put the plane departure time on the ticket when they should put the checkin time.

      I don't care about airlines anymore. I have a pilots license and only use them for very long flights or overseas flights.

  64. Re:A better snow job. They need it. by Lord_Dweomer · · Score: 1
    "What a happy company for you to dissect at your leisure next semester. Season's Best!"

    Oh come off it. What would you like him to do, sit there feeling bad for all the people who got screwed over? At least he's being proactive about it studying and analyzing it to try to figure out ways to prevent it in the future, as opposed to sitting on Slashdot and bitching how someone is being analytical when the troll (sorry, poster) thinks they should be sympathetic instead.

    Of course, he might be sympathetic as well, but who cares when we have blanket assumptions like the ones you've made in your flamebait post.

    --
    Buy Steampunk Clothing Online!
  65. Re:Southwest refuses to drink the Kool-aid by Zak3056 · · Score: 1

    It's an intresting concept, but it probably causes their IT managers to pull their hair out.

    Interesting question: Would you rather have an easy(ier) job with a company that loses billions of dollars a year, or a hard(er) job with a company that actually makes money and is going to be around for the next five years?

    Me, I'd choose the later. I'd rather be bald than working for a company with no future.

    --
    What part of "shall not be infringed" is so hard to understand?
  66. Re:Southwest refuses to drink the Kool-aid by hemp · · Score: 1

    Yep...SWA is still on SABRE or more precisely SAS, Braniff's old rez system in Tulsa(still TPF based).

    Everything else is done in house in Dallas.

    Which leads one to a question - why does the most profitable airline in USA not outsource its IT??

    --
    Skip ------ See the latest from http://www.anArchyFortWorth.com
  67. Delta fuct me and didn't even kiss me! by y2imm · · Score: 1

    I got into Boston on the evening of the 24th for the final leg of my HNL-YFC flight. The small gate area for Comair passengers was overflowing with-not-so-happy looking folks. Checking with a few of them I discovered many had been waiting since early afternoon for their flights, with nothing more from the Delta folks but glib, non-informative announcements now and then. The aircraft were on the ground, but there were no crews available to fly them. I waited patiently while my flight got delayed and delayed again. Finally, after many passengers took the only available flights remaining to alternate destinations, did Delta/Comair finally admit our flight was cancelled. They gave us hotel/meal vouchers (well, I got a hotel voucher anyway) and we went our way disappointed but expecting to get to our destinations a day late.

    The next morning when we arrived back at Logan, our flight had already been cancelled. The lineup stretched from one end of Terminal C to the doors leading to Terminal B, and then some. While we waited, an airport rep was walking the lines asking where people were going. When our Comair destinations came up, he said "Go home. There'll be no flights out until maybe Monday or Tuesday." I thought people were going to cry right there. Our little band decided to try renting a car and driving. No luck, with no cars available at Logan that day. We tried buses, but no buses were running to our destinations that day. I seized upon the idea of using my Air Canada Aeroplan miles to get a free ticket to my destination (plus booking and taxes, $60).

    When I was waiting in the Air Canada line, I saw one of the people I was with earlier. She told me the Delta rep told her to go to other airlines and ask if they would give carriage based on the Delta tickets she had. Basically, Delta was telling people they were on their own. One guy told me he watched a Delta agent tell a lady to fuck off. Anyway, I got to the counter and asked the Air Canada agent if they were doing anything for Delta passengers. She told me unless Delta signed the ticket over to us they were not doing anything special for us. I didnt know what that meant, but I knew Delta was doing fuck all for us, so I went ahead and used my Aeroplan mileage ticket.

    The Delta fuck-up was basically all the news on Christmas and Boxing Days. Even the Canadian Immigration guys who never talk or joke around were feeling sorry for us. I dont know what I can get from Delta for the HUGE PITA they caused me, but I dont think Ill ever fly them again.

    1. Re:Delta fuct me and didn't even kiss me! by angrykeyboarder · · Score: 1

      "...he told me unless Delta signed the ticket over to us they were not doing anything special for us. I didn't know what that meant..."

      The Air Canada agent was probably assuming that the Delta ticket in question was a non-refundable ticket. If so, it was only valid on Delta (or it's regional affiliates like Comair).

      In the case of a ticket endorsed "Valid only on XX" (as non-refundable tickets are) said ticket would have to be reissued to reflect validity on another airline before any other airline could take it.

      In other words, the other airline wants to be paid by Delta for accepting their passengers.

      --
      Scott

      ©20014 angrykeyboarder & Elmer Fudd. All Wights Wesewved
  68. Didn't RTFP very well by A+nonymous+Coward · · Score: 1

    He said he fell back and by accident hit the power switch.

    You said Write-permit rings or write rings are light and don't actually fly well, so I doubt they were playing with that as it is unlikely to mass enough to flip the switch.

    His body certainly had enough mass.

  69. McPay. by Anonymous Coward · · Score: 1, Insightful

    "From Yahoo Jobs:

    Software Engineer Cincinnati, OH $40K -$50K"

    That's more than I make at McDonalds.

    1. Re:McPay. by groot · · Score: 1

      Not by much!

      --
      "Just remember, it takes a village idiot." -- The Motley Fool.
  70. Re:A better snow job. They need it. by Dachannien · · Score: 1

    A few years ago, a fellow was running an antique steam engine at the county fair. The steam tank ruptured, and one person was killed.

    This is obviously a tragic event for the person who was killed and his family, but it's also an interesting and important engineering problem. So much so, in fact, that the following year's PhD qualifier in our mechanical engineering department consisted of an analysis and redesign of the steam engine, so as to prevent such an explosion from happening again.

    It's a *good* thing to study problems like these, even in an academic sense, because academia is the very first step in the production of usable goods. If people aren't learning from these mistakes, then problems like the stranding of thousands of passengers in airports or the accidental death of a poor fellow at the county fair will happen again.

    because their far to expensive M$ "soloution" "melted"

    In case you forgot to read the first half of the comments to this thread, it's already been revealed that ComAir's system runs on AIX, a product of IBM, and that the software was developed by a subsidiary of Boeing. When you flame Microsoft for something, at least make sure they're *involved* first.

  71. response from an AA employee by dan_bethe · · Score: 2, Interesting

    I sent a summary of these Slashdot comments to my cousin who works at American Airlines hq in Dallas. Here's his response!

    ---

    "ugh... I worked 9pm-1am yesterday (xmas day). I spent the first two
    hours of my shift calling people to tell them their flight was
    cancelled and reschedule them. Most of them were taking flights out to
    Miami and the Caribbean to spend New Years Eve partying on the beach.
    Honestly, I had little pity telling them they were going to miss out on
    one day of tanning especially since they seem to 'blame' the weather on
    us.

    "One hour into my shift our reference system went down. No IT people
    were willing to come in and fix it. I had the system up for booking
    flights and making reservations, but I could not look up any of our
    rules and regulations. Ah well, enjoy your xmas off IT guys!! Enjoy
    the weather in Cabo San Lucas!! Cheers!!

    "Fortunately, we have a backup of all our html files saved as text
    files. However each text file can only hold serval hundred text
    characters. So, when I want to look up our baggage policies the normal
    html file is called BAG INFO. In the backup system BAG INFO is
    separated into 10 or 20 text files and I have to 'page' through them by
    typing BAG INFO P2, BAG INFO P3, BAG INFO P4. The text files are not
    indexed and are not searchable. It took me 10 minutes to find and
    advise someone how big a bag they can take to Puerto Rico.

    "After I started taking incoming calls again, there were people calling
    in on Christmas day to book their trips for Spring Break. There were
    over 100 calls on hold to talk to us, and there were people sitting on
    hold for half an hour to ask me how much it would cost to book a trip
    to Fort Lauderdale in March. Couldn't that wait until the day after
    Christmas?

    "Yes, the airline industry does not prepare for emergencies as well as
    it could for the holidays when people want to travel in record numbers.
    However, I think the general public could try to have their own backup
    plans in place as well and realize that the travel industry in general
    does not have the equipment or the staff to handle everyone in the
    country wanting to travel all at once in one week. Do people stock
    their refrigerators year round with enough food to feed everyone in
    their families at one meal like they do at Christmas?

    "Even though we try to accommodate everyone as best as we can on the
    holidays, we want to to have a holiday just as bad as the rest of
    everyone else. Working in the travel industry should not indenture us
    to be your slaves over holidays. The public needs to have a little bit
    of compassion and realize how much we give up in our own personal lives
    just to help you get where you are going. Frankly, the way most people
    treat me on the phones I don't think they deserve our help and
    compassion. And don't call on Christmas day to book flights in March.
    That phone call is making someone work on a day they shouldn't have to.

    "anyways.... heh..... guess i had a bad night at work last night, huh

    "MERRY XMAS!"

    1. Re:response from an AA employee by winwar · · Score: 1

      Some comments:

      "One hour into my shift our reference system went down. No IT peoplewere willing to come in and fix it."

      I wonder why. If this is accurate, of course. I would define this as an emergency-willingness would have nothing to do with it. In other words, if you don't come in, you have just accepted unemployment.... Of course, if this wasn't in their contract, etc., just one more instance of incompetent management....

      "Yes, the airline industry does not prepare for emergencies as well as it could for the holidays when people want to travel in record numbers. However, I think the general public could try to have their own backup plans in place as well and realize that the travel industry in general does not have the equipment or the staff to handle everyone in the country wanting to travel all at once in one week."

      Ahh, yes. The "I'm incompetent, but it's your fault" defense. If they didn't have the staff or equipment, why did they sell the tickets? I mean, it's not like this was a surprise-this increase in travel happens every year. Hell, the airlines encourage it. Look, it is the airlines job to prepare for things like this. If they can't or won't then why don't they advertise this fact? Perhaps because they will be seen (rightly) as greedy and incompetent idiots? I mean, if they offered to refund peoples tickets, even those that weren't refundable, most of the animosity would disappear.

      "Working in the travel industry should not indenture us to be your slaves over holidays."

      What a crock of crap! You get paid for your work. All labor laws remain in effect. Therefore you are not slaves.

      "The public needs to have a little bit of compassion and realize how much we give up in our own personal lives just to help you get where you are going."

      And exactly what do you give up? It's called a job. You get paid for it. With the full knowledge of what it is going to be like this time of year. And it's not like you helped people get where they PAID to go very well....

      "Frankly, the way most people treat me on the phones I don't think they deserve our help and
      compassion."

      And the last time I was treated with compassion by the airlines in this situation was, oh, let's see, never. Look, you took their money for a service, now you are calling to say, oops sorry, we can't deliver this service because of the weather (because we are actually incompetent goes unsaid). Did you offer refunds? Probably not (oh, but you bought a non-refundable ticket....) Look, people don't deserve crap over the phone, but I doubt they were getting much help or compassion....

    2. Re:response from an AA employee by dan_bethe · · Score: 1
      All labor laws remain in effect. Therefore you are not slaves.

      Unbelievable. How to even respond ... the original author used an expression called "hyperbole". Comparable to "reduction to absurdity" for the purpose of illustration and summary.

      This was commentary on the personal experience of one individual inside a megacorporation -- an individual you haven't encountered and know absolutely nothing about. He merely advocates compassion and common sense. You're trying to project the idea that individual clerks are in any way responsible for the industry's strategic calamity. Then you ignore that he conceded the calamity in the first place. Then you use that projection as a strawman to stand on as a cheap way to try to elevate your own nondescript allegations.

      And the last time I was treated with compassion by the airlines in this situation was, oh, let's see, never.

      As his commentary proves, you would have if you'd called him! Or any of the subset of countless other highly trained and compassionate clerks who do their best to improve the calamity from within their isolated station, some of whom could use a pick-me-up after the last 50 people who dumped on em.

      If you want to pay attention and write about your experience or any other *original* thought so as to *add* to the discussion, then please do. Otherwise, I'm done with regurgitating negativity in the Slashdot guerilla verbal warfare zone.

    3. Re:response from an AA employee by Lord+Flipper · · Score: 1

      Agreed 100%

      I use a Mac at home, Windows at work, and started on old IBM mainframes (Fortran/COBOL) in the early Seventies.

      I read, respond to, and often ignore, posts on a variety of Mac user support LISTSERVS and whatnot. I hear bitching, from people who never read manuals, QuickStart guides, ReadMe.txt files, etc, all the fucking time, saying "This co. sucks", "these -company name- tech support/call desk people Suck", etc, etc...Okay, already, we all know that 1) Help Desk isn't a lot of people's first priority when it comes to Vocational Choices, and 2) Nobody had a gun held to their head (unless you count the need for food, shelter, 'stupid' things like hungry kids, as a 'gun', which I would), when they took those jobs. All that being said, every company I've seen singled out as being the 'worst', happens to be a company I've called. Earthlink, let's say, there have been times, back in Boca, after severe storms, when I wanted to rule out local damage as a 'fault' for net interruptions, and, even though I knew they might not admit it, I would call Earthlink in Atlanta, to see if the DHCP servers in Miami were okay. Nine times out of ten I could get the guy on the phone to get past 'All systems are go", or "Did you reboot?" script-talk, and actually poll the servers in Miami, and give me info. Amazing, eh? How did I do that? By being polite on the fucking phone.

      For a while I would just say "Win2kPro" when they'd ask what OS are you on? Heheh, and then I'd silently translate their advice into mac-speak. That worked pretty good too, but talking to human beings as if they were human beings is just the simplest 'ticket', not only for 'support', but in Life.

      One last example: Microsoft. Everybody hates the company, of course, that's a no-brainer, but the people that work there? I don't know, they might be 'people' too... let's see: I run VirtualPC on a Titanium Powerbook. (I run a lot of things in VPC, but 2kpro was my example.) So, after a few years, some unrelated issues and whatnot, I find myself with a re-install of Windows, that was simple, and no ID, no Auth number, no serial, etc, can't find the "card" with the number on it. (Which I'm convinced never existed, until 9 months later when I find it). Ok? So, I call Redmond, expecting the worst, and who wouldn't expect the same? Call answered on 2nd or 3rd ring. Whoa. No voice mail. (WTF?). Nice enough fellow, believes I 'never had 'the Card', says "Hang on a minute, I'll be right back". Meanwhile, he knows I'm on a Mac, because I set the scene... 15 fucking seconds later he's back on the line, gives me a new serial, and Auth Number, says if I find the 'card that never existed, don't worry about it, you'll have 2 then". And I'm golden. So much for that. The point is: Companies are about people. "Corporations" are 'entities'. There's a difference. And as for attitude on the part of help desk, etc, it's a software issue, used to have an acronym: GIGO, anybody old enough to remember that? The old Garbage In, Garbage Out.

      As for the guy with the 'Insult to injury' regarding Cleveland... I spent 5 hours in rain, on the tarmac at Kennedy, before they taxied back to the gate to let us all off a cramped 747 to stretch and eat. (airline regs said nothing but peanuts and candy till at elevation)... I stood in a nice long line, was one lady away from the cash register with 'dinner', when the girl from the Airline entered with "Flight blah blah Reboarding Immediately!!!", waved bye-bye to dins, and proceeded to re-board, and wait another 5 hours out on the tarmac. Heheh. A greasy spoon, with a coffee, a smoke, and dinner-on-the-way in Cleveland would have been "Nirvana Tonight", as far as I was concerned. (this from a current New Yorker).

  72. Re:Southwest refuses to drink the Kool-aid by PPGMD · · Score: 1
    Some of the systems work fine in almost every environment, it's simply that overall Southwest refuses to use them. Whenever possible they like to develop it in house.

    They believe in doing things differently from all the rest of the airlines, but IMO they are throwing the baby out with the bath water in somes cases, developing your own software has left them behind when it comes to web orders, which is possible revenue, and less seats empty.

    Don't get me wrong I respect Southwest, but there are areas where they could have done better.

  73. Re:Southwest refuses to drink the Kool-aid by angrykeyboarder · · Score: 1

    As has been pointed out by others, while hub-and-spoke has it's flaws, it's also the best solution (from an economic standpoint) to get from say Billings, MT to Jacksonville, FL.

    There is no way any airline could make money flying that route directly.

    And Southwest does, in fact, operate on a hub-and-spoke system. They have major hubs in Dallas (DAL), Houston (HOU),Phoenix (PHX) and Baltimore (BWI) among others.

    However, they also supplement that system with regular point-to-point service, but only on profitable routes (from big metro area "a" to big metro area "b"). Again, they'd lose their shirt on the "Billings-Tallahassee" route.

    On the other hand, Delta can get you from Billings to Tallahassee....

    --
    Scott

    ©20014 angrykeyboarder & Elmer Fudd. All Wights Wesewved
  74. Proof that you guys mean NOTHING to Joe Sixpack by gfecyk · · Score: 1

    A freak weather event did more damage to a computerized reservation system in one night, than all of the hackers, viruses, trojans, spyware and idiot lusers combined over all of 2004.

    Unless you take the overinflated guesstimates of the likes of mi2g at face value, anyway.

    Toronto faced a blizard last week. Some two hundred flights cancelled because of bad weather. Air Canada, West Jet, Jetsgo, etc didn't go down even if their planes did. That too, caused more damage to YYZ's fiscal health than all of its computer security woes combined through 2004.

    And then I read the letter posted here about the IT guys sunning themselves during all of this.

    Merry xmas. I hope you can justify your jobs in 2005.

    --
    Use Evolution instead of Outlook? Bewa
  75. Re:Not entirely accurate by angrykeyboarder · · Score: 1

    Still "Rule 240" went went out with Airline Deregulation years ago. It's a generic term these days and every airline can pretty much do what they want as far as re-accommodating passengers.

    For the most part they are fairly consistent and I'm sure that's mostly only for customer service/competition reasons.

    --
    Scott

    ©20014 angrykeyboarder & Elmer Fudd. All Wights Wesewved
  76. Re:Southwest refuses to drink the Kool-aid by winwar · · Score: 1

    "Hub and spoke isn't the problem."

    Yes and no. Southwest does have hubs. I would say that Salt Lake City is a Southwest hub.

    "You NEED it to get anywhere that's not a nonpopular destination."

    Well, you have a few choices.

    Accept that you can't fly everywhere. (Olympia, WA, the capital of the state, has had great trouble keeping air service of ANY kind-they seem to survive).

    Accept that it will more inconvenient. Instead of flying to a hub, you will have to fly to a city from which the flight that goes to "nowhere, USA" originates. Many Southwest flights have multiple stops....

    Accept that you will have only a few large profitable "hub and spoke airlines" (fewer than now because they aren't profitable....) And they will routinely suffer huge delays....

    "By saying that hub and spoke is a flawed concept, you effectively resign smaller cities to death."

    No, they just don't get convenient/cheap air service. If I want to go to Olympia, Wa., I have to fly into SeaTac (an hour drive). Always been that way, probably always will be.

    Remember the old joke (paraphrased):

    How do you become a millionare? Buy an airline when you are a billionare....

  77. Algorithms by coyote-san · · Score: 1

    I think the key algorithms we studied in the mid-90s were developed in the late 80s. If my poor memory is right, the complexity dropped from O(n^3) to something like O(n^lg(n)) or O(n^lg(lg(n))).

    There's even more stunning improvement in the algorithms for solving multidimensional partial differential equations. (e.g., weather). Put a modern vector processor supercomputer running the algorithms from the 70s in one room, and a TRS-80 running the latest algorithms in another room, and the TRS-80 will easily beat the supercomputer. (Assuming it has sufficient memory to hold all of the model data, of course!)

    Implement the modern algorithm on a 1024-node cluster....

    --
    For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
  78. Surely you know the ditty ... by chris_sawtell · · Score: 1

    Time to spare?
    Go by Air.

  79. at our plant, it was little under desk heaters by DeprecatedFeature · · Score: 1

    that all the women had. inductive loads are a real bitch.

    --
    maybe one day i'll be smart enough to come up with a cool sig, too.
    1. Re:at our plant, it was little under desk heaters by jrockway · · Score: 1

      A heater is a resistive load.

      The blower is inductive, but the heating element is what's using the power.

      --
      My other car is first.
    2. Re:at our plant, it was little under desk heaters by Walt+Dismal · · Score: 1

      I had a similar problem. I was in a corner cubicle bordered by two other people. My monitor constantly had fluctuating image size and I figured there was something on the line pulling the voltage down. A rather obstinate co-employee had a massive heater plugged into her cubicle. She refused to disconnect the load. I came in a few days later in the afternoon to find that a fire had occurred that morning, in the cubicle walls. She had pulled enough current to heat up the wires in the plastic channels for power cables, the plastic actually caught fire. They had smelled smoke but nothing was visible for awhile. Cubicle wiring does not have circuit breakers, depends instead on the building. The building was wired for big load o current.

  80. Re:Outsourcing, of course! by hoolie · · Score: 1

    Yep, They will do anything to "cut costs"

  81. not really a technical problem by snow_man · · Score: 1

    in my way of thinking this was a management failure. technical, finiancial, and operational management share this joyful problem.

    my understanding is that the software wasn't designed for the volume of txn pushed down it's throat. slowly this information "trickled up" the food chain and right back down. rinse, repeat.

    the folks that supported the system would say it's overloaded. the technically-responsible management would tell the next layer up that they needed money to fix the problem. the financially-responsible management layer would tell the operationaly-responsible layer that someoneone wanted more money than the fiscal budgeting planned for and it would be painful to fix. the operational-management would squeal like stuck pigs and never, ever, tell shareholders a story like that one. the financially- responsible people would convey this unhappiness downwards. technically-responsible management would "find a way make it work" and divert the unhappiness to the support people. the support people would piss/moan/bitch and make-it-so as best as possible.

    repeat this cycle 'til flames shoot out of the mission critical system(s).

    then "suddenly" the problem is handled properly but in a somewhat hurried manner.

    --
    i am snow. fear me.
  82. Re:Southwest refuses to drink the Kool-aid by RollingThunder · · Score: 1

    The trick is to have a crewcut. That way you can't get a good enough hold to pull the hair out. ;)

  83. Re:Southwest refuses to drink the Kool-aid by /dev/trash · · Score: 1

    What if I wanna fly to Atlanta?

  84. proactive and reactive. by twitter · · Score: 1
    At least he's being proactive about it studying and analyzing it to try to figure out ways to prevent it in the future,

    The proactive people said not to use M$ crap. Studying the mess afterwards is the very definition of reactive. The lesson was obvious before it happened, and the rest of us are entitled to a happy, "I told you so".

    we have blanket assumptions... in your flamebait post.

    Microsoft sucks, yeah, yeah, yeah. All your apologies are worthless.

    --

    Friends don't help friends install M$ junk.

  85. bring it on. by twitter · · Score: 1
    It's not the OS, it's the people behind who's to blame. Yes, stupidity and MSW often go together but in a few years one will probably occasionally see a massive linux outage due to... similarly stupid people.

    It's interesting how you dissmiss an OS with a track record of failure in order to blame anyone other programmer. This assumes Microsoft has better programmers than anyone else, an assumption Microsoft marketing loves you for but is unsupported by any objective review of of performance. The same "stupid people" have been and still are writing applications for Linux, Unix and Mac right now but they have better tools and make fewer mistakes there than they have with M$'s crappy SDK's and pathetic OS.

    Two examples of software that just works are Apache and Sendmail. People write all sorts of applications for both of them without this kind of meltdown and both dominate their "markets". Microsoft's efforts at both, IIS and exchange have been a total dissaster.

    Wanna bet what crappy OS is behind this? Blaming you developers is sorry stuff.

    --

    Friends don't help friends install M$ junk.

    1. Re:bring it on. by ext42fs · · Score: 1

      I don't dismiss the OS: M$ is crap and smart people just don't use it. The guy who picks a crappy OS to start with is a deep cause. Incompetent programmers, operators, whatever are a deep cause as well: they don't have enough clue and will fsck up some day even when they use linux. Are we then going to blame linux?

      Another example:
      If you can't view a web page because it is written for IE, blaming M$ is one thing but it is more effective to educate the world about the stupid webdesign company which did that lousy job. By now, only the really stupid people are not aware of M$ crappyness.

  86. Some clarification by Anonymous Coward · · Score: 1, Informative

    Well... to try and provide a little clarification here, as I work for Comair. Here's the skinny:

    Crew and aircraft scheduling is done through a software package called SBS Track. This very same software package is used by many other airlines, including the two I worked for before coming to Comair. I don't know if their systems have the same hard-coded limit that ours does or not. This software package has _nothing_ to do with reservations, or anything concerning passengers whatsoever. It is simply the software we use to schedule our aircraft and crews to fly the list of flights that Delta wants us to fly.

    Crew scheduling is done by creating "pairings". A pairing is a sequence of flights that comprise a crewmember's trip. Anytime a change is made, a new pairing is generated, with the new sequence of flights. The system has a hard-coded limit of 32k pairings ("transactions" is the what the IT folks call it) in a calendar month. As of 10:00 pm on 12/24, that limit was reached. Crew Scheduling was unable to create any new pairings, unable to track who would be flying what airplane to where, and basically unable to keep the airline flying at that point.

    It was not any kind of a hardware failure, there are backups for that. It is simply a software limitation, that when it was coded many years ago, nobody realistically thought it would ever be reached. Why they hardcoded a limit into it in the first place is beyond my knowledge. :)

    A major part of the problem is Comair's concentration in Cincinnati. CVG is our only crew base, and it is the largest single crew base of any airline in the world. Over 1800 pilots and 1100 flight attendants in one base. Not even any of the majors have a single base that large. Several of our software packages are woefully inadequate, and replacements have been sought for some time.

    As for getting things up and running on paper, this is a monumental task. Scheduling for 160+ aircraft and 2900+ crewmembers, and compliance with all FAA regulations, maintenance requirements, crew rest requirements, and contractual requirements is incredibly complex. In addition, we have crews and aircraft stranded across the country due to the weather that moved through that caused this whole mess in the first place. Add to that the very limited number of people who actually have the knowledge of all the requirements for scheduling, and coming up with a full schedule for the next day would be nearly impossible.

    Jan. 1 starts a new month, and the system will return to full functionality then. Until that date, however, our operations will be very limited.

    1. Re:Some clarification by DenvilleSteve · · Score: 1

      Really good post! How could Comair IT Management not have been aware of the limitation in this critical application?

    2. Re:Some clarification by DenvilleSteve · · Score: 1

      or lawsuits by stranded passengers!

  87. No, by twitter · · Score: 1
    Try again, bitch. That's an IBM "solution" Comair was running. IBM, yes. Does that hurt? Feeling mighty retarded yet?

    No, I don't feel hurt by some highly moderated anonymous "inside information" that clearly contradicts

    --

    Friends don't help friends install M$ junk.

  88. Oh jesus christ... by Anonymous Coward · · Score: 1, Informative

    Some of you have no clue.

    The BS&T quotient on your average travel application is on the relatively nuts scale. Expedia, Travelocity, hotwire, priceline, whatever -- I'd ask that some of you with simple solutions go and speak to the lead travel-server dev for the product.

    You'll probably have to change pants after the conversation. Travel is stable, reliable, and generally rock-solid. The algo's for selecting airline flight prices or hotel room block-reservations are known and well-tested. The methods and protocols of communication are well-documented and generally straightforward.

    Until recently, it was all on hardware (And i'm speaking generally about the large travel providers -- Worldspan and Sabre come to mind) that was considered arcane. Ancient versions of Netware on an X.25 pad; screen-scrapers on top of it. Have Fun trying to modernize!

    This does not suprise me in the slightest. We are stressing our ancient systems more than ever these days, and it should not be a suprise when the occasional ancient application (ctime, folks) gets floor'd and dies a bloody death.

    It'll be patched in a month.

  89. Re:Huge earthquake by cammoblammo · · Score: 1

    Okay, nobody's going to read this, because it was yesterday's story and it's been modded into oblivion.

    But can somebody please explain to me how this is off-topic? Somebody posted something that wasn't, admittedly, perfectly on topic (but worth a `Flamebait' more than `Offtopic') and I replied to that exact post.

    I could cope with being modded Flamebait. Having a read, it probably does qualify. I'd even be happy with a Troll. But I made a perfectly relevant, and thus ontopic, reply to a post.

    I could go on, but there's no real point. Metamods, a bit of justice please?

    --

    Cogito, ergo sig.

  90. Re:Southwest refuses to drink the Kool-aid by PPGMD · · Score: 1

    Learned that long ago from a military man, I understood why later, he did a tour at the Pentagon.

  91. Re:Southwest refuses to drink the Kool-aid by Lord+Flipper · · Score: 1

    I have to wonder about the Healthcare area being worse.

    I worked, off and on, at a major VA Hospital. The VA, of course, like all parts of the Military system not related to procurement, is seriously underfunded. But, the only localised system in the entire facility I worked in, was the CT-scanners. They needed super hi-def imaging, so they used SGIs, and something that 'looked like' an X-window-type OS. I never ran those boxes, so I don't have better, more accurate info on that, sorry. (It might have been some version of solaris with a KDE or Gnome lookalike desktop, don't know. They were all leased from GE, anyway, and had really intelligent people running them. Every other department was tied-in to the same over-all 'system', as far as boxes and software, but separated from other departments (for obvious reasons, "Admissions" didn't need to be looking at/accessing Food Services, etc).

    As for the rest of the hospital, all of it was on a unified system. With the exception of paper-based Medical Records. You cannot imagine the enormity of the paper-based issue. It defies simple 'scanning' and conversion to electronic docs, due to the wide range of forms, crazy handwriting, etc. All that aside, maybe the 'normal' hospital systems are different. Or your post was referring to the Healthcare industry at the governmental level. Not sure.

    But (this is the 'plug'): If you, or anyone, care(s) about the kids and fellows that have already 'done their duty' (whether you're pro- or anti- political war is irrelevant, IMHO) be aware that the Administration (White House, power structure, whatever) is contemplating further severe cutbacks in health care, hazard duty pay, death benefits for families, etc, and write to your representatives, expressing your opinions.

    As far as the attitude, expressed by some, that the airlines (or any company, for that matter) should go bankrupt, to be 'taught a lesson', for management stupidity, or bean-counting decisions, may seem reasonable, but the only people hurt (and they are numerous) are the ones with the least responsibility for the failures, as a whole. Think of it this way: In a country that has a tax and Congress (whose primary purpose is to divy up tax dollars between competing corporations (aka the real 'Special Interests') working as a system designed to facillitate Corporate Welfare for the rich and powerful, the last thing we need are more 'little people' needing money for food, shelter, their families, etc.

  92. Maybe example of too much Horizontal vs Vertical by petepdx · · Score: 1

    Vendor A, B, C, ...; package Z, Y, X, ...;
    integrator 1, 2, 3 ...; contractor ....

    Maybe I'm too old, but it sure was nice when
    at least the applications were written in house,
    yea we would bitch when the writer of the code
    was long gone, but still, we could fix it.

    Oh well, I just wish I chould get use to saying
    "its the vendors fault" when ever something goes
    wrong.

    Conair made the descision not to replace/upgrade
    before January. End of Year, make the profits look better ? The roll of the dice ?

    -pete

  93. always preview by DeprecatedFeature · · Score: 1

    which i didn't do. i clipped out the bit referring to the floor buffers which also nailed us. sorry, thanks.

    --
    maybe one day i'll be smart enough to come up with a cool sig, too.
  94. War stories, get your war stories heah! by hey! · · Score: 1

    Make in the early 80's I used to work making applications for and servicing very early microscomputers from a company called IMS. They basically consisted of dishwasher sized box with a S100 bus into which a number of single board z80 computers plugged, sharing a single hard drive (that's "Winchester" to you sonny) controlled by a master computer, and comunicating using a hacked up CP/M compatible system called TurboDos. The result was one of the first multiuser computer systems suitable and affordable enough for routine office use; you could equip a half dozen people for paltry twenty or thirty thousand dollars.

    Anyway, the manufacturer discovered that the office environment was a bit more, uh, challenging than they anticipated. Static was turning out to be an issue. So one day they issued a tech bulletin saying that secretaries should stop wearing pantyhose. In the formal (in those days) frozen Northeast, women did not wear slacks to work, any more than men went to work without a tie. So the resourceful ladies began to bring spray bottles of fabric softener to work; they'd run into the ladies room to "freshen up" every hour or two.

    Of course aside from the inexperience of the manufacturers, we still had the ususal load of PEBKAC malarkey. We had one customer who was complaining about hardware crashes. Now you can imagine these weren't altogether uncommon given the primitive nature of these boxes and the operating system they ran, but he seemed to really having a bad time of it. We sent a tech, who on entering the customer's newly built and rather swanky computer room, noticed a dimmer switch on the wall next to the door.

    "What's this?" he asked.

    "Oh, that runs the computer."

    --
    Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
  95. Re:AIX? by Zeinfeld · · Score: 2, Funny
    While AIX, "Ain't unIX", might be described as Unix and the advert looks like HR drool, I'd still wager that some thing M$ failed something Sybase and that the AIX rumor is someone blowing smoke up your ass. Comparing reputations, AIX vrs. M$, the choice is clear.

    So lets think this one through for a second. The people who work there say the system that failled runs on AIX and that its the application thats gone whoopsie. So they obviously must be lying since everyone knows that the minute an application is ported to AIX all the bugs fall out of it.

    Of course with this type of thinking there is no way that reputations are ever going to change since every computer error is attributed to Windows even if it has nothing to do with the issue.

    I suspect that the HR advert is for a completely unrelated job.

    I also would hazzard a guess that the real problem at the place now is not the system anymore. The system is probably back up but they are now having to deal with planes that are in the wrong places and crews that have no flying hours left because of decisions that were taken manually while the system was down.

    --
    Looking for an Information Security student project suggestion?
    Try http://dotcrimeManifesto.com/
  96. Re:AIX? by Fallen_Knight · · Score: 1

    its a 24+ year old application, it should have been repalces years ago with a more mordren system.

    Not much a OS can do about a varible overflow in an application program, windows or AIX.

  97. Best (flame bait) quote for this situation. by notany · · Score: 1

    High on the list of things Lisp offers that most other languages botch is the idea that (+ x 1) for any integer x should return a number bigger than x in all cases. It seems like such a small point, but it's often quite useful. -- Kent M. Pitman

    --
    Dyslexics have more fnu.
  98. Re:Southwest refuses to drink the Kool-aid by MadHungarian1917 · · Score: 1

    Because SWA knows that IT is a critical part of their business not an "expense" to be minimized. Anybody with the proper qualifications can build an airline. To run one efficiently one needs to be in control of all the variables. To do this one needs efficient and effective IT.

  99. Looks like I was wrong. by twitter · · Score: 1
    So lets think this one through for a second. The people who work there say the system that failled runs on AIX and that its the application thats gone whoopsie. So they obviously must be lying ...

    At the time, you did not know that people who worked there said that. All you had to go on was a post by another Slashdotter claiming an anonymous person told them that. It was pure hearsay, but it seems to have turned out correct as Comair has come out and said the same thing.

    with this type of thinking there is no way that reputations are ever going to change since every computer error is attributed to Windows even if it has nothing to do with the issue.

    Actually, the reputation will not change because Microsoft will not change. This one woopsie just happened to not be Windows, that does not make the platform any more stable. Microsoft has been warned about the dangers of their system designs but has chosen to blunder forth.

    Given a choice, which one would you rather be responsible for? Which one would you use for a mission critical application? Unbelievably, many airlines use Windows as a terminal for ticketing and other very important functions.

    --

    Friends don't help friends install M$ junk.