Slashdot Mirror


Comair Done In by 16-Bit Counter

Gogo Dodo writes "According to the Cincinnati Post, the Comair system crash was caused by an overflowed 16-bit counter. Perhaps Comair should have paid for the software upgrade to MaestroCrew." You heard it here first...

4 of 441 comments (clear)

  1. From Another article... by bje2 · · Score: 4, Interesting

    from information week

    "The computer failure that grounded an airline's entire fleet over the Christmas weekend and stranded thousands of travelers was due to creaky software that couldn't count higher than 32,768." ...

    According to the Post, the software -- which tracks all details of crew scheduling, including how long they have flown (an FAA regulation restricts airtime), and logs every change -- has a 16-bit counter that limits the number of changes to 32,768 in any given month. ...

    to be fair (although it's not an excuse), but 32K crew changes in a month? that's like 1,000 a day? that's crazy!...

    --

    "Facts are meaningless. You could use facts to prove anything that's even remotely true." - Homer Simpson
  2. Let's not be too hard.. by Staplerh · · Score: 4, Interesting

    This was a horrible chain of events that severely inconvenienced a lot of people for Christmas, and I would be hoppin' mad if I was in any of their places. However, let's not jump on ComAir too hard, IMHO. From TFA:

    "This probably seemed like plenty to the designers, but when the storms hit last week, they caused many, many crew reassignments, and the value of 32,000 was exceeded," he said.

    It's true, it was an extreme connection of circumstances... horrid weather (heck, there was snow in some Texas town for the first time in like 80 years or something, read it in some glurge article) coupled with the winter holidays. They should redesign their system and admit that they've grown to a level where their system is unable to hand extreme circumstances, and this should serve as a great wake-up call for them.

    In the past I've always chuckled at the thought of 'upgrading for the sake of upgrading', but I suppose this is one case where an earlier upgrade could have saved them millions and made a lot of people's holidays better.

    --
    "There's no success like failure, and failure's no success at all."
    - Bob Dylan
  3. There was a high profile example of this problem by hey! · · Score: 4, Interesting

    back in the early 80's. There was a big financial company that had an automated system that watched the prices of certain commodities and issued automated trade orders. The transactions where stored in arrays addressed by 16 bit signed integers, with the (now) highly predictable result on the first day that trading volume exceeded 16384 transactions. Since in C arrays are just syntactic sugar for pointer arithmetic, the system started executing trades based on "data" from random bits of heap memory. This apprently went on for some time before a human being figured out something had gone wrong, and (reportedly) the company lost billions in a single day. This might be somewhat exaggerated, since the event now has passed into folklore.

    In any case, this is one of those incidents like the Therac-25 accidents that experienced programmers should always have in mind.

    --
    Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
  4. Happened to me too by wandazulu · · Score: 4, Interesting

    I worked at a bank in the early 90s that had a trading system based on SQL Server and the client was written in Visual Basic 3. Apart from every other bad design choice in this system (I inherited it when the designers got promoted and started working on another, even bigger system), the all important record counter was an integer, so when trade 32768 was posted, the application crashed, and simply could not be started again, because the first thing it did was try to show the current total (it was written for operators to use, not traders). Worse was that the counter variable wasn't a global, and it was often times a stack variable, and always with a different name (sometimes iCounter, sometimes iCount, sometimes x).

    The upshot was that I was able to convince management to totally scrap it and allow me to write a new one. The downside was that the idiot who designed the original system went on to spend 100 million dollars on this new, grandious system that too was eventually scrapped, but he knew long before that his turkey wasn't going to fly, so he quit and became a lead architect at some other company.

    *Sigh*...okay, back to coding.