Slashdot Mirror


Comair Done In by 16-Bit Counter

Gogo Dodo writes "According to the Cincinnati Post, the Comair system crash was caused by an overflowed 16-bit counter. Perhaps Comair should have paid for the software upgrade to MaestroCrew." You heard it here first...

13 of 441 comments (clear)

  1. Well... by Tuxedo+Jack · · Score: 5, Funny

    It seems that 16 bits and 640K wasn't enough for them after all.

    --

    Striking fear in the authors of godawful fanfiction, I am here, appearing in darkness, Tuxedo Jack!
  2. Bugtraq covered this as well.. by EvilStein · · Score: 5, Informative

    Here's the original post:

    Hi,

    On Christmas Day last Saturday, Comair Airlines had to completely stop
    flying
    all of its planes due to computer problems. Comair blamed the computer
    problems on their pilot scheduling software being overloaded after bad
    weather earlier in the week forced many flights to be rescheduled. Comair
    now hopes to have all of its 1,100 daily flights restored by tomorrow.

    An article which was published today at the Cincinnati Post Web site
    provides some interesting details of a software failure in Comair's pilot
    scheduling software:

    How it happened
    http://www.cincypost.com/2004/12/28/comp12-28-2004 .html

    According to the article, Comair is running a 15-year old scheduling
    software package from SBS International (www.sbsint.com). The software has
    a hard limit of 32,000 schedule changes per month. With all of the bad
    weather last week, Comair apparently hit this limit and then was unable to
    assign pilots to planes.

    It sounds like 16-bit integers are being used in the SBS International
    scheduling software to identify transactions. Given that the software is 15
    years old, this design decision perhaps was made to save on memory usage.
    In retrospect, 16-bit integers were probably not a good choice.

    An anonymous message posted to Slashdot the day after Christmas first
    described the software failure at Comair:

    http://slashdot.org/comments.pl?sid=134005&cid=111 85556

    Earlier this year, an overflow of a 32-bit counter in Windows shut down air
    traffic control over southern California for 3 hours:

    Microsoft server crash nearly causes 800-plane pile-up
    http://www.techworld.com/opsys/news/index.cfm?News ID=2275

    This problem occurred because of a known design flaw in older versions of
    Windows:

    http://tinyurl.com/5n9gc

    Richard M. Smith
    http://www.ComputerBytesMan.com

    1. Re:Bugtraq covered this as well.. by dmccarty · · Score: 5, Insightful
      It sounds like 16-bit integers are being used in the SBS International scheduling software to identify transactions. Given that the software is 15 years old, this design decision perhaps was made to save on memory usage. In retrospect, 16-bit integers were probably not a good choice.

      Rubbish. Don't judge yesteryear's programs by today's standards. Back then 4MB RAM cost more than $200. That's how important memory conservation was. In 1989 using an int was a perfectly acceptable choice. If you were programming back then you'd know how loathe programmers were to use longs when they didn't have to. (Granted an unsigned int would've worked better here, but that 64K limit could've also been reached.)

      The software spec probably says something to the effect of "Don't attempt to schedule more than 32,767 crew changes." If you're running software that's more than a decade old you need to know what the limits of your software are.

      --
      Have fun: Join D.N.A. (National Dyslexics Association)
    2. Re:Bugtraq covered this as well.. by imsabbel · · Score: 5, Informative

      200$ for 4MB? Thas more 1994 than 1989...

      --
      HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
  3. Re:Maybe it had "worked just fine" for them? by kirun · · Score: 5, Insightful

    It's interesting because it provides a lesson in software design - arbitary limits will trip you up eventually. It's not as if nobody knew to avoid them before, though.

    --
    I'm scared of numbers that can't be written as a fraction. It's an irrational fear.
  4. Re:Maybe it had "worked just fine" for them? by jedidiah · · Score: 5, Insightful

    This assumes that they had the resources. Given the current competitive environment in terms of consumer price and fuel costs, it would not be surprising if IT got the short end of things.

    --
    A Pirate and a Puritan look the same on a balance sheet.
  5. Let's try to remember by CodeWanker · · Score: 5, Funny

    That when you are talking about an airline, a COMPUTER crash is by far the least traumatic kind you can have.

    --


    "Wow. Now THAT'S a lot of angry Indians." - Lt. Col. George Armstrong Custer
  6. Re:Comair? by buckeyeguy · · Score: 5, Insightful
    Potential trolls aside, Comair is a regional air carrier, based at the Greater Cinci airport, that was bought up by Delta, and turned into their secondary route provider. They handle both short and medium-range non-stop flights (i.e. Ohio to Atlanta or Orlando). So it's more closely-related than the code-sharing arrangement that some carriers have.

    Now my question would be, since they're owned by Delta, why wouldn't Comair flights be handled within Delta's own reservation/flight tracking system?

    p.s. I've traveled through CVG, on Delta, during the holidays. Not anymore... One weather-delayed flight and the whole system falls apart.

    --
    I'd have a personalized plate on my car, but "toxic bachelor" won't fit into 7 letters.
  7. Re:From Another article... by Anonymous Coward · · Score: 5, Funny

    >... 32K crew changes in a month? that's like 1,000 a day? that's crazy!

    You arent by any chance the original developer of this software?

  8. Error checking is the real culprit by Anonymous Coward · · Score: 5, Insightful

    So it turned out to be problematic to use a signed 16-bit integer.

    But the real problem is a lack of error checking. It sounds like the code had something like:

    int num_crew_changes; ...
    crew_change_list[++num_crew_changes] = blah;

    And the counter wrapped and the system crashed.

    The code should have said:

    if (num_crew_changes == MAXINT)
    {
    ERROR(E1234, "too many crew changes");
    }

    The system is still degraded after 32767 crew changes. It might be so degraded as to be unusable. But at least the company would know the extent of the degradation and could pull out the appropriate "Plan B". It's much safer and better to work around a known problem of known scope than to work around a system crash when you don't know the exact problem.

  9. Re:Maybe it had "worked just fine" for them? by Remlik · · Score: 5, Informative

    bet *now* they'll upgrade, but until this particularly hairy situation arose, they didn't really see a need to upgrade a computer scheduling system that had been working great for them.

    RTFA RTFA RTFA - The new system goes live in January. Good god its like herding cats around here.

    Gotta love /. when you can get moded +5 insightful without RTFA AND posting verbal vomit....

    --
    Apple free since 1990!
  10. ComAir Now Hiring IT People by JavaDev04 · · Score: 5, Funny

    Hey everybody! Comair is hiring Unix System Administrators and IT Software Engineers! http://www.comair.com/hr/other/

  11. Re:actually... by nuclearspike · · Score: 5, Funny

    I heard it from the ComAir desk at the airport when I was trying to get home. :(