Comair Done In by 16-Bit Counter
Gogo Dodo writes "According to the Cincinnati Post, the Comair system crash was caused by an overflowed 16-bit counter. Perhaps Comair should have paid for the software upgrade to MaestroCrew." You heard it here first...
I believe this will answer your question:
Tom Carter, a computer consultant with Clover Link Systems of Los Angeles, said the application has a hard limit of 32,000 changes in a single month.
"This probably seemed like plenty to the designers, but when the storms hit last week, they caused many, many crew reassignments, and the value of 32,000 was exceeded," he said.
So it sounds like a signed int.
It was a joke! When you give me that look it was a joke.
Here's the original post:
4 .html
1 85556
s ID=2275
Hi,
On Christmas Day last Saturday, Comair Airlines had to completely stop
flying
all of its planes due to computer problems. Comair blamed the computer
problems on their pilot scheduling software being overloaded after bad
weather earlier in the week forced many flights to be rescheduled. Comair
now hopes to have all of its 1,100 daily flights restored by tomorrow.
An article which was published today at the Cincinnati Post Web site
provides some interesting details of a software failure in Comair's pilot
scheduling software:
How it happened
http://www.cincypost.com/2004/12/28/comp12-28-200
According to the article, Comair is running a 15-year old scheduling
software package from SBS International (www.sbsint.com). The software has
a hard limit of 32,000 schedule changes per month. With all of the bad
weather last week, Comair apparently hit this limit and then was unable to
assign pilots to planes.
It sounds like 16-bit integers are being used in the SBS International
scheduling software to identify transactions. Given that the software is 15
years old, this design decision perhaps was made to save on memory usage.
In retrospect, 16-bit integers were probably not a good choice.
An anonymous message posted to Slashdot the day after Christmas first
described the software failure at Comair:
http://slashdot.org/comments.pl?sid=134005&cid=11
Earlier this year, an overflow of a 32-bit counter in Windows shut down air
traffic control over southern California for 3 hours:
Microsoft server crash nearly causes 800-plane pile-up
http://www.techworld.com/opsys/news/index.cfm?New
This problem occurred because of a known design flaw in older versions of
Windows:
http://tinyurl.com/5n9gc
Richard M. Smith
http://www.ComputerBytesMan.com
Having once done tech support for the Maestro program used by Comair (and other scheduling software for other airlines as well), I think the software is junk. The employees undoubtedly said "I told you so!" when it broke, because they hated it as much as the support team did. IMO the airline didn't bother upgrading because they didn't think the old version was broken enough or outdated enough to warrant it.
bet *now* they'll upgrade, but until this particularly hairy situation arose, they didn't really see a need to upgrade a computer scheduling system that had been working great for them.
/. when you can get moded +5 insightful without RTFA AND posting verbal vomit....
RTFA RTFA RTFA - The new system goes live in January. Good god its like herding cats around here.
Gotta love
Apple free since 1990!
Maybe Maestro should just die. My friend is a flight attendant for Southwest and has to use Maestro to plan her schedule. To use it she has to citrix into their main server and wait for an open client (I assume they have either a license or horrible programming restriction on concurrent users). On the very day that the new schedules are posted, it can take hours to log in. It's a joke.
This stuff could be handled by a team of a dozen web based programmers (Java? C? ASP? LAMP? You pick.) in a few months. It's not difficult.
RTFA
It was a signed integer. The problem occured at 2^15 (32768) (although the article reported it as 32,000)
My wife works for Comair here in Cincinnati. The computer system under discussion was in the process of being upgraded prior to the crash. Comair's IT recognized weaknesses in the current system some time ago. The upgrade just happened to be taking a little longer than anticipated. Timing is a bitch, isn't it?