Air Traffic Snafu: FAA System Runs Out of Memory
minstrelmike writes: Over the weekend, hundreds of flights were delayed or canceled in the Washington, D.C. area after air traffic systems malfunctioned. Now, the FAA says the problem was related to a recent software upgrade at a local radar facility. The software had been upgraded to display customized windows of reference data that were supposed to disappear once deleted. Unfortunately, the systems ended up running out of memory. The FAA's report is vague about whether it was operator error or software error: "... as controllers adjusted their unique settings, those changes remained in memory until the storage limit was filled." Wonder what programming language they used?
I have actually seen something similar to this before, also involving an Air Traffic Control.
They were having some problem in handling "Large Messages", I am not sure of the exact details / circumstances - I was only peripherally involved. Anyway, the programmer wrote these to a file, then they were processed asynchronously and deleted. This minor change was tested - as usual at the site - by someone shooting an hour's production traffic through the test system and checking for unexpected aborts or other abnormalities. All was fine, the spooling file was 1% full.
The patch went online. 4 days later (it was a Sunday morning and it was snowing) the file hit some limit and refused to accept new messages. At that moment things went "Keystone Cops".
The switch was duly made and everything was working again.
It turned out that the deletion of the processed records had a bug. One hour of live data left the file 1% full. 100 hours . . . do the math. It took 5 or 10 minutes for the programmer to fix the problem, he could have done it live on the Sunday if anyone had bothered to tell him what was going on.
One of the lessons from that is also relevant here - one hour of live data left the file 1% full. I'd bet that they were testing that the new feature worked, not looking for hidden side-effects.
Mielipiteet omiani - Opinions personal, facts suspect.
Surgeons leave tools in patients because they have no process when operating on a patient. Read the Checklist Manifesto sometime and read what the author has to say about best practices in the operating room. Everyone makes mistakes. The process we follow is what allows us to catch those mistakes, and prevent any mistakes from re-occurring.
the growth in cynicism and rebellion has not been without cause