Slashdot Mirror


Computer Error Grounds Japanese Flights

zephiros writes "Mainichi Daily News reports that a "computer glitch" in Tokyo air traffic control systems resulted in the cancellation of 203 flights this weekend. At 7am Saturday, the error "caused the names of airlines and flight numbers to disappear from radar screens." A Japan Times article suggests the problem may be related to upgrades on a system which exchanges flight plans with the Defense Agency. Makes one wonder about the integration and maintenance risks of systems like CAPPS II."

41 of 154 comments (clear)

  1. Risk Maintentance 101 by gostats · · Score: 4, Interesting

    I've work quite a bit with risk maintenance. Most often situations like these increase the budget for disaster prevetion and other related expenses. This failure *should* make fewer failures in the future and generally a safer airport. But then again that all depends on how much passion they have for their job.

    Maybe I should take a trip to Japan in a few months.

    1. Re:Risk Maintentance 101 by kryonD · · Score: 3, Informative

      Actually, the damage was almost minimal to the Japanese air system. The delay only lasted 50 minutes. Unlike American travellers, Japanese people will quietly and orderly board a fully booked 747 in under 20 minutes. If asked to hurry, they will board it even faster. That combined with Narita and Haneda's ability to handle traffic far above their average had most flights back on time before noon. Only a small handful of international passengers may have had to rebook a connecting flight. Domestic flights are almost always direct.

      As far as risk management, had there actually been a perceived emergency due to the malfunctioning radar display system, the airports would default to an agreement with Yokota and Atsugi US airbases to provide fallback flight control facilities.

      This is really a non news item. The system administrator correctly applied upgrades during non-critical operation time. (i.e. not during the main business week) The problem was identified early on and corrected pretty damned quick. This happens hundreds of times a week all over the world. Had the glitch actually halted the entire Japanese air system for a long period of time, then it would make more sense.

      --
      I've dirtied my hands writing poetry, for the sake of seduction; that is, for the sake of a useful cause. --Dostoevsky
  2. Re:Windows? by SILIZIUMM · · Score: 4, Funny

    Sure look at the photo : http://web.ukonline.co.uk/eric.price/humour2/AIRPO RT.jpg

  3. Why? by Cyno01 · · Score: 3, Insightful

    Why even bring up CAPPSII, is has nothing to do with air control, only with passenger data.

    --
    "Sic Semper Tyrannosaurus Rex."
    1. Re:Why? by abh · · Score: 3, Funny

      Because this is Slashdot, where every article must somehow involve a violation of rights by the big bad government.

    2. Re:Why? by 56 · · Score: 4, Insightful
      Because the program that caused the error was similar to CAPPSII.

      To quote the article:

      A Japan Times article suggests the problem may be related to upgrades on a system which exchanges flight plans with the Defense Agency.

    3. Re:Why? by zephiros · · Score: 3, Interesting
      Because it's a large system that will have to integrate with numerous airline systems from god knows how many vendors. And it will need to be maintained and patched. And it's a potential single point of failure (from a software standpoint; obviously they could stripe it across as much hardware as needed).

      Even if CAPPS is only connected to ticketing and passenger information, a bug could result in a pretty nasty transportation snarl. Suppose airlines are unable to issue boarding passes for an hour, or an unusually large number of people were flagged for screening.

      For any of these total-information-awareness type systems, one has to ask "what happens when some part of the patchwork breaks?" Even the most diehard "I have nothing to hide from my government" type understands that multi-hour flight delays are bad.

  4. What does CAPPS II have to do with this? by revmoo · · Score: 5, Insightful
    "Makes one wonder about the integration and maintenance risks of systems like CAPPS II."

    Does that seem like flaimbait to anyone else? Computers crash all the time, granted steps can be taken to ensure redundancy, but this is nothing new. This problem has nothing to do with the CAPPS II system other than the fact that they are both computerized systems, I'm not trying to defend CAPPS II, I just don't think that it is any way related to this this tokyo airlines problem. Computers crash, it's a fact of life, the real question here, is why weren't there multiple redundancies in place for such a mission critical application.

    --
    I would expect such blatant racism on Fark, but on Slashdot? Mods please ban this asshole.
    1. Re:What does CAPPS II have to do with this? by Phroggy · · Score: 4, Insightful

      Computers crash, it's a fact of life...

      Been listening to Microsoft too much lately, eh? It shouldn't be something we take for granted.

      --
      $x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
      $x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
    2. Re:What does CAPPS II have to do with this? by revmoo · · Score: 3, Insightful
      I take it you've never had a kernel panic then...

      What about hardware failures? Even the best code still has bugs in it, and the potential to fail

      --
      I would expect such blatant racism on Fark, but on Slashdot? Mods please ban this asshole.
    3. Re:What does CAPPS II have to do with this? by Anonymous Coward · · Score: 2, Funny
      Even the best code still has bugs in it,

      I've never been able to crash helloworld.c.

    4. Re:What does CAPPS II have to do with this? by The_dev0 · · Score: 2, Funny
      Yeah, but that is just a cover for this conspiracy. I was on one of these flights, and the captain announced:

      "Uh, folks, we're experiencing some moderate Godzilla-related turbulence at this time, so I'm going to go ahead and ask you to put your seatbelts back on. When we get to 35 thousand feet, he usually does let go, so from there on out, all we have to worry about is Mothra, and, uh, we do have reports he's tied up with Gamera and Rodan at the present time. Thank you very much."

      Don't believe the lies!

      --
      Never fight naked, unless you're in prison...
  5. redundancy by Brigadier · · Score: 4, Insightful



    Am I the only one wondering why there was no redundancy. As in effective. One would think something as important as airtraffic control should have several layers of complete redundancy. As in if a control tower has say a catastrophic failure there is another a.) civilian b.) military control center able to hand off instructions. which would include all flight information. including passengers, cargo flight log, flight plan everything.

    1. Re:redundancy by Mr+Rohan · · Score: 2, Interesting

      Am I the only one wondering why there was no redundancy.

      Typically there are redundant systems as well as manual processes - in Sydney Australia there's even a redundant tower, which is used if the main tower stops working (e.g. major power problems).

  6. Re:This is wonderfil news for opensource! by fewnorms · · Score: 2, Funny

    Euhhh . . . . wasn't that Linus Thorvalds? =]

    --
    Veni, Vidi, Velcro!
  7. ATC and CAPPS II are NOT connected by MyNameIsFred · · Score: 4, Informative
    Obviously upgrades to Air Traffic Control (ATC) systems and communication links to ATC can be cause problems. There is a significant safety of flight issue. Therefore, the FAA maintains strict control of these systems. And in fact, has a dedicated network reserved for ATC. Only "essential" programs and systems are allowed to connect to it.

    Passenger listings, airline booking systems, and related software are NOT connected to the ATC network. Since CAPPS II looks at booking data, credit card info, and related data, it would not be connected to the ATC network.

  8. how do they test the system? by NotAnotherReboot · · Score: 3, Interesting

    Out of curiousity, how does one go about testing a system like this? Do they test changes to the code in a live system? (not using the newer version, just looking at it along with the old one). Are there flight emulators that will feed fake data to the software which in turn displays what it is receiving? Do they do extensive testing between new systems that perform different functions yet interface as well? It seems to me a large part of the budget for these projects has to be testing.

  9. Anyone see the other news on this site?! by caluml · · Score: 4, Insightful

    Anyone see the other news on this site?!

    Police recover rock climber's body after fatal fall
    Motorcyclist dies after being hit by a truck
    61-year-old jobless man fatally abuses senile mother
    Dad dies of shock after son's repeated beatings
    Comic questioned over hitting woman in restaurant
    Death row inmate dies in prison cell

    Can someone in Japan please confirm that this is a freaky, awful day, and that Japan isn't normally this bad?

    Although that last one is quite ironic.

  10. Re:Computers are just too fragile? by caluml · · Score: 2, Funny

    Your hot-standby mouse should have kicked in and taken over without you noticing.. ;)

  11. Re:2 things I want to know... by Anonymous Coward · · Score: 5, Informative
    > 1) How the hell did the flights get DOWN once the radar died? It said they disappeared from radar, and you don't keep radar on the planes that are on the ground, so....?

    Read the article. It says that just the airline name and flightnumber tags printed beside the radar blips vanished. The radar worked just fine.

    > 2) Whose bright idea was it to do a "systems upgrade" while there were large, flying metal objects carrying many people still in the air?!?!

    Read the article. The change was made early in the morning on a weekend. When would you suggest?

    > Wouldn't you do a test run, install it on a backup system, or one that's not systems-critical?

    The article (did you read it?) hints that might have been a networking problem when they integrated the military database with the civilian database. A backup system is a good first start, but isn't always the same as the production system. Network problems can't always be perfectly tested or simulated.

  12. The explanation by anon*127.0.0.1 · · Score: 5, Funny

    I think it's obviously Y2K related. Civilization as we know it should be coming to an end in a week or so.

    --
    I am NOT a man!
    I am a free number!
  13. Who needs computers? by ignoramus · · Score: 2, Funny

    "Computers are just no good," said one 51-year-old company manager leaving for Sapporo. "I'm sure they're helpful, but they're just too fragile." Uh, yeah, I also have a feeling they may be a little helpful. Good luck controlling 70 percent of all air traffic in Japan with abacii and the Everyday Memory Builder...

  14. ATC?It's a big tower, but that's not important now by Alien+Being · · Score: 4, Funny

    Loger Murdock: We have crearance Crarence.
    Captain Oveur: Loger, Loger. What's our vector Victor?
    Tower voice: Tower's ladio crearance, over!
    Captain Oveur: That's Crarence Oveur! Oveur.
    Tower voice: Loger.
    Roger Murdock: Huh?
    Tower voice: Loger, over.
    Roger Murdock: Huh?
    Captain Oveur: Huh?

  15. some glitch by LuxFX · · Score: 4, Interesting

    If this was an error in the code, then how were they able to repair it in just 54 minutes? That's a pretty narrow window when it comes to rounding up the programmers, searching through the source, then repairing, testing, redistributing to the entire system, and rebooting the whole thing.

    Kind of like how Hugh Jackman can hack into the DoD from a computer he's never touched before in Swordfish.

    I'm tempted to think that this was much more human error than a bonefide "computer glitch". Maybe that 54 minutes was the time it took to call in their expert, have him look at the system, and declare "Why, you must have hit F11, which toggles the flight information. Just hit it again and it comes back."

    --
    Punctanym: alternate spelling of words using punctuation or numerals in place of some or all of its letters; see 'leet'
  16. Yes, it is that bad by Anonymous Coward · · Score: 2, Informative

    I've lived here for several years now, and the above stories really are an average selection. On a true freaky, awful day, you would see stories far worse.

  17. Computer or Programmer error? by Technomancer · · Score: 5, Insightful

    Was it computer that failed some operation or lousy programmer who made a mistake in the program?
    I am sick of people complaining abour "computer errors" when they are at fault.

  18. THIS is why you don't upgrade by NineNine · · Score: 2, Interesting

    ... or at least upgrade as little as possible. No matter how much planning and testing is done, upgrades can and will screw things up. I'm always reading about , "luckily, you can recompile the new kernel every week or so", or, "a new version is coming out so I have to upgrade" and I'm thinking... yeah, at home, maybe, if you have nothing better to do. But this is an extreme example of why companies that are worth their salt don't upgrade at the drop of a hat.

  19. Re:2 things I want to know... by mickwd · · Score: 2, Insightful

    "Whose bright idea was it to do a "systems upgrade" while there were large, flying metal objects carrying many people still in the air?!?!"

    Actually, there are planes in the air most hours of the day. There is no time when planes aren't flying.

    The best time (when there are fewest planes) may be at night. But that's just the time when the people actually doing the upgrade are going to be half asleep.

    "Wouldn't you do a test run, install it on a backup system, or one that's not systems-critical?"

    I'm sure they did. But the live system is bound to be different in some small way
    - maybe a different (more powerful) system, which might cause different timing issues;
    - maybe a different disk configuration, perhaps with a file system running out of space (e.g. more online logs);
    - maybe the live database (if any) is different to that on the backup system.

    These things can easily go wrong. In my experience, it's vital to ensure you have a way of getting some sort of system operational if you do screw up. Maybe ensure a backup system is capable of running live first, then attempt the update of the live system, and if it goes wrong, you have a backup system capable of operating until you can correct the screw-up.

  20. DATELINE: JAPAN by infonography · · Score: 4, Funny
    DATELINE: JAPAN (maybe)

    Computer related story about a programming error halting Air traffic control system in Japan is entered in a pre-posting queue to Slashdot.

    DETAILS: Limited and not noteworthy.

    REAL NEWSWORTHYNESS: Not really. No deaths reported.

    DATELINE: SLASHDOT HQ

    PREPOST WORD SEARCH: code runs check for Important items. - keyword search generate matches for two known hot item words [COMPUTER & JAPAN]

    HENTAI AND GIANT ROBOT FACTOR?: n/a

    CUTE BABE?: n/a

    SEARCH FOR BIG NAMES- JOBS, ELLISON, GATES, TORVALDS, STALLMAN, CowboyNeal?: n/a

    Microsoft Bashing Factor: High

    PRIMARY ACTION TAKEN: Story authorizes posting of story to Slashdot

    SECONDARY ACTION TAKEN: activate Inquisitors of the Holy Order of Linux, First Poster Squad IM'ed, new Sex story featuring Whicky the slashdot cat beta authorized.

    STATUS REPORT: Status Quo Achieved.

    RESOLUTION: Computer error found between keyboard and chair

    --
    Sorry about the writing. Robot fingers, you know? Cliff Steele in DOOM PATROL #23
  21. Slashdotted by banzai75 · · Score: 5, Funny

    At 7am Saturday, the error "caused the names of airlines and flight numbers to disappear from radar screens."

    I'm guessing there was an article posted yesterday on Slashdot that linked directly to their system.

  22. Everything fails. by T-Ranger · · Score: 2, Insightful
    Water mains under the street break. Suspension bridges collapse. Buildings collapse. Ships founder and sink. Breaks on cars fail causing crashes. Trains derail.

    No, computers shouldnt crash. But they will eventualy fail, just like everything else will.

  23. This 80's Show by Tablizer · · Score: 2, Funny

    If this was an error in the code, then how were they able to repair it in just 54 minutes? That's a pretty narrow window when it comes to rounding up the programmers, searching through the source...

    If this was the 80's, I could say: "Their programmers are Samari trained, and if they don't work fast and accurate, they have to commit hari kari (disembowelment) in front of their peers.", and everybody would believe me. Guess I'll just have to make up shit about Islam instead.

  24. Japanese software industry. by coday · · Score: 2, Insightful

    Working in the software industry here in Japan for the last two years I have had my eyes opened to the true state of affairs. Most 'westerners' have an idealogical view of the high-tech world of Japan. This is far from reality. The fact is that software development here is at best poorly done, little design, short timelines (okay that one is universal), and lack of quality assurance. I can't say why this is the case, but shoddy products are in abundance. It may be trying to shove a relatively new industry into an old style organization, or the lack of individualism, I'm guessing at these. This story does not surprise me. All I know is I am looking forward to returning to the industry in Canada.

  25. A few thoughts on redundancy. by muonzoo · · Score: 5, Informative
    I think this is one of those rare times where I have an opinion that's actually relevant. :-)

    First, people need to understand that no Bad Things will happen if an ATC system goes offline while planes are under it's jurisdiction. ICAO member countries (and most nations for that matter) have strong procedural rules in place that keep planes separated without the help of radar. This is espcially true in the enroute case. (Area control centres handle overflight and enroute traffic. Eveyone is separated by at least 1000' vertical and 3 miles horizontal at all times. The altitude restrictions and clearances that each pilot receives are chosen specifically so that in the even of loss of communications, the pilot can continue to his "clearance limit" without any problem. Well, you ask, what happens when he gets to his clearance limit and still isn't communicating with air traffic control? They hold. This is all laid our quite clearly. These rules have been around since before RADAR because thats the way it was done.
    Just take a look at the RADAR coverage map of Canada (one is visible at the link above). There are lots of places that don't even HAVE radar coverage.
    The old tried and true clearance and time/speed based conflict resolutions works and works well.

    Secondly, and more imporatantly, there really isn't any news in this article. It's scaremongering. This happens all the time. It's an inconvenience, but rarely a saftey concern.

    For those who asked about it; yes, typically a new system is run in parallel with the legacy system for a period of time (sometimes 24 months) before it is used as the primary control. Notice that the old system is live and the new system is shadowing. That way, anomalies that are found do not impact any flights.
    [*flame proof underwear on*]
    Is it just me, or does the press dig around for 'news' in about as diligent a manner as Slashdot?
    1. Re:A few thoughts on redundancy. by Microlith · · Score: 2, Informative

      Mainichi Daily News (daily daily news) is often regarded (especially MDN english) as being a tabloid.

      Generally they go for sensational headlines and stories (their "Wai-Wai" section is the most popular).

    2. Re:A few thoughts on redundancy. by lommer · · Score: 2, Funny

      Does the ICAO [icao.org] have strong procedural rules in place on what to do in the event of a slashdotting?

      Might be time to get out the rulebook...

    3. Re:A few thoughts on redundancy. by Oswald · · Score: 2, Informative
      Hmmm. Perhaps I can help with a few misconceptions here, based on over 19 years of air traffic control experience at Atlanta Center.

      First, people need to understand that some bad things might happen if enough ATC systems go offline at once. Bad things are less likely to happen, as the poster states, if the outages occur in the enroute (my) environment, because the planes are generally farther apart than in terminal airspace. (Picky notes: enroute separation is 5 miles (not 3) OR 1000 feet--not AND--but I'm sure that was just a misstatement.) But they're not THAT far apart. This post makes it sound like any time we want to we can drop back to good old non-radar control. Well, standard separation in a non-radar environment is as high as 10 minutes flying time (longitudinally, which is to say along the same route). That's a lot more than the five miles I was using when the radar was working. The transition will be a bit tricky, and if I have to do it for any length of time, traffic will slow to a virtual standstill.

      What's more, it is simply not true that aircraft clearances cover eventualities like lost communications or lost radar. This is a myth, and one that new on-the-job trainees quickly get de-programmed out of their heads. It's not possible to issue clearances that are good all the way to your clearance limit--every aircraft that departs, deviates for weather, changes destination, or even changes altitude (say, for turbulence) has the potential to screw up everybody else's "perfect" clearances. We truly don't even try to come up with such clearances. As for the idea that everybody will get to their clearance limit (actually, it's the published holding pattern for the route they're on to their clearance limit--probably that was simplified for clarity) and hold, that's great until you get the part about "until their estimated time of arrival" (original poster left that part out). Now you have planes dropping out of holding (and BTW, who assigned altitudes to make sure 6 aircraft didn't hold at the same altitude when the radios went out?), not necessarily from the bottom first, and flying to their destination airport. It's a 5-times-a-day event at hubs like Atlanta for 30+ aircraft to be scheduled over one fix in an hour--what are we gonna use for sequencing? TCAS? Common Traffic Advisory freqs? Get serious.

      I'm not trying to scare anybody here. There are redundant systems (and they're pretty well-seasoned at this point anyway, so they almost never break), and ways to get hold of aircraft through company radios, and it really is a big sky. But it doesn't do anybody any good to pretend that it's not dangerous to try to sort out a major arrival rush by looking in your fish-finder and chatting with the other pilots til the controller gets back.

      ATC was invented many decades ago because airplanes flew into each other without it. Those were props, flying to destinations with a tenth the volume of a modern hub. Maybe someday we'll have some cool hive-mind software that will allow the airplanes to sort everything out between themselves, and there won't be anymore ground-based controllers. (I won't see it in my career, cause I retire in less than 6 years.) Until that time, controllers and reliable control equipment will continue to be necessary for safety as well as expediency.

  26. Bean Counters by Jetson · · Score: 3, Insightful

    Redundancy started to suffer when the bean counters took over. Air Traffic Control is no longer an exercise in absolute safety but one of "risk management". This means that when the system designer says "I want a fully redundant hot standby system in a separate building powered from a different grid feed and on its own battery backup" the bean-counters say "you can have a warm standby (because we wouldn't want to have to pay for two software licenses) in a separate rack in the same computer room (have you looked at the cost of raised flooring lately?)". Instead of asking "what can we do to avoid a failure?" they tend to ask "how long will each failure last and how much will that cost us in lost revenue?"

  27. Re:Was It Linux Based? by fitten · · Score: 4, Insightful

    If it had been open source, this problem would have never happened. With millions of eyeballs detailing the code, we'd have found and corrected this bug before it ever occurred. Whats more, if the flaw did get thru, the operator could have jumped in and fixed the problem real time.

    OMG... man are you brainwashed. First, as impossible as it may seem (gasp), open source software has bugs in it too. Second, even if it were open source, what million eyes would be looking at the code? I bet there isn't any source in the OSS archives that a "million eyes" have looked through. Third, you assume that the operator is an a) programmer, and b) at all familiar with the code enough to debug it and understand just what in the hell the code is doing anyway. Keep repeating your mantras fan boy, may they always give you a warm tingly feeling as you say them.

  28. The Real Story??? by rm3friskerFTN · · Score: 2, Interesting
    From The DrudgeReport on 02MAR2003 @ 2204 PST

    Intelligence reports about the terrorist threat to the Hawaiian harbor bombed by the Japanese in World War II were sent to senior U.S. officials in the past two weeks and coincided with reports of the planning of a major attack by Osama bin Laden's terrorist group.

    GERTZ: Terrorists aim at Pearl Harbor; Plan to hijack airliners, fly them into nuclear subs

    --

    I believe Juanita

    1. Re:The Real Story??? by rm3friskerFTN · · Score: 2, Interesting

      The Washington Times has still more details.

      --

      I believe Juanita