Slashdot Mirror


Air Traffic Snafu: FAA System Runs Out of Memory

minstrelmike writes: Over the weekend, hundreds of flights were delayed or canceled in the Washington, D.C. area after air traffic systems malfunctioned. Now, the FAA says the problem was related to a recent software upgrade at a local radar facility. The software had been upgraded to display customized windows of reference data that were supposed to disappear once deleted. Unfortunately, the systems ended up running out of memory. The FAA's report is vague about whether it was operator error or software error: "... as controllers adjusted their unique settings, those changes remained in memory until the storage limit was filled." Wonder what programming language they used?

234 comments

  1. But, but, but... by Anonymous Coward · · Score: 3, Funny

    But nobody should ever need anything more than 640k!

    1. Re:But, but, but... by Anonymous Coward · · Score: 0

      That myth has been debunked enough, but to be fair the original alleged quote didn't say it was all anyone would *ever* need.

    2. Re:But, but, but... by msauve · · Score: 1

      ...as long as you remember to do an occasional FRE("") in your code.

      --
      "National Security is the chief cause of national insecurity." - Celine's First Law
    3. Re:But, but, but... by Richard+Steiner · · Score: 2, Informative

      One advantage of many airline online transaction systems: An applications programmer cannot do a malloc equivalent.

      Programs are created with a fixed memory size, and complex applications are simply a series of program modules which pass data between each other via common memory areas or memory-mapped files.

      Memory leaks in such an environment are quite rare.

      --
      Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
      The Theorem Theorem: If If, Then Then.
    4. Re:But, but, but... by Anonymous Coward · · Score: 0

      Mods modded up this stale hackneyed joke "+3 funny," and modded down my comment to -1 because they don't want to hear a valid point?

      Put down the fucking crack, mods!

    5. Re:But, but, but... by Anonymous Coward · · Score: 0

      need to get more by putting up into himem...

  2. no data ever really by turkeydance · · Score: 1

    disappears.

    1. Re:no data ever really by U2xhc2hkb3QgU3Vja3M · · Score: 1

      ...begins to mumble lines from Blue Skies.

  3. Software error ... by gstoddart · · Score: 5, Informative

    You can make the argument that if the software allowed the operators to crash the system, it's a software fault.

    You can also make the argument that stuff like this should have been tested in parallel with the live system so this wasn't a possibility.

    I mean, my god, what are the change management and testing practices which allowed this to only be discovered in your real system?

    I've been around a few systems which had to do with aircraft ... and the rules and practices surrounding them are pretty paranoid and rigorous, because the stakes are so high. For an actual air traffic system I'm stunned this happened.

    I guess I'm not surprised, but I am stunned.

    --
    Lost at C:>. Found at C.
    1. Re:Software error ... by AlecC · · Score: 2

      The loose description sound like something not being garbage collected when it should have been. So no single change cause the problem. It might well have been caused by controllers playing with a new toy, in a way they would never do once it had settled in and testers would not do, It is difficult to observe heap leakage - even if you check free space after a run, it is not clear what the right value is.

      --
      Consciousness is an illusion caused by an excess of self consciousness.
    2. Re:Software error ... by Anonymous Coward · · Score: 5, Insightful

      No, no, no, no, no! The concept of garbage collecting is a reaction to poor coding practices and reliance on it is laziness. Software engineers responsible for real-time, public safety software should be capable of managing memory in their code!

    3. Re:Software error ... by Anonymous Coward · · Score: 0

      You can also make the argument that stuff like this should have been tested in parallel with the live system so this wasn't a possibility.

      It probably was, but the number of flights are going up exponentially. What the live system was handling during testing 5-10 years ago will be far less than what the software is expected to cope with today.

    4. Re:Software error ... by U2xhc2hkb3QgU3Vja3M · · Score: 5, Insightful

      Not only should they be "capable" of managing memory in their code, it should be part of the software design itself.

    5. Re:Software error ... by fahrbot-bot · · Score: 2

      I mean, my god, what are the change management and testing practices which allowed this to only be discovered in your real system?

      Don't know, probably just Government ineptitude. Let's ask free-market leaders how they handle things: Toyota (brake/accelerator pedals) or Chrysler / GM (remote access) or Boeing (Li-ion batteries) ... -- oh wait.

      --
      It must have been something you assimilated. . . .
    6. Re:Software error ... by dwpro · · Score: 4, Insightful

      Software engineers responsible for real-time, public safety software should be capable of managing memory in their code

      And surgeons responsible for cutting open live human beings should be capable of not leaving tools in the person they're operating on, but it still happens. Professionals make mistakes. Garbage collection is a useful tool to make it more difficult to screw up.

      --
      Millions long for immortality who do not know what to do with themselves on a rainy Sunday afternoon. -- Susan Ertz
    7. Re:Software error ... by jamstar7 · · Score: 5, Insightful

      Couple things to keep in mind.

      The civilian aircraft control system has been chronically underfunded for decades, since Reagan fired PATCO. One of the things they were on strike for was for better equipment to do their jobs better, easier, and with less stress. Even in the 80's, the computers and radars were dinosaurs best kept in a museum. Upgrades since then have always been a day late and a dollar short.

      The airspace above the US is the busiest in the world, and it's just getting worse. They don't even report near-misses anymore to the media unless the pilots can see each other giving them the finger. They're that common.

      Nothing will be done until 3 or 4 planes do a mid-air and the public outcry is so bad that people are ready to march on the FAA's office with torches and pitchforks. Then there will be a massive round of public firings to appease the crowd, a slight boost in funding to the FAA, followed by further deregulation of the airlines.

      Personally, with all the deregulation already, I'm surprised more planes don't shed parts along the way.

      --
      Understanding the scope of the problem is the first step on the path to true panic.
    8. Re:Software error ... by Archangel+Michael · · Score: 1

      humans make mistakes. Get over it. Even when we try not to, we still make mistakes. And also, when shit goes horribly wrong, humans make incredible decisions that are crazy insane that defy logic, that allow people to live when nobody should, even when no human mistake actually occurred.

      Sully Landing the plane on the Hudson is a great example. Probably the only way that scenario works is to land the plane there. Right guy, in the right place, thinking crazy thoughts in an emergency.

      --
      Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
    9. Re:Software error ... by SpeedBump0619 · · Score: 4, Informative

      Professionals make mistakes. Garbage collection is a useful tool to make it more difficult to screw up.

      I get this. And as a software engineer I fully agree. However, in practical terms, there shouldn't be any dynamic memory management happening at all.

      It's a real-time system. It *must* interact, on time, with all the planes that are in it's domain. That should be a bounded, predictable load, or there's no way to guarantee responsiveness. Given that, an analysis should have been done on the maximum number of elements the system supported. Those elements should have been preallocated (into a pool if you want to treat them "dynamically") before actual operation began. If/When the pool allocator ran out of items it should do two things: allocate (dynamically) more, and scream bloody murder to everyone who would listen regarding the unexpected allocation.

      This is (one of) the reason(s) I generally haven't liked garbage collected languages for real time systems. There's rarely ever a way to guard against unexpected allocations, because *every* allocation is blind.

    10. Re:Software error ... by Anonymous Coward · · Score: 0

      FAA regs are very rigorous. If someone thinks seven years is a lot of time to keep records, the FAA demands 50 years of retention.

      With how critical this is to life safety, this isn't unwarranted, and in general, the FAA is known to be quite competant.

      The sad thing, proper programming means which produce code have been known for decades. The reason why they are not put into practice is simple: Programming and doing proofs of all states software can wind up in are expensive, and with the prevailing attitude of "the second it builds successful, it gets shipped" philosophy that is endemic to the computer industry now, it will take a multi-plane mid-air disaster to prod Congress [1] and other parties into having provable software standards [2].

      [1]: The FAA is doing its best. It is Congress that needs to cut some pork and open the purse strings.

      [2]: It isn't cheap, but Ada was developed as a language where one could prove without a doubt what states a program can end up in. Maybe it is time to return to a programming language centered around security and stability for life-safety tasks, as opposed to throwing whatever is the cheapest solution proposed.

    11. Re:Software error ... by Anonymous Coward · · Score: 5, Insightful

      Pretty sure the defined time frame of "since Reagan fired PATCO" involves being critical of every administration since then that hasn't resolved the issues. I suppose one wouldn't want basic reading comprehension to get in the way of a good partisan knee jerk lash out though.

      Americans are fucking weird.

    12. Re: Software error ... by Anonymous Coward · · Score: 0

      You are crazy. Programming languages like Java, Python, etc were designed so that memory leaks and buffer overruns are drastically reduced if not completely eliminated entirely.

      I dread having to go back to "older" languages since my productivity would drop significantly. Let the developers of languages worry about memory management, let developers that use those higher order languages worry about their problem space: keeping planes safely in the air in this case.

    13. Re:Software error ... by Rei · · Score: 5, Insightful

      So, I actually am a programmer for an ATC system...

      First off, this isn't as bad as it sounds as far as safety goes. One first needs to ask themselves, "what is the purpose of an ATC system?". The simple answer is, "don't ever let two aircraft exist in the same location at the same time". So any two aircraft can be separated in a) time, b) location, or c) altitude, and so long as they meet the minimum safety distances, that's all okay. Complicating this is the great variety of hardware on the aircraft, communications methods and protocols, and gaps in the information available to you, plus the wide variety in ATC systems and how they talk to each other. And there's a lot of potential instability at each stage. So basically ATC systems are massive collections of "special cases" that need to be handled on top of the basics. Maybe some line in Denmark is garbling messages that lead to you being fed bogus data. Maybe some aircraft in India's buggy hardware is for some reason spamming everyone on the network. Maybe you've got two different systems handling radar data and one says the radars are all fine and the other says they're not. Maybe the aircraft says they were at X point at Y time but some radar says something different. These are the sorts of things we have to deal on a weekly if not daily interval, and they lead what seems like it should be very simple pieces of software to become really huge systems.

      As mentioned, there can be lots of instability. Yep, it's true, these things can be rather buggy - both hardware and software. They're usually old designs that may have been poor design from the beginning, but have had to be continually patched and patched over the course of decades. Don't like that? Throw some more funds in for new ATC systems designed from scratch, otherwise this is going to continue to be the reality (yeah, new subsystems do come in every now and then for various purposes, but old systems are slow to go away).

      So, instability and bugs can sound scary. But remember the goals of an ATC system: separation. So let's just say that you lose the whole system for a long time - what do you do? Well, you basically revert to paper, and you've got a LOT more phone calls to make. You have to allow for more separation, and because of the increased workload, you can't handle as many planes. So you have to greatly reduce the number of planes in your region - they have to divert or wait. It's big delays, which costs big money. But it's not like we just start guessing whether planes are going to run into something or not.

      Our software here is predominantly old C code with a little bit of C++, and miscellaneous like yacc and lex. There are changelog entries dating back to the 80s - though that's the manual changelogs, it didn't go under revision control until the late 90s. Its core uses macros to an annoying degree to emulate object-oriented design in C; macros can be nested dozens of layers deep. It makes bugs very hard to find sometimes, but it's the core of the software, so it's not something that can be easily changed. So we do our best. Yes, there are "WONTFIX" bugs that we know about, and operators have documented procedures for working around them (usually involving restarting some module - the system is very modular, you don't have to restart the whole thing to fix a part that's acting up). But we always prioritize fixing the things that get in the way of their work the most - there's a lot of direct back and forth. Again, safety always takes top priority, then throughput. Everything else is way down below on the priority list.

      Changes work through the following process. A report of a bug or feature request is made. Someone analyses it and if they think it's worth working on writes up a task and assigns it to a programmer. The programmer works on the task and when they think it's ready they submit it for code review. Another programmer looks through all of the code and tries to see if they have any complaints. After any necessary back and forth to get things r

      --
      "99 dead duelists of Dios on the wall. 99 dead duelists of Dios! Take one's ring, pass it around..."
    14. Re:Software error ... by operagost · · Score: 2

      As far as safety goes, the private airlines are heavily regulated. And maintenance is not under the purview of the air traffic controllers, so this is a red herring and "shedding parts" is mere hyperbole in any case.

      If the systems were allegedly "dinosaurs" in the 1980s, I would think they'd be causing "mid-airs" on a regular basis right now. That they are not tells me that the systems have been upgraded.

      Reagan went into office SUPPORTING PATCO. They actually endorsed him over Jimmy Carter, who had ignored them. But they decided to test Reagan after only 7 months in office but illegally striking per Federal law. They certainly had concerns, but striking is illegal. These are the realities. There are now two organizations representing controllers, so they are by no means unrepresented.

      --

      Gamingmuseum.com: Give your 3D accelerator a rest.
    15. Re:Software error ... by Anonymous Coward · · Score: 0

      i hope you have an equal distrust of malloc/free virtual memory, and tricks like auto-extending stacks.

      in fact anything except compile time allocation

    16. Re:Software error ... by Rob+Riggs · · Score: 3, Interesting

      Surgeons leave tools in patients because they have no process when operating on a patient. Read the Checklist Manifesto sometime and read what the author has to say about best practices in the operating room. Everyone makes mistakes. The process we follow is what allows us to catch those mistakes, and prevent any mistakes from re-occurring.

      --
      the growth in cynicism and rebellion has not been without cause
    17. Re:Software error ... by Anonymous Coward · · Score: 1

      Using garbage collection is the screw up.

    18. Re:Software error ... by operagost · · Score: 2

      So, when Obama and the Dems controlled everything, when Clinton and the Dems controlled everything, and neither of them fixed the problem you describe as being "Reagan fired PATCO" (which has little or nothing to do with the code and systems today, being 30+ years ago), you still chose to blame something that can only be described as tangential to the current problem.

      Indeed, Clinton had an opportunity to help loosen the union restrictions in Taft-Hartley (passed with veto override by both parties in 1947 and subsequently used by the President who vetoed it), but he did not do so. And, of course, neither has Obama as you stated. I guess Obama needs a third term to have time to get things like this done.

      --

      Gamingmuseum.com: Give your 3D accelerator a rest.
    19. Re:Software error ... by tomhath · · Score: 2, Informative

      The civilian aircraft control system has been chronically underfunded for decades, since Reagan fired PATCO.

      Reagan initiated and appropriately funded a complete overhaul of the control system.

      The illegal strike by the air traffic controllers is irrelevant.

    20. Re:Software error ... by Anonymous Coward · · Score: 0

      I wouldn't consider Clinton, her husband or Obama left wing, more like "Reagan Lite" for a more moderate flavor of objectivist conservatism.

      Of course by the current Randian standards, anything to the slight right of center is immediately branded leftist.

      Of course I'm tempted to vote for Rand Paul just because I want to see those food-stamp conservatives get there entitlements cut.

    21. Re:Software error ... by fahrbot-bot · · Score: 1

      humans make mistakes. Get over it.

      Agreed. Which, of course, was my point. Should I have included a "/sarcasm" tag?

      --
      It must have been something you assimilated. . . .
    22. Re:Software error ... by Anonymous Coward · · Score: 0

      You clearly don't know what you are talking about. Bro, you do even code?

    23. Re:Software error ... by Anonymous Coward · · Score: 1

      Right guy, in the right place, thinking crazy thoughts in an emergency.

      Actually, they weren't crazy thoughts, they were *exactly the thoughts he was trained to have,* because pilot training includes things like handling emergency unpowered landings.

      The fact that he was able to execute it so well is, again, a testament to the amount of rigorous training and practice that pilots experience.

      It wasn't some "hey here's a crazy thought, I'll land on the river, nobody's ever done this before!" It was a "hey, my training says that if you lose power to both engines, you try to find a flat, open area where you can glide the plane in and ditch it safely. And if that flat, open area happens to be water, you try to ditch it near boats, to maximize the chance that you can be rescued quickly." These were all intentional, deliberate actions the pilot took - it doesn't mean they always work out well for every pilot who tries them, and the pilot was absolutely highly skilled to bring in an unpowered A-320 to a water landing with no major loss of life, but to presume that Capt. Sullenberger was the only person capable of thinking of that response betrays a shocking ignorance of exactly what it takes to become a commercial pilot.

    24. Re: Software error ... by Anonymous Coward · · Score: 2, Insightful

      Nice job trying to dodge responsibility, right wing whacko. Your side is so busy funneling money to their big contractor buddies for stuff that makes news when it actually works (see, pretty much every defense program ever). They then blew off concerns from people doing actual work to make a political point, and your complaint is that the other side can't fix your screwups fast enough?

      How about your people stop sabotaging agencies that do useful things just do they can say that government never does anything right? (Because if government does do something right, conservatives will jump in and fix that for you.)

    25. Re:Software error ... by Anonymous Coward · · Score: 0

      This is an insightful, lucid, well structured comment that offers a good perspective of the actual development practices of people doing ATC work.

      I predict that you will end up shouted down and modded to a "-1, Troll".

    26. Re:Software error ... by phantomfive · · Score: 4, Informative

      You are trying to be sarcastic, but the MISRA standard for embedded systems includes these rules:

      1) absolutely no recursion. it could lead to stack overflows.
      2) absolutely no local variables. it could lead to stack overflows.
      3) absolutely no use of of malloc or free. it could lead to stack overflows.

      So yeah, that has been an accepted approach for many years.

      --
      "First they came for the slanderers and i said nothing."
    27. Re:Software error ... by Anonymous Coward · · Score: 0

      Doing as you say suggests it's simply an array of objects representing each plane. The reality is there'll be many data structures which interact with each other. Preallocating everything is then no longer feasible.

      And just because you're preallocating space doesn't mean you can't have memory leaks. You can run out of slots because you've failed to tag it as no longer in use. Exactly the same as running out of memory because you failed to call free. All you're doing is reimplementing the memory stack that malloc/free manage, and in doing so bringing in additional code that's likely to contain bugs that malloc/free do not. You're also making it more likely that A will run out of space because it's allocated to B when B doesn't actually require it.

    28. Re:Software error ... by Anonymous Coward · · Score: 1

      You are letting your bias show. You right wing blow hard. He blamed all admins since then.

    29. Re:Software error ... by Anonymous Coward · · Score: 0

      Actually, it sounds like NOT using garbage collection was the screw up -- hence the memory leak(s).

      Yes, I know that even with a garbage collector, you can leak memory (by not de-referencing the memory), but that is an entirely different class of problem. It is not 'leaked' memory if your object is alive by being referenced by another object (ad infinitum). Basically, in garbage collected languages, you get memory de-allocation for free, you actually have to go out of your way to retain references to actually cause a leak.

      Contrast that to say 'C' and other low-level languages: you have to go out of your way to 'free' memory, even when a static-code analysis can prove that the memory can never be used (after a return from a method for example), you still have to explicitly call free() or you suffer a memory leak.

      So...which is more likely: the FAA is using a low-level language that doesn't support garbage-collection, or they are using one of the newer languages and went out of their way to OOM?

    30. Re:Software error ... by stabiesoft · · Score: 1

      Umm, valgrind

    31. Re:Software error ... by Anonymous Coward · · Score: 1

      Next time a recruiter contacts you, tell him you're looking for $200k (push up all our salaries).

      Why would we want to drive down salaries in the Silicon Valley?

    32. Re:Software error ... by jittles · · Score: 2

      Software engineers responsible for real-time, public safety software should be capable of managing memory in their code

      And surgeons responsible for cutting open live human beings should be capable of not leaving tools in the person they're operating on, but it still happens. Professionals make mistakes. Garbage collection is a useful tool to make it more difficult to screw up.

      Until the entire air traffic system grinds to a halt at the same time every day while java garbage collects everything. No, garbage collection is not the answer. There are more performant ways to manage memory.

    33. Re:Software error ... by phantomfive · · Score: 4, Informative

      Garbage collection is a useful tool to make it more difficult to screw up.

      Recently I've seen a lot of memory leaks in Java and Javascript. People stick things in a hash table or a queue, then forget to remove them (angular.js also has gotchas to watch for avoiding memory leaks). Because programmers in those languages don't think about memory, they end up with more memory leaks than programmers in C.

      For a system that needs high reliability, garbage collection is not the answer, and can make things worse.

      --
      "First they came for the slanderers and i said nothing."
    34. Re:Software error ... by Anonymous Coward · · Score: 2, Insightful

      The reading from one such as this stops when they see a negative comment about Saint Reagan.

    35. Re:Software error ... by Minwee · · Score: 2

      Only because there is no moderation for "Wrong Website For That Kind Of Thing".

    36. Re:Software error ... by phantomfive · · Score: 1

      No prob, if you live in Silicon Valley, ask for $500k. Push up all our salaries.

      --
      "First they came for the slanderers and i said nothing."
    37. Re:Software error ... by Anonymous Coward · · Score: 0

      Actually ATC systems are, at most, soft real time. The nearest thing we have to hard real time is surveillance (i.e. radars etc, downstream processing and display), which has an end-to-end latency of a few seconds. The controller uses a GUI, and that GUI has to be fast enough to feel responsive. But the sort of real-time process control you are talking about, with embedded processors and hard deadlines measured in milliseconds, is just not anywhere to be seen.

      ERAM is (mostly at least) written in Ada but that just reflects the culture and economics of ATC software development, where 40-year product lifecycles are not uncommon. There are operational ATC systems written in Java that use GC and they work just fine.

      As for GC in hard real time, it depends on just how hard. There are process control applications where the latency between sensor and actuator is a key part of the stability analysis, and I wouldn't want to use GC for those. But take a look at the next "hard real time" application you see and ask yourself if a missed deadline once per hour is going to be a practical issue. For a surprising number the answer is "no".

      BTW, preallocating elements in a pool doesn't solve the allocation problem, it just renames it. Its quite possible that this is exactly how ERAM worked, but deallocating the slots from this particular pool wasn't happening.

    38. Re:Software error ... by Triklyn · · Score: 1

      one question...

      i can understand 4 planes having midair collisions... you know 4 in total.

      2 separate midair collisions...

      but in what possible universe do you imagine 3 planes colliding in total.

    39. Re:Software error ... by Anonymous Coward · · Score: 0

      OK, so how do you know that the problem wasn't that they had preallocated a pool of these objects, and when the pool ran out the failure was that the system was "screaming bloody murder"?

      dom

    40. Re:Software error ... by Anonymous Coward · · Score: 1

      People who haven't done real-time, human-life-at-risk, development just don't understand.

      Don't use dynamic memory when a human life is on the line.

      It is inconvenient, but necessary. Even the best teams make memory mistakes. This is a team activity. Testing will never find all memory issues. Garbage collection - never used it myself. Seems slow.

      When you personally know the people at risk, I suspect you are even more careful with the code. Not to forget that national pride is also at risk.

    41. Re:Software error ... by Anonymous Coward · · Score: 0

      ... who is this troll?

    42. Re:Software error ... by Anonymous Coward · · Score: 0

      Easiest way to fix this is implement your own stack.

    43. Re:Software error ... by Impy+the+Impiuos+Imp · · Score: 1

      Bugs in car software that don't crash the car can still crash the car if they distract the driver.

      Even normal features in other contexts can cause such things, like cranking up the volume before playing a CD, then the CD blasts, distracting the driver, either in immediate scare, or as they fumble for the volume in a panic.

      No, this was bad and a terrible mistake on their part.

      --
      (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
    44. Re:Software error ... by swillden · · Score: 3, Insightful

      No, no, no, no, no! The concept of garbage collecting is a reaction to poor coding practices and reliance on it is laziness. Software engineers responsible for real-time, public safety software should be capable of managing memory in their code!

      Garbage collection is a red herring. The notion that "real" software engineers must use manual deallocation is just as silly as the idea that garbage collection eliminates memory leaks. Though GC actually does eliminate dangling pointer bugs... by turning them into memory leaks.

      Garbage collection is a viable and reasonable strategy for handling deallocation -- in fact it can be significantly more efficient than manual deallocation, in terms of cycles spent on deallocation -- but it's not a panacea. It doesn't eliminate the need to think about object lifetimes or memory consumption. It reduces the amount of development effort focused on those issues, trading it instead for management of GC times. Whether that tradeoff is a net benefit depends on the context and system requirements.

      And that is what real software engineers do. They don't choose their tools based on which is the manliest and best for proving their coding prowess. They choose based on the nature of the problem and the resources available. Where GC interruptions can be tolerated, or safely scheduled, GC is a tool that automates away significant engineering effort. That's a good thing. Hard real-time systems generally don't tolerate GC very well, but virtually anything that interacts with people does tolerate brief (50 ms or less) GC pauses, and that's actually quite easy to achieve.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    45. Re:Software error ... by swillden · · Score: 1

      3) absolutely no use of of malloc or free. it could lead to stack overflows.

      You mean heap exhaustion. Use of malloc and free cannot cause stack overflows.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    46. Re:Software error ... by phantomfive · · Score: 1

      Yes.

      --
      "First they came for the slanderers and i said nothing."
    47. Re:Software error ... by MobSwatter · · Score: 1

      Funny how this morphed into a political shit slinging contest.

      Perhaps no one bothered to relate that the NYSE was down at the same time, or the aspect that there was a sunspot that produced a CME that arrived at earth a day earlier than expected, or that the earths magnetic field is weakening at the moment and happens to be the thing protecting us from CME's...

    48. Re: Software error ... by ZiggyM · · Score: 1

      malloc and free do not cause stack overflow but heap memory full errors.

    49. Re:Software error ... by PuckSR · · Score: 2

      It was relevant because it was one of the issues highlighted at the time.
      Also, Reagan announced the overhaul as a RESPONSE to the strike, but it wasn't given the type of fast-track authorization that would have made it useful

    50. Re:Software error ... by turbidostato · · Score: 2

      "but in what possible universe do you imagine 3 planes colliding in total."

      I would bet that, within civil aviation, it's easier to have three planes colliding mid-air than just two or, at least, three involved with two crashing and a near-miss.

    51. Re: Software error ... by phantomfive · · Score: 1

      Yes, my mistake.

      --
      "First they came for the slanderers and i said nothing."
    52. Re:Software error ... by DomNF15 · · Score: 1

      Also, garbage collecting doesn't help for cases of handle or GDI object leaks (.NET), which can also crash the system with out of memory exceptions. I probably disagree to some extent with the concept of garbage collecting being a reaction to poor coding practice, how annoying is it to have to write delete or delete [] every time you allocate, or for that matter initialize a bunch of variables to 0 manually every single time?

    53. Re:Software error ... by Anonymous Coward · · Score: 0

      Seriously? You really need to review your GC knowledge. A lot have changed now that we can perform things in parallel.

    54. Re:Software error ... by Anonymous Coward · · Score: 0

      Sounds like the both of you are blaming a language feature when lack of proper testing is the culprit. Memory profiling is recommended for all long duration processes regardless if the language provides garbage collection.

    55. Re:Software error ... by phantomfive · · Score: 1

      Sounds like the both of you are blaming a language feature when lack of proper testing is the culprit.

      It's impossible to test every scenario. There are too many. Dijkstra pointed out that it's better to avoid bugs in the first place.

      Testing is still helpful, though.

      --
      "First they came for the slanderers and i said nothing."
    56. Re:Software error ... by LWATCDR · · Score: 1

      Actually it was a partisan attack.
      "The civilian aircraft control system has been chronically underfunded for decades, since Reagan fired PATCO. One of the things they were on strike for was for better equipment to do their jobs better, easier, and with less stress. Even in the 80's, the computers and radars were dinosaurs best kept in a museum. Upgrades since then have always been a day late and a dollar short."

      From the statement you can see that the attack is in the "statement since Reagan fired PATCO"
      Yet he also states that one of the reasons for the strike was the outdated equipment they were using so the lack of funding would actually predate the firing of the members of PATCO.

      It is logical that one partisan attack would inspire others.

      BTW yes the ATC system has had issues for decades with a really botched upgrade. All political parties are can share the blame.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    57. Re:Software error ... by Anonymous Coward · · Score: 0

      You are surprised? Try reading the NTSB accident reports starting back in the 50's. They had some bad crashes and the NTSB figured out the reasons and made good suggestions to the FAA which has greatly improved all of aviation, but especially in the US. Gee: who came up with the crazy idea to have an independent investigative organization to watch over the FAA.
      PS: one of the early 80's crashes involved an FAA field director trying to take off in a DC-3. He forgot or probably didn't know you had to lock the tail wheel or the DC-3 was difficult to steer, especially at spend. Like take off: crashed and burned: about 5 or 8 dead.

      The AC systems they have now are fantastically improved. Wind shear is still a problem along with CAT upsets, which will greatly increase with climate change (404ppm CO2 now). A stewardess was just reported to have suffered serious back injury from a "jolt" because she struck the ceiling in the cabin.

      Keep you seat belts on.

    58. Re:Software error ... by rrr00bb5454 · · Score: 2

      But there is also the issue of some reasonable level of proof that the code is robust; akin to the assurances you get from a good compiler that the machine language behaves like the source code. If you work within truly large C code bases (I estimate that I'm on one right now), the completely manual approach is just not good enough. Garbage collection isn't the only answer of course, but tooling is essential. In the future, higher languages are definitely going to play a role. C/C++ aren't keeping up with changes being created by multi-core. Innovations like LLVM help to keep making progress, but ultimately, embedded systems are going to look something like Rust while everything else is going to move up to a higher level abstraction. The abstraction just has to be high enough that we can get away from compilers being utterly blind, where we can ask the compiler if code is memory safe or conforms to protocols in its interfaces. (See Coq related projects producing subsets of C that can be proven correct)

    59. Re:Software error ... by Archangel+Michael · · Score: 1

      The civilian aircraft control system has been chronically underfunded for decades, since Reagan fired PATCO. One of the things they were on strike for was for better equipment to do their jobs better, easier ...

      So, the problem existed before Reagan, and after Reagan, being "chronic" but we only name Reagan because .... ???

      --
      Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
    60. Re:Software error ... by geoskd · · Score: 2

      delete or delete [] every time you allocate, or for that matter initialize a bunch of variables to 0 manually every single time?

      It's really not that difficult unless you are not very clear on how your memory is being used and how the algorithms around it perform. With a clear understanding of the system being designed, cleaning up your memory allocation is a no brainer. The only time it presents anything remotely complicated is when the programmer doesn't understand the system they are working on. Under those circumstances, I would posit that the programmer should probably not be programming that particular project. I am well ware that stringent guidelines to that effect would knock 50% or more of the programmers out of the pool, and I am OK with that. Anyone with any sense at all should be OK with that, especially for safety critical systems.

      --
      I wish I had a good sig, but all the good ones are copyrighted
    61. Re:Software error ... by Richard+Steiner · · Score: 1

      No, no, no, no, no! The concept of garbage collecting is a reaction to poor coding practices and reliance on it is laziness. Software engineers responsible for real-time, public safety software should be capable of managing memory in their code!

      Meh. Real-time public safety software should not be running in an environment which allows an application to arbitrarily request memory.

      I've written code for multiple mainframe online transaction environments and in UNIX C/C++ and Java environments. Guess which one is more problematic when it comes to memory leaks and other similar issues?

      I'm hardly suggesting that a mainframe environment is a viable answer, but the approach that such environments take was taken for a reason. Controlled environments are far less prone to programmer stupidity than uncontrolled ones.

      --
      Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
      The Theorem Theorem: If If, Then Then.
    62. Re:Software error ... by AmiMoJo · · Score: 1

      With systems like this you generally don't do much memory management, in fact. You statically allocate everything and then you only have the stack to worry about. It's safer that way because then there is no chance of memory leaks, or running out of memory due to allocating too much. If your code is careful to avoid recursion then the maximum stack allocation is predictable as well.

      Most embedded systems operate that way, as well as safety critical systems. In another comment Rei seems to suggest that this might not be the case for ATC software, which I find surprising and slightly worrying.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    63. Re:Software error ... by umghhh · · Score: 1

      there are many things that should be part of software design - communication techniques, decision making, decision making in a group, project management techniques, product life cycle, maintenance, fault management and few others. If one concentrates on memory management one can teach about manual ways, semi manual ways and garbage collector that contrary to what some think is not some Aztec god that does things automagically but something that usually counts references thus can release your objects only if they are not linked from anywhere. GC has also consequences on performance and ways to tune it other than hoping it works.
      OTOH that is probably pointless anyway and if all software was maybe not first time right but say 3rd time right than the world would be less funny.

    64. Re:Software error ... by ChrisMaple · · Score: 1

      The idea that the computing power required to keep track of, and report dangers to, all the aircraft in a region is more than could be handled by a 1985 PC is preposterous. There just isn't that much going on and it isn't happening all that fast.

      At any one time, there are roughly 5500 maximum IFR flights aloft for the whole US, so any region will have a considerably smaller number. Updating status once a second means an aircraft's position might have changed by 800 feet. If this can't be handled automatically and reliably, something is well and truly screwed that can't be fixed by throwing money at hardware.

      --
      Contribute to civilization: ari.aynrand.org/donate
    65. Re:Software error ... by Anonymous Coward · · Score: 0

      Microsoft based system: software error
      Apple based system: user error

      In reality: hardware error.

    66. Re:Software error ... by Anonymous Coward · · Score: 0

      a. don't stick java and javascript into the same sentence. Apple -n- oranges (more of Apples and sour grapes).
      b. Hashtables and such in java have a graceful way of handling overflows, yes, an exception is likely involved, but typically don't crash a app like a C/C++ hashtable would.

    67. Re:Software error ... by Anonymous Coward · · Score: 0

      but in what possible universe do you imagine 3 planes colliding in total.

      3 or 4 collision incidents, not necessarily at the same time. The phrasing of the post in question supports this. Your comprehension is the issue.

    68. Re:Software error ... by Anonymous Coward · · Score: 0

      Just the ones carping on Slashdot... If they bothered to register, it's even worse.

    69. Re:Software error ... by Anonymous Coward · · Score: 0

      The concept of garbage collecting is a reaction to poor coding practices and reliance on it is laziness.

      Talk about over generalizing. Maybe in real-time systems, but in general, garbage collections can not only reduce the occurrence of many types of memory leaks(not all), but also make memory allocations quicker and increase throughput. There are certain allocation optimizations you can do when your heap is compacted.

      Next up, C is a horrible language because you should know exactly what instructions are being compiled at all times!

    70. Re:Software error ... by Anonymous Coward · · Score: 0

      Heh, I worked on many (mostly government) systems back in the 90's that did the same thing with C and macros. I hated working with that code so much. It just made no sense when straight C would have worked better in every conceivable way. Shoving OO into macros literally made every part of the development worse. Especially back in those days when macros were buggy and lacked many of the features of modern macros.

      From what I saw, the people that designed those systems had long forgotten how to even write software (maybe they did some COBOL or FORTRAN decades previously) and were just hyped up on the OO buzz. They were determined to use it no matter if it was a good idea or not. They didn't trust C++ compilers so just hammered the OO screws in with C at any cost.

      I actually rewrote several of those systems in Perl if you can believe it. It was a fraction of the amount of the original code, ran faster, was way easier to maintain, and only took a couple days to implement. (insert Perl joke here but it was pretty much the only scripting language back then outside of shell languages and I swear it is possible to write clean Perl if you want to)

    71. Re:Software error ... by Anonymous Coward · · Score: 0

      Maybe Surgeons need a garbage collector. For marketing purposes it could be called an instruments collector.

    72. Re:Software error ... by Anonymous Coward · · Score: 0

      "Absolutely no use of local variables" sounds like a recipe for concurrency bugs and for extra bugs caused by non-readability / non-maintainability of the code.

    73. Re:Software error ... by Anonymous Coward · · Score: 0

      Just declare everything as static final so it doesn't get garbage collected.

    74. Re:Software error ... by phantomfive · · Score: 1

      "Absolutely no use of local variables" sounds like a recipe for concurrency bugs and for extra bugs caused by non-readability / non-maintainability of the code.

      It means greater responsibility is on the programmers to keep their code clean. You're not going to get it done with cheap code-monkeys, you'll need to pay for people who understand how to program.

      It can be done, however.

      --
      "First they came for the slanderers and i said nothing."
    75. Re:Software error ... by TapeCutter · · Score: 1

      Better process are the only answer, they can minimise the harm from mistakes, but they will only catch foreseeable mistakes. Some mistakes are so bizarre that they are not foreseeable, eg: I live alone and the other day I caught myself putting the soy-sauce bottle in the microwave, I meant to put it the fridge but didn't realize my mistake until I started closing the microwave door. Auto-pilot told me the next step after closing the door was to set the timer, this meant I was forced to think about what I was cooking and for how long. At that point auto-pilot switched off and I was left wondering why I was about to nuke the soy-sauce, auto-pilot happens to all of us, there are very few experienced drivers who haven't run the daily commute on it for the entire journey

      --
      And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
    76. Re:Software error ... by Anonymous Coward · · Score: 0

      I wrote some code for the FAA several years ago. We put a lot of effort into making a simple, correct design, and the resulting code was well received by operators.

      Later the FAA hired another company to "upgrade" our system. Gave them huge wads of cash, orders of magnitude more than we had. Our source code was also made available. New company completely ignored it, our offers of design support, lessons learned, etc. NIH, clean slate design.

      Huge dev team. Impressive presentations. Massive budget overruns. Major bugs.

      My code is still running reliably at some large international airports. The new code is slowly shaking bugs out at other airports.

      Cash and clean slates are not a cure-all. Technical competence is also required (and in short supply). The problems start with the FAA's intentionally byzantine bureaucracy. It repels efficiency and hence competence.

      We had a fairly agile dev process, with everyone involved in analysis and programming. The other company had a "throw it over the wall" process like you describe. We had a semi-automated testing process, with the ability to reprocess years worth of data and flag anomalous discrepancies for human inspection.

    77. Re:Software error ... by Anonymous Coward · · Score: 0

      "Recently I've seen a lot of memory leaks in Java and Javascript."

      No you haven't. By definition, you have not. You might have seen someone leave something in scope unnecessarily but that memory is NOT leaked.

    78. Re:Software error ... by Anonymous Coward · · Score: 0

      "Though GC actually does eliminate dangling pointer bugs... by turning them into memory leaks."

      Give us an example of a dangling pointer being turned into a leak in Java. In other words show me a pointer that points to deallocated memory and some memory that's allocated but cannot be pointed to any more (because I don't know where it is).

    79. Re:Software error ... by phantomfive · · Score: 1

      What definition are you using?

      --
      "First they came for the slanderers and i said nothing."
    80. Re:Software error ... by dave420 · · Score: 1

      I think he defines a memory leak as where you create and destroy a variable and have more used memory than before. Which is not too bad of a definition, I guess.

    81. Re:Software error ... by allcoolnameswheretak · · Score: 1

      No, no, no, no, no! The concept of garbage collecting is a reaction to poor coding practices and reliance on it is laziness.

      Not really. The concept of garbage collection is a simple one - if an object is no longer referenced anywhere, free the memory.
      This makes perfect sense and also frees the developers mind from making sure everything is always deallocated. Thereby eliminating a huge amount of potential errors, such as premature memory deallocation.

      But what many novice programmers don't realize is that you must still manage your memory in a garbage collected environment. Objects that are no longer needed have to be removed from collections - all those references must be cleared so that the memory will be freed.

      So I agree with the second part:

      Software engineers responsible for real-time, public safety software should be capable of managing memory in their code!

    82. Re:Software error ... by phantomfive · · Score: 1

      I've seen some people who say that when you leak memory in C or C++, it is worse than in Java, because when you leak it in C or C++, the OS can't get the memory back when the program quits.

      If you create a variable, use it to attach the object to the DOM, then destroy the variable, you've just created a memory leak in Javascript. Swing commonly has memory leaks where the framework keeps track of objects that have been forgotten by the rest of the program.

      And let's be honest, with memory leaks in C, every single allocation is remembered, with its size known. Malloc and Free have their own data structure keeping track of all that.

      --
      "First they came for the slanderers and i said nothing."
    83. Re:Software error ... by jittles · · Score: 1

      Seriously? You really need to review your GC knowledge. A lot have changed now that we can perform things in parallel.

      I've worked on a project with a 16 core CPU and 196GB of RAM handling billions of records of data. The system would literally grind to a halt for 5 minutes every single day while java garbage collection occurred. And you had no control over when it would occur. This was not even 2 years ago so if you're referring to multithreaded programming then NOTHING has changed. If something has changed in Java itself, then you may have a valid point.

    84. Re:Software error ... by jittles · · Score: 1

      That should be 16 CPU cores, not a 16 core CPU. It was 4 quad cores with hyperthreading. I'm still waking up (16 physical cores, 16 hyperthreading cores).

    85. Re:Software error ... by swillden · · Score: 1

      "Though GC actually does eliminate dangling pointer bugs... by turning them into memory leaks."

      Give us an example of a dangling pointer being turned into a leak in Java. In other words show me a pointer that points to deallocated memory and some memory that's allocated but cannot be pointed to any more (because I don't know where it is).

      A dangling pointer in a non-GC'd language is one which the programmer thinks will no longer be used and therefore deallocates, but which actually does get referenced later. Boom.

      In a GC'd language, the same scenario results in the object not being deallocated, so it's still around when it gets referenced later. No boom, which is good. However, this is arguably a form of memory leak because the programmer believes it should have been deallocated, and is assuming that the memory is available for other uses.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    86. Re:Software error ... by Smerta · · Score: 1

      ...the MISRA standard for embedded systems includes these rules: 2) absolutely no local variables. it could lead to stack overflows.

      ????

      Could you please cite the MISRA C 2004, C 2012, or C++ 2008 rule that forbids local variables? I don't remember such a rule.

      Also I'll make the distinction between a local variable (which could be static, and thus not on the stack) and an "automatic" variable, which is local and almost always allocated on the stack (but not required by the standard -- in fact, small automatics might only live in registers & not even use stack memory).

      While it's absolutely true that a large automatic object ("variable") can blow the stack, I don't recall MISRA forbidding such objects.

    87. Re:Software error ... by lsatenstein · · Score: 1

      Not only should they be "capable" of managing memory in their code, it should be part of the software design itself.

      And they should not allow for memory fragmentation.

      --
      Leslie Satenstein Montreal Quebec Canada
    88. Re:Software error ... by phantomfive · · Score: 1

      Sorry, I just looked through MISRA C 2004 and 1998 and couldn't find it anywhere. This was my original source. Looks like I should have been more careful.

      For completeness, on the other two rules:
      in MISRA C 2004, rule 20.4 forbids heap allocations, and rule 16.2 forbids recursion.

      --
      "First they came for the slanderers and i said nothing."
    89. Re:Software error ... by Bengie · · Score: 1

      GC still requires a stop world during compaction or any times references need to change. Certain parts of the GC can be done in parallel, like cleaning up dereferenced objects. The GC will eventually kick in and can cause jitter. Improved GCs will preemptively start to stop-world in smaller chunks than waiting until the last possible moment then grinding to a halt for a long time.

    90. Re:Software error ... by Bengie · · Score: 1

      when you leak it in C or C++, the OS can't get the memory back when the program quits

      Incorrect, but some truth. When a program quits, all of it's virtual memory is released, but any unmanaged resources like file descriptors could somehow be left in an opened state.

    91. Re:Software error ... by phantomfive · · Score: 1

      You mean if the program has been forked? That is not a memory problem, and it is not a language problem. Java will do the same thing.

      --
      "First they came for the slanderers and i said nothing."
    92. Re:Software error ... by SpeedBump0619 · · Score: 1

      Mostly I'd agree, but there are a few exceptions:

      1) no recursion. Except perhaps forms of tail recursion known to exit in bounded time. But you definitely have to ask yourself why you are doing something that could either be unrolled into a loop or has some kind of exponential growth potential.

      2) Local variables are fine so long as the analysis is done to guarantee the maximum stack requirement is pre-committed. I mean, realistically the return pointer is a stack variable, so just calling a function that returns would violate the "no local variables" rule. I wouldn't allow dynamically sized items, because then bugs might cause stack overflows...

      3) I agree that malloc and free are forbidden *during real-time operation*. However, in some situations you use dynamic allocation if you can pre-allocate all necessary elements prior to entering operational states. This really depends on whether your system *has* a pre-operational state,

    93. Re:Software error ... by rrr00bb5454 · · Score: 1

      Do you actually mean alloca() ... no dynamically sized local variables? Because that makes sense. I would say that it makes more sense to ban global variables than to ban local variables. Although, returning pointers to values that were once on the stack could be quite a hazard that might give me pause to ever use pointers to local variables.

    94. Re:Software error ... by phantomfive · · Score: 1

      The point is that with global variables, it's easy to know at compile time the maximum amount of RAM that will be used. If you program that way, you can guarantee to not run out of memory.

      Of course there are other ways to achieve the same goal, but that is one of them.

      --
      "First they came for the slanderers and i said nothing."
    95. Re:Software error ... by Anonymous Coward · · Score: 0

      Surgeons leave tools in patients because they have no process when operating on a patient. Read the Checklist Manifesto sometime and read what the author has to say about best practices in the operating room.

      You're a bit confused by that article. Nurses have been using checklists for decades. My mom talks about how when they can't find a sponge that's on the list, they have to Xray the patient even though they're pretty sure it just got lost in the garbage. The article you linked is about having a checklist for the surgeon and it having some very specific things like everyone introducing themselves. Keeping a list of every instrument and verifying were they all ended up is standard procedure in every OR I've ever heard of.

    96. Re:Software error ... by ebvwfbw · · Score: 1

      Reagan didn't fire them. They fired themselves. They signed a no strike agreement and they knew it. So does every federal government worker, so do the police for example. It's simple, you strike, you're fired. President (love him or hate him, whoever it is at the time) has absolutely nothing to do with it.

      Union boss didn't care, he wasn't an ATC professional. Like almost all the unions out there, they just collect a boatload of money for doing very very little. I know, I used to belong to a union. Big rip off.

  4. This is why we like C by Murdoch5 · · Score: 1

    So sloppy, untested, programming crashed the system. I'm willing to bet that if you take the hood off the system, it's written using high level languages instead of C.

    1. Re:This is why we like C by crashumbc · · Score: 1

      I kinda doubt that, My understanding is most of the US's air-traffic control systems (and software) is ancient .

    2. Re:This is why we like C by Anonymous Coward · · Score: 0

      Wut.

      There aren't any memory leaks when you write in C?

    3. Re:This is why we like C by Anonymous Coward · · Score: 0

      You clearly have never written C if you believe that sloppy, untested C code won't crash the system.

    4. Re:This is why we like C by bobbied · · Score: 1

      Where you *could* write a ATC system in C, why would you? Given that this system was envisioned and designed nearly 20 years ago, I have a feeling they used the accepted tools of the trade for the day and have upgraded to new platforms since.

      My best guess here is that what failed is the system engineering followed by failure to performance test. As you move from C++ applications to say Java, your memory and CPU requirements go up, way up. What may work GREAT in the lab, may consume way too many resources when put into production. My guess is the SE failed to specify the memory constraints and the system integration team didn't test them. Which is about par for the "government contract" course...

      --
      "File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
    5. Re:This is why we like C by AlecC · · Score: 1

      Many display systems are. It sounds like a heap problem to me. If you are building a display which only selects and monitors an underlying database, which may well be managed in C, it is plausible to use a higher level language, But C can have heap leaks, too.

      --
      Consciousness is an illusion caused by an excess of self consciousness.
    6. Re:This is why we like C by kelemvor4 · · Score: 3, Funny

      Wut.

      There aren't any memory leaks when you write in C?

      Maybe you have them, but I don't. That's because my C-peen is larger than yours.

    7. Re:This is why we like C by 0123456 · · Score: 1

      I kinda doubt that, My understanding is most of the US's air-traffic control systems (and software) is ancient .

      Somehow, I doubt it was 2,000,000 lines of assembly language.

    8. Re:This is why we like C by U2xhc2hkb3QgU3Vja3M · · Score: 3, Funny

      He said ancient, not precambrian.

    9. Re:This is why we like C by bobbied · · Score: 1

      Wut.

      There aren't any memory leaks when you write in C?

      Not me.. I never use malloc or free....

      --
      "File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
    10. Re:This is why we like C by Anonymous Coward · · Score: 0

      Given that this system was envisioned and designed nearly 20 years ago, I have a feeling they used the accepted tools of the trade for the day and have upgraded to new platforms since

      WAY older than 20 years. My money is on FORTRAN and Assembly (I'll bet they need to have old emulated stuff custom-built).

      In the early-to-mid 1980s, there was an effort to rewrite the ATCS in Ada. I believe the tab was on the order of $8B. I had friends that worked on it.

      Funny...you never hear about it. Wonder what happened?

      I think they will be taking another stab at it. Probably try Ada again.

    11. Re:This is why we like C by gstoddart · · Score: 2

      Well, whatever this was coded in, it was recently upgraded ... you know, not ancient.

      The language written in doesn't matter. It was a new change, insufficiently tested, and which failed in the real world with a corner case nobody anticipated.

      That's a pretty large failure of coding, testing, and deployment.

      It's a bunch of things, but really you'd expect the people responsible for it would have been a LOT more paranoid and rigorous about it.

      It's an air traffic system after all, around Washington for crying out loud ... which means there's probably some security people going completely apoplectic.

      I mean, the movie scenario of knowing there's an update to the ATC system around Washington and then all of the fanciful plot devices you can add in are suddenly slightly more plausible ... if "air traffic control offline around Washington" isn't begging to have a Bruce Willis movie, I don't know what is.

      You want to bet someone at DHS didn't have a couple of extra Rolaids when this happened?

      --
      Lost at C:>. Found at C.
    12. Re:This is why we like C by alex67500 · · Score: 1

      Wut.

      There aren't any memory leaks when you write in C?

      Not me.. I never use malloc or free....

      That's the spirit. sbrk(2) FTW :-D

    13. Re:This is why we like C by Anonymous Coward · · Score: 0

      Yeah, writing stuff in sloppy C code can only crash the entire eastern seaboard's phone system.

    14. Re:This is why we like C by godefroi · · Score: 1

      Surely! C has saved countless software systems from memory leaks over the years, not to mention other various classes of bugs that simply WOULD NOT EXIST had we stopped evolving programming languages past C.

      --
      Karma: Poor (Mostly affected by lame karma-joke sigs)
    15. Re:This is why we like C by RenderSeven · · Score: 1

      ... if "air traffic control offline around Washington" isn't begging to have a Bruce Willis movie, I don't know what is...

      "Die Hard Drive"?

      How about "Bjarne Stroustrup Must Die Hard"?

    16. Re:This is why we like C by Vlad_the_Inhaler · · Score: 4, Interesting

      I have actually seen something similar to this before, also involving an Air Traffic Control.

      They were having some problem in handling "Large Messages", I am not sure of the exact details / circumstances - I was only peripherally involved. Anyway, the programmer wrote these to a file, then they were processed asynchronously and deleted. This minor change was tested - as usual at the site - by someone shooting an hour's production traffic through the test system and checking for unexpected aborts or other abnormalities. All was fine, the spooling file was 1% full.
      The patch went online. 4 days later (it was a Sunday morning and it was snowing) the file hit some limit and refused to accept new messages. At that moment things went "Keystone Cops".

      • All department heads were informed, except programming. Given that only one the patch had been applied in the previous week, not very helpful. Headless chickens ran around trying to find a solution.
      • Standard practice in this type of situation was to switch to the backup/standby system. Since ATC data is very short lived, the backup system had an empty database which would then be populated dynamically. All "Station Chiefs" had to approve this step. One refused because he could not see any problem. Finally someone managed to make him understand what the problem was, then it was "oh yes, we are seeing that as well". His was the smallest station of course.
      • Standard procedure was also to switch to manual control - rather than automated - and cancel short-haul flights. The railways could take up the slack. This was done.

      The switch was duly made and everything was working again.
      It turned out that the deletion of the processed records had a bug. One hour of live data left the file 1% full. 100 hours . . . do the math. It took 5 or 10 minutes for the programmer to fix the problem, he could have done it live on the Sunday if anyone had bothered to tell him what was going on.

      One of the lessons from that is also relevant here - one hour of live data left the file 1% full. I'd bet that they were testing that the new feature worked, not looking for hidden side-effects.

      --
      Mielipiteet omiani - Opinions personal, facts suspect.
    17. Re:This is why we like C by delt0r · · Score: 1

      What the fuck? if you use all the memory in C you run out of memory like every other fucking language! What is this tripe with programing languages as religions round here? You think the magic C pixie fairy is going to magic more memory into your system?

      --
      If information wants to be free, why does my internet connection cost so much?
    18. Re:This is why we like C by Murdoch5 · · Score: 1

      If you run out of memory because you didn't SELF MANAGE it correct, it's still your fault. Memory leaks are always the fault of the developer, unless they're hardware based.

    19. Re:This is why we like C by Anonymous Coward · · Score: 0

      So sloppy, untested, programming crashed the system. I'm willing to bet that if you take the hood off the system, it's written using high level languages instead of C.

      Why?

      Nothing in C makes sloppy untested programming impossible.
      The crash here sounds similar to what happens when you run a recursive Fibonacci calculator for long sequences in C.

      My guess is that the system in question has very limited memory and the issue is that someone who isn't used to the idea that windows take up an amount of heap space you have to care about didn't think to limit the number of open windows, so users could open enough windows to crash the system. That can happen in any language. It should have been caught in testing but depending how many windows it actually took could be easy to miss.

    20. Re:This is why we like C by Anonymous Coward · · Score: 0

      If you have a C-peen you probably need to consult a urologist, a little bend isn't uncommon but that's pretty extreme.

    21. Re:This is why we like C by sjames · · Score: 1

      So you're saying C would never allow failure to free the mallocs? I'm not aware of that dialect, please enlighten us!

    22. Re:This is why we like C by Rei · · Score: 1

      I can't speak for their code, but ours is written in C. And... (don't throw tomatoes, I'm actively working to remedy this!)... it's C compiled without warnings even enabled, let alone -Werror. When you turn warnings on you get heavily, heavily spammed.

      Hopefully I'll have that aspect fixed within a few months, if other tasks don't eat up too much of my time.

      --
      "99 dead duelists of Dios on the wall. 99 dead duelists of Dios! Take one's ring, pass it around..."
    23. Re:This is why we like C by dpidcoe · · Score: 1

      Memory leaks are always the fault of the developer, unless they're hardware based.

      And the first rule of software development is that it's always a hardware problem until proven otherwise!

    24. Re:This is why we like C by Anonymous Coward · · Score: 0

      if "air traffic control offline around Washington" isn't begging to have a Bruce Willis movie, I don't know what is.

      Jesus fuck, ease up with the histrionics broseph.

      If the ATC system malfunctions, planes don't fall out of the sky. If the ATC system goes offline, planes don't get uncontrollably set on an autopilot course that smashes them into each other over populated areas.

      If the ATC ends up inoperative, the first thing they do is revert to manual control - they do practice drills on this all the time. They also increase distance between planes to allow for the high latency of manual control. They also minimize the amount of traffic in their airspace, by diverting traffic away to other airports.

      Yes, it's *inconvenient* and yes, it's *expensive* and *disruptive* for people to get diverted, but the shutdown of the ATC system doesn't immediately turn every plane within 200 miles of Washington DC into a guided missile or a 100 million dollar lawn dart.

      Also, if you think that the DHS is simply relying on commercial radar and air traffic control to secure the airspace over Washington, I've got a bridge in Brooklyn I'd like to sell you. The military, you know, part of the department of defense, is quite capable of monitoring the restricted airspace over Washington and keeping it secure, even if air traffic control is offline.

    25. Re:This is why we like C by Murdoch5 · · Score: 1

      No, I'm not saying that all, I'm saying that it's a combination of several factors:

      1. Sloppy, unprofessional programming, such as freeing a non allocated memory block.
      2. Modern languages that take over memory management for you.

      I'm pointing out C because when developers stop relying on managed memory and garbage collectors they learn to be more responsible and in general avoid simple memory issues.

    26. Re:This is why we like C by Murdoch5 · · Score: 1

      Hey as long as you have a plan thats awesome :-).

      I'm currently an embedded system developer and web system developers. ALL of my C code MUST compile with -Wall -Wextra -Wpadentic -Werror turned on and must pass strict valgrind scans.

    27. Re:This is why we like C by Murdoch5 · · Score: 1

      Said the poor developer. Prove it's hardware before assuming.

    28. Re:This is why we like C by Rei · · Score: 1

      Our code doesn't even work with valgrind. It (again, no tomatoes!) uses shmat to seize a particular address space (clobbering whatever was there in the process) because it uses pointer addresses that are hard coded into the program rather than being allocated dynamically by the operating system.

      I wish I was kidding. It's kind of like this.

      --
      "99 dead duelists of Dios on the wall. 99 dead duelists of Dios! Take one's ring, pass it around..."
    29. Re:This is why we like C by Murdoch5 · · Score: 1

      I feel sorry for you, can I ask what your program does and how old it is?

    30. Re:This is why we like C by CWCheese · · Score: 1

      Mod +5 on the Ada observation; I too had friends in the early '90s who had worked on the Ada rewrite which was abandoned, so they ended up at our telecomm firm writing C on our projects, then C++ etc etc. Seems there has been a near continuous chain of ATCS rewrite initiatives since the '80s, none of which has replaced the core systems which very likely still have a good bit of FORTRAN (66 or 77) and COBOL inside.

      --
      Have a Day!
    31. Re:This is why we like C by Anonymous Coward · · Score: 0

      Actually, that was AT&T deploying the sloppy code to production during the middle of the morning on a business day.

    32. Re:This is why we like C by gstoddart · · Score: 3, Insightful

      It took 5 or 10 minutes for the programmer to fix the problem, he could have done it live on the Sunday if anyone had bothered to tell him what was going on.

      I have seen far too many occasions where some hotshot made an out of band code change, broke prod, and then said "oh, it's just a quick fix".

      It would have to be one hell of an emergency to have live changes on a prod system be anything other than a hanging offense. I've see more problems caused by it, than things fixed by it.

      I've experienced several outages caused by someone who was either thinking "it's just a quick fix", or was trying to sneak in a fix for something which shouldn't have left their desk in the first place.

      --
      Lost at C:>. Found at C.
    33. Re:This is why we like C by Chris+Mattern · · Score: 1

      Yes, it's not really all that dangerous. But movie makers aren't interested in what's real--they're interested in what's exciting. They're perfectly prepared to depict ATC as having planes falling out of the sky with any failure, because that'll make for a movie that'll pull in an audience.

    34. Re:This is why we like C by Anonymous Coward · · Score: 0

      Many display systems are. It sounds like a heap problem to me. If you are building a display which only selects and monitors an underlying database, which may well be managed in C, it is plausible to use a higher level language, But C can have heap leaks, too.

      The hell you say? /sarcasm

      Lower level programming languages (like 'C') are far more prone to memory leaks compared to languages with garbage collectors. I witnessed Microsoft's "TrustedInstaller.exe" (their patching / updating tool) eating 8GB memory just this past week trying to determine what updates were needed. No it was not a virus, this is just the sad state that Windows is in when you do a clean install and have 100s of patches to be applied. Now I understand why MS publishes "ServicePacks" (because their updater is a horribly inefficient POS when there are more than a handful of updates).

    35. Re:This is why we like C by Alioth · · Score: 1

      Don't get me started on Die Hard 2. Argh.

    36. Re:This is why we like C by Anonymous Coward · · Score: 0

      1. Original ATC systems were likely written in ADA not C++.

      2. The problem originated from not properly disposing display information and/or not handling the scenario where memory is being consumed by too many custom displays.

    37. Re:This is why we like C by Bill_the_Engineer · · Score: 1

      I'm sure it was said "tongue in cheek".

      We used to have this joke flow diagram in our meeting area which basically said (I can't do it justice):

      1. Problem with a subsystem?
      Hardware engineer: Looks like a software issue.
      Software engineer: Looks like a hardware issue.

      2. Running out of time?
      Hardware engineer: We can emulate this missing hardware function in software.
      Software engineer: We didn't need that feature.

      --
      These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...
    38. Re:This is why we like C by Anonymous Coward · · Score: 0

      He mentioned earlier in the thread, ATC and sometime in the 80s

    39. Re:This is why we like C by Murdoch5 · · Score: 1

      HA!!!

      That's seriously funny! The way I would of wrote that:

      Running out of time:
      1,. Hardware Developer - You only had to read the datasheet.
      2. Software Developer - This is why marketing doesn't set deadline.

      As an embedded engineer, I NEVER blame hardware without being completely sure that it's the problem.

    40. Re:This is why we like C by Bill_the_Engineer · · Score: 1

      As a fellow embedded engineer, neither do I. Mostly because the hardware engineer sits right next to me at the meetings. ;)

      Really it's because we have documentation and engineering processes.

      --
      These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...
  5. which language? by Anonymous Coward · · Score: 0

    Don't see that on the articles

  6. What programming language... by Anonymous Coward · · Score: 0

    Why VBA, of course

  7. Programming language doesn't matter by rockmuelle · · Score: 4, Insightful

    You can have poor memory management in any language.

    Sure, historically C/C++ have had the been known for memory leaks due to memory that's not freed, but in Java/Python/pick-your-favorite-garbage-collected-language or using smart pointers in C++, all you need to do is have a container that keeps a reference to everything and nothing will go away. It's not hard to do this.

    Based on the summary, it sounds like that's what happened. Some monitor views just kept a list of everything and the developer forgot to purge the lists when things went out of, er, scope.

    -Chris

    1. Re:Programming language doesn't matter by IAMBatman · · Score: 1

      It annoys me that people like you get upvoted, because you are wrong. It's very simply a theorem stating that the maximum memory use under all circumstances should be lower than X should be proved. Obviously, that didn't happen. Yes, such systems do exist.

    2. Re:Programming language doesn't matter by tomhath · · Score: 2

      It's very simple to prove a student's program consisting of a few modules and a couple hundred lines of code. Expand that out to hundreds of programmers, thousands of modules and tens of millions of lines of code and it's not so simple anymore.

    3. Re:Programming language doesn't matter by IAMBatman · · Score: 1

      The systems described in the news item do not have 10M+ lines of code nor do they require hundreds of programmers. Even so, that's not a limitation of these systems.

    4. Re:Programming language doesn't matter by phantomfive · · Score: 1

      It's very simply a theorem stating that the maximum memory use under all circumstances should be lower than X should be proved.

      Proving that is impossible in a garbage collected language like Java or Python.

      --
      "First they came for the slanderers and i said nothing."
    5. Re:Programming language doesn't matter by Anonymous Coward · · Score: 0

      It's very simply a theorem stating that the maximum memory use under all circumstances should be lower than X should be proved.

      Proving that is impossible in a garbage collected language like Java or Python.

      Uh, it is really simple: -Xmx1024m -- good luck exceeding 1GB in the Java heap with that.

    6. Re:Programming language doesn't matter by Jeremi · · Score: 1

      Just like you can die in a car accident, no matter what kind of car you drive. And yet, you're still safer in a Volvo than in a Pinto.

      --


      I don't care if it's 90,000 hectares. That lake was not my doing.
    7. Re:Programming language doesn't matter by rockmuelle · · Score: 1

      Ok Batman, prove it. Using real programming languages, not toy academic ones. Particularly, prove for any program written in C, which according to other posts on this thread is the language that was used. Make sure your proof works for heavily macroed C programs with upwards of 2M lines of code developed by thousands of people over the last 30 or so years (again, see other posts for the details). Make sure any checker developed using your method completes its analysis before the heat death of the universe, too.

      Then, just for completeness, prove it for all programs in the other languages I mentioned. You know, the languages people actually write software in.

      And don't forget that external inputs are allowed (those pesky side effects). We're tracking airplanes here. If nothing else, a stream of inputs must be supported by your proof.

      It annoys me that people like you make comments like that with nothing to back them up.

    8. Re:Programming language doesn't matter by IAMBatman · · Score: 1

      No, you just say that it is impossible. I call it an engineering problem. You just confused some concepts and as a result you are now spreading your ignorance. I hate it when when people do that.

    9. Re:Programming language doesn't matter by phantomfive · · Score: 1

      No, you just say that it is impossible. I call it an engineering problem

      I hope you never engineer something mission-critical, because you will engineer it very poorly.

      You just confused some concepts and as a result you are now spreading your ignorance. I hate it when when people do that.

      Instead of insulting me, correct me.

      --
      "First they came for the slanderers and i said nothing."
    10. Re:Programming language doesn't matter by IAMBatman · · Score: 1

      Oops, I already built (and build) mission critical stuff. Additionally, you should wish that I was also developing on flight control software and life critical systems; current development methods as used in industry are either limited or lacking in the kind of dynamic behaviour they support. There really isn't any point in explaining to you what you weren't able to figure out yourself. It's like I am building a space ship and you, who has just built your first paper plane am telling me that I am doing it wrong. I hope you understand. If, however, you represent a multi-billion dollar company, I am ready to explain for the small price of 250K USD how something like that could be achieved.

    11. Re:Programming language doesn't matter by phantomfive · · Score: 1

      Oops, I already built (and build) mission critical stuff.

      Please tell me the brands so I can be sure to avoid them.

      --
      "First they came for the slanderers and i said nothing."
    12. Re:Programming language doesn't matter by Anonymous Coward · · Score: 0

      Computer has 2GB of memory.
      Program runs within computer memory.
      Computer cannot gain memory.
      Program cannot switch computers.
      Program can never use more than 2GB as there isn't more than 2GB.

      I really suck at proofs. Someone can write that up better than me.

      I agree with the parent. Formal methods are extremely difficult and costly. As far as I know only researchers use it.

  8. It's not clear they ran out of RAM by Snotnose · · Score: 1

    From the way it reads the system could have allowed for, say, 256 spiffy windows. If they weren't getting deleted as expected they could have drained that pool of spiffy windows no matter how much RAM they had.

    1. Re:It's not clear they ran out of RAM by Anonymous Coward · · Score: 0

      And no matter the language used !!!

  9. Someone lost their job? by Anonymous Coward · · Score: 0

    https://www.usajobs.gov/GetJob/ViewDetails/413146100

  10. QA process? by scsirob · · Score: 2

    I don't care what language they use. It could be BASIC for all I care.

    What I do care about is what their QA process looks like. How did this not get caught in testing??

    --
    To Terminate, or not to Terminate, that's the question - SCSIROB
    1. Re:QA process? by Brett+Buck · · Score: 5, Insightful

      It didn't get caught in testing because testing is by far the most expensive and time-consuming part of the development process, and is always the first thing to get cut/trimmed/"streamlined". Just like it has been forever.

    2. Re:QA process? by bobbied · · Score: 2

      It didn't get caught in testing because testing is by far the most expensive and time-consuming part of the development process, and is always the first thing to get cut/trimmed/"streamlined". Just like it has been forever.

      There is one more reason... Testing is the LAST thing you do before a release, so as the schedule slips to the right the last task on the schedule ALWAYS gets squeezed into smaller and smaller schedules. Less time means less testing.

      --
      "File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
    3. Re:QA process? by Zak3056 · · Score: 1

      It didn't get caught in testing because testing is by far the most expensive and time-consuming part of the development process, and is always the first thing to get cut/trimmed/"streamlined". Just like it has been forever.

      While what you say is, sadly, how the world actually works, the above should never occur with safety critical systems. "We'll fix the bugs in production" is absolutely unacceptable when your possible failure modes include dead people and hundreds of millions of dollars in damage.

      --
      What part of "shall not be infringed" is so hard to understand?
    4. Re:QA process? by Brett+Buck · · Score: 2

      Well, a big part of the distinction you are making (life-critical VS commodity software) has gotten lost by the various programming "cargo cults", where every problem is the same and every solution fits into some sort of stupid ritual.

    5. Re:QA process? by CODiNE · · Score: 1

      I'm amazed continuous integration isn't a software industry norm. At the bare minimum adding a regression test for every bug fix would slowly grow a really nice validation suite.

      --
      Cwm, fjord-bank glyphs vext quiz
    6. Re:QA process? by Chelloveck · · Score: 1

      Small memory leaks are very hard to find in testing. Most testing cycles involve testing a particular feature, looking for pass/fail for that feature. Say the feature is window display, as the summary seems to imply. Okay, does the window pop up when the command is given? Does it contain the right contents? Does it go away when commanded? Check, check, and check. Ship it! It's very unusual to test a feature long enough that a smallish memory leak adds up to anything noticeable. "System crashes when window is opened and closed 256 times in a session." QA is just plain never going to get to that point. It may take weeks of heavy use to get to the breaking point. Sure, it would be *nice* if there was a month-long burn-in period where the system was used heavily to expose any slow leaks, but that never happens.

      From the summary I can't tell if the leak is anything that could be fixed by garbage collection. Was a block simply not freed and lost? Or was a reference to it still in a list somewhere, so that it would never get garbage collected even if that was a feature of the language? Is the memory in question on the regular heap so that something like valgrind could find that blocks weren't freed? Is the memory part of a specialized buffer pool managed by some other means? There's nowhere near enough information to go on, but that's not stopping anyone here from jumping to conclusions about the development and QA process.

      --
      Chelloveck
      I give up on debugging. From now on, SIGSEGV is a feature.
  11. As this is the FAA by RevWaldo · · Score: 5, Funny

    I'm betting someone just got some of the punch cards mixed up.

    .

    1. Re:As this is the FAA by Richard+Steiner · · Score: 1

      Nothing a good card sorter couldn't fix. That's why we use line numbers on our COBOL decks and all that. :)

      Er. UseD. UseD. :)

      --
      Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
      The Theorem Theorem: If If, Then Then.
    2. Re:As this is the FAA by Anonymous Coward · · Score: 0

      Maybe the the punch card stack became to large and the got a stack overflow.

  12. Programming Language by lbmouse · · Score: 1

    Depends a little on the OS... a while back it was a combination of OS/400, AIX, and MVS | OS/390 | z/OS.

  13. Language: ADA by JumboMessiah · · Score: 5, Informative

    While everyone speculates on GC vs heap vs what flavor is my coffee, ERAM approach systems use ADA as the language of choice.

    reference

    1. Re:Language: ADA by Anonymous Coward · · Score: 0

      your reference says ada AND java

    2. Re:Language: ADA by Anonymous Coward · · Score: 0

      It also mentions Visual Basic. Glad you passed the 4th grade, but it looks like reading comprehension got the best of you.

  14. Question Mark by Anonymous Coward · · Score: 0, Offtopic

    I wonder why people tend to put question marks at the end of sentences that begin with "I wonder".

    "I eat". What do you eat? "I eat whole-grain wheat toast for breakfast."
    "I wonder". What do you wonder? "I wonder what programming language they were using?"

    Is it some valley girl thing?

    1. Re:Question Mark by U2xhc2hkb3QgU3Vja3M · · Score: 0

      Maybe because the general idea of the sentence is a question in itself but for yourself? I see plenty of worst mistakes all day long to start wondering about such details.

      Misspled words, people who cant use punctuation properly people who dont capitalize the first word in there sentences and people who dont know the difference between there and their.

      The previous sentence was self-referencing sarcasm.

  15. THIS. by tlambert · · Score: 0

    THIS.

    1. Re:THIS. by Anonymous Coward · · Score: 0

      THAT.

    2. Re: THIS. by Anonymous Coward · · Score: 0

      and the OTHER

  16. Better thrashing... by thrillseeker · · Score: 1

    ...than crashing. Well designed systems do not die when running out of memory - they recognize the issue, and either at the general OS level or at the specific Application level, begin shifting the memory requirements to storage. Yes, they run (much) slower - but it gives an opportunity for some system more aware of the big picture than the application (e.e. the operator) to prioritize and recover. As others have alluded to - how did this situation not get found in a proper testing process?

    1. Re:Better thrashing... by 0123456 · · Score: 1

      Yeah, because what could possibly go wrong in an air traffic control system when the computers are thrashing like crazy as they run out of RAM?

      I've worked with some non-critical systems used in aviation, and they shut down and switch to the backup when they get close to running out of RAM. A few seconds' delay to swap in data about airliners that are travelling a few miles apart at a few hundred miles an hour could kill hundreds of people.

    2. Re:Better thrashing... by DaMattster · · Score: 1

      Yeah, because what could possibly go wrong in an air traffic control system when the computers are thrashing like crazy as they run out of RAM?

      I've worked with some non-critical systems used in aviation, and they shut down and switch to the backup when they get close to running out of RAM. A few seconds' delay to swap in data about airliners that are travelling a few miles apart at a few hundred miles an hour could kill hundreds of people.

      Very well put! The system should have failed over to a backup. Seconds could mean the difference between life and catastrophic death.

    3. Re:Better thrashing... by Anonymous Coward · · Score: 0

      Actually, ATC procedures are designed to avoid catastrophic death even if the computers go down completely. Notice that the result of this outage was delayed flights, not crashed flights. Doesn't change the fact that a thrashing system wouldn't be acceptable, but the notion then ATC works on such margins that a few seconds of outage would send airliners plummeting is simply wrong.

  17. Actually, it was a bunch of 68K assembly. by tlambert · · Score: 2

    I kinda doubt that, My understanding is most of the US's air-traffic control systems (and software) is ancient .

    Somehow, I doubt it was 2,000,000 lines of assembly language.

    Actually, the old system was a bunch of 68K assembly. Nowhere near 2,000,000 lines of it. I know one of the guys who wrote some of it.

    1. Re:Actually, it was a bunch of 68K assembly. by 0123456 · · Score: 1

      And this article, if you actually read it, explicitly refers to 'two million lines of code'.

    2. Re:Actually, it was a bunch of 68K assembly. by cdrudge · · Score: 1

      And his comment, if you actually read it, explicitly refers to the old system, the one that is being replaced by the new system that has 'two million lines of code'.

    3. Re:Actually, it was a bunch of 68K assembly. by 0123456 · · Score: 1

      Which is irrelevant, since it's not that system that failed.

    4. Re:Actually, it was a bunch of 68K assembly. by tlambert · · Score: 1

      Which is irrelevant, since it's not that system that failed.

      No, it's the system that would have NOT failed, had they been using it instead of the shiny, new, failure prone system.

  18. Implemented in Ada 2005 by deppli · · Score: 2

    A quick search reveals Lockheed Martin used Ada 2005 primarily to implement ERAM. Ada's Vital Role in New US Air Traffic Control Systems http://www.iaeng.org/publicati... "The new Ada 2005 real-time, and object-oriented language. Now it offers more has introduced more robust capabilities based on user experience. safety and portability than Java, and better efficiency and The language offers particular innovations which helps make safety assurance less costly and further improves high integrity flexibility than of C/C++"

    1. Re:Implemented in Ada 2005 by Anonymous Coward · · Score: 2, Informative

      The backend code is implemented in Ada but all of the display code is implemented in a mix of C and C++

  19. Nobody by U2xhc2hkb3QgU3Vja3M · · Score: 5, Funny

    Nobody expects the ERROR: OUT OF MEMORY.

    1. Re:Nobody by Anonymous Coward · · Score: 0

      Our chief weapon is surprise! Surprise and fear!

  20. Why does any program ever need more than 640KB by JoeyRox · · Score: 1

    Heard that somewhere before :)

  21. Re:As this is the FAA Easy Fix, Punched tape by BoRegardless · · Score: 2

    No more mixed cards

  22. You Said by dcw3 · · Score: 2, Informative

    But, you said that 8G was enough!

    --
    Just another day in Paradise
  23. Sadly by DaMattster · · Score: 1

    This is just indicative of America's crumbling infrastructure due to extreme ineptitude at the elected leadership level.

  24. Need to reboot by dheltzel · · Score: 1

    Even if the manual says that Win 95 no longer needs to be rebooted everyday like Windows 3.11, it's still a good idea.

  25. And the language is...... by jeremyp · · Score: 4, Informative

    Ada and Java apparently

    http://dl.acm.org/citation.cfm...

    --
    All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
    1. Re:And the language is...... by Anonymous Coward · · Score: 1

      No, it was was due to incompetent people.

    2. Re:And the language is...... by Anonymous Coward · · Score: 0

      https://en.wikipedia.org/wiki/Ada_(programming_language)#History

    3. Re:And the language is...... by ChrisMaple · · Score: 1

      ADA is the Americans with Disabilities Act; also the American Dental Association. Ada is a computer language. You're blaming this on cripples and dentists?

      --
      Contribute to civilization: ari.aynrand.org/donate
  26. Considering that its the FAA... by erp_consultant · · Score: 1

    the whole thing is probably coded with a stone tablet and chisel.

  27. If this is till the based on the 20+ year-old system it's Ada.

  28. Language? What O/S are they using, Windows? by rstanley · · Score: 1

    I hope not! Too many systems are! Banks are still using Windows on ATM's!

  29. back to the software language used... by swschrad · · Score: 1

    it was F- , pronounced F minus , and it is an indicator of programmer skill and attention.

    --
    if this is supposed to be a new economy, how come they still want my old fashioned money?
  30. Very funny guys. by tlambert · · Score: 1

    Very funny guys.

    If you can't handle explicit memory management, then you can't write in MISRA-C, and if you can't write in MISRA-C, you should probably be kept as far as possible away from coding on life support systems, because you are not going to be good at it.

    The idea that the could "run out of memory" in the first place implies heap allocation, at the very least, which is prohibited in the MISRA-C standard.

    You may remember the Toyota acceleration bug in the ECC; it was due to a practice which, had they been enforcing their own coding standards (which were a partial intersection with the MISRA-C coding standard, which was under development and nearing completion at the time), they would have caught the bug before it started ending up in dead people.

  31. If only there wer some kind of portable machine... by tlambert · · Score: 1

    And surgeons responsible for cutting open live human beings should be capable of not leaving tools in the person they're operating on, but it still happens. Professionals make mistakes. Garbage collection is a useful tool to make it more difficult to screw up.

    If only there were some kind of portable machine one could use in order to look for metal left in a patients body...

    If only there were a requirement to do things like count gauze pads before and after surgeries, and then account for any numerical discrepancy as being ON (not IN) the patient, and/or going into the biohazard disposal unit. You know: a documented procedure.

    Oh, wait: there is.

  32. Scorpion by Anonymous Coward · · Score: 0

    Did they get Walter O'Brien to race under a plane, so they can upload the correct software to reset the system?

  33. Re:Programming Language At FAA Radar Rooms? by Anonymous Coward · · Score: 0

    Calm down, Mr. Trump, it's time for your medication again.

  34. Why are we subsidizing planes? by WillAffleckUW · · Score: 1

    One of the major impacts on climate change is inefficient jet planes.

    Why not double the fee for non-787 fuel sipping jet planes (or turboprops), so that the impact cost has a real cost?

    There's your money.

    And take any extra and put it into high speed passenger and freight rail lines along the dense West urban I-5 corridor that produces 50 percent of all US GDP.

    --
    -- Tigger warning: This post may contain tiggers! --
  35. Really? by Anonymous Coward · · Score: 0

    I wonder what 'programmer' they used.

  36. Re: Software Programmer Enviroment by cyberhooligan77 · · Score: 1

    In case of a Software Error, should be "Software Programming Enviroment", not just "Programming Enviroment".

    Is not just the programming language you like, if it has bad libraries, or wrong programming logic, or viceversa, the programming language you may not like, but, has a good programming logic, and good libraries.

  37. ADA is the programming language... by gavron · · Score: 1
  38. Primary Buffer Panel by Anomalyst · · Score: 2

    I'm surprised more planes don't shed parts along the way.

    Did the Primary Buffer Panel just fall off my gorram ship for no apparent reason?

    --
    There is no right to feel safe thru security vaudeville at the expense of everyone's freedom, privacy and tax money.
  39. How many H1B programmers were on this project by Anonymous Coward · · Score: 0

    And I wonder how many 'most excellent' H1B programmers worked on this snafu.

  40. Garbage Collection by Anonymous Coward · · Score: 0

    Maybe they could [garbage collection] have used [garbage collection] java [garbage collection] [garbage collection] which would [garbage collection] [garbage collection] have ensured [garbage collection][garbage collection] lots of memory [garbage collection] [garbage collection] is always [garbage collection] available, at the [garbage collection] [garbage collection] small [garbage collection] [garbage collection] [garbage collection] price [garbage collection] [garbage collection] [garbage collection] of a runtime [garbage collection] [garbage collection] [garbage collection] hit and non [garbage collection] [garbage collection] realtime [garbage collection] [garbage collection] [garbage collection] priority. What [garbage collection] [garbage collection] could [garbage collection] [garbage collection] [garbage collection] possible [garbage collection] [garbage collection] [garbage collection] [garbage collection] happen.

    There is a reason java [garbage collection] isn't used for critical "realtime" systems. imagine the garbage collector dropping events or being "slightly behind".

    1. Re:Garbage Collection by Anonymous Coward · · Score: 0


      Garbage Collection?? Here is an old SlashDot reference...

      C# Memory Leak Torpedoed Princeton's DARPA Chances
      http://slashdot.org/story/07/1...

  41. My guess by reboot246 · · Score: 1

    I bet somebody was running Firefox.

  42. Late-80s Development Process Failures by billstewart · · Score: 2

    I think this system that failed is part of the same one I helped bid on upgrading in the late 80s. (We were the lucky ones who lost the bid; IBM were the poor suckers who won it.) The Advanced Automation System was supposed to have a budget of something like 4 years and $4B, or maybe it was $7B, but either way it ran way way over that, in both years and billions, before being restuctured, partly because the problem is really hard, partly because the specs were extremely unrealistic, and partly because we were required to use DOD-STD-2167 software development methodology, a very heavy clumsy version of waterfall process.

    The important requirement was that if anything went wrong and two airplanes crashed and fell out of the sky, mobs of citizens and Congresscritters would descend on FAA headquarters with torches and pitchforks and budget cuts, so everything that to be ultra-conservatively speced to prevent that from happening. I'm extremely annoyed to hear the FAA saying that except for this failure, they've been running 99.99% reliability this year. Four 9s? The specs we were supposed to meet were 8 9s, and since nobody was willing to ask the FAA to define a failure event, our management was conservatively aiming for 10 9s. (An average system controlled about 100 radars, and the big difference is whether a "failure" means "all the radars are out" or "any single radar is out".) This kind of reliability meant that duplicating everything wasn't good enough, you had to triplicated every piece of equipment, or double-double it, because otherwise the possibility of one piece failing while you had its backup down for preventative maintenance for 5 minutes blew your numbers for the year. (No matter that the radars were connected back to the data center over circuits that had 3.5-4 9s, just because of the usual risk of physical damage.) We later found out that the FAA shut down the then-current 1960s system for four hours a night, running on the backup equipment (which was a 1970s transistorized upgrade to the 1940s/1950s version) to keep the backup system reliable and operators trained.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
    1. Re:Late-80s Development Process Failures by lokedhs · · Score: 1
      9 nines? That's 32 milliseconds downtime per year.

      How do you even detect downtime in such a system?

    2. Re: Late-80s Development Process Failures by ceoyoyo · · Score: 1

      If it goes down, it failed.

    3. Re: Late-80s Development Process Failures by Anonymous Coward · · Score: 0

      I worked on AAS for IBM and it was a disaster in terms of what the FAA wanted. The reliability numbers were so high that we tried to write a custom communications stack because TCP/IP stack in the OS wasn't good enough. The hardware architecture was crazy overkill with quad token ring networks backed up by Ethernet and bridges everywhere. I left the project but it was going way over interns of money and time. The project was too big to manage. Government at its finest...

    4. Re:Late-80s Development Process Failures by billstewart · · Score: 1

      As with telephone switches, you end up needing a large fraction of your code to be monitoring the state of the system to make sure everything's working. But also, you don't get that kind of reliability by having systems that flake out for a few milliseconds - you end up with multiple systems in parallel, and you know failure probabilities of the individual systems and build the thing to try to eliminate common-mode failures. So Box A, Box B, and Box C are in parallel, and each one has alarms that detect whether the other ones are down and the backup needs to take over for the primary, so if Box A is down, Box B takes over while you fix Box A, and Box C is there to take over if Box B also fails (or was already down when Box A failed.) Or alternatively, you've got an A/B pair, and a C/D pair, and if A fails, the C/D pair takes over while you fix A, reducing the risk that B will fail before you've done that.

      We were part of the project because we're good at systems integration and government projects, but also because we had processor chips that did trig functions really really fast (for late-80s definitions of fast.) Turns out that the data was all coming from sensors with 12-bit A/D converters, and the fastest way to do trig functions on them isn't a floating-point chip - it's a lookup table :-)

      --

      Bill Stewart
      New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  43. See my rant about working on AAS above :-) by billstewart · · Score: 1

    I just wrote a rant about AAS in a comment above. What a disaster that was, and the system we got out of it was partly so late and unreliable because the FAA way way overspec'd the first version.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  44. Ada, not MISRA-C by billstewart · · Score: 1

    I don't know what the current FAA system is written in, but back in the late 80s when I was working on a previous attempt to upgrade it, we had to write in Ada. (Actually, we mostly had to write in DOD-STD-2167 development methodology, and no I don't mean 2167A, but if we'd gotten far enough in the process to be coding, it would have been in Ada, generally emulating systems originally written in JOVIAL.)

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks