Slashdot Mirror


Blackout Cause: Buggy Code

blanca writes "The big northeast blackout from last summer was caused in part by a software bug in an energy managment system sold by General Electic, according to a story on SecurityFocus. The bug meant that a computerized alarm that should have been triggered never went off, hindering FirstEnergy's response to the train of events that lead to the cascading blackout. Investigators found the bug in a intensive code audit following the outage, and a patch is now available."

33 of 377 comments (clear)

  1. See what happens? by poofmeisterp · · Score: 4, Insightful

    ... when you outsource to the lowest bidder?

    I've said enough.

  2. Development vs Engineering by bmongar · · Score: 4, Insightful

    The term 'Software Engineering' is bantered about in the software industry. I think little that you could call engineering happens. Software is developed. It doesn't meet the strict standards of testing and reliability of physical products.
    I am a software developer not an engineer, as are most people in the field. Software won't become an engineering science until companies are willing to pay for that process. Given the current trend towards cost cutting I don't see that happening anytime soon.

    --
    As x approaches total apathy I couldn't care less.
    1. Re: Development vs Engineering by Black+Parrot · · Score: 4, Insightful


      > I am a software developer not an engineer, as are most people in the field. Software won't become an engineering science until companies are willing to pay for that process. Given the current trend towards cost cutting I don't see that happening anytime soon.

      It will be interesting to follow the lawsuit news on this one. If someone gets squeezed hard enough, we might see a movement toward good engineering praxis as a result.

      More likely the politicians will step in and bail them out, but ISTM that as society continues to rely more and more on software, at some point we're going to decide that we can't afford not to set and follow good engineering standards.

      --
      Sheesh, evil *and* a jerk. -- Jade
  3. Would this be any better in an OSS environment? by bernywork · · Score: 3, Insightful

    Just a question for everyone here:

    Who thinks this could have been any better with Open Source and why?

    People make the comment of the many eyes, but who is really looking at the code?

    --
    Curiosity was framed; ignorance killed the cat. -- Author unknown
    1. Re:Would this be any better in an OSS environment? by Anonymous Coward · · Score: 2, Insightful

      The initial bug would still have been produced with an open source model. There would still have been a huge blackout. The difference is that the bug might have been found and patched much quicker. If you had been without electricity for a week and if you had the source to the application you might have had some insentive to look into the source yourself to prevent it from happening to you again. The great thing is that it would also prevent the same thing from happening to everybody else at the same time.

    2. Re:Would this be any better in an OSS environment? by eraserewind · · Score: 5, Insightful
      People make the comment of the many eyes, but who is really looking at the code?
      Probably nobody, especially if you are talking about something as dull as a utility management app. That's why companies pay people to look at these things.

      Open source almost certainly would have not prevented the bug. The bug might have been found faster after it happened though, because curious (or under pressure from their boss) engineers engineers in every facility affected would spend at least some time trying to figure out what went wrong.

      Having the source is great, and you would be surprised at the number of companies who license the source for what they use. Risk management is important. Free isn't everything, you can get many of the same things by paying :-)
    3. Re:Would this be any better in an OSS environment? by Detritus · · Score: 3, Insightful
      I don't care whether it is open source or closed source or divine inspiration, software reliability requires testing. Depending on the reliability requirements, proper testing can be very expensive. That's assuming anyone has even bothered to state reliability requirements.

      There are also system reliability requirements to be considered. Hardware fails. Software fails. Is the system designed to detect and cope with component failures?

      GE's software may suck. I don't know. I've never seen it. I am suspicious of people who attempt to hide their own negligence by blaming a third party.

      --
      Mea navis aericumbens anguillis abundat
  4. Argument from ignorance by Gothmolly · · Score: 2, Insightful

    "Things are so compliated, we don't know that a small event, or series of small events won't bring down the whole system"

    Yeah, well I don't know that I won't be fired tomorrow for reading Slashdot at work, but that doesn't mean that I will.

    --
    I want to delete my account but Slashdot doesn't allow it.
  5. Software "Engineering"? by fygment · · Score: 5, Insightful

    Now if in fact this was buggy code, and if Software Engineers are in fact part of the engineering profession, then a professional body should be taking the engineer(s) to task. This would be the same thing that would take place in the event that a civil engineer signed off on faulty building plans. But smart money says no software "engineer" will get nailed.

    A look at the software industry will show this to be the norm. And that is why there is such a problem with having people claiming the title of "software engineer". "Engineer" doesn't just mean having the technical savvy, it also means having a responsibility to the public for the use of that knowledge and being beholden to a professional body charged with ensuring you are held accountable.

    --
    "Consensus" in science is _always_ a political construct.
    1. Re:Software "Engineering"? by Detritus · · Score: 5, Insightful

      You can't have responsibility without authority. The building never gets built without the signature of the civil engineer on the plans. Few software engineers have that control.

      --
      Mea navis aericumbens anguillis abundat
    2. Re:Software "Engineering"? by Anonymous Coward · · Score: 5, Insightful

      That's why you'll never see a proper software ENGINEER... when engineers undertake a project they know the materials, the requirements, the environment, etc. As soon as a piece of software goes out the door all bets are off.

      How long do you think engineering (as it stands today) would last if that bridge meant to stand on bedrock spanning no more than 1000' and carry a load of no more than 1500 tons at any given time were suddenly put on a sandy bed, stretched to cover 1100' and carry 1600 tons... oh yes, and the user didn't like that third support so they removed it.

      Software and engineering are VASTLY different disciplines. If software is ever judged like engineering then it would kill the market because the EULAs would have to say that you use THIS motherboard with X amount of RAM and Y amount of hard-drive space. The agreement would only be in effect as long as you used OS "ABC" and no other processes besides those required by the OS and the programme in question were running. It would make the cost of running a business prohibitively expensive.

      When you consider that most large-scale software development projects are equivalent in complexity to building structures like the Golden Gate Bridge or the Empire State Building (I didn't want to mention any buildings outside the US since I realise the audience on here is largely American and probably wouldn't know what I was talking about) consider the cost of actually treating software development the same way... I'm sure companies everywhere will be lining up to pay $300M for that content management system.

    3. Re:Software "Engineering"? by CharlieG · · Score: 4, Insightful

      Your right - MOST software "engineers" aren't. Guess what? If they were, you would NOT see death march projects, software would cost a LOT more, and when the chief "eng" on the software project (or for that matter any Engineer on the project) said "This can NOT ship, it's not ready", the company would have to suck it up, and NOT ship.

      Software Enginners would have to carry E&O insurance (Think of it as malpractice insurance, like a MDs). It MIGHT be supplied by their boss, but...

      And in exchange for taking on this risk, what would a software Engineer EARN? You'd better believe it would be a LOT more than it is now.

      You would still have "coders" - in fact, MOST "software engineers" would go back to their pre title inflation title - "Programmer". The SE on the job would be responsible for all the code that the programmers wrote

      Just like MOST jobs don't have to be signed of by a PE, most software would NOT have to be signed off by an SE - but if you use software that wasn't signed off by a SE, and you caused 50b in losses, you would loose YOUR shirt

      At this point in time, it seems that the people of the US just have NOT found the need to come up with the idea of a licensed SE. I predict it will happen, and within the next 25-30 years. There have been movements withing the programming trade to do this. it's coming - but when?

      Right now, software development is very much like the "guilds" of the Middle Ages. You didn't have PEs back then - you had folks who learned from other folks, and you had projects that failed massively. Eventually, things became codified, and a lot of the failures stopped - at least for day to day stuff. But guess what? Buildings still fall down, even in construction (read the book "why buildings fall down"). It's just that for "common" designs, it doesn't happen

      --
      -- 73 de KG2V For the Children - RKBA! "You are what you do when it counts" - the Masso
    4. Re:Software "Engineering"? by Anonymous Coward · · Score: 1, Insightful

      I think you have some odd impressions of engineering. I am a mechanical engineer, and I never get decent requirements for what I'm supposed to design. I have to "guess and check" - design something and show it around to see if it meets the requirements that nobody tells me. I have to overdesign, because I know that it is going to be mistreated. If it breaks, I simply point to the spot on the drawing where it says "must be mounted on bedrock", and the contractor loses their ass, which is why they do what's on the drawing (sometimes).

      I think there are few software projects that can compare to a real world building of any size. Think "every single variable is analog, and only partially known." We just have more experience, so we can get them right, and we know how to parallel them better, so you can have a large project team that still produces a buildable result.

    5. Re:Software "Engineering"? by YU+Nicks+NE+Way · · Score: 4, Insightful

      Engineering is all about tolerances and modes of failure. If I design my car to be able to take a fifteen mph front end collision, and you drive into a wall at thirty, I'm not responsible, and my E&O won't wind up paying out.

      Currently, software is built in a craft/guild model: senior developers (masters) teach junior developers (journeymen) who've reached a certain level of expertise. Interns (apprentices) are drafted into the profession and groomed into junior devs. There is a widely held notion of subjective quality, and we can recognize a masterwork, but we can't quantify what it takes to generate one.

      Software engineering will become a true engineering discipline only when there is an objective measure of defect level and an objective notion of what constitutes an adequately circumscribed operating environment. Once we have adequate definitions of those things, though, software production will become industrialized almost immediately.

  6. TIBCO middleware by Anonymous Coward · · Score: 3, Insightful
    Never have I worked with a vendor so arrogant and yet so totally clueless. Their UDP based reliability protocol is total crap, regardless of their boasts that it is equiv to TCP.


    And yep, it runs on major critical systems, including energy systems and satellites.


    Lean on it in the slightest and it will crash and burn with little chance for recovery. Tibco even says they don't test their own software (lack of docs lowers their liability). Press them for test results and they will offer you to pay them to test for you.


    When a backup server kicked-in, it also failed, unable to handle the accumulation of unprocessed events that had queued up since the main system's failure.

    Sounds like classic Tibco.

  7. Metroid by Graymalkin · · Score: 5, Insightful

    Blaming the black out on a software bug is a damn cop-out. The cause of the black out was a horribly managed electrical grid that can barely keep up with the current demand. Any major failure in the system can cause a cascading failure of the entire section of the grid. That is a horrible design. A software bug may have been the trigger but it is by no means the true cause.

    The grid in the North East US is supplied by horribly inefficient and antiquated power lines that were struggling to keep up thirty years ago. That they are still in use today is an outright crime. There's also the issue of the operators of the lines generators trying to save a few bucks by cutting maintenance on equipment and facilities and cutting supervising staffs down to skeleton crews. It is much easier to fit "software bug" into a sound bite so the news media will stick with that. Unfortunately the real cause of the black out is not ever going to be patched and another blackout is as inevitable as this last one was. I hope next time a few more people will have invested in backup generators or some alternate form of power to keep from losing their business during a blackout.

    --
    I'm a loner Dottie, a Rebel.
    1. Re:Metroid by that_xmas · · Score: 2, Insightful

      Your right, it is horrible that we are still using this old power grid. Of course, no one wants new power lines built in their back yard, it may lower their property values. On top of that, 20 years ago we were going through the "EMF causes cancer!" scare. People were blaming power lines on cancer clusters. *sigh* Welcome to the United States of Short-sightedness

    2. Re:Metroid by Milalwi · · Score: 2, Insightful

      The cause of the black out was a horribly managed electrical grid that can barely keep up with the current demand.

      Wow. Quite an accusation. Any facts to back it up?

      Any major failure in the system can cause a cascading failure of the entire section of the grid. That is a horrible design.

      Really? There are major circuit outages on the Eastern Interconnected Network every day. The system is designed to have the local area go black instead of blacking out a widespread area. That was the lesson of the 1965 blackout, and the reason the 1977 NYC blackout was limited to the NYC/Long Island areas. By design, blackouts are supposed to stop at the interconnections between control areas, and the fact that the 2003 North Eastern blackout took out several control areas is what was suprising. In the end, however, it did stop at control area boundarys.

      How many major, widespread blackouts have occured in the Eastern Interconnected Network in the last 40 years or so? Note that the Eastern Interconnected Network does not include Texas, Quebec or systems west of the Rockies. I am using widespread to mean affecting several system/control areas. The 1977 NYC blackout, although large, did not spread past the New York City/Long Island area.

      This reminds me of the old SNL skit "Common Knowledge Jeopardy". A few public figures make ill-informed comments about a subject and suddenly everyone thinks it's a fact.

      The grid in the North East US is supplied by horribly inefficient and antiquated power lines that were struggling to keep up thirty years ago. That they are still in use today is an outright crime.

      What do you mean by inefficient? Do you think that the conductors somehow wear out? Equipment is inspected and replaced as needed. Yes, it's still done. This is not to say that maintenance procedures are perfect, of course.

      As another poster in this article stated, part of the problem is that no one wants new power lines in their back yard. (NIMBY, Not In My Back Yard) Another part of the problem, in my no-so-humble opinion, is that the feds are driving "de-regulation" of the generation portion of the system only, and they're not providing any logical (again, IMNHO) method for funding transmission system upgrades. In fact, having a well-designed trasmission system is becoming a liability as it continues to cost money, but the ability to make money from it is disappearing. (Yes, I meant to quote de-regulation, as they're not de-regulating anything, they're just changing the regulations)


      Unfortunately the real cause of the black out is not ever going to be patched and another blackout is as inevitable as this last one was.


      What would you recommend as a patch? Seriously, I'm interested to know what you think should be fixed and how.

      The report detailing what happened on 14-Aug-2003 is quite well written and interesting. I recommend it.

      There are major changes resulting from what we've learned from the study of the events of 14-Aug-2003, just as we learned and changed due to the events of 9-Nov-1965. People are thinking about these problems.

      Milalwi
  8. OK, time to revisit advanced development methods by starseeker · · Score: 2, Insightful

    If this isn't a call to take a closer look at the possibility of more widely using tools like Z and B to develop important software, I don't know what is.

    Yes, they're difficult. Yes, they aren't likely to eliminate all bugs. BUT. They provide a much better chance (as I understand it - I'm not an expert) that what is designed is what actually gets implimented. That shifts the burden onto the design, but that's OK - that burden was always there. It just means that the design gets properly implimented, which is all that can reasonably be asked of the coding process.

    Currently, again as I understand it, the life of a software program in development is a constant struggle by the developers to cope with ever changing demands of customers. I think if people want matters to improve the customers are going to have to come to grips with reality, take the time to sit down and think things through, and make all critical design decisions BEFORE the development process begins. More expensive up front? You bet. That's why I think companies should look at cooperative effort for this type of thing. Distribute the cost of developing one really good program across an industry. A lot of the same core functionality can likely be shared between businesses - if they all pay for one proper design and implimentation of an open program up front, and they all get copies of the logic and proof code with rights to extend as they see fit, they all benefit. They can also open up the more general parts of the package to the world at large under GPL, and anyone could contribute who can generate valid B and Z designs/proofs. Sort of an "academic" open source code development forum - peer review and all. The companies get the benefit of all new development - if they are using it internally they can extend the GPL code for themselves, so long as they don't distribute it. If they do distribute it, they can so so under GPL for everyone to enhance. A plugin based model can also allow them to develop components to the system they can sell as commercial software, if they wish.

    Whether this would work/appeal with corporate thinking I have no idea - many of those folks seem to view cooperation like the plague. But it might allow a higher grade of software to be developed and universally used, and I have a hard time imagining how that could be a bad thing for anyone.

    --
    "I object to doing things that computers can do." -- Olin Shivers, lispers.org
  9. Not Surprised by Anonymous Coward · · Score: 4, Insightful

    Posting anonymously for obvious reasons to me :)

    Given my personal experience with this certain Fortune 5 company and software development as a whole, I am not surprised.

    The bottom line is that there is soooo much software developed here by non-computer programmers. There are many great Engineers (Mechanical, Aerospace, etc.) here, yet very few can write good code. Many of them are asked to write code nonetheless and thanks to the travesty that is Visual Basic and other Rapid Application Development tools the code that is produced is extremely un-maintainable.

    Then you have the matter of people moving jobs every 2 years and the poor bastard who has to maintain someone else's code gets lost inside of it.

    Consider me very frustrated at the whole process.

  10. Re:The real cause... by Anonymous Coward · · Score: 2, Insightful

    and yes, there is no reason that a 12" tree should be anywhere CLOSE to a 50 MV line.

    Rather, there is no reason that a 50 MV line should be anywhere close to a 12" tree.

  11. Argument against centralization by sphealey · · Score: 3, Insightful
    In the wake of the blackout there were a lot of calls to create a centralized, monolithic dispatching center that would manage all electric generation and transmission in North America.

    To me, this report give a good example of why a monolithic (monocultural) dispatching system is not a good idea. If every transaction were controlled by a central center, a single software bug could shut down the entire North American grid.

    sPh

  12. Re:Visual Basic by cassidyc · · Score: 1, Insightful

    This is informative why??? mention of some of your friends who have nothing to do with XA21?

    And some random comments on GE selling crypto hardware....

    where's the connection??

    Clues please?

  13. More Reliable than Mars Rover by occamboy · · Score: 4, Insightful

    In all fairness...

    The Mars Rover's software crashed in just a few days.

    Virtually all software should be designed and tested better than it is.

    However, I'm perplexed at why the Mars Rover failure and resurrection is considered a miracle of human inginuity, rather than an indictment of crummy testing.

    I'll not excuse the power grid software either; but it seems to work more reliably than the software on the Rover.

    1. Re:More Reliable than Mars Rover by hpulley · · Score: 2, Insightful

      It is not considered a miracle but it is considered amazing. It is hard enough to debug things sitting on your desk, harder to debug someone else's problem over the phone and worse from orbit but imagine debugging a problem with 10 minutes of light delay! And there is only one computer on that rover so they were using the buggy computer to recover; not an easy task. In the end it turned out to be flawed file management code in the flash memory; the daily TODO list was kept in flash and it couldn't find it so it panicked and booted over and over, like a home computer with corrupted config/startup files. Not an easy thing to debug from millions of miles away.

      --
      $#!^ happens, but why does it always have to happen to me???
    2. Re:More Reliable than Mars Rover by Anonymous Coward · · Score: 4, Insightful

      Complete testing is impossible. The number of cases that can occur is enormous. To test every single one is impossible within the lifetime of any civilization, let alone the lifetime of a human being or the lifetime of the software itself. Even if you could test every case you can think of, you've still tested only the cases you can think of. What are you going to do, sit around all day and think, "What would happen if a cosmic ray flipped this bit while a surge from the camera's actuators caused the processor to reboot at the same time a martian gave it a good hard kick in the side and spilled martian beer on it?" That's ridiculous.

      Complete testing is impossible.

    3. Re:More Reliable than Mars Rover by Citizen+of+Earth · · Score: 4, Insightful

      Virtually all software should be designed and tested better than it is.

      "Software sucks because users demand it to."

      Unless every single software company does this, the ones that don't will own the market by virtue of supplying software that "mostly works" two years ahead of the others that supply software that is "perfect, minus epsilon". Then, all of the perfectionados go out of business, and the market returns to its present state. Things are the way they are because that's how various market pressures make them.

    4. Re:More Reliable than Mars Rover by Mr.+Piddle · · Score: 3, Insightful

      Things are the way they are because that's how various market pressures make them.

      The market is slowly changing, thankfully. A good example of a maturing market would our good old friend: home electrical wiring. How long did it take before every new home since probably the early 1980s is wired pretty much identically. They went through several different types of wire and insulation, grounded and ungrounded outlets, fuses and circuit breakers, etc. In a lot of ways, the software world is no different, and I'd say were at the aluminum wire stage with the various incarnations of systems we have and accompanying reliability and security problems.

      --
      Vote in November. You won't regret it.
  14. Re:Uh... by TimTheFoolMan · · Score: 5, Insightful

    According to the SecurtyFocus article, the operators had no way of knowing, because the data wasn't "live." This is a common problem with SCADA systems--the systems will display the "last known-good value" if something goes offline. However, the system should also visibly identify the data as "out of service" or "offline," and this didn't seem to happen. That could be an issue at the server, or it could be something blamed on the people commissioning the XA/21 system (assuming the display is configurable enough to allow you to program it at this level).

    Even so, there should have been sufficient watchdog messages between the client, the server, and the field hardware for the XA/21 to broadcast a general alarm along the lines of "I can't talk to the stinking field, so we're all flying blind here, you morons!" This is exactly the same as software in my industry (HVAC fire/security systems for large buildings), where if you lose communication to a subsystem or the field, you have to raise alarms all over the place.

    The real question is how you could lose such comm and the operators had no visible indication that they were relying on old data. This sounds like a missed requirement, if not insufficient testing.

    Tim

  15. blaming the software is easy by dewdrops · · Score: 3, Insightful


    So the software didn't raise alarms as it should've. That's bad. But it seems to me that the software is being made a scape goat here. It's much easier to blame "that #$@&@$ computer" than "FirstEnergy's failure to trim back trees encroaching on high-voltage power lines" or the fact that the infrastructure for the powergrid is old and poorly setup such that one failure can bring down the whole system. There's no reason why a failure in Ohio should blackout New York and there's nothing software can do to fix that.

  16. Re:Hmm by Anonymous Coward · · Score: 1, Insightful

    There is also the fact that consumer use of electricity is growing faster than the infrastructure to support it. If you can squeeze an additional 10-20% transmission capacity by more efficient use of existing facilities, then you can hold on until new infrastructure is built.

  17. No Wintel bashing? Oh wait it's RISC/UNIX code! by Glasswire · · Score: 3, Insightful

    Had this been a Windows-based system, the torrent of comments about how unreliable the OS and platform fundementally was would be huge.

    Funny, just because this ships for "industrial strength" AIX / Solaris RISC systems (see specs on pg 8), I don't see any cheap, reflexive comments about the platform.

    I guess the message here is that good or bad code can be written for any architecture.

  18. Re:Not very analogous... by naarok · · Score: 2, Insightful

    Water accelerates the growth of a plant, but it doesn't cause the plant to be. The seed did that.