Slashdot Mirror


Tracking the Blackout Bug

Alien54 writes "This earlier Slash story cited a CNN news report on how the August blackout was preventable. But, as seen in this Security Focus article, things are not so simple. 'In the initial stages, nobody really knew what the root cause was,' says Mike Unum, manager of commercial solutions at GE Energy. 'We test exhaustively, we test with third parties, and we had in excess of three million online operational hours in which nothing had ever exercised that bug,' says Unum. 'I'm not sure that more testing would have revealed that. Unfortunately, that's kind of the nature of software... you may never find the problem. I don't think that's unique to control systems or any particular vendor software.' Which leads to a number of other questions."

3 of 207 comments (clear)

  1. B Method? by starseeker · · Score: 5, Interesting

    "the bug was unmasked as a particularly subtle incarnation of a common programming error called a "race condition," triggered on August 14th by a perfect storm of events and alarm conditions on the equipment being monitored. The bug had a window of opportunity measured in milliseconds. "

    Isn't this the type of problem the B Method (and maybe the Z language too) are designed to address? Use proof logic initially - once you have decided on a behavior you want, design the system in such a way that it is provable it executes this design.

    That doesn't mean the DESIGN is flawless, of course. But if we start engineering software on as many levels as we can, mightn't things improve? Normal software development and testing would never have found a critical bug with rare trigger conditions and a millisecond window. If you need precision on that level, you need to (for starters) to KNOW your implimentation of your design is sound, and preferably the code you are running exactly impliments the proven logic. Isn't this what the B Method was created for?

    --
    "I object to doing things that computers can do." -- Olin Shivers, lispers.org
  2. Re:The problem with SCADA systems by Vancorps · · Score: 5, Interesting
    This all reminds me of the movie Resident Evil where they shut down power and all the doors unlock when power is restored.

    You bring up a great point about failure states. I work for several large hotels and the fire control systems are the ones that alert whenever there is any problem of any kind largely because any problem of any kind needs to be addressed immediately so it makes sense.

    I would think power systems would think along the same lines since the odds are, ANY failure whatsoever needs immediate attention of engineers that maintain the system. This is not a requirement for all software but when it comes to such critical services why doesn't everybody do the same practice? It seems so blatently obvious that alarms should have been raised.

    Also, in situation's where you don't work on a live environment you can always create a test environment that is for all intensive purposes "live" For web development work I do I have a testing domain which is used to test sites to ensure that because they work here in my lab they will work when I hand them off to the client. Its 100% accurate, I've seen it done with countless other systems, so why wasn't it done here?

  3. The American jackasses who blamed Canada by Kevin+Mitnick · · Score: 5, Interesting

    Did anyone ever retract their statements? I know the NY Mayor was pretty quick to blame us Canucks.