Tracking the Blackout Bug

The problem with SCADA systems by up+up+down+down+lrlr · 2004-04-10 07:04 · Score: 1, Insightful

According to the SecurtyFocus article, the operators had no way of knowing, because the data wasn't "live." This is a common problem with SCADA systems--the systems will display the "last known-good value" if something goes offline. However, the system should also visibly identify the data as "out of service" or "offline," and this didn't seem to happen. That could be an issue at the server, or it could be something blamed on the people commissioning the XA/21 system (assuming the display is configurable enough to allow you to program it at this level).

Even so, there should have been sufficient watchdog messages between the client, the server, and the field hardware for the XA/21 to broadcast a general alarm along the lines of "I can't talk to the stinking field, so we're all flying blind here, you morons!" This is exactly the same as software in my industry (HVAC fire/security systems for large buildings), where if you lose communication to a subsystem or the field, you have to raise alarms all over the place.

The real question is how you could lose such comm and the operators had no visible indication that they were relying on old data. This sounds like a missed requirement, if not insufficient testing.

Tim

Re:The problem with SCADA systems by Vancorps · 2004-04-10 07:13 · Score: 5, Interesting

This all reminds me of the movie Resident Evil where they shut down power and all the doors unlock when power is restored.
You bring up a great point about failure states. I work for several large hotels and the fire control systems are the ones that alert whenever there is any problem of any kind largely because any problem of any kind needs to be addressed immediately so it makes sense.
I would think power systems would think along the same lines since the odds are, ANY failure whatsoever needs immediate attention of engineers that maintain the system. This is not a requirement for all software but when it comes to such critical services why doesn't everybody do the same practice? It seems so blatently obvious that alarms should have been raised.
Also, in situation's where you don't work on a live environment you can always create a test environment that is for all intensive purposes "live" For web development work I do I have a testing domain which is used to test sites to ensure that because they work here in my lab they will work when I hand them off to the client. Its 100% accurate, I've seen it done with countless other systems, so why wasn't it done here?
Re:The problem with SCADA systems by Anonymous Coward · 2004-04-10 07:21 · Score: 0

Also, in situation's where you don't work on a live environment you can always create a test environment that is for all intensive purposes "live"
What?
Re:The problem with SCADA systems by Vancorps · 2004-04-10 07:29 · Score: 1

Every test you can run will behave exactly the same in the real environment. I'm not sure why that is even remotely hard to understand.
Re:The problem with SCADA systems by Anonymous Coward · 2004-04-10 07:35 · Score: 0

Hmm.
Re:The problem with SCADA systems by EddWo · 2004-04-10 07:42 · Score: 1

It should be "for all intents and purposes", thats what he's getting at. Nevermind, you're not the first person to write it as they here it.

--
"Taligent is still pure vapor. Maybe they'll be the last who jumps up on Openstep... "
Re:The problem with SCADA systems by Vancorps · 2004-04-10 07:48 · Score: 1

Yeah, I figured that is what it was, but language is about communicating an idea and the point got across so it is considered acceptable. The meaning still fits my purpose even though it is not the common saying.
Re:The problem with SCADA systems by Anonymous Coward · 2004-04-10 08:07 · Score: 0

FYI, it's "all intents and purposes," not "all intensive purposes."
Re:The problem with SCADA systems by rand.srand() · 2004-04-10 08:52 · Score: 1

Having been at the plant where they make/support the XA/21, it's no wonder the thing failed. In the last few years they've axed the entire support crew, and tried to sub it out to recent high school grads. The last few good people worked really hard but couldn't document a single thing in the pressure to release systems.

As for the updates and how that works, etc, the XA/21 system uses RTU's in the field which are basically 1200 baud modems with some instrumentation and a simple controller. They call back into the main system every interval and update their status. From what I've read, however, the trouble was that the master server set had failed, and the secondary didn't switch over correctly. The RTU's were madly calling into the system reporting error status, but the XA/21 was dead and couldn't report good or bad status.

GE Energy can say that they can't test everything, but real problem is that they aren't testing much of anything in Melbourne any more...
Re:The problem with SCADA systems by 0x0000 · 2004-04-10 09:18 · Score: 1

It was my understanding that it was the watchdog that was supposed to raise the alarm when the field comms went down that failed. I.e. It was not a single point failure. This is the impression I got from the news coverage immeadiately following the blackout, at least.

It sounded to me as though there was an alarm mechanism in place, but it was not redundant.

It is an interesting point that the system was not displaying real-time data. I did visit a power control station once some years ago (a small one, belonging to a Gerogia Power), and remember being impressed with the fact that much of the control software was (appeared to be) real-time, in that operators could manually switch grids in and out using a terminal that resembled a Pacman game -- flat, tabletop display with a crolling map controlled by a trackball.

That does not speak to the monitorring and alarm systems, though, which were quite different. If anyone had asked, I would have thought that all the power monitoring and control systems were considered "hard real-time" applications.I wonder now if the monitoring system was properly integrated with real-time control system.

At any rate, there are saftey-criritcal systems that have to meet (more-or-less) independant standards of robustness. E.g. RTCA DO-178b for software in aircraft. Are there similar standards for critical untility systems software, and if not, perhaps there should be. Power grids and phone systems can be life-critical sytems under a variety of circumstances, and I don't know that this has been addressed by any authoritative independant body...

--
"The Internet is made of cats."
Re:The problem with SCADA systems by miu · 2004-04-10 09:22 · Score: 4, Insightful

For web development work I do I have a testing domain which is used to test sites to ensure that because they work here in my lab they will work when I hand them off to the client. Its 100% accurate, I've seen it done with countless other systems, so why wasn't it done here?
Mostly because web systems are still toys compared to real systems.
These systems get real and very intensive testing in labs as close to live as they can get. Even once they knew the conditions and affected subsystems it took the dev and testing teams months to recreate this bug in the lab. The lab is never just like real life, it cannot be - because even real life now is not always the real life of 10 seconds ago.

--

[Set Cain on fire and steal his lute.]
Re:The problem with SCADA systems by Kirill+Lokshin · 2004-04-10 09:32 · Score: 2, Interesting

This is exactly the same as software in my industry (HVAC fire/security systems for large buildings), where if you lose communication to a subsystem or the field, you have to raise alarms all over the place.

And perhaps the software in question also tries to do that. However, there are any number of reasons it could still fail.

Consider the following scenario: one software component (a proccess, if you will) is responsible for synchronizing the data between the remote testing station and the local data storage. Another pulls the locally stored data and displays it to the user. The natural place to check for lost comm is in the first component; but if, for some reason, the lost comm causes that component to fail, the second one may not be aware that the locally cached data is not being refreshed (a silly mistake, but I've seen it happen). Furthermore, the user will be unaware that the link failed because the process responsible for generating the notification will no longer be running.
Re:The problem with SCADA systems by spurdy · 2004-04-10 09:58 · Score: 2, Interesting

You make a good point, but in my company, we have hundreds of data points reporting continuously. When the communications (telephone company) fails, which it does multiple times every day, you end up with wrong data temporarily. If the operator had to investigate every comm failure, he'd never get anything else done. So, there has to be a threshold somewhere of when does a problem reach a level that it needs to generate an alarm.
Re:The problem with SCADA systems by fermion · 2004-04-10 10:13 · Score: 2, Interesting

It kind of depends on how often the out of data conditions occur and how long they occur. My understanding is that the design of proper alarms is actually a complicated security issue, and improper alarms leads to less effective security.
For example, I once worked at a place with many many Window web servers. Every time a server failed, an alarm would sound. But the reason we used Window servers is that they were dirt cheap so we could buy enough to compensate for the expected frequent failures. The result were near constant alarms that were uniformly ignored. Therefore, the alarms resulted in no security benefits. This place had many other example of impressive front door security with nonexistent backdoor security.
It could be that the data was often "not live". Such 'failures' might be due to perfectly legitimate and expected condition. As such, these would not be exception in the sense that it was not unexpected. It is quite possible that the system was designed to have a human check some board on a periodic basis to confirm the age of the data. It may be that as long as an operator did this job once an hour there would be no problem. Some group decided that additional indication would not do any good because the data was so often "not live" that the operators would suffer blindness to the alarm.
Of course we do not know this for sure, but it could happen. But it is a consideration. As another example my check engine light has been on for a long time, and yet the mechanic says that nothing is significantly wrong with the engine. How will I ever trust the light again?

--
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
Re:The problem with SCADA systems by MntlChaos · 2004-04-10 10:20 · Score: 1

timestamps should be able to solve that quite easily. If the data is older than 1-2 time intervals of refresh, throw up an alarm
Re:The problem with SCADA systems by Anonymous Coward · 2004-04-10 10:30 · Score: 0

"The real question is how you could lose such comm and the operators had no visible indication that they were relying on old data. This sounds like a missed requirement, if not insufficient testing."

They did have an indication. Typically alarms come in every minute or two. Where I work if we don't see any alarms after 5-10 minutes we start looking for the cause, usually a restart of the computer fixes it or IT specialists are called out. In Ohio apparently the operators received no alarms for over an hour and never investigated, this despite the fact that they were even getting calls from adjacent areas seeing things happening on their system.
Re:The problem with SCADA systems by Fishead · 2004-04-10 10:51 · Score: 2, Interesting

Wasn't Chernobyl taken out by a test gone bad?

Testing is all fine and good, but there are always going to be instances where something will remain undetectable for years until circumstances are just right (wrong?)

I am a technician at a plant that makes batteries and we see this all the time.

I remember one time where an operator was cleaning a conveyor with a cloth soaked in Methanol (standard procedure) but forgot about the rag he had left on the underside of the running conveyor. Once the Meth had all evaporated, the dry rag got caught on the conveyor and jammed in the sprocket. At the same instance a valve had opened to fill the electrolyte tank. The jammed sprocket blew a breaker which stopped the machine. The PLC (Programmable Logic Controller) is programmed to keep valves in their current state in the case of an emergency (you kill less operators this way), but in this case it should have closed the valve. The result was a large puddle of nasty smelling, toxic, expensive electrolyte underneath the machine. Much fun.

My point is that as much as we try to make our machines foolproof, there is always at least one fool out there that will one day outsmart you.
Re:The problem with SCADA systems by wintermute1974 · 2004-04-10 11:32 · Score: 2, Informative

I agree. These SCADA systems can become quite complex. If you are interested, you can even read General Electric's brochures for the XA/21 system.
Re:The problem with SCADA systems by Anonymous Coward · 2004-04-10 11:56 · Score: 0

Tim

You need to read the article again and read the final report. This had nothing to with "live" data or field device communications. The problem was that the alarm subsystem failed before transmission line began occurring. Quoting directly from the final report:

"Although the alarm processing function of FE's EMS failed, the remainder of that system generally continued to collect valid real-time status information and measurements about FE's power system,and continued to have supervisory control over the FE system. The EMS also continued to send its normal and expected collection of information on to other monitoring points and authorities, including MISO and AEP."
Re:The problem with SCADA systems by Anonymous Coward · 2004-04-10 12:23 · Score: 0

"write it as they here it."
Umm....
Re:The problem with SCADA systems by palndron · 2004-04-10 13:12 · Score: 0

If you read some of the articles on the problem, I believe that the issue was a failure of a reduncant system while it was trying to send out the alarms - basically they had automattically failed over to a redundant "server" whose alarm queue's where hoses, so thier client displays did not get fed new alarms. I don't believe that the issue was with not seeing live data.

Also to your points on scadas - the don't show the last known value unless specifically configured to do so.

--
a man, a plan, a canal, panama
Re:The problem with SCADA systems by Anonymous Coward · 2004-04-10 13:27 · Score: 0

The problem wasn't JUST with the SCADA system. As a power system operator, I can confidently say that someone was asleep at the wheel. If you will read the government's reports from the 1965 blackout in New York, it isn't the first time operators were negligent.
Re:The problem with SCADA systems by Vancorps · 2004-04-10 16:19 · Score: 2, Interesting

Web systems were but one example. I'll through another much more complex example. Take DNA from bacteria and splice it with stem cells to produce nerves much more resistent to damage. You are talking thousands about thousands of long protein strands most of which you have no idea what perform what task. Do this without destroying the cell. When you are done with that test you move on to a more complex test until ultimately you are ready to do it with humans, at which time you can accurately predict exactly what it will do. Yes there are occasions when that doesn't happen that way but that is usually because something was missed in the testing procedure.
The elitism seen here is incredible, just because a system in and of itself isn't complex doesn't mean you can take stock of how they manage. Although personally I'm about to design a call center application for Mercedes that will be used by hundreds of thousands of people. This system can get quite complex albeit, not as important as a power system.
When it comes to troubleshooting systems you always have the option of making an exact scale model. You scale it up for more precision. This is a simple concept and apparently a lot of people think just because a system is complex and antiquated the same ideas can't apply.
Re:The problem with SCADA systems by Eskarel · 2004-04-10 16:53 · Score: 1

Perhaps the theory was that they shouldn't need a status indicator change to tell them the lights are out.
Re:The problem with SCADA systems by IncohereD · 2004-04-10 17:24 · Score: 1

"but language is about communicating an idea and the point got across so it is considered acceptable"

See, this is the REAL reason there's hundreds/thousands of languages on earth - people don't give a shit about getting things right, because 'i got my point across'. Over a few generations the language you're speaking bears no resemblance to the original, and the one on the other side of the mountains has diverged entirely differently to the point where the two are mutually unrecognizable.

Language IS about communicating ideas. But you can only communicate with someone if you're speaking the same language. So have respect for the language, and get it right.
Re:The problem with SCADA systems by miu · 2004-04-10 18:18 · Score: 2, Interesting

When it comes to troubleshooting systems you always have the option of making an exact scale model. You scale it up for more precision. This is a simple concept and apparently a lot of people think just because a system is complex and antiquated the same ideas can't apply.
Even if you could create a model to test with that is identical to the live system you cannot test every possible situation which can occur in the real world. Integration testing can only test those things which can be envisioned by those responsible for testing.
You absolutely do the best testing you can, unit test every piece of functionality, test subsystems and whole systems in integration testing, but you will never test every single possibility. The more complex (and antiquated) the system, the greater the number of interactions, and the greater the potential for bugs. I'm convinced that there are bugs lurking in every piece of hardware and software I use, the conditions under which those bugs manifest may have never occurred, but they are there.
I'm not fatalistic about software quality, and I don't disagree that we need to test better, but complexity to testing difficulty is not linear and I dislike seeing it trivialized. People who underestimate the difference between a system with 100 parts and 1000 parts are in for a rough time.

--

[Set Cain on fire and steal his lute.]
Re:The problem with SCADA systems by Vancorps · 2004-04-10 20:21 · Score: 1

I said nothing wrong, the meaning of my words used all english words and meant the same thing. We have hundreds, even thousands of ways of expressing an idea in english. Because it is not a common phrase or an expected phrasing doesn't make it wrong. The meaning of all the words in my sentence were correct. No words were made up, not definitions were changed, it just didn't mention a word in the manner is which was expected so people are deeming it wrong.
There is nothing wrong with the phrasing. This is not why there are so many languages on earth. There is a reason I can understand a person that speaks english in Kenya and England.
Re:The problem with SCADA systems by Vancorps · 2004-04-10 20:26 · Score: 1

I'm sorry if you think I trivialize the issue when in fact I think it is one of the most important steps to creating software that people are going to rely on. If its a chat client it obviously doesn't need the same level of testing but when it comes to the power grid there needs to be constant testing so that you spot problems before they become an issue. I don't believe any of this testing was done after the control systems were installed and verified. Security is a process not a product so I believe software should continually be tested.
That said, it is possible to create a model of the power grid that will behave exactly the same. The real problem is the fact that all power grids are not created equal. Some were built 80 years ago, some 5 years and so the technology needs to be tested better to ensure that the system can properly interact.
Re:The problem with SCADA systems by EddWo · 2004-04-10 23:36 · Score: 1

exactly

right it as thay here it

--
"Taligent is still pure vapor. Maybe they'll be the last who jumps up on Openstep... "
Re:The problem with SCADA systems by EddWo · 2004-04-10 23:44 · Score: 1

well "intensive purposes" does not actually mean the same thing as "intents and purposes"

By saying it out loud we can recognise that what you meant was not actually what you wrote, but that does not make what your wrote mean the same as what you think it does.

"intensive purposes" could be seen as meaning something like "the same thing in extreme circumstances"
whereas "intents and purposes" is more like "practically the same thing in all circumstances"

--
"Taligent is still pure vapor. Maybe they'll be the last who jumps up on Openstep... "
Re:The problem with SCADA systems by Jussi+K.+Kojootti · 2004-04-11 00:08 · Score: 1

That said, it is possible to create a model of the power grid that will behave exactly the same.
Yeah right, and the operator is Laplace's demon. It is trivializing to think that a 100% accurate test environment could be built for a system as complex as a power grid...
Re:The problem with SCADA systems by Vancorps · 2004-04-11 06:31 · Score: 1

So that means we shouldn't strive for 99% and work up to 100%? I'm seriously, did you honestly think that was a firm number? It doesn't have to start out that way, but you should always work towards it.
To just assume that you can never reach 100% accurasy is just plain naive, modifying the tests scenarios and expanding the test setup for more precision will eventually create an environment that will duplicate real life.
It sure sounds like you are trivializing the importance of testing complex systems. To just accept defeat won't get anyone anywhere. If they have to create a scale model of NYC 50 miles wide they could create a duplicate environment. Hell, they could just hook up a city to a test grid and offer real cheap electricity with that caveat.
Re:The problem with SCADA systems by Vancorps · 2004-04-11 06:34 · Score: 1

Or I could have meant what I wrote and you're assumption was flawed. I could have meant by testing extremes you will see the worst scenarios. In the case of the power grid that would be a power failure which last I checked doesn't happen under any normal condition.
Re:The problem with SCADA systems by EddWo · 2004-04-11 07:38 · Score: 1

"create a test environment that is for all intensive purposes "live" "

definately seems more like you were trying to say "practically the same thing in all circumstances" than "the same only in extreme circumstances" to me, why would it make sense to create a test environment that was not generally representative of the system you were trying to simulate?

--
"Taligent is still pure vapor. Maybe they'll be the last who jumps up on Openstep... "
Re:The problem with SCADA systems by Vancorps · 2004-04-11 22:24 · Score: 1

That depends on what you are trying to simulate. If you want to know how a system deals with failure then model that doesn't break is very useful
Re:The problem with SCADA systems by Anonymous Coward · 2004-04-12 01:44 · Score: 0

Why is this moderated up? It's a blatant copy from a post under an earlier article. It even copied the typo in "SecurtyFocus". Christ, it even copied the original poster's name "Tim" at the end.

Software bug was just one part of bigger problem by bonnyman · 2004-04-10 07:04 · Score: 5, Informative

The software bug was just one piece of a much bigger problem; I wouldn't want to overstate its' role. There were many other factors; here are just a few:

Poor vegetation management probably played an even bigger role as overloaded power lines warmed up, expanded and sagged into trees and bushes that were supposed to have been cut back.

Poor communications between utilities played a major role.

This whole section of the transmission system was known to be unstable.

An inadequate regulatory structure lacked teeth to deal with known problems.

Lack of adequate transmission line capacity

If all these other problems hadn't been in place, the software bug might never have surfaced. And certainly, the rpoblems would have been contained within a much smaller area -- maybe just First Energy's service area.

An article featured on Slashdot last year lays out the underlying complexity of the power grid very well: "The World's Largest Machine"

--

Al Bonnyman
Community Broadband Networks

strange! Not GE?! by Anonymous Coward · 2004-04-10 07:05 · Score: 0

Ths is really strange, since GE is one of those companies that is really high on Quality. Their products are absolutly trustable. The Six sigma focus at GE is famous. GE's jet engines apparently are 12 sigma.

Well if you've got no warning... by mindless4210 · 2004-04-10 07:07 · Score: 2, Insightful

how can you respond to an incident? It just goes to show the need for multiple monitoring systems in mission critical systems.

--
Wireless News www.DailyWireless

Re:Software bug was just one part of bigger proble by Raindance · 2004-04-10 07:07 · Score: 4, Interesting

I agree that there's more to this than just one line of code, as some folks seem to believe- I think referring to it as 'one bug' is rather misleading.

As well refer to the things leading up to WWII as 'one problem'.

For the 21st century... by Anonymous Coward · 2004-04-10 07:10 · Score: 3, Funny

If a bug exists in the code, but it's never triggered, is it really a bug?

Re:For the 21st century... by Raven42rac · 2004-04-10 07:14 · Score: 4, Insightful

Yes, yes it is. If a mime gets hit by a tree in the forest, does anyone care? Sometimes, no matter how much testing you do, shit just happens. It is a fact of life. Show me one perfect, bug free, piece of software. Stuff breaks all the time, we only notice it when it affects us. We take for granted sometimes how good we have it. Power in this country is extremely reliable. We act as if a bomb dropped when the power goes out. Some parts of the world do not have power, clean water, etc. We should think of that before we start whining about having to actually talk to each other, use candles, read books, etc.

--
I hate sigs.
Re:For the 21st century... by Vancorps · 2004-04-10 07:22 · Score: 1

Those parts of the world don't rely on power for virtually every aspect of their lives. Electricity is used everywhere, its is like a bomb dropped if power goes out in the U.S.
Many businesses can't function, businesses of all types from banks to some hotels, to retailers, printing presses, the list goes on.
We are very fortunate to have a power grid as stable as it is. For the most part things do just work, although there is no telling how much damage is done every year to electronics because someone turned a tv on and the resulting spike gives ICs the twitch.
Re:For the 21st century... by timeOday · 2004-04-10 07:31 · Score: 2, Interesting

I suppose one silver lining in having an outage once a year or so is that it forces us to keep backup systems for hospitals etc in place. If we only lost power once every 10 years, probably nobody at the hospital would even know what to do when power was lost, and people could die. It's just so hard to keep a backup system maintained and working if you are never forced to really use it once in a while. Like planning ahead for a weeklong camping trip, if you don't work up to it by taking shorter trips your chances of being fully prepared are nigh on 0%.
Re:For the 21st century... by evilviper · 2004-04-10 08:15 · Score: 4, Insightful

Power in this country is extremely reliable.

Actually, that's statistically untrue. We have, perhaps, the least reliable power system in all the countries of the first-world. Sure, 3rd-world countries have worse-off power systems, but the comparison isn't valid at all.

Some parts of the world do not have power, clean water, etc. We should think of that before we start whining about having to actually talk to each other, use candles, read books, etc.

Since when does the hardship of others make an unreliable power system a plus? Some places may be worse, but so what? We pay a lot for power, and expect our money is being spent on making sure we DO NOT have many outages.

Meanwhile, in California, prices are high, and power was VERY unreliable. "Rolling Blackouts" anyone?

My point is this. If something is broken, we want to fix it. We don't want to sit around saying "Well, it isn't as broke as that one". If we do, pretty soon it will get worse, and worse, and worse, until we have no other countries to point at.

How about our medical system, and water utility? Should we accept thousands of deaths due to malpractice, or contaminated water, by just saying "Well, it's not as bad as country XYZ"? No, I don't think anyone would believe that, but it's really the same thing. Power outages do mean deaths, and do mean losses of lots of money. Businesses can't run, food can't be properly preserved, or even delivered. People die of heat-stroke, or hypothermia due to power loss. Ambulances can't get through dense traffic caused by traffic signals loosing power, etc.

A power outage is a lot more serious than people "whining" about not being able to watch TV... And yet you get moderated up anyhow... Amazing.

--
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
Re:For the 21st century... by Fratz · 2004-04-10 08:27 · Score: 1

Sometimes, no matter how much testing you do, shit just happens. It is a fact of life. Show me one perfect, bug free, piece of software. Stuff breaks all the time, we only notice it when it affects us.
Next time you're in a hospital and there's some embedded pumping you full of [radiation | medication | oxygen] you'd better pray the developers of those systems didn't have the same viewpoint.

--
-- Fratz, human
Re:For the 21st century... by ummit · 2004-04-10 09:29 · Score: 1

If a bug exists in the code, but it's never triggered, is it really a bug?
You may have been joking, but in case someone takes you seriously, I'll respond seriously.
Yes, a bug that has not been triggered is absolutely a bug. Real-world software gets used under evolving circumstances, and ought to be able to respond at least somewhat robustly to circumstances beyond those for which it was explicitly tested.
If you believe that bugs that haven't shown themselves don't exist, it's easy to get into a sloppy mindset where you code for the end of the day, relying on your QA department to catch your mistakes, and imagining that the code is somehow "perfect" if it passes its test cases. (But what about the test cases that the QA department hasn't written yet?)
If, on the other hand, you can figure out ways of writing code that's initially and inherently bug-free, not because someone found and reported each bug and you fixed it, but rather because you were careful right from square one, and imagined and coded for all the cases that could come up (not just the ones that seemed important), you'll end up with code that is robust, that doesn't fall over every time something unexpected happens...
Don't let anyone tell you that this is impossible -- it can be done, and with practice it's even easy.
Re:For the 21st century... by Mark_in_Brazil · 2004-04-10 10:07 · Score: 4, Informative

Meanwhile, in California, prices are high, and power was VERY unreliable. "Rolling Blackouts" anyone?
Good point. I live in Brazil, and there's a real sick tendency among people here to kiss American ass and fantasize that the United States are a place where everything works perfectly and nobody has to pay for anything. When they do that, I chuckle and point out things like the difference in the electrical power systems in the two countries.
NOTE: I AM NOT SAYING BRAZIL IS BETTER THAN THE USA... JUST THAT IT'S NOT WORSE EITHER.
Brazil's electrical power, as of 2001, was about 97% hydroelectric. Because of years of below-average rainfall, this system was threatened, and in 2001, we were told there might be "rolling blackouts" here (except that the Brazilian government, unlike the US government, was honest enough to call it what it was: power rationing). We ended up not getting any "rolling blackouts," and a regression toward the mean in rainfall has left us sufficiently well off that we don't even have to use the new polluting thermo plants that were built around the time of the crisis. Electrical power here is cheap and reliable, especially compared to places like California, where a lot of my friends had to endure "rolling blackouts" because the folks at the deregulated power companies decided to put more money on their bottom line by not investing in infrastructure upgrades and maintenance. So the execs who made those decisions increased profits in the short term, increasing their bonuses and the value of their stock. When the $#!+ hit the fan, guess who had to pay, both in damages from "rolling blackouts" and in higher rates? The consumers, of course!
The only power problems I've had here in São Paulo were a neighborhood issue, not a city-wide, state-wide, or nation-wide problem. Basically, the new condo across the street overloaded the local grid 3 times in a 2-week span. The worst thing is that the new condo has its own generator, so the newcomers would knock out the neighborhood power and then not even notice, because their generator kicked in. Meanwhile, those of us who had already been in the neighborhood were screwed. Even those problems have been resolved, though. With even more people moving into the new condo, it's been about 6 weeks since we had a problem. The power companies here are pretty efficient. Yeah, I'd have liked for somebody to stop people from moving into the new condo until the local power grid was adequately updated, but they responded pretty quickly once the problem did present itself in an inconvenient way.

--Mark

--
"It is nice to know that the computer understands the problem. But I would like to understand it too." --Eugene Wigner
Re:For the 21st century... by evilviper · 2004-04-10 10:49 · Score: 1

I live in Brazil, and there's a real sick tendency among people here to kiss American ass and fantasize that the United States are a place where everything works perfectly and nobody has to pay for anything.

I'm sure it's not just Brazil.

In the US, the big thing we have going for us is a lot of competition between companies. That means a lot of choice in products, and low prices. However, there are a lot of exceptions to that, and we are completely screwed when it comes to anything monopolized, or government controlled.

Gas prices are sky-rocketing for absolutely no reason, and the government seems completely uninterested in even investigating. Prices are now about $2.30/gallon in this area of California, which is amazingly high. I know other countries are worse off, but it's getting pretty bad here.

Local governments are pretty much useless. Roads are in poor shape, and not getting fixed for many years. In fact, the major road aound here was ripped up by construction workers, and then they left it that way, unfinshed. Not a real hardship all-in-all, but it just points out the fact that the local government is completely incapable of getting the simplest task done properly.

What I find even worse is public safety. Here in California, we have earthquakes, fires, mudslides, floods, etc. There has never been anything done to prevent huge forrest fires (and it just recently caught up with us), and nothing is done to prevent the numerous mudslides, even though both have trivially easy solutions, the city and state governments are completely useless to do anything about them.

What makes things worse (and causes fire and flood problems) is that they don't do their job in telling people where they can't build homes. You can get permission to build a house feet from the edge of a river that constantly floods, even though anyone in their right mind knows it will be seriously water damaged, if not washed away in less than 5 years. The same is said for homes built near forrests, prone to fires.

I wish it was just that our local government was incompotent, but I see homes being built where they shouldn't be, all over the country. Homes built on the banks of the Mississippi river are guaranteed to get flooded within a few years.

It's to the point that I don't know what good having a city/county/state government is. They take a huge chunk of our money, and provide us with practically nothing for it. Worst yet, our education system is worse than just about anywhere else in the world. It's gone so far that our schools have de-evolved into mandatory 6 hour per-day prisons for kids until they are 18 years old. It's even becomming publically acknowledeged that that's all they are.

About a year efore he became govenor, Schwarzenegger was supporting a bill that would increase funding for afterschool programs. None of the ads about it talked about how good it would be for the children. They all talked about how the hour after schools let out is the time with the highest instances of crimes, essentially saying that children should be imprisioned for an extra hour every day.

Anyhow, that's the end of this rant. I hope I've provided plenty of ammo for the next time you hear someone talk about how great the US is.

--
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
Re:For the 21st century... by Anonymous Coward · 2004-04-10 12:27 · Score: 0

Ah Brazil, the best looking trannies in the world! What is it with Brazil and fiiiiine trannies? Can someone explain that to me?
Re:For the 21st century... by DerekLyons · 2004-04-10 12:40 · Score: 1

Meanwhile, in California, prices are high, and power was VERY unreliable. "Rolling Blackouts" anyone?
Sorry, but those blackouts have nothing to do with low reliability, and everything to do with lack of capacity. Nothing in the system was broken, nothing in the sysem failed, there simply wasn't enough power.
Lack of capacity isn't lack of reliability.
Re:For the 21st century... by evilviper · 2004-04-10 12:52 · Score: 1

Lack of capacity isn't lack of reliability.

Yes, it is. In fact, lack of capacity is a CAUSE of the lace of reliability.

If there is not power to your wall, the power system has failed. It doesn't matter how reliable the individual power plants are, we are talking about overall grid reliability.

--
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
Re:For the 21st century... by Raven42rac · 2004-04-10 14:04 · Score: 1

Don't pull a "what about the children?!?!" on me. I tend to not worry about things. People who expect perfection are likely to be disappointed. Am I a medical software developer? No. If I were I would not hold the same viewpoint. I would strive to make the best code possible, but it is not possible to test for every contingency. Expecting perfection from everything is insane. As an atheist, I do not pray that often. I know it is just a figure of speech though.

--
I hate sigs.
Re:For the 21st century... by /dev/trash · 2004-04-10 14:11 · Score: 1

Power is reliable in this country? Hmm, then the UPS Home Office market must be a myth.

I have two units myself.
Re:For the 21st century... by Raven42rac · 2004-04-10 14:22 · Score: 1

Just because there is a market for contingency does not denote unreliability. How do you define reliability? Power going out once a day, week, month, year? Car batteries rarely die, but does that mean that a set of jumper cables is a vote of no confidence? No, it means you are prepared for contingency.

--
I hate sigs.
Re:For the 21st century... by /dev/trash · 2004-04-10 14:50 · Score: 1

Well, I bought my UPSes mainly to have a cleaner for steady supply of power to things. Brownouts are hell on electronics.
Re:For the 21st century... by ealar+dlanvuli · 2004-04-10 17:56 · Score: 1

Yet we accept $130 office visits so the doctors can afford malpractice insurance.

Pick a sane example, thanks.

--
I live in a giant bucket.
Re:For the 21st century... by Anonymous Coward · 2004-04-10 18:00 · Score: 0

I wish it was just that our local government was incompotent, but I see homes being built where they shouldn't be, all over the country. Homes built on the banks of the Mississippi river are guaranteed to get flooded within a few years.

Sort of. Most larger municipalities won't grant building permits unless your behind a levi (sp?) or on a bluff.

Thankfully for us the Mississippi left a lot of bluffs =p.
Re:For the 21st century... by Ben+Hutchings · 2004-04-11 06:08 · Score: 1

There was no shortage of power, but Enron had cornered the market and was scamming CA.

You know... by Steamhead · 2004-04-10 07:11 · Score: 0

Not everything is pheasibly preventable, with something happening first. Honestly people are people and they might overlook something.

B Method? by starseeker · 2004-04-10 07:12 · Score: 5, Interesting

"the bug was unmasked as a particularly subtle incarnation of a common programming error called a "race condition," triggered on August 14th by a perfect storm of events and alarm conditions on the equipment being monitored. The bug had a window of opportunity measured in milliseconds. "

Isn't this the type of problem the B Method (and maybe the Z language too) are designed to address? Use proof logic initially - once you have decided on a behavior you want, design the system in such a way that it is provable it executes this design.

That doesn't mean the DESIGN is flawless, of course. But if we start engineering software on as many levels as we can, mightn't things improve? Normal software development and testing would never have found a critical bug with rare trigger conditions and a millisecond window. If you need precision on that level, you need to (for starters) to KNOW your implimentation of your design is sound, and preferably the code you are running exactly impliments the proven logic. Isn't this what the B Method was created for?

--
"I object to doing things that computers can do." -- Olin Shivers, lispers.org

Re:B Method? by mccalli · 2004-04-10 07:38 · Score: 4, Interesting

Isn't this the type of problem the B Method (and maybe the Z language too) are designed to address? Use proof logic initially - once you have decided on a behavior you want, design the system in such a way that it is provable it executes this design.
Ye gods, you've frightened the hell out of me with reference to Z. I'd almost entirely forgotten it, and had hoped its cold corpse would lie in the ground undisturbed, undiscovered and most importantly of all unreferenced until the end of time. Still, "That is not dead which may eternal lie"...
Z is a beautiful way to mathematically prove that you have design bugs at the highest level possible. You can then design your unit tests around those bugs, and confirm that they're valid.
That's it. It provides nothing else that unit testing on its own couldn't do, with the exception of a few salaries and a research grant here and there. Whilst you can mathematically prove implementations of certain designs, the vast majority of designs have more complex interactions. Try using Z for a multithreaded real-time environment for example - my Software Engineering tutor at the time, Iain Sommerville (well known in the field due to his books, oh and 'at the time' would ~1993), basically said that Z just breaks down in those circumstances. I wouldn't know - I personally had no clue how to even make it begin in those circumstances, let alone break down.
Please confine Z to camp-fire ghost stories used to scare new programmers. It always was a living hell, and it really shouldn't be resurrected now.
Cheers,
Ian
Re:B Method? by Orne · 2004-04-10 07:39 · Score: 4, Interesting

SCADA systems transport data samples. My company's system collects from several hundred thousands of meters, about half of which are expected to send in a sample about once every 10 seconds, some as fast as once every two seconds. The concept is that you have a communications buffer that collects the data, the link writes to the memory while the other EMS applications (about a dozen) read from the memory.

Now admittedly, FirstEnergy's system is a little smaller in territory, but I wonder if their mergers over the recent years (Cleveland Electric and Ohio Edison became FE, and then proceeded to take Toledo Edison and GPU of PA) have outpaced the collection capabilities of their mainframe (which was already at the end of its life and was scheduled to be replaced). That could account for some of the "slowing" that the G.E. testers said they had to do to make the race condition appear.
Re:B Method? by Mr.+Slippery · 2004-04-10 08:02 · Score: 2, Interesting

Use proof logic initially - once you have decided on a behavior you want, design the system in such a way that it is provable it executes this design.

Problem is, doing and verifying proofs is just as subject to error as creating and reviewing code. All you've really done is change your symbol set.

--
Tom Swiss | the infamous tms | my blog
You cannot wash away blood with blood
Re:B Method? by Anonymous Coward · 2004-04-10 09:00 · Score: 0

Use proof logic initially

How do you know that the proof was done correctly? Who tests it? How do you know you're not missing a test case, or have one incorrect?

A proof is more or less the same thing as a program: a series of steps in a formal language that transforms some input into some output. However, the steps in the proof tend to be primitive, very low-level. It's like rewriting your program in assembler to gives you confidence that the high-level version is correct.

It's generally harder to get the proof right than the program in the first place.
Re:B Method? by bruthasj · 2004-04-10 19:47 · Score: 2, Insightful

design the system in such a way that it is provable it executes this design

Unfortunately, if you actually come out of the library and the computer labs, software has to be done -- yesterday. Flawless, provable code would cause most software houses to go bankrupt. It's a fact of life...
Re:B Method? by Roydd+McWilson · 2004-04-10 21:49 · Score: 1

No, the whole idea of a proof is that it is a statement about something which can be verified as correct using a very simple algorithm. The problem is that the specification, to which you are proving your implementation conforms, may itself have bugs.

--
THE NERD IS THE COMPUTER.
Re:B Method? by starseeker · 2004-04-13 07:28 · Score: 1

Possibly true. Which is why I view this type of development as a perfect way for open source to do things - we might potentially achieve a level of software quality impossible in a commercial environment. Or, alternatively, perhaps someone like IBM could sponsor some high powered developers to develop a general purpose OS on such principles, viewing it as a long term investment. If they succeed and create a Linux replacement that just doesn't have software problems below the user application level, imagine the $$ IBM could make. They could sell hardware running this proven, open system where they've also proven the hardware and drivers work perfectly with the system, and put IBM level support behind the hardware itself. Other people would start to use the OS since it would be of high quality, and as businesses look around for the most reliable solution they would see IBM, who developed it, is also selling hardware proven to work well with it. Ka-ching.

Maybe it would work, maybe not. But as software becomes more critical, it becomes more and more worthwhile to Do It Right The First Time. Because as Windows has clearly proven, people don't always update their systems and if they don't they can hurt the network as well as themselves.

--
"I object to doing things that computers can do." -- Olin Shivers, lispers.org

I don't trust this Mike Unum guy... by JessLeah · 2004-04-10 07:13 · Score: 1, Funny

...I don't know him from a hole in the wall. But his cousin, E. Pluribus Unum.... that guy, I trust. :)

--
Honey, I shrunk the Cygwin

The American jackasses who blamed Canada by Kevin+Mitnick · 2004-04-10 07:14 · Score: 5, Interesting

Did anyone ever retract their statements? I know the NY Mayor was pretty quick to blame us Canucks.

Re:The American jackasses who blamed Canada by Scrameustache · 2004-04-10 07:56 · Score: 1

Blame Canada, Blame Canada!

With their beedy lil' eyes and flapping heads so full of lies!

Xenophobic, yes.

"In the initial stages, nobody really knew what the root cause was,"
But you know, there are freedom canadians! And they put gravy and cheese on their freedom fries, those foreign weirdos...

--
You can't take the sky from me...
Re:The American jackasses who blamed Canada by spinkham · 2004-04-10 08:11 · Score: 3, Funny

We blame you, you blame the Newfies. It's the pecking order around here, deal with it ;-)

--
Blessed are the pessimists, for they have made backups.

Oh come on... by Anonymous Coward · 2004-04-10 07:18 · Score: 1, Funny

We all know it was Microsofts fault, this is Slashdot remember? The Blaster Worm?

Oh wait..

Re:Oh come on... by xenoandroid · 2004-04-10 12:33 · Score: 1

AC is right! All those damn windows machines overloading the power grid.

Makes me wonder by Killjoy_NL · 2004-04-10 07:21 · Score: 1

If they did all this testing and this bug didn't show up, it makes me wonder how many killer bugs are still in there.

--
This is the sig that says NI (again)

Canada has a history of bad grid control by Orne · 2004-04-10 07:23 · Score: 2, Informative

From the perspective of New York, they saw a surge race through their system East to West, through the choke point into Canada at Niagra station. NY constantly has problems with IMO not following schedules, and from their perspective, this was yet another incident of bad reliability control across the border.

What they didnt know is that the energy was routed through the southern bit of Canada along the lake area, back into the USA in Michigan, to feed all of the communities along the southern shores of the great lakes. The reason this happened is that the coastal towns became electrically isolated from southern ohio because of failures in FirstEnergy territory. I don't think to this day FE has accepted full responsibility for their roles in the failures, something I think should be done with a good house-clearing in their company...

Re:Canada has a history of bad grid control by Anonymous Coward · 2004-04-10 15:50 · Score: 0

You have presented no proof of any kind to back up your ridiculous assertion about Canada's power grid being faulty. Slashdot supports hyperlinks - why don't you use one to prove the nonsense you are spouting? But you won't because you're just a clown.

Testing isn't the answer... by evilviper · 2004-04-10 07:25 · Score: 5, Insightful

You can't expect just testing to reveal all bugs in a program. Even a simple program would have to be fed completely random data constantly, in every different order and circumstance concievable, for a very long time, to reveal all bugs. That's just not a real option.

The only way to have bug-free software is to write it properly. You have to modularize and simplify everything down to the point that each one is easilly understandable, and it is easy to detect when one is providing a sensless answer (in other words, cross-checking every result). Then, you have to tie them all together in a robust but simple way.

I know it's far easier to say it than do it, but it seems like nobody even tries to do it these days. Even mission-critical systems are commonly built as a single monolithic program, and when you have a lot of things going on within a single program, with no checks of the sanity of the data going into or comming out of each component, there is no way to be 100% certain that the program is theoretically and genuinely perfect. Meanwhile, by modularizing everything, you can PROVE that it is actually perfect.

But this is really just the old Macrokernel vs. Microkernel arguement all over again. A Microkernel can be perfect, while a macrokernel can never be completely bug-free, but people just find the latter to be easier to write, and then spend hundreds times more man-hours finding and removing bugs, rather than spending (less, overall) time doing it correctly in the first place.

Oh yes, almost forgot, IMHO...

--
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant

Re:Testing isn't the answer... by Grimmtooth · 2004-04-10 08:18 · Score: 2, Insightful

Your comments remind me of an old QA maxim: "We can only prove the existance of bugs - not the absense of them."

You invoke the magic buzzwords of "modular design" as if it were a new thing. It isn't. That concept is older - in practice, even - than the median user on /.. Edsger W. Dijkstra was one of the earliest proponents of such coding practices - you can find archives of his papers HERE and see for yourself.

Magic buzzwords can't prevent defects from occurring. QA can't find them all, no matter the budget or amount of time they spend on it. You can only minimize the effects of bugs and put procedures in place to deal with them, programatically and non-programatically.

"Our software contains no known undetected bugs."

--
/* .sigs are irrelevant */
Re:Testing isn't the answer... by Kirill+Lokshin · 2004-04-10 09:45 · Score: 2, Insightful

Meanwhile, by modularizing everything, you can PROVE that it is actually perfect.

Umm, no. Modular design is great for theoretical process correctness, i.e. if a certain input is made to the running program, will it provably produce a certain output. The main problem with this, of course, is it assumes that the program is physically running the whole time.

The systems (I assume) are being used here have to deal with more ephemeral and unpredictable conditions: failing hardware, CPUs going offline in the middle of instructions, random electric interference, etc. The main issue is that the program may not be able to run in its original state, and attempting to recover usually deals with potential data loss, which prevents good theoretical proofs.
Re:Testing isn't the answer... by ummit · 2004-04-10 10:00 · Score: 1

You can't expect just testing to reveal all bugs...
The only way to have bug-free software is to write it properly. You have to... simplify everything down to [be] easily understandable...
I know it's far easier to say it than do it, but it seems like nobody even tries to do it these days.
Hear, hear.
The current "state of the art" in software is pathetic, and unfortunately the vast majority of computer users (including, it appears, users of systems that ought to be safety-critical) have become so inured to buggy systems that they don't demand anything better, and few vendors are therefore motivated to try to do better. And yet the techniques for writing reliable, robust software have been known for decades -- but they're never as fashionable as the latest whiz-bang methodology du jour that lets you bang out a huge volume of seemingly-working code in a hurry.
Re:Testing isn't the answer... by evilviper · 2004-04-10 10:06 · Score: 1

"We can only prove the existance of bugs - not the absense of them."

Fortunately, that's not true. Unfortunately, nobody seems to even try to write provably bug-free code.

You see, everything dealing with computers in math. Everything that happens can be simplified down to a binary-base math problem. In math (unlike the real world) you CAN prove that something is perfect.

For instance, it was a while back that a /. story touted the first provably unbreakable encryption method. Now, it's not a method that is really feasable currently, but it is provably perfect. Programs can be the same.

You invoke the magic buzzwords of "modular design" as if it were a new thing.

Now that's just wrong. I didn't pretend that they are new... In fact I pointed out that it's like the old microkernel vs. macrokernel arguement all over again. Did you skip over that part, or do you not have any idea what a microkernel is?

QA can't find them all, no matter the budget or amount of time they spend on it.

That was exactly the point of my post.

You can only minimize the effects of bugs and put procedures in place to deal with them, programatically and non-programatically.

Again, that is almost exactly the point of my post. Through simplification, modularization, and cross-checks, you CAN stop a faulty result (presumably from a bug) from being propogated to the next step in a calculation, thereby preventing a bug from causing an incorrect result.

Logic errors can be prevented. Buffer overflows can be stopped. Race conditions can be avoided. Programs CAN BE perfect.

--
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
Re:Testing isn't the answer... by fermion · 2004-04-10 10:45 · Score: 1

Absolutely, it is a design issue. It is putting all rules in a well known location and everyone uses those rules to access the data. There is no sneaking around and looking at or changing data outside of the agreed upon rules. If the rules are implemented as perfectly as possible in one place, then we are halfwa to good code.
When we wrote in Fortan and C and the other old style languages, we talked about modules and minimizing the coupling, or how much one module knew about the other modules. For example, there is no reason for modules other than the one that accesses the data to know exactly how the data is stored or what rules need to be followed to safely and securely access the data. As a matter of fact, it would be foolish to assume that the other modules could know, and even if they did, could implement the rules perfectly in multiple functions.
Nowadays we admit that programmers are sneaky and will not follow the agreed upon rules. They will in fact get pompous and creative and try to re-implement the rules in new and unique ways. So we encapsulate data under increasingly complex layers of abstraction and hope that no one looks in the include files to figure out what the original structure is. It still is an honor system, but at least now any violations can be considered willful and malicious.

--
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
Re:Testing isn't the answer... by just+fiddling+around · 2004-04-10 17:50 · Score: 1

I have no idea of the length or breadth of your experience in the field, but NOBODY writes monolithic programs anymore.

Functions are mandatory. Functions break down a program's task(s) in smaller chunks.

Projects which are implemented by a team of more than one programmer cannot be monolithic, for reasons which are self-evident.

There, you have it: two reasons why you stated a non-problem. Even if a system is broken down into perfect minimal functional units, bugs happen.

Non-trivial systems have so many parts interacting together that the interactions almost always are the source of bugs; some interactions are not predicted or expected by the designers of programmers. That is why nobody has a "silver bullet" to prevent bugs in non-trivial systems. Some day, a Nobel prize will be awarded to somebody who solves a part of this problem.

--
You're not old until regret takes the place of your dreams.

Reasons for power blackouts by pcraven · 2004-04-10 07:27 · Score: 4, Interesting

I've been reading several papers on this for a grad class I'm taking. One of the several problems is no government control. If a power outage might be prevented by shedding some load (turning out power to some people), no company wants to step up to the plate and be the one to turn out the power to their customers. So they luck out, or they have a massive power outage.

This paper (click on the PDF link) has a good summary of the problems in keeping power outages from happening again.

Re:Reasons for power blackouts by ctr2sprt · 2004-04-10 08:06 · Score: 1

I'm not really seeing where government control would change that. If they were quicker to pull the trigger and cut power to 100,000 homes, we'd just be seeing that every 3 months as soon as anything trivial went wrong. And because it's the government, there's nothing you can do about it.
No, I'm not ready to give up on an industry that, so far, is so exceptionally reliable that most people are without electricity for maybe 5 hours out of the year. We get excited just for approaching that level of reliability in computers.
Re:Reasons for power blackouts by mc6809e · 2004-04-10 08:18 · Score: 1

I'm not really seeing where government control would change that. If they were quicker to pull the trigger and cut power to 100,000 homes, we'd just be seeing that every 3 months as soon as anything trivial went wrong. And because it's the government, there's nothing you can do about it.
No, I'm not ready to give up on an industry that, so far, is so exceptionally reliable that most people are without electricity for maybe 5 hours out of the year. We get excited just for approaching that level of reliability in computers.

And don't forget how the cheap energy they produce literally saves lives.

Consider the heat-wave in Europe that killed thousands. The mean summer temperature in Paris is the same as Detroit, Chicago, and Denver and when heat-waves strike these cities, very few lives are lost.

Why? Because cheap energy makes things like air-conditioning affordable and people don't die. In places like France, it's just too expensive to have air-conditioning.
Re:Reasons for power blackouts by Orne · 2004-04-10 08:20 · Score: 1

I disagree, government is never the answer if you want something truely fixed. There are plenty of rules in place on how to maintain a reliable system, rules formed by the industry itself as "best practice" procedures; not to mention that there's already an alliance called NERC for US & Canada who's supposed to be managing it. A similar government commission FERC exists for setting USA policy only. Thirdly, there's another coallition called NAESB who sets the common standards for energy markets.

What doesn't exist is legally binding penalties on those who don't follow the "best practices" on how to run a control area. (Why can't we sue our utility company like we could any other private industry? Government.) Most of FirstEnergy's failures documented in the final report were not because there weren't any rules in place, it's that they weren't obeying the procedures already laid out; procedures that would have notified neighbors they were having issues, giving them time to rebalance the energy flows. This is a change that's been in the big "energy bill" for the last 4 years as the Senate sits and refuses to act on it, as the Democrats won't have anything to do with Republican proposed bills. The politicians have been arguing about Standard Market Design for 3 years, no progress. Private industry realized it needed common market rules for better efficiency and cost savings, so it's been implementing it themselves. If you leave things to the government, they argue and argue and nothing gets done.

The recent proposals, including the IEEE paper you link to, want to mandate additional collection equipment on every utility company in North America, so that one (government, of course) agency can collect all the data and have the big picture view. Well, in the next two years thanks to private industry advances brought on by deregulation, we may be down from hundreds to maybe 10 private institutions called Regional Transmission Organizations (RTOs) that will have the same big picture of vast swaths of the USA, with no government involvement whatsoever. That's the path I'd rather see.
Re:Reasons for power blackouts by kf4lhp · 2004-04-10 08:46 · Score: 1

My power, here in Tennessee, is supplied by the federal government - namely the Tennessee Valley Authority. Has been for all 24 years I've been alive. And it's damn reliable too. Granted, TVA doesn't actually sell me the power - they sell it to the Electric Power Board of Chattanooga, which is operated by the city, and they distribute power to the masses.

The number of outages I've experienced can be counted on one hand. Even with a tropical storm blowing through in 1996, a 3 foot blizzard in 1993, and sundry other storms and so forth, no outages. Only one of any significant length was due to a squirrel trying to nest in a transformer. Squirrels are highly conductive, they just burn out really quickly.

Another point to make about TVA is that their generating plants are scattered throughout the territory, instead of having them concentrated in certain locations, and they operate a wide mix of nuclear, coal, and hydro plants. Diversity is a good thing.

Most of the cooperatives around here are the same way. Low rates, high reliability and pretty good customer service. Taking the demand for profits out of the mix can never hurt the customers.
Re:Reasons for power blackouts by dabblah · 2004-04-10 09:04 · Score: 1

What exactly can the government control in this case? Are you proposing that the government establish and maintain SCADA and EMS systems for the purpose of continuous monitoring and backstop of the power grid? And if so, what are they going to do when a problem on the grid becomes unstable and explosive more than a control area?

The root of the problem in this blackout was massive human error, software bug or no.

Not to go too libertarian on you, but the soulution is strong contractual provisions among the interconnecting parties. Blythely saying "Government Fix This" isn't going to accomplish much to prevent anything, unless it can act in real time which nobody is proposing. The government basically already does what it needs to in requiring reserve margins of energy and transmission of regulated monopoly control areas.

Had MISO the authority to tell FirstEnergy to get their voltage under control, or to tell them they were out of N-1 compliance, and for either violation they would be penalized x thousand of dollars for every minute beyond z that they are out (where z is 30 for N-1, beyond some reasonable y for voltage, and where x is something on the order of 50000 say) FE would have gotten into shape. Alternatively, PJM, AEP, DLCO, MECS, and DPL could have provisions for not violating inadvertant scheduling of energy and reactive and they could seperate from FE for being out of compliance. FE would have blacked out, but better them alone than the entire northeast...
Re:Reasons for power blackouts by inKubus · 2004-04-10 15:20 · Score: 1

If a power outage might be prevented by shedding some load (turning out power to some people), no company wants to step up to the plate and be the one to turn out the power to their customers.

Maybe like a lottery system that gets drawn at the beginning of each day randomly and sent to each utility. Or a disk that gets sent ahead of time that has the rotation on it and an offset so you'd never need to do it again.

Basically, if the fit hits the shan then a notification goes out to every utility, they check their entry and if they're number 1, they turn off. No compromise.

Then at least it's fair for all the companies.

--
Cool! Amazing Toys.
Re:Reasons for power blackouts by Ben+Hutchings · 2004-04-11 05:40 · Score: 1

In places like France, it's just too expensive to have air-conditioning.

According to the US DoE the domestic electricity prices ($/kWh) in 2000 were 0.102 in France and 0.082 in the US, not counting state taxes in the latter case. That doesn't seem like a sufficiently large price difference to explain the lower use of AC in France.
Re:Reasons for power blackouts by mc6809e · 2004-04-11 14:03 · Score: 1

According to the US DoE the domestic electricity prices ($/kWh) in 2000 were 0.102 in France and 0.082 in the US, not counting state taxes in the latter case. That doesn't seem like a sufficiently large price difference to explain the lower use of AC in France.

But look back a little further. The price in 1995 was as high as $0.167/kWh. That's nearly twice as much as what the US was paying at the same time. Now I realize you might object that that was several years ago, but ask yourself this: how long does it take for people to respond to decreases in energy costs before they decide to buy an air-conditioner? 1 year, 2 years, 3 years? How long? A history of high energy prices probably discouraged buying air-conditioners even after prices began to decline.

In 1999 the price was still at $0.121/kWh. It's only come down recently because France has begun to liberalize energy production.

Still even with all that the price is about 25% higher. That might not seem like a lot, but keep in mind the per capita GDP of France is only about 71% that of the US, so there is less money to begin with.

Re:Software bug was just one part of bigger proble by Vancorps · 2004-04-10 07:27 · Score: 1, Flamebait

I believe you can trace it all to one problem. Lack of management...

Realistically none of these problems had to happen and wouldn't have happened if the people in charge were doing their jobs. Maybe they were working on a way to make cold fusion feasible, I don't know but if they were negligent then they need to be removed from their position. If they were just too busy with other aspects of the system then they need to bring more people in so the system can be properly maintained. A power outage is a big deal. Of course, one outage is hardly a trend so probably the whole thing is just blown out of proportion.

World's largest machine by stefanb · 2004-04-10 07:30 · Score: 4, Interesting

An article featured on Slashdot last year lays out the underlying complexity of the power grid very well: "The World's Largest Machine"

OK, it's nitpicking, but the largest machine is arguably the telephone system. Among other things, it maintains a synchronized clock (8 kHz base), even across oceans and continents.

Re:World's largest machine by Creepy+Crawler · 2004-04-10 08:01 · Score: 1

Well then, it's the Internet too. Tele lines are just data lines with splitters for a/d and d/a coxes for the "phone" part.
--
- Mod parent up! by Anonymous Coward (Score:1) Thurs, Nov 31, @13:37
Re:World's largest machine by Anonymous Coward · 2004-04-10 08:16 · Score: 1, Informative

Umm, actually, you can get sample slippage
when crossing clock domains in the telephone
network. Although it _appears_ to be completely
synchronized, it's really just that all the
different master clocks have really, really
tight tolerances.
Re:World's largest machine by the+unbeliever · 2004-04-10 08:47 · Score: 1

There was no concept of 'data lines' when most of the current telephone infrastructure was laid out.
Re:World's largest machine by Creepy+Crawler · 2004-04-10 11:54 · Score: 1

Stupid. They WERE data lines that sent analog voice data. That's all they could send.

When we went more digital, we assigned large optic ring networks (SONET) where pearts were for digital sending of analog and the others were for true digital data. Ever since the switch of ESS back in '87, we've been all using digital phone lines. They just happen to have a D/A hooked to them (thats what leads to our houses).
--
- Mod parent up! by Anonymous Coward (Score:1) Thurs, Nov 31, @13:37
Re:World's largest machine by IncohereD · 2004-04-10 17:28 · Score: 2, Insightful

Among other things, it maintains a synchronized clock (8 kHz base), even across oceans and continents.

It's actually plesiochronus, and only synchronized within certain (relatively large) regions. And I don't know where you're getting that 8 kHz figure from.

Basic relativity (not to mention propogation) will tell you that what you're describing is impossible.

Race conditions are nasty ... by cagle_.25 · 2004-04-10 07:36 · Score: 5, Insightful

As you programmers all know, avoiding race conditions is really difficult. The fellow Neumann quoted in the article who said

But Peter Neumann, principal scientist at SRI International and moderator of the Risks Digest, says that the root problem is that makers of critical systems aren't availing themselves of a large body of academic research into how to make software bulletproof.

is overly optimistic; it's theoretically impossible to write a general test to find all race conditions in code. This is a variant of the Halting Problem.

--
Human being (n.): A genetically human, genetically distinct, functioning organism.

Re:Race conditions are nasty ... by Animats · 2004-04-10 07:47 · Score: 2, Informative

it's theoretically impossible to write a general test to find all race conditions in code.
Baloney. It is possible to write programs for which race conditions are undecideable. Such programs are broken. It is possible to write programs for which race condition detection is NP-hard. Such programs are broken if N is large. It is also possible to write programs for which race conditions can be proven to be absent. That's what you want to do.
Actually, it's straightforward to design software to be free of race conditions on a single machine. You then have a deadlock avoidance problem, but deadlocks are easily detected when they occur.
Hardware is routinely designed to be free of race conditions, after all.
Re:Race conditions are nasty ... by Mr.+Slippery · 2004-04-10 08:10 · Score: 2, Insightful

it's theoretically impossible to write a general test to find all race conditions in code. This is a variant of the Halting Problem.

I doubt PGN was refering to software to test for race conditions; I expect he was alluding to methods for writing code that does not contain them. People have, after all, been thinking about Dining Philosophers for quite a while now, yet coders still do amazingly stupid things with threads.

--
Tom Swiss | the infamous tms | my blog
You cannot wash away blood with blood
Re:Race conditions are nasty ... by platipusrc · 2004-04-10 08:19 · Score: 3, Informative

how do you have a large nondeterministic?

hint: NP-hard is a problem that is NP-complete, or worse. An NP-hard problem does not have to be solvable. NP in this context stands for nondeterministic polynomial (with reference to time bounds). NP means that a problem can be solved in polynomial time with an infinitely parallel system. NP-complete problems are at least as hard as all other NP problems.

Sorry, it just bugs me whenever people try to talk about theory of CS and use "non-polynomial" or something else for NP.

--
And the muscular cyborg German dudes dance with sexy French Canadians
Re:Race conditions are nasty ... by ummit · 2004-04-10 09:52 · Score: 1

As you programmers all know, avoiding race conditions is really difficult.
I'm sorry, but this is just wrong. It's easy to avoid race conditions: don't write excessively complex, multithreaded systems that are too complicated for you or anyone else to understand. Unfortunately, it's also easy -- quite seductively easy -- to try to write excessively complex, multithreaded systems that are too complicated for you or anyone else to understand, and that's why so much code is so buggy. But it doesn't have to be that way.
"The fellow Neumann" isn't overly optimistic, he's dead-on accurate. Don't try to excuse GE's failings by claiming that "it's theoretically impossible" to find some bug. It's theoretically impossible for a bumblebee to fly, it's theoretically impossible for life to exist, yet somehow we and the birds and the bees are all here.
"People who believe a thing to be impossible should not stand in the way of those who are doing it." I fervently wish all the apologists for bad code would get out of the way, so that we could start raising the bar a little and get software out of the Dark Ages it's currently mired in. Again, it doesn't have to be that way.
Re:Race conditions are nasty ... by foosballhound · 2004-04-10 10:12 · Score: 1

actually, hardare isn't always free of lockups. anybody remember the 6502 HCF ("halt and catch ) fire" instruction? (one opcode was x02) look up on google "6502 HCF" one of the opcodes was 0x02, which locked up the processor so even NMI interrupts didn't work. some of the other HCF-type opcodes were more dramatic
Re:Race conditions are nasty ... by Anonymous Coward · 2004-04-10 12:34 · Score: 0

But Peter Neumann, principal scientist at SRI International and moderator of the Risks Digest, says that the root problem is that makers of critical systems aren't availing themselves of a large body of academic research into how to make software bulletproof.
It's precisely because its academic research that much of that large body isn't making its way into production sofware. Academics don't have to write code on a budget and a shedule, and don't really have a grip on what happens day-to-day. Much of that 'large body' falls into one of two categories; a) telling us to do what we are already doing, but do it 'better' or b) complex methodologies for proving and testing that are themselves prone to the very errors they are supposed to prevent (as well as being expensive as hell to implement).
Re:Race conditions are nasty ... by Tony-A · 2004-04-10 12:55 · Score: 1

It's easy to avoid race conditions:

Right. Just one step at a time.

Unfortunately, the real world is asynchronous and it doesn't really work to say "Stop the world, I've got some computing to do".

it's also easy -- quite seductively easy -- to try to write excessively complex, multithreaded systems that are too complicated for you or anyone else to understand,
You're right, but methinks you understate the case.
Re:Race conditions are nasty ... by ealar+dlanvuli · 2004-04-10 17:45 · Score: 1

How did this get modded informative? The guy made a simple capitalization error that took two seconds for anyone with a brain and a CS degree to fix in their head.

Hint: s/N/n/g

--
I live in a giant bucket.
Re:Race conditions are nasty ... by Animats · 2004-04-11 05:13 · Score: 1

Back then, people didn't have race condition detectors for VLSI designs.. The 6502 was a 1970s design. CAD barely existed. ICs were laid out using Kodalith and X-acto knives. There's been progress since then.
Re:Race conditions are nasty ... by foosballhound · 2004-04-11 07:04 · Score: 1

good points. One of the rate-limiting factors in computer science, IMHO, is that it's hard to make money on software checking tools. It's hard enough to get business to buy a $300 tool. Let alone something that costs millions (or $25,000, as the hw checking tool you mentioned costs). to spend real money, the bean counters need a spreadsheet model that basically says: "we spend $$$ and it saves us $$$$$$$" "so ok, that's a good business decision." in hardware, if the tool saves a single tape-out, that justifies a lot. in hardware, if the tool saves a single recall, that saves a lot. in hardware, if the tool saves a lawsuit, that saves a lot. in software, there's nothing like a tape-out, where some senior level person has to sign off on a big-ticket milestone. there's no recalls, to factor in as a cost. (this is getting worse, since business discovered that products could be updated over the net) there's no real legal liability to factor in as a cost. basicaly the software has to be checked manually for bugs. so the cost of finding the nth bug goes up rapidly, perhaps exponentially. however the COST of the nth bug goes down exponentially.
Re:Race conditions are nasty ... by platipusrc · 2004-04-11 19:03 · Score: 1

'n' doesn't matter much for NP-hard problems because anything larger than a trivial amount of input is going to take a very long time to run to completion, if it does so at all.

'n' is the size of the input, so it still isn't correct to use it in the sense that you're speaking of, unless you're trying to say that it's ok to have an NP-hard race condition determination if you have a smaller than 1000 line program? I'd say that in programs where race conditions will matter, there will almost always be a significant amount of code. Besides, there was redundancy in your GP's post. Undecidable problems are NP-hard.

--
And the muscular cyborg German dudes dance with sexy French Canadians

Software ENGINEERING by Anonymous Coward · 2004-04-10 07:39 · Score: 4, Interesting

If I want to build a large structure (bridge or building) where it is possible that public safety is at issue, I had better have an engineer's signature on the drawings.

This case seems like a real good argument for having the same requirement for software.

Good engineering practice would probably have prevented this. A simple example of such a system would be a burglar/fire alarm panel. The system is self-checking. If any part of the system isn't working (ie. someone cuts a wire), then that causes an alarm.

I realize that there will be strange undetectable bugs in software but if the system as a whole is properly engineered, the system will fail gracefully and safely.

Re:Software ENGINEERING by Orne · 2004-04-10 08:45 · Score: 3, Interesting

The two systems you describe are fundamentally different from the design of this alarming system. In fire or safety, the "reading" is the voltage of the closed loop wire itself; 12 volts connected, 0 volts open.

Now imagine if you have a layer in between; you want to monitor the fire status of a complex of warehouses from a single room several miles away. Analog/Digial the signals to all of the individual buildings, transport the data to a common computer, and view the data there. Figure you have several hundred buildings you're watching at once, and now you're getting closer in scale to how the grid dispatchers get their data.

Now imagine that the computer's software back at the main station reads all these meters, and if a line's open (say you're tracking window openings for security), it writes an alarm to a text log on the screen; on a good day, you don't get any alarms. Now suppose the driver that writes the alarms to the screen hangs; since you werent expecting any alarms, you're not that concerned that you aren't seeing anything. That's pretty much what caught FirstEnergy for those 3 hours that afternoon, while the system was failing and they didn't realize they needed to act.
Re:Software ENGINEERING by Kirill+Lokshin · 2004-04-10 09:48 · Score: 2, Insightful

the system will fail gracefully and safely.

A mission-critical system, by definition, cannot fail "safely", since it must not fail at all.
Re:Software ENGINEERING by sjames · 2004-04-10 09:52 · Score: 3, Interesting

At first glance, that can seem like a good idea, but are you prepared to pay for that signoff from each engineer whenever you install a piece of software?

A PE signs off on each particular instance of a design taking intended use, site and other construction into account. If you then build elsewhere, you need a new signoff. If you make any significant change (including adding other structural elements to the design (that is, installing more software), you'll need a new signoff. Add a new network driver, another signoff. Upgrade the CPU? You guessed it!

Some software is poorly designed and crash prone. Other software is well designed but cannot be signed off on because it might be installed on nearly anything that pretends to be a compatible platform.

The one justification for that sort of signoff is in situations where a bug will kill someone. Even then, the system should be divided into critical and auxillary parts to limit what must be signed off on.

Autopilots work that way. You have a small and reletivly simple part that assures safe conditions, is extensively tested, and rarely changed. Another portion is more frequently updated, attempts to optimize the flight and provides a nicer interface. The latter can fail completely and the plane will continue to fly (possibly with poor fuel economy and the pilot navigating manually, but it won't fall out of the sky).

There are many tradeoffs. In some sense, many small distributed systems are more robust than centralized control. However, it's a lot easier to create a chaotic system that way. If you do, you won't know until the system falls into a weird state without warning.
Re:Software ENGINEERING by iabervon · 2004-04-10 10:34 · Score: 2, Informative

One issue is that there is no safe state for the system to go to if the control system breaks down. Bringing the power grid in an area down safely is as hard as bringing it up safely (which, if you remember, took a while) and is harder than just keeping the system running.

The system is full of inductors, whose voltage drop is determined by the change in current through them. If you disconnect a transmission line, suddenly you're trying to change the current to 0, which puts all of the inductors at whatever voltage is necessary to make the current change more slowly. Generally, the way of making the current change more slowly is either to shoot a bolt of lightning across the gap you're creating or to melt your equipment into a conductive lump of metal, but this is only a temporary solution. Instead, the inductors (inside transformers and such) can melt down so that they aren't inductors any more and the current can change more quickly. Of course, when this happens, the next segment of transmission line is now not getting current, so it has the same problem.

The only safe way to bring down the grid is by coordinating with the adjacent grids to carefully remove the load on the line you want to disable; but that's not really an option when the problem is that communication is out.
Re:Software ENGINEERING by dabblah · 2004-04-10 23:02 · Score: 1

Your last statement lets FE off a little light. They knew they had voltage problems that day (they knew they had them every day, Davis-Bessie was down). Eastlake 5 tripped on over production of voltage before the initial event that led to the blackout. Not seeing alarms under those conditions should have set off an alarm in somebody's mind...

Bug free! by Ghoser777 · 2004-04-10 07:43 · Score: 4, Funny

int main()
{
return 0;
}

Because I have shown you bug free software, does that invalidate the rest of your argument?

Matt Fahrenbacher

--
James Tiberius Kirk: "Spock, the women on your planet are logical. No other planet in the galaxy can make that claim."

Re:Bug free! by Raven42rac · 2004-04-10 07:46 · Score: 0, Flamebait

No, no it doesn't. I meant software in the sense of "running power plants" or "actually does something". Not "Hello World" samples.

--
I hate sigs.
Re:Bug free! by Creepy+Crawler · 2004-04-10 07:55 · Score: 1

Then your compiler must've fucked up.

See you relied on BUGGY software to make a binary of your "perfect program"
--
- Mod parent up! by Anonymous Coward (Score:1) Thurs, Nov 31, @13:37
Re:Bug free! by Anonymous Coward · 2004-04-10 08:56 · Score: 0

- you misread the specification; the return value should have been 1.

- you forgot to declare argc / argv, so you're liable for parameters mismatches with your runtime system that could have been caught at compile time.

- since we're talking embedded control devices for power plants, the return value is incorrect.

- since we're talking embedded control devices for power plants, returning _at all_ is incorrect. Returning from main means a system reset if you did the rest of your design right, a lockup if not.
Re:Bug free! by cd14 · 2004-04-10 09:37 · Score: 0

Matthew Fahrenbacher - WOW, who would have thought i'd hear from you on slashdot again. Man, how the heck are you?! This is Chris Dozois, you may remember me from Junior High band.. computer stuff over at Sherwin's, etc. Congrats on your graduation!
Re:Bug free! by SB5 · 2004-04-10 09:51 · Score: 1

I am sure my computers at my High School could have compiled that wrong. Don't ask how but they screwed up the Hello World program. It also didn't matter what compiler we used, for some reason we returned errors. Then again that was years ago, we played Quake2, GTA2, Starcraft during the first 4 months since the computers weren't running properly and software hadn't arrived. Then we played catch up the rest of the year. Getting a year's worth of C++ in half the time is not my idea of fun.

--
If what you are reading sounds funny, or sarcastic, lame, or stupid
it is because it is supposed to be. just laugh

Additional Information by Orne · 2004-04-10 07:47 · Score: 3, Interesting

Oddly enough, while writing a comment to another user's message, I threw some info in google to learn about FirstEnergy's EMS system, and found this other SecurityFocus story in Feburary 2004, which gives more raw facts than this newer story.

"DiNicola said Thursday that the company, working with GE and energy consultants from Kema Inc., had pinned the trouble on a software glitch by late October and completed its fix by Nov. 19..."

"With the software not functioning properly at that point, data that should have been deleted were instead retained, slowing performance, he said. Similar troubles affected the backup systems. " This dovetails well with why the testers had to "slow" their testing to make the race condition appear.

342 years of online operational hours? by VoidEngineer · 2004-04-10 07:51 · Score: 2, Insightful

So, as far as I can figure, there are 24 hours in a day, and 365 days in a year, which equals about 8760 hours in a year (give or take).

Now then, 3 million hours divided by 8760 hours per year equals approximately 342 years, modulo 4070 hours (i.e. approximately 169 days...).

Now then... how the hell do they get the idea that they've been up-and-running for 342 years? Are they counting things in parallel? Even if they were counting end-user operational hours, the number should at least be a couple orders-of-magnitude higher, no?

3M online operational hours sounds like fuddy-duddy accounting to me... although, obviously I haven't looked over the books. I would be interested to see how they came up with this number.

Re:342 years of online operational hours? by Anonymous Coward · 2004-04-10 07:55 · Score: 1, Insightful

They said this software was running at about 100 utilities world-wide, so that's an average of 3.42 years at each installation.

aQazaQa
Re:342 years of online operational hours? by Creepy+Crawler · 2004-04-10 07:58 · Score: 3, Interesting

342/x

x = "how many reactors they have in operation"
--
- Mod parent up! by Anonymous Coward (Score:1) Thurs, Nov 31, @13:37

Comment removed by account_deleted · 2004-04-10 07:52 · Score: 1, Insightful

Comment removed based on user account deletion

This Defines All Catastrophic Failures by Allen+Zadr · 2004-04-10 07:59 · Score: 1

They say that no airline crash was ever the result of a single failure. There are always at least three systems, sub-systems, either human, computer (but usually both) that lead to an airline crash.

In the case of HVAC fire systems, there are probably over 500,000 installations of HVAC systems, and these are tested under real fire conditions several times a year (where the type of feedback seen in this blackout investigation is made, each time).

I think this should support Raindance's point

--
Kinetic stupidity has a new brand leader: Allen Zadr.

Testing vs RTFS. Proprietary vs open. by SharpFang · 2004-04-10 08:02 · Score: 4, Insightful

if(int(rand()*1e20)==31337){
blow_up();
} else {
do_your_work();
}

Now I can't imagine amount of testing in proprietary software that could reveal this example of malicious code. In open source one look at the code will reveal it. Of course not all cases are so obvious, but always reading the code should be used together with "testing the software". How do you know lots of proprietary software that IS close-source isn't i.e. a gatweway for terrorists? How do you know biggest companies' stuff isn't all trojans? It wouldn't be hard to hide it. Say your software is kind of server. It does its job okay unless it receives TCP packets starting with certain string. Then it just executes commands contained after that string. Boom. No amount of -testing- will reveal this.
And there are bugs that can be triggered once in several billion cases. Only looking at the code could fix them and explaining "we did a lot of tests" is bullshit.
I put a lot of iron, gum, different materials, C4, glass and some more together and it goes, I call it "a car" and I rode 1000's of kilometers okay. Now no amount of testing in all road conditions will reveal it contains the C4 explosives. Looking under the hood will reveal it really fast.

--
45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2

Re:Testing vs RTFS. Proprietary vs open. by Anonymous Coward · 2004-04-10 08:59 · Score: 1

How do you know lots of proprietary software that IS close-source isn't i.e. a gatweway for terrorists?
Assuming they open their source code, how do you know that the binary they're deploying was compiled honestly from the sources that everybody gets to read? Opening the sources also removes some protection of obscurity (yes, yes, not to be relied on, etc, but is still real). Simply by reading software code one can put together the arrangements and interdependencies of hardware subsystems that are deployed, and may make it easier to attack.
I'm not saying open sourcing mission-critical software is not going to help, but you do still have to weigh the risks and benefits. How many people actually will help? This isn't Mozilla or Linux, which many people actually deploy and use, and it's tedious work. How much easier would it be for a terrorist? Put another way, do you know how to cause a catastrophic power failure if you forcibly take over such a facility? Would it be easier if you had studied the source code of its control systems?
But the bottom line is, at some point you just have to trust somebody at least minimally, since there's zero chance they'll let you compile the code and upload it onto their machines.
Only looking at the code could fix them and explaining "we did a lot of tests" is bullshit.
How do you know they don't look at the code? I don't really expect lawyers and spokespeople to explain the details of their software development process. The numbers cited appear to be simply for lawsuit avoidance purposes.
Re:Testing vs RTFS. Proprietary vs open. by Anonymous Coward · 2004-04-10 09:08 · Score: 0

can't imagine amount of testing in proprietary software that could reveal this example of malicious code. In open source one look at the code will reveal it

You have a limited imagination. "Proprietary" doesn't mean "invisible". It just means invisible to you. The rest of us can take one look at the code and spot the bug, even more easily than if it were open source -- because it's our job.

Code inspection was invented by an evil, profit-seeking corporation, for use on their own proprietary products. It's not something you can only do on open source.

On the other hand, spotting bugs, in say, OOo Impress, is not my job -- or anyone else's, judging by the fact that I found three major bugs in five minutes the first time I tried to use version 1.1. One of the great myths of open source is that users have any time, training, or inclination to go bug-hunting in their apps. It's possible to do so, if there's some mission-critical problem that they absolutely have to work around. But in general, everyone has their own problems to deal with, and using software is just a tool to deal with those problems, not an end in and of itself -- unless you're a hobbyist. More likely, you'll just try another tool that doesn't have that particular problem and get on with your business.

Just because people can look at the code doesn't mean that they will.

Fixing with words what needs engineering. by Futurepower(R) · 2004-04-10 08:08 · Score: 1

From the Slashdot story: "Unfortunately, that's kind of the nature of software... you may never find the problem."

What the parent poster said sounds right. The GE spokesperson is just trying to fix with bullshit what should be fixed with engineering.

"We test exhaustively..." by Fratz · 2004-04-10 08:12 · Score: 4, Insightful

Um, no you don't. By definition, if you tested exhaustively, you'd have found everything that could possibly go wrong with whatever you tested.

I'm not saying it's always feasible to test exhaustively, but don't say you did when you clearly didn't.

Also: "we had in excess of three million online operational hours in which nothing had ever exercised that bug"

Taken with the "exhaustively" statement, I'm thinking that whoever said these things doesn't understand QA very well. It's easy to write code that works well when everything's good, and it's often just as easy to test that. It's another thing entirely to write code that works well (or fails gracefully) when everything's wrong. And again, it's harder to test that.

--
-- Fratz, human

Clocks by Detritus · 2004-04-10 08:13 · Score: 2, Interesting

That's one reason that I like to put UTC clocks on displays. A quick glance at the clock will tell you if the display subsystem has crashed.

I'm also a big fan of watchdog timers. The process that periodically resets the timer can make all sorts of health and sanity checks.

--
Mea navis aericumbens anguillis abundat

Re:Clocks by corngrower · 2004-04-10 11:45 · Score: 2, Interesting

A watchdog timer on the alarm system (that was deadlocked) would probably have prevented this scenareo. And I also agree that displying clocks on the screen is a good way for the operator so see if the display system is functioning properly.

Not having the display system give a visual indication of stale data was also a deficiency.

There also seems to have been a problem in that the data collection and monitoring portion of the system was held up by a malfunctioning alarm system.

de centralised power by zogger · 2004-04-10 08:16 · Score: 2, Insightful

I think the nation/region would be served better if we stepped back a bit and took another look at more decentralised power generation as a full bore government encouraged option. Not as a complete replacement, but frankly, I see no reason we can't have millions more solar panels and wind generators out there. Economy of scale in manufacturing, spurring on even more R&D, etc, works for everything else it appears. And having a lot more points of production, spread out, would help to mitigate cascading failures, especially if islanding was more precise and easier to implement in smaller areas. Wind, were the average wind is adequate enough, is especially cost effective now, approaching coal burning costs per watt. Solar is nice where applicable by climate because it can make use of dead space going to waste, millions of roofs already there.

I like the "not all your eggs in one basket" approach to problems, and I believe in backups for everything.

Re:de centralised power by Anonymous Coward · 2004-04-10 19:09 · Score: 0

Exactly, if they took even a fraction of the billions they gave to PG&E to bail them out after the Enron induced "energy crisis" in California and put it into solar we would have added resilience to the system right where it is needed - solar radition patterns follow the peak usage patterns and decentralized generation reduced load on interconnect points.

During the cold war, US researchers analysed our transportation infrastructure and found it to be more resilient due to being more decentralized (Russian transport was based more on rail and very few warm water ports and inland water ways).

So micro, nano, distributed generation would seem to solve all sorts of problems - terrorist attack impact, peak usage problems, regional redistribution of power, etc the reason it isn't happening is because it takes away power from those who currently hold it and as far as I can tell there is no valid reason that California or anywhere else wouldn't massively benefit from decentralized generation.

Re:Software bug was just one part of bigger proble by Detritus · 2004-04-10 08:17 · Score: 2, Funny

They were doing their job, cutting budgets and payroll costs. Oh, you wanted the system to operate reliably too?

--
Mea navis aericumbens anguillis abundat

Then why do they get paid? Re:You know... by sharper56 · 2004-04-10 08:19 · Score: 1

Once you hit a level of professionalism, then you are PAID to think outside the box and anticipate unlikely problems.

In a disater, this becomes difference between companies that take a financial loss and those that file Chapter 11.

Re:Software bug was just one part of bigger proble by Shakrai · 2004-04-10 08:20 · Score: 5, Insightful

The software bug was just one piece of a much bigger problem; I wouldn't want to overstate its' role. There were many other factors; here are just a few:

Why don't we point out the real problem that likely caused this to happen. Energy deregulation in the first place.

I know I'll be jumped on by the free market types for daring to suggest this, but I'd rather have a regulated monopoly then a free-market for my life essential services anyday of the week. That article you linked is very interesting reading. Some quotes:

Prior to the implementation of Federal Energy Regulatory Commission Order 888, which greatly expanded electricity trading, the cost of electricity, excluding fuel costs, was gradually falling. However, after Order 888, and some retail deregulation, prices increased by about 10%, costing consumers $20 billion a year.

"Under the new system, the financial incentive was to run things up to the limit of capacity," explains Carreras. In fact, energy companies did more: they gamed the system. Federal investigations later showed that employees of Enron and other energy traders "knowingly and intentionally" filed transmission schedules designed to block competitors' access to the grid and to drive up prices by creating artificial shortages. In California, this behavior resulted in widespread blackouts, the doubling and tripling of retail rates, and eventual costs to ratepayers and taxpayers of more than $30 billion. In the more tightly regulated Eastern Interconnect, retail prices rose less dramatically.
In the four years between the issuance of Order 888 and its full implementation, engineers began to warn that the new rules ignored the physics of the grid. The new policies " do not recognize the single-machine characteristics of the electric-power network," Casazza wrote in 1998. "The new rule balkanized control over the single machine," he explains. "It is like having every player in an orchestra use their own tunes."
Equally important, the frequency stability of the grid rapidly deteriorated, with average hourly frequency deviations from 60 Hz leaping from 1.3 mHz in May 1999, to 4.9 mHz in May 2000, to 7.6 mHz by January 2001. As predicted, the new trading had the effect of overstressing and destabilizing the grid.

Of course it's the first quote that rings true with me. If deregulation is so friggen great then where is the cheap electric? Why can my Village sell me electric for $0.04/kWh with their regulated municipal power authority (while paying their workers Government rates and with Government benefits) when my girlfriend (who lives a whole two miles away) pays $0.14/kWh for electric supplied by a company that is supposedly part of the free market (a company that pays their employees crap and outsources their call center/billing functions to India). What's the problem with that picture?

Before energy deregulation the price of our electric was regulated by the PSC (Public Service Commission) and was fairly stable. The company that had the monopoly in this area made a set amount of profit (it wasn't a bad stock to pick up either -- you knew what you were getting), treated their employees well and charged a fair rate. Nowadays they treat their employees like crap, the stock has tanked because they are eating the price difference from their suppliers (otherwise we'd be paying about $0.20 kWh) and they are being raped by out of state suppliers that bought all of their generation capacity.

In another slightly related story the out of state company that bought one of their power plants sued the local township because they wanted the tax levy on the power plant reduced. They claimed that it wasn't worth what it used to be because they didn't plan on operating it (it was to be backup generation). After a three-year legal battle the township lost (ran out of money to pay the lawyers) and the tax levy was reduced by some 60%. Property and school taxes on

--
I want peace on earth and goodwill toward man.
We are the United States Government! We don't do that sort of thing.

Permanent Alarms by Detritus · 2004-04-10 08:21 · Score: 1

That's assuming the faults get fixed. I've seen buildings with the new fancy computerized fire alarm systems where alarms for sensor and wiring faults get ignored for months.

--
Mea navis aericumbens anguillis abundat

Re:Permanent Alarms by Vancorps · 2004-04-10 08:24 · Score: 1

In my experience its impossible to ignore since it sets off the firealarms throughout the building and calls the fire department who have to come out an investigate. Only they are allowed to shut it off, if you do then you get yourself a nice hefty fine. I guess its not like that everywhere from the sounds.
Re:Permanent Alarms by innocent_white_lamb · 2004-04-10 08:43 · Score: 1

I guess its not like that everywhere from the sounds.

Absolutely.

At a shopping mall near where I used to live, an alarm monitoring panel was located right beside the main doors into the place. A fault light was flashing on that panel right from the day that they opened the mall until the day that I moved away.

--
If you're a zombie and you know it, bite your friend!

two words: formal methods by Anonymous Coward · 2004-04-10 08:22 · Score: 1, Informative

It is possible for some problems to construct a formal
description of the code. There are many,
many tools (e.g., SPIN, ACL2) that take this
formal description and produce a rigorous
proof of some property, e.g., that some state is
never reached, that a safety or liveness property
is upheld, etc.

http://spinroot.com/spin/whatispin.html

AMD uses this to test the floating point unit
in their chips, to make sure the algorithm they
use will not result in an Intel-style half
billion dollar mistake.

The question is: does your application warrant
the time and cost needed to create the formal
description of the problem, needed to drive these
tools.

Re:two words: formal methods by Tony-A · 2004-04-10 12:32 · Score: 1

take this formal description and produce a rigorous proof of some property, .g., that some state is never reached ... and then have the system go beserk when that state is reached.

The problem is that while you can get a rigorous proof (Wasn't the parallel postulate "proved" in the 13th centery or so?) of the formal description, you have nothing remotely like a proof, formal or otherwise, that the formal description actually matches reality.

A perfect example by maximilln · 2004-04-10 08:22 · Score: 1

This probably would've been prevented if they had compiled using -O3 and -march=athlon-xp.

Someone said "always go with package installs" and that person had more seniority.

Unum. 'I'm not sure that more testing would have revealed that. Unfortunately, that's kind of the nature of software... you may never find the problem. I don't think that's unique to control systems or any particular vendor software.'

--
+++ATHZ 99:5:80

Re:A perfect example by ptr2void · 2004-04-10 10:00 · Score: 1

This probably would've been prevented if they had compiled using -O3 and -march=athlon-xp.

You mean that would have been prevented if they hadn't..., right?

The real problem by GISGEOLOGYGEEK · 2004-04-10 08:26 · Score: 0, Flamebait

Ok, so they found the trigger ... poor maintenance left cables hanging down on trees, and a bug in software failed to set an alarm off when those cables tripped off.

But They need to deal with the REAL PROBLEM.

The surrounding electrical utilities, when they measured the power fluctuations hours before the cascade, acted soley to protect themselves instead of protecting the system and the customers. They acted to trip off their own systems and shunt the power drain to other utilities.

By doing so, loads too big to fill were thrown on down the line, forcing more and more utilities to trip off. . .. the cascade failure.

The utilities are required by law to act for the system first, before ducking their heads in the sand the way they all did. They could have isolated a small area and left the outage as a minor event never making the news.

Instead in typical dumbass ignorant american fashion, everyone ignored what was happening including Dubya and tried to blame it on a utility in Canada.

The truth is out now, but with their rude american ways and short attention spans, it will never occur to them to even apologize to the Canadian people and systems that they in fact disrupted.

--
George Bush + Linux = "I will not let information get in the way of the fight against Windows"

Re:The real problem by sjames · 2004-04-10 10:19 · Score: 2, Informative

From the reports I have seen, other than FE, the various companies did take appropriate action and shed load where necessary, it's just that the situation developed too quickly (from their perspective) and was too large to save by the time they could see it.

The problem was that the grid was running too close to capacity in general. Since the electricity is traveling as fast as any control signal could, it is necessary for the system to be able to tolerate whatever condition may exist long enough for systems to react and get a command transmitted. To make matters worse, you can't just switch off that much current, it takes several seconds for a switch to trip and the arc to be extinguished.

At 50% of design limits, a sudden doubling of demand due to a failure is no big problem (but needs to be dealt with before something else goes wrong), if you're at 90% though, you have a problem.

The real problem is that peak capacity simply isn't there. Our grid does run around 90% during peak load. The question is, are we willing to pay for the extra peak capacity.

California's problems were quite different since it was basically an effort to wring out more profit than existed in the system. THAT is a good reason for regulation. It may be that a more limited form of that is why we don't have more peak capacity, and that needs to be addressed.

I know a test methodology which would find that by Anonymous Coward · 2004-04-10 08:29 · Score: 1

It's called "code coverage analysis". You run tests, with the code profiled to track which instructions have run. Then you generate a report, and go look at all the code which never got run, and try to figure out how to change your tests to make it run.

And then you add "fault injection", which is a technique to force "errors" to happen (which in this case would cause a particular return value from rand()) - and Ta Da! You have found the "bug".

But then, this is /., so I wouldn't expect very many of the posters to actually have a clue...

Mutexes and Locks by Detritus · 2004-04-10 08:30 · Score: 1

It isn't that difficult for most common cases. You just put mutex semaphores or locks on shared data structures.

You need programmers with a good background in real-time and concurrent programming, who understand the hazards and how to avoid them.

--
Mea navis aericumbens anguillis abundat

Re:Mutexes and Locks by Tony-A · 2004-04-10 12:47 · Score: 1

You need programmers with a good background in real-time and concurrent programming, who understand the hazards and how to avoid them.

Agreed. Including all the places that look innocent but are capable of encountering such hazards. Including the pathological cases where innocent-looking code can have extremely evil consequences. Including code that looks dangerous but is in fact safe. Including code that looks safe but is in fact dangerous.
Re:Mutexes and Locks by Ben+Hutchings · 2004-04-11 06:03 · Score: 1

It isn't that difficult for most common cases. You just put mutex semaphores or locks on shared data structures.

Semaphores are difficult to use correctly in the general case, but mutexes are fine. However, Java, Win32, and I suspect .NET (but I really don't know about that) provide recursive mutexes which provide easily enough rope to hang yourself - particularly since waiting undoes all the locks in Java.

You need programmers with a good background in real-time and concurrent programming, who understand the hazards and how to avoid them.

Yup. Unfortunately, while modern programming environments like Java encourage or require the use of multi-threading and many server applications require it for scalability, most programmers really don't have a good grasp of these things. They write lock-free code because they either (a) don't consider competing threads at all or (b) don't understand that memory access can be reordered in unexpected ways in the absence of explicit synchronisation.

Statistics by Detritus · 2004-04-10 08:43 · Score: 1

Exhaustive testing, however you wish to define that, can reduce the number of defects in the code, but it isn't going to eliminate them in a complex system. The number of defects found per unit of test time follows a predictable curve, where each new defect found requires more test time. It's like accelerating to the speed of light, the closer you get to 0 defects, the more test time is needed.

--
Mea navis aericumbens anguillis abundat

Re:Statistics by chgros · 2004-04-10 10:25 · Score: 2, Informative

Exhaustive testing, however you wish to define that
Exhaustive \Ex*haust"ive\, a.
Serving or tending to exhaust; exhibiting all the facts or arguments; as, an exhaustive method. Ex*haust"ive*ly, adv.

Basically, it should mean you've tested everything (which is of course impossible in most cases).
The term usually used (and rightfully so) is extensive testing.
Re:Statistics by aardvarkjoe · 2004-04-10 12:35 · Score: 2, Funny

Maybe it just means that they got very tired while testing the software?

--

How can we continue to believe in a just universe and freedom to eat crackers if we have no ale?

Re:Software bug was just one part of bigger proble by RobinH · 2004-04-10 08:43 · Score: 1

There were many other factors; here are just a few:

Yeah, and don't forget the biggest cause: Canada! We all knew immediately that it was their fault. They probably wrote this software too.

--
"I have never let my schooling interfere with my education." - Mark Twain

The solution: PoIP (Power over IP) by uberTr011 · 2004-04-10 09:26 · Score: 0

Power over IP would prevent blackouts like this from happening in the future. The internet is the solution to everything... even power.

Re:The solution: PoIP (Power over IP) by pedrop357 · 2004-04-10 15:27 · Score: 1

But can PoIP be carried by pigeons?
Re:The solution: PoIP (Power over IP) by uberTr011 · 2004-04-10 16:08 · Score: 0

No, but pigeons could be carried over IP! Haha! The internet wins again :-P
Re:The solution: PoIP (Power over IP) by pedrop357 · 2004-04-10 18:34 · Score: 1

You're talking about the new V.Pigeon compression standard.

Testing and robustness in critical realtime system by 0x0000 · 2004-04-10 09:35 · Score: 1

We test exhaustively, we test with third parties, and we had in excess of three million online operational hours in which nothing had ever exercised that bug,' says Unum.

I feel obliged to point out 2 things about this statement:

3,000,000 is not a very big number when you're talking about computerized systems.
If we're talking about 3 million hours of operation of this system, then it was a very old system by digital computing sytem standards. Have the system maintenance records been examined?

Systems designed to operate for 3 million unbroken hours without failure should have been tested both before release, and after release using information gathered during operation.

I would expect (were this my system, for instance) to have to periodically redesign software and upgrade hardware. Accountants might hate it, but it has to be done if you want to garuntee your uptime.

And finally (and all you EE's out there please consider careful before you decide I need my clock cleaned for this remark): this is the kind of thing you can expect to happen when you send electrical engineers to a software engineering job. (And don't all you kiddies who consider yourselves Software Engineers get too excited patting yourselves on the back, either; you haven't heard my thoughts on the current state of software engineering programs, yet; there are good reasons the EE's still dominate the embedded and real-time fields)

The testing required and at least some part of the requirements analysis should have involved software engineers. The fact that there probably were no software engineers when this system was designed and implemented just highlights the fact that the corps invovled were too busy trying to prevent distributed power technology from catching hold to maintain and upgrade their existing systems as new technology became available -- esp in the software side. Imo.

--
"The Internet is made of cats."

Blame Game by The+Monster · 2004-04-10 09:39 · Score: 1

We all know it was Microsofts fault...Blaster Worm?

If you want to know the truth, ask former White House Cyberterrorism expert Richard Clarke. He'll tell you that he had been warning both the Clinton and Bush administrations about this, and although Clinton's team had approved a plan to deal with the menace (but never actually got around to implementing it), none of Bush's senior aides listened to him, and instead wanted to do a pre-emptive strike on Kazaa to elimnate Weapons of Mass Distribution. It's all in his new book Against all Crackers

--

[100% ISO 646 Compliant]
SVM, ERGO MONSTRO.

Re:Software bug was just one part of bigger proble by Grayswan · 2004-04-10 10:11 · Score: 2, Interesting

Why don't we point out the real problem that likely caused this to happen. Energy deregulation in the first place.

I think it is more accurate to say that deregulation enabled, not caused, the problem. Certainly First Energy used deregulation to put in place much of the pieces of the problem. You just don't hear about all the well run deregulated power systems.

--
If you open your mind too wide, people will throw trash in it.

You can't simulate the real world by A+nonymous+Coward · 2004-04-10 10:12 · Score: 1

No matter how fancy your testing system, the real world has more connections, more diiots with fingers on keyboards, more feet tripping over cables, more weather knocking out transformers and lines, more everything.

I'm not sure why that is even remotely hard to understand.

--
Infuriate left and right

Re:You can't simulate the real world by Vancorps · 2004-04-10 16:30 · Score: 2, Insightful

Forgot rats, for some reason they likes to chew cables.
Now, for an example. I stress tested a database I am in the process of building for Mercedes, I made the machine come to crawl. I did it to a dual cpu server, a quad cpu server, and a 16 cpu server. Guess what? They all behaved exactly the same as the system grew. Now scale it up to the DB/2 cluster that it will actually be working on. I do the same thing and guess what? Yep, the exact same result.
If testing fails to produce an outcome that brings a fault then there is a flaw in the testing procedure. The real world can have more connections, but I don't care, software can be 100% bug free. I constantly hear programmers saying that every complex piece of software will have bugs. Its poor planning, and management, even if the programmer is incredibly gifted they will make mistakes without proper planning.
Yes, in the real world there are unforeseen variables but as systems because critical they should be undergoing more testing to ensure such things don't occur. But as I said in my original post. One outage in recent years is hardly a trend and is more likely just blown out of proportion.

bugs are not inevitable by ummit · 2004-04-10 10:21 · Score: 4, Insightful

We test exhaustively... I'm not sure that more testing would have revealed that.

For an obscure race condition, this is undoubtedly true.

Unfortunately, that's kind of the nature of software... you may never find the problem.

This is sorta true, sorta false, and definitely misleading.

I don't think that's unique to control systems or any particular vendor software.

No, it's not unique; bugs that may never be found are rampant in most varieties of software. What's false -- tragically, crushingly false -- is the presumption that these unfindable bugs are therefore inevitable. They are not.

If there's a class of bugs that's hard to test for -- and of course there are many such classes -- the prudent thing to do is to find development methodologies that skirt those bugs entirely. If you don't put in so many bugs in the first place, you obviously don't have to work so hard trying to find and fix them.

A loose end? by Anonymous Coward · 2004-04-10 11:45 · Score: 0

It's been said there's a monitoring business just North of here (Indianapolis) which is responsible for tracking power issues and taking care of these types of situations and helping to balance the power grid when it happens.

The local media has investigated it pretty thoroughly and determined they were a major cause in the blackout becoming as widespread as it was.

Perfect, Bug free software. by Electrawn · 2004-04-10 11:53 · Score: 1

Apollo landed on 40,000 lines of beautiful, bug free code. Yes, Mission critical can be done perfectly, it just takes half the GDP of the USA to do.

Also brings up the joke if they can land a guy on the moon in 40,000 lines...What the heck is going on with Windows 95 that it needs 16 million?

Couldn't find a HTML link fast, Word doc:
Word reference

Re:Software bug was just one part of bigger proble by wintermute1974 · 2004-04-10 12:00 · Score: 2, Insightful

You just don't hear about all the well run deregulated power systems.

Yes, we do not hear about them, because they do not exist.

Sure, it was First Energy's lines that failed initially, but if it wasn't First Energy, some other utility would have failed eventually. The engineering and the legal descriptions of the current electrical generation and distriubtion system in North America are at odds with one another.

There's a good technical discussion on the failings of the power grid that may interest you.

Re:Software bug was just one part of bigger proble by arodland · 2004-04-10 12:09 · Score: 1

Naturally, deregulation, the method by which government supports monopolies by restricting competition in bizarre ways (as opposed to the less-fashionable old tactic, regulation, by which government supports monopolies directly), is a major contributor to the problem. But it is critical that one distinguish between the shortcomings of deregulation and the shortcomings of unregulated utilities. They are not at all the same thing.

Rolling vs 1 by Anonymous Coward · 2004-04-10 12:17 · Score: 0

Not to flog a dead issue: but why is it that it's ok for California to have Months of Rolling blackouts, affecting more people (total) over a longer period of time than the entire single blackout on the east coast?

Re:Software bug was just one part of bigger proble by Shakrai · 2004-04-10 12:28 · Score: 1

I think it is more accurate to say that deregulation enabled, not caused, the problem.

So if I enable a problem that wasn't enabled before that means I didn't cause it? Explain that one to me. You just don't hear about all the well run deregulated power systems.

Generally speaking if you don't hear about something then it probably doesn't exist.

--
I want peace on earth and goodwill toward man.
We are the United States Government! We don't do that sort of thing.

Re:Software bug was just one part of bigger proble by bluGill · 2004-04-10 13:35 · Score: 1

Perhaps because the municipal power authorities don't pay any attention to the future, take new lines the the non-municipal paid to install without paying for it, has many more customers per mile, and does minimal maintenance.

At least in my area it is like that. I'm a member of an electric co-op. We have 16 customers per mile of line on average, the nearest investor owned utility has ~45, and the municipal ~115. The municipal takes the high profit lines, and leaves the rest to someone else. Both the company, and the co-op are paying attention to future needs, making sure generators are getting upgraded before there is a need. The Municipals know nothing about running a utility, so they do only what is required to get by.

I find it hard to feel sorry for that one township you sited, since there are many townships without that high taxed power plant around. I'll admit a bias, the nearest city to me is facing a budget crunch because they counted an a power plant to pay for everything, and those taxes are going away, now they want to annex me to pay their dept on a beautiful (but too large) town hall, and other boondoggles.

Wholesale rates have gone up everywhere. I live in an area where there never was regulation, and we face exactly the same higher rates. Coal prices are higher. Haven't you noticed that gas is nearly twice what it was 5 years ago? (Was $1.15/gallon, $1.78 now) It all connects.

Re:Software bug was just one part of bigger proble by Anonymous Coward · 2004-04-10 13:44 · Score: 0

I work for a large electrical utility in Texas. I've heard that vegetation management
(as you put it) is the first thing to go when budgets get tight; they'd rather pay out
bonuses than proactively trim trees to prevent undue outages.

Why isn't this Open Source? by chris_sawtell · 2004-04-10 14:09 · Score: 1

While I can understand that one does not necessarily want every Tom, Dick, and Henrietta checking changes into the current CVS branch, software which is created to reliably serve the General Public's need on a 24/7 basis, should be available for the said General Public to at least examine and critique. This would create not only the much needed conduit between Industry and Academia, but also the background 'body of literature' which is so essential to all learning. It would also vastly improve the code quality as the coders would know that they were doing their job in the public gaze.

Comment removed by account_deleted · 2004-04-10 14:18 · Score: 1

Comment removed based on user account deletion

Re:Software bug was just one part of bigger proble by Shakrai · 2004-04-10 14:52 · Score: 2, Insightful

The Municipals know nothing about running a utility, so they do only what is required to get by

I'm going to call bullshit on that. My Village has been running municipal electric since the 1910s. It is self-sustaining (i.e: takes in enough money to operate without using tax dollars) and geared towards the future. They didn't annex any lines or equipment from private companies -- it was built from the ground up. They don't own their own generator plants anymore (last one went offline in the 50s) -- they buy it from the wholesale grid just like everybody else. And yet somehow they are able to provide it at $0.04 kWh without screwing over their employees or customers. This municipal grid feeds everybody in town from houses to streetlights to factories. I'll grant you they don't have to serve a rural area but rural areas aren't automatically four to five times as expensive -- if that was the case then why do my parents, girlfriend, grandparents and friends all pay the same high rate even though they all live in suburban or urban areas with the exception of Grandma?

I find it hard to feel sorry for that one township you sited, since there are many townships without that high taxed power plant around.

Perhaps you'd feel for them more if it was your friends and family that lived there. Perhaps you'd feel for them if half of them had previously worked there before being laid off by the company that bought the plant and screwed the town over. Then the company fired the plant back up and brought in it's own people from out of state to run things. But that's ok -- last I heard New York state was going to go after them. Care to place bets on who will run out of money first in that legal battle?

Wholesale rates have gone up everywhere. I live in an area where there never was regulation, and we face exactly the same higher rates. Coal prices are higher.

So coal prices are the reason why Enron and it's buddies were slashing power production at their plants in California to drive up the wholesale prices? I wish my state would just tell the Feds to fuck off and regulate our own power industry. Everybody was better served when it was regulated -- from the power company itself (I never heard them complaining when they were posting a 15-20% profit margin) to the consumers. The product was more reliable. Those are facts. I just don't think it's a good idea to allow essential services that you really can't get elsewhere to be run by unregulated industries. What's your option if your power company is screwing you (and they are screwing you in all likelihood because if they don't they won't survive because they are being screwed by the wholesalers)? Not using electric? Try that one in a New England winter.

While I'm on this rant I might also point out the phone and cable companies. Before the deregulation of the phone companies my standard phone service was $15-$20 (before long distance charges). Now it's $35 -- or would be if I still had a landline phone. Before the deregulation of the cable industry (forced on my state by the Feds) we had tons of local cable companies and basic cable (50-70 channels depending on where you lived) was $15 a month. Now with Time Warner it's $45 a month. The only thing you can count on Time Warner for is to raise their rates once a year. You can set your watch by it.

--
I want peace on earth and goodwill toward man.
We are the United States Government! We don't do that sort of thing.

Re:Software bug was just one part of bigger proble by Anonymous Coward · 2004-04-10 15:17 · Score: 0

Before the deregulation of the phone companies my standard phone service was $15-$20 (before long distance charges). Now it's $35

Amen, brother.

Not only that, but 20 years ago, our phone bill came on a postcard.

Oh, absolutely by A+nonymous+Coward · 2004-04-10 17:01 · Score: 1

Once you figure out how to simulate the electric power grid for a good section of the country, you will be set. You seem to have a good handle on the approach, just scale up and scale up.

One outage in recent years is hardly a trend and is more likely just blown out of proportion.

Quite. A quarter of the country losing power is surely blowing things out of proportion, after all, that's how you scale things up, eh?

--
Infuriate left and right

Re:Oh, absolutely by Vancorps · 2004-04-10 17:08 · Score: 1

Its a single outage, if it happened inside of ten years then I'd say there was a problem. That doesn't mean we can't look at the causes and try to prevent them from happening again. Seems more like it happened a result of cascading circumstances. There was a lot of harm done from the outage thats for sure but the problem isn't necessarily has large as a lot of people seem to think it is. Perhaps it is, I don't have all the facts on the matter. At any rate, I feel confident that the power grid will function and can still be relied upon.

Re:Software bug was just one part of bigger proble by swillden · 2004-04-10 18:03 · Score: 1

Generally speaking if you don't hear about something then it probably doesn't exist.

Yeah, like those fictional women who aren't mugged and gang-raped, and those non-existent cars that manage never to crash into one another.

There are many, many, many things that do exist, but you don't hear about them, because they work just fine and therefore remain unnoticed.

Ask a competent network administrator how many people know his name...

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.

Re:Software bug was just one part of bigger proble by rickshaf · 2004-04-10 19:35 · Score: 1

The writer makes some excellent points, and I certainly agree with the statement that the power grids are literally the World's biggest machines. And being such, they are among the World's most complex systems. Therefore, they are subject to the laws of Chaos Mathematics. In other words, no matter how well we test such as sytem, and no matter how many safequards we build into it, it will occasionally behave chaotically. This is true for Shuttle crashes, airplane crashes, train crashes, any really large system (crashing). Anytime something of this sort happens, and you read or hear that "a string of very improbable events all happened at one", and the investigator goes on to say that, "if one of these events hadn't happened, the plane wouldn't have crashed.", you just heard about "Chaos in Action". The long list of improbable events the writer mentioned just illustrate my point.

Testing cannot guarantee systems by Goonie · 2004-04-10 21:30 · Score: 2, Insightful

If testing fails to produce an outcome that brings a fault then there is a flaw in the testing procedure. The real world can have more connections, but I don't care, software can be 100% bug free.

The first thing they teach you in a software testing course is that testing cannot guarantee the absence of bugs. The only way you can guarantee, through testing alone, that your program is error-free is to exhaustively test every possible "input" (combination of external inputs and internal state) and check them. When was the last time you wrote a program with a finite (and tractable) input space?

If you need 100% reliable programs, you'll need to prove them correct, and that's enormously difficult to do, and doesn't help if the bug is the result of a flaw in the program's requirement specification rather than an incorrect implementation of that specification.

What testing *can* do is provide estimates of a system's reliability, and in the real world that's all you're going to get.

--

Any sufficiently advanced technology is indistinguishable from a rigged demo
--Andy Finkel (J. Klass?)

A "slash story" about the blackout? by Wakko+Warner · 2004-04-10 21:32 · Score: 1

Blackout slashfic? Is the internet broken, can I get my quarter back?

- A.P.

--
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"

for all intensive purposes by Anonymous Coward · 2004-04-11 00:06 · Score: 0

He means for all intents and purposes.

Another victim of whole language reading. He did as he was taught, he guessed at what would convey his thoughts.

Phonics rules.

Re:Software bug was just one part of bigger proble by buzzcutbuddha · 2004-04-11 01:46 · Score: 1

" If deregulation is so friggen great then where is the cheap electric? Why can my Village sell me electric for $0.04/kWh with their regulated municipal power authority (while paying their workers Government rates and with Government benefits) when my girlfriend (who lives a whole two miles away) pays $0.14/kWh for electric supplied by a company that is supposedly part of the free market (a company that pays their employees crap and outsources their call center/billing functions to India)."

Well, let's handle this first. Your municipal can offer lower rates because you're paying more in taxes to subsidize them. You pay local, state, and federal taxes which then go to artificially lower the up-front costs you pay for electricity. But it is not necessarily cheaper.

An analogy would be Canadian drug prices. It's easy for those of us in the US to marvel at the lower prices of drugs in Canada without first considering the fact that it's only cheaper because Canadians pay a bulk of their taxes towards their health care expenses (29% of total tax revenue in 2002).

If it's not taxes, then the municipal funds itself by offering bonds, which then pushes the higher costs onto future subscribers. This isn't an effective solution, as it depends on future growth to give current subscribers a lower rate. You're effectively mortgaging your children's future so you can leave the lights on now.

Further more, if municipals were truly better, then it should have been the Canadian Power authorities or ConEdison that recognized the problem and cut Ohio off of the grid, thereby preventing this whole problem. But they didn't. Instead it was a private company outside of Philadelphia that helps maintain the grid that recognized the issue as it was happening and isolated it further. In fact, they said in a news story that they practice the very type of blackout event twice a year. They do this because they have a responsibility to their shareholders, and their customers and know that screwing either of them is not good business.

Yes, a company like Enron can game the system, and screw a lot of people, but I think we can honestly recognize:

Enron is the exception, and not the norm. Not many companies operate like Enron did, or was as unethical they were.
I think we can all agree that unethical behavior, ignorance, and incompetence are not limited to private corporations, but government agencies, municipal authorities also exhibit those human qualities.

btw, nice strawman, mentioning outsourcing while talking about a deregulated power company. sure to get a raise, but can we keep the logical fallacies to a minimum please? thanks

Re:Software bug was just one part of bigger proble by Melantha_Bacchae · 2004-04-11 01:55 · Score: 1

I know none of the big unregulated power companies are saints - all of them put profit before safety and reliablility these days.

But First Energy is spectacularly unsafe. The nation's second and third worst nuclear accidents happened on their watch, at their David-Besse plant in Ohio. For six years they just wiped off the leaking coolant from the reactor head. It was inspected several times by the Nuclear Regulatory Commission. It wasn't until there was a hole eaten in the head big enough to stick a gallon milk jug into, with only a thin (1/16 inch) veneer of stainless steel between the US and a Three Mile Island to Chernobyl accident, that it was even discovered and shut down! They are trying, with NRC supervision (like that helps), to restart it with a new head, and last I heard they couldn't go 24 hours without another failure of some sort or another!

This is what I would do:

First, I would re-regulate the industry and get tough. Make it clear that power companies are expected to be safe and reliable as being a power company is a public trust.

Second, regulatory bodies should not be responsible for industry promotion. They should be regulatory bodies only. This would toughen inspections, as there would be no more "nudge, nudge, wink, wink" at their industry "pals".

Third, look at what worked during the blackout. Coal and nuclear, dirty power, when down. Clean energy, such as the Niagara Falls hydroelectric plant, kept right on chugging. That shows us a good direction to go for the future.

Fourth, repair or replace the grid. It is held together badly by a bunch of companies who don't want to spend money on it. Either repair it, or localize the power sources (even per building) so no more huge blackouts can occur.

Shinoda: "Is Godzilla showing his hatred toward man-made energy?"
Godzilla: "Human! Impertinent! I rule the Atom!"
"Godzilla 2000 Millennium" (Japanese version)

and the best part is... by zogger · 2004-04-11 03:26 · Score: 1

... you can DO IT YOURSELF and not wait on the government or the energy monopolies. And it's scalable from 10$ on up. At the ten buck level you get get dedicated small solar powered devices, I have a radio I wrote about that has a crank genny on the side and a solar panel on the top. A friend gave it to me, he sold them, and I know his wholesale cost was around 10 clams, retail is around 30. We have a small solar rig here for grid juice backup, 3 panels, charge controller, batteries, small inverter, and we have a small wind genny. The wind genny I keep non mounted as a backup now in case of nasty storm damage, but I got all the stuff needed to quick install it within an afternnoon should there be a severe emergency. I believe in backups for backups.. I tell you on the wind genny, there is an industry going begging from potential customers just not realising how well they work and how cheap they are. We could put entire laid off out of work US rust belt guys back to work making them in mass quantities, all they are is a freaking vacuum cleaner motor (more or less, casually speaking now, they are DC not AC mostly) with some propeller doo dads on them. I mean, easy to make, cheap too. They go up in size from there of course. We also have a couple of smallish fuel gennys for backup to that here, and backup to THAT we got firewood and kerosene..

I built a small scale demonstration model methane digester before, man o man there's another major *thing* being under utilized inside the US, you get burnable gas easy. Took me less than 1/2 hour to build a working model out of scrap junk I had kicking around.

We DON'T have an energy crisis, we have a MONOPOLY energy supplier & governmental & media -> to the people education crisis. The fatcats who make trillions off "energy" DON'T want people to find out how easy and affordable it is to be your own micro energy producer. They want you to keep sending them a check,month after month, forever. Produce your own you can pay it off and own that sucker. Grid only is rent your juice from them, zero price guarantees down the road. I issued a challenge several times, I'd like to see ONE example where joe paycheck can go to any local elelctrico monopoly and get a carved in stone price guarantee good for ten to 20 years down the road. No one has even bothering replying, because it don't exist. so you can't say what it'sgoing to cost you even next year, let alone 20 years down the road. folks looking at retirement and a more restricted income might want to think on that some. With home produced, you got that guarantee, at least you'll always have "some" power that can't be fugged over by government/industry/politics. Just like with a nice garden you can always produce a lot of your chow. Just makes sense to me.. You know up front what it costs.

I always chime in on any energy related topics here at slasherdott, and on other forums, with anecdotals to help counteract the industry FUD out there. To produce at least some of your power-for any random regular joepaycheck, is QUITE doable and affordable now, especially when there are any number of big lenders out there that will let you tie in your start up costs into your 20 year note. Costs no more for a real decent home primary or backup system than an additional bathroom in your house. And it's doable and scaleable from hardly zip, I started with one small panel and one battery, worked up from there.

I have seen people who will gladly drop more on a big screen TV than it would have cost them for a good starter rig, then complain that "it isn't affordable". Geeks especially, home power production and storage has a GREAT application in the SOHO, you get a killer good UPS system out of it and you get controllable, tweakable clean double emphasis clean day to day power. win/win there, you're gonna have/want a UPS system for your boxes anyway, might as well do the logical next step and make it *nice*.

The real problem with software by TheLink · 2004-04-11 05:57 · Score: 1

I am not a software engineer but I think the real problem with software in practice is this:

With civil engineering stuff, people draw the blueprints etc, make the nice plastic/clay models, and once everything seems fine they build the real thing.

With software the plastic/clay models are actually _fully_functional_, and too many people think that's the real thing, especially since the plastic/clay models are as costly to make as the real thing if not _more_costly_.

But hey, I'm probably not as smart as those "Real Programmers" talking about P, NP and so on...

I just work in the IT security line, so what would I know...

--

Too many replies beneath your current threshold

Re:Software bug was just one part of bigger proble by Shakrai · 2004-04-11 09:05 · Score: 1

Third, look at what worked during the blackout. Coal and nuclear, dirty power, when down. Clean energy, such as the Niagara Falls hydroelectric plant, kept right on chugging. That shows us a good direction to go for the future.

Problem with hydroelectric is there hasn't been a new dam built in this country for a few decades. I'd take a dam (granted it harms the local fish population -- but there are way's to avoid or limit this) over a coal powerplant spewing radioactive dust into the atomosphere any day. I also wouldn't mind seeing more investment in nuclear power -- though that seems to be a taboo subject these days. Gas is also an option -- it pollutes but at least it's not as bad as coal and we (combined with Canada) have large reserves of the stuff.

Fourth, repair or replace the grid. It is held together badly by a bunch of companies who don't want to spend money on it. Either repair it, or localize the power sources (even per building) so no more huge blackouts can occur.

I think the grid is in better shape then people give it credit for. It just wasn't designed with this deregulated system (transporting power over hundreds of miles) in mind. If you re-regulated the power industry the grid would probably be in pretty good shape.

I'd like to see my state implement regulation again. People bitch about the bureaucracy and high taxes of New York but at least when you need it (be it the Public Service Commission, the Insurance Department, legal aid, etc etc) it's going to be there for you. We should buy back all the power plants from the out of state bastards that are trying to hold us hostage (the only reason it isn't working as well as it did in California is we have access to cheap hydroelectric power from Canada -- so they can't blackmail us as effectively as they did California) and let our local utility companies run things again.

Of course it probably won't happen until there is another disaster. That's just the way we seem to work these days (be it with blackouts, 9/11, or what have you).

--
I want peace on earth and goodwill toward man.
We are the United States Government! We don't do that sort of thing.

Re:Software bug was just one part of bigger proble by Shakrai · 2004-04-11 09:20 · Score: 2, Interesting

Well, let's handle this first. Your municipal can offer lower rates because you're paying more in taxes to subsidize them. You pay local, state, and federal taxes which then go to artificially lower the up-front costs you pay for electricity. But it is not necessarily cheaper.

Bzzzt wrong answer. My municipal power agency has been self-sustaining since 1920. They don't take in any tax dollars -- they run it all on the money they take in. Sure it's a Government run Agency so it can't make a profit (though they do take in extra cash for a rainy day fund) -- but for the sake of the argument if they increased prices 50% (to make a profit) they'd still be cheaper then the non-municipal options.

If it's not taxes, then the municipal funds itself by offering bonds, which then pushes the higher costs onto future subscribers.

Wrong again. The last bond they issued was back in the 1950s to build a new substation. The Agency started in the 1900s off tax dollars with a charter to provide street lighting. Over time they hooked up private customers (the infrastructure was already in place) and became self-sustaining. Perhaps that's the exception rather then the rule but you shouldn't go painting all municipal power with a broad brush of "You are just being screwed on your taxes" or what not.

Enron is the exception, and not the norm. Not many companies operate like Enron did, or was as unethical they were.

Really? Did you bother to read the story about the power plant in a local township near me? After they won their petty tax battle by exhausting the town's financial resources they fired the plant back up with out of state employees that they brought in. Sure we could rehire the local people that used to work there but they actually fought us on our tax levy so fuck em! I hope NYS shoves it up their ass -- they are going after them last I heard and something tells me that NYS won't run out of money like the township did.

I think we can all agree that unethical behavior, ignorance, and incompetence are not limited to private corporations, but government agencies, municipal authorities also exhibit those human qualities.

Your point?

btw, nice strawman, mentioning outsourcing while talking about a deregulated power company. sure to get a raise, but can we keep the logical fallacies to a minimum please? thanks

Why not? It's a valid point. Our power company (which was always a publicly held company) used to make enough profit that they could hire local people and pay them a decent (some would say too high but that's another story) wage. Now that they were forced to sell off their generation capacity they are being raked over the coals by the out of state suppliers and profits are a thing of the past.

So how did they respond? By laying off as many workers as possible and outsourcing whatever they could. And they still aren't back in the black. The PSC isn't going to let them charge the $0.20 kWh it would cost to put them in the black (why should they? All the money would just be leaving NYS) so it's a lose-lose battle for all involved. The customers get screwed, the employees get screwed, the townships get screwed and the shareholders (of the power company) get screwed. The only people who are winning are the shareholders of the out of state energy company that's screwing us over. The only reason it's not as bad as it was in California is because NYS has access to cheap hydroelectric power from Canada. That's the only thing keeping them from screwing us completely -- and it's the only thing keeping our power companies solvent. Thank god the Canadian companies at least have some ethics and responsibility.

So keep advocating your deregulated industry. I'm waiting for individual states to just start regulating it on their own. It wouldn't be the first time.

--
I want peace on earth and goodwill toward man.
We are the United States Government! We don't do that sort of thing.

Re:Software bug was just one part of bigger proble by buzzcutbuddha · 2004-04-11 12:59 · Score: 1

Okay, so you've had a bad experience in your area, with a poorly run company, and you've got a stellar municipality. Kudos to the PU, and it's shame that the company can't find their ass with both hands. But why judge EVERYONE'S experiences based on what's happening locally? Why return everyone to a single government-run monopoly based on your township? Deregulation doesn't work for you, and you've found a town that agrees with you. Cool. I don't want that and I'm happy where I am. That's what I love about this country: the ability for us to disagree politely and go our different ways. If, in the future, deregulation doesn't work, we'll fix it. If it does work, it will spread to those who want it.

Re:Software bug was just one part of bigger proble by Shakrai · 2004-04-11 13:13 · Score: 1

Okay, so you've had a bad experience in your area, with a poorly run company, and you've got a stellar municipality

It's not my area -- it's my entire state that is having this problem. The power company isn't being run poorly -- they are simply trying to survive while being raked over the coals by their suppliers. They built a functional system from the ground up and were forced to sell parts of it off (the power plants) to out of state suppliers and become nothing more then a energy deliveryman because some dolt in Washington figured it would be a good idea.

What's the solution? Let them pass the charges on to the customers? Hint: Our (already shitty) economy won't survive the electric rates going up by a factor of two or three. We are being raped and nobody seems to give a damn.

Why return everyone to a single government-run monopoly based on your township

I wasn't advocating that. I was advocating a return to a regulated power system -- perhaps a regulated monopoly but not a Government-run monopoly (though that is an idea).

Deregulation doesn't work for you, and you've found a town that agrees with you. Cool. I don't want that and I'm happy where I am.

Well good for you. But deregulation has been a disaster for my entire state. If it works for you then great -- but we will probably regulate our power industry again (with or without the permission of Washington). If that means our Attorney General needs to file suit against the Federal regulations and laws then so be it. We won't be the next California. We watched all the help they got from the Feds when the shit hit the fan -- the cynical part of me wonders if they didn't allow that to happen because California (like New York) is a Democratic bastion.

--
I want peace on earth and goodwill toward man.
We are the United States Government! We don't do that sort of thing.

Re:Software bug was just one part of bigger proble by bpettichord · 2004-04-12 07:24 · Score: 1

Poor vegetation management probably played an even bigger role as overloaded power lines warmed up, expanded and sagged into trees and bushes that were supposed to have been cut back. This is a red herring. The alarm system's purpose was to alert the system operators if a transmission line went down. These things happen, and that is why they have an alarm system. A failure in an alarm system will never lead to a serious problem if the events it is supposed to detect never happen. This does not absolve the the XA/21 developers in any way.

207 comments