Domain: nerc.com
Stories and comments across the archive that link to nerc.com.
Comments · 64
-
Re:Plus there was a built-in governor
The August 14, 2003 blackout on the U.S. East coast was due to a heat wave that caused the electrical system to be overloaded by too many air conditioners.
On the contrary, the official explanation (p. 17) is:
The Ohio phase of the August 14, 2003, blackout was caused by deficiencies in specific practices, equipment, and human decisions by various organizations that affected conditions and outcomes that afternoon--for example, insufficient reactive power was an issue in the blackout, but it was not a cause in itself. Rather, deficiencies in corporate policies, lack of adherence to industry policies, and inadequate management of reactive power and voltage caused the blackout, rather than the lack of reactive power. There are four groups of causes for the blackout:
1: FirstEnergy (FE) and ECAR failed to assess and understand the inadequacies of FE's system, particularly with respect to voltage instability and the vulnerability of the Cleveland-Akron area, and FE did not operate its system with appropriate voltage criteria.
2: Inadequate situational awareness at FirstEnergy. FE did not recognize or understand the deteriorating condition of its system.
3: FE failed to manage adequately tree growth in its transmission rights-of-way.
4: Failure of the interconnected grid's reliability organizations to provide effective real-time diagnostic support.
Also, hydrocarbons come more from transportation than electrical generation, these days.
-
Re:Reasons for power blackouts
I disagree, government is never the answer if you want something truely fixed. There are plenty of rules in place on how to maintain a reliable system, rules formed by the industry itself as "best practice" procedures; not to mention that there's already an alliance called NERC for US & Canada who's supposed to be managing it. A similar government commission FERC exists for setting USA policy only. Thirdly, there's another coallition called NAESB who sets the common standards for energy markets.
What doesn't exist is legally binding penalties on those who don't follow the "best practices" on how to run a control area. (Why can't we sue our utility company like we could any other private industry? Government.) Most of FirstEnergy's failures documented in the final report were not because there weren't any rules in place, it's that they weren't obeying the procedures already laid out; procedures that would have notified neighbors they were having issues, giving them time to rebalance the energy flows. This is a change that's been in the big "energy bill" for the last 4 years as the Senate sits and refuses to act on it, as the Democrats won't have anything to do with Republican proposed bills. The politicians have been arguing about Standard Market Design for 3 years, no progress. Private industry realized it needed common market rules for better efficiency and cost savings, so it's been implementing it themselves. If you leave things to the government, they argue and argue and nothing gets done.
The recent proposals, including the IEEE paper you link to, want to mandate additional collection equipment on every utility company in North America, so that one (government, of course) agency can collect all the data and have the big picture view. Well, in the next two years thanks to private industry advances brought on by deregulation, we may be down from hundreds to maybe 10 private institutions called Regional Transmission Organizations (RTOs) that will have the same big picture of vast swaths of the USA, with no government involvement whatsoever. That's the path I'd rather see.
-
The alarm bug contributed but was not the cause
After looking at the original report, it looks more like the GE XA21 SCADA network failure was not the primary cause of the cascading failure but more an effect of the failure. The key failure seems to be a software system callled the "State Estimator" (SE) that is used by the Midwest System Operator (MISO), a NERC reliability coordinator, to develop optimal solutions of for the planned operating level of all of the power generation and transmission equipment in the MISO area covering about 10 midwest states and 1 million square miles. It is not described in much detail but the SE seems to be an optimization tool using a linear programming model that gathers availability data for all of the major system components and load demand every five minutes and then calculates the 'optimal' use of those system components to maintain system reliability at the required level. The 'solution' of the model is then used to plan the operation of the overall system by sending the target operating levels to each facility in the system. So why did it fail? Two reasons. First, the model depends on having accurate availability information from each major system component. Status information is sent to MISO in Indiana by the "ECAR" data netork or by direct links. On the day of the failure, the direct link to a key transmission line was not working and the analyst had turned off the estimator to troubleshoot it. After fixing the problem, he went to lunch and forgot to put the system back in automatic mode where it would develop updated solutions. This situation existed for 2 hours from 12:15 to 14:40. When the estimator was switched back to automatic, it was unable to develop a solution because another key transmission line had overloaded and tripped and *its* new non-operational status was unknown to the model, apparently because the status of that line is assumed to be 'on' until told otherwise. This problem was not corrected until 16:04. The bottom line is that a critical major planning tool was not available for 4 hours for a regional generation and distribution system that absolutely required it's use to be operated successfully when the system power supply was very close to the demand.
The SCADA system itself did not fail, but its alarm function did, which provides alarms to control room operators about system operational problems. The problem with the alarm function seems to be a case of too many alarms for the system to handle as the problems multiplied. The software bug that they are now reporting was probably related to the unexpectedly large number of alarms that the system was experiencing. The new alarm inputs built up and then overflowed the process input buffers. The alarm system just stalled while processing an alarm event and the alarm function stopped. Then, at 14:41 the primary server hosting the alarm processing application failed due to some combination of the stalling of the alarm application and the queueing to the remote terminals. The hapless backup server then was automatically activated and everything was was transferred to it, even the functional non-alarm stuff. The backup server failed after 13 minutes. Basically, the SCADA alarm system seems to have been massively overloaded (which shouldn't ever happen, of course) beyond the capability of the system design to cope with. The bug apparently prevented an indication that the alarm system was failing but it looks like the cascading failure still would have occurred even if the software bug had not been present because the system deterioration had progressed to far to recover by the time that the bug manifested itself.
The immediate cause of the failure seems to be the forgetfulness of the analyst who was operating the planning model. The significant underlying contributory cause seems to be a very poor regional operational design in which a critical centralized system planning tool was being used with insufficient backup and oversight. It looks as though both Unix and Windows escape blame. The SCADA system probably was doing far more than it's designers intended and probably performed heroically until it died. 'Aye Captain...I canna do no more.' -
Re:I predict two things:In energy companies, there are a lot of things that have to happen that create no competitive advantage, such as energy scheduling. It's just something that has to be done. And then there are the regulatory requirements, where the vertical market vendors really make bank. Organizations like NERC (the National Electric Reliability Council) put out fully documented requirements for software, vendors code to it, wrap it in promises of end-to-end solutions for business-specific needs, and charge through the nose for it. Half of it turns out to be vaporware, and the only parts that work well are the ones that had to adhere to NERC specs.
I work for a small utility company, in the IT department, and I started talking to management recently about the possibility of open sourcing some of our apps. I think they're a little scared of being the first ones in our industry to do it (we haven't heard of any others, and I've done some searching). I tried to get a project going, but it faltered due to lack of resources. I sure would like to see a good project get off the ground for e-Tagging, energy scheduling, OASIS (the subject of my project), outage management, or any of the other non-competitive things we have to do.
-
Re:Scary Concept...
As you can see from the DoE summary, the grid entered a cascade state (failed) on about the 9th redundancy.
Breaker trips in New Jersey, and north of NYC, were examples of "good" operations, where it halted the voltage collapse by isolating load. South and West were spared, but it sucked if you were on the wrong side of the Hudson station. -
Re:Use open source in governmentEvery software in government, which is paid for from citizens taxes, should be open source.
How about, "Every software package mandated by government decree (like e-Tagging for electric power) should have a government-sponsored open source alternative," instead? Oftentimes, as in the case linked here, a government agency writes a software specification and requires vendors to adhere to it. That should be the case with electronic voting, too, and there should be open source alternatives to vendor-based packages.
-
The bullshit is yours.
If I had gone and said the north american power grid should be replaced at the wake of the outages [ . . . ], I would have been accused of countless acts of civil disobediance.
My first question is what is wrong with Slashdot? I mean someone saw fit to give the parent coward "Insightful" for what she or he wrote? Someone wind the clock back before 2000 when Slashdot wasn't frequented by Microsoft apologists.
I'm not sure what makes you think your exercising your 1st Amendment right to speak freely (assuming you're a US citizen) would be branded civil disobedince, but in case you're really worried (and not just ranting) know you're in good comapny: first, the outage of August 2003 has produced a US-Canadain task force to investigate problems with the aging power grid. In fact, the power grid is so important that it is the subject of dozens of assessments conducted by North American Electric Reliabilty Council. Let's just say that NERC is not sanguine about the reliability of the North-American power grid. The problem is so widespread that even US lawmakers anticipate a massive political dispute.
Regarding your comparison of the power grid to the Internet, network events such as MSBlaster and Sobig.F highlight the fragility of an information network built of insecure nodes. At present, the overwelming majority of the nodes of the Internet are powered by Microsoft software. For better or for worse, "press releases and open letters right at the wake [sic] of major worms" draw attention to the real effects of maintaining so insecure an information network. MSBlaster and Sobig.F are not theories but facts and so prove the unreliability of an Internet composed mainly of Microsoft-powered nodes. The timely discussion of network events such as MSBlaster, Mimda, Code Red, Sobig.X, etc. in the press should, in my opinion, be an obligation of network adminstrators.
Given your post, you'd probably have us ignore the problem in the hopes that the next worm/virus/trojan does not damage our shared information network even more spectacularly. Thanks, but I would rather disseminate information and share data about such network events rather than stop my eyes, ears, and mouth with sand.
-
The bullshit is yours.
If I had gone and said the north american power grid should be replaced at the wake of the outages [ . . . ], I would have been accused of countless acts of civil disobediance.
My first question is what is wrong with Slashdot? I mean someone saw fit to give the parent coward "Insightful" for what she or he wrote? Someone wind the clock back before 2000 when Slashdot wasn't frequented by Microsoft apologists.
I'm not sure what makes you think your exercising your 1st Amendment right to speak freely (assuming you're a US citizen) would be branded civil disobedince, but in case you're really worried (and not just ranting) know you're in good comapny: first, the outage of August 2003 has produced a US-Canadain task force to investigate problems with the aging power grid. In fact, the power grid is so important that it is the subject of dozens of assessments conducted by North American Electric Reliabilty Council. Let's just say that NERC is not sanguine about the reliability of the North-American power grid. The problem is so widespread that even US lawmakers anticipate a massive political dispute.
Regarding your comparison of the power grid to the Internet, network events such as MSBlaster and Sobig.F highlight the fragility of an information network built of insecure nodes. At present, the overwelming majority of the nodes of the Internet are powered by Microsoft software. For better or for worse, "press releases and open letters right at the wake [sic] of major worms" draw attention to the real effects of maintaining so insecure an information network. MSBlaster and Sobig.F are not theories but facts and so prove the unreliability of an Internet composed mainly of Microsoft-powered nodes. The timely discussion of network events such as MSBlaster, Mimda, Code Red, Sobig.X, etc. in the press should, in my opinion, be an obligation of network adminstrators.
Given your post, you'd probably have us ignore the problem in the hopes that the next worm/virus/trojan does not damage our shared information network even more spectacularly. Thanks, but I would rather disseminate information and share data about such network events rather than stop my eyes, ears, and mouth with sand.
-
Scarier thought
Did you all know that all power transactions on public power systems travel over the internet? Wanna hear something a little better? The backup plan in case of internet breakage is by E-Mail and then finally defaulting to the old fax machine. With the increasing complexity of transactions, increasing dependance on automation of power delivery, and an upcoming rollout of the ETag 1.7 transaction updgrade in April, who's to say the light switches will work in the future?
In light of this article and the probability that the public phone system is very susceptible to a terrorist or otherwise dangerous attack, shouldn't there be a dedicated messaging medium for the power grid? Say, Satellite or Microwave? I realize how daunting a project would be, as well as how cost prohibitive, but look at it this way: A foreign or national threat doesn't attack the power generation facilities, instead, they DDoS a server responsible for scheduling the power delivery. Thus preventing or decreasing the reliability of this power grid. Statewide or even interstate power blackouts are just one of a million effects of such an attack.
I'm not proclaiming a doomsday here, but with the current plight of Enron, shouldn't there be a little more scrutiny?
Related links:
FERC - Federal Energy Regulatory Commission
NERC - North American Electric Reliability Coucil
-
Re:Interesting Network Layout Challenge
This club, near Princeton, NJ, manned the statewide Red Cross radio hq from shortly after the blast until about 8:30pm last night (Wednesday). They coordinate communications statewide among the other chapters, and use HF to coordinate with the national hq in Washington. Excellent group of people.
-
A view from the inside.
A bit of background. I work in the Electric Utility industry, so take my comments knowing that bias and that I might have a clue about this stuff. In any event, my comments are my own. My employer doesn't even know I read Slashdot.
- But the construction of new power-generating plants and transmission lines to meet that demand has virtually ground to a standstill in the same period as companies wait to understand the effects of deregulation of the electric utility industry.
Not surprising, actually. However, the reasons for the delays are different for power plants than transmission lines. Power plants are being built these days, as companies react to the incredibly high prices of two summers ago. The market price for energy spiked at about $7-10k/MWh in summer of 1998, in part due to sellers defaulting on energy sales. Transmission lines are very hard to get built, in large part due to the "Not in my back yard" syndrome mentioned by another poster.
- The imbalance threatens to grow even larger in coming months amid projections that electricity demand will grow 17 percent by 2007 as transmission capacity rises only 4 percent.
This is the real problem. There has been and continues to be very little incentive to build new transmission lines. Remember that in nearly all states (perhaps all, I am only aware of the states where we do business) the Transmission system is not being deregulated. As a result, the owners of transmission systems will only be compensated for their investment via the regulatory process.
Finally, I have to comment on this:
- Byron noted that utilities can promise only 99.9 percent reliability--a figure that translates to about eight hours of blackouts a year--while high-tech firms stand to lose millions of dollars from a blackout lasting just a minute.
"We need 99.9999 percent reliability for e-commerce, and we need more flexibility from regulation to achieve that," he said.
Some (hopefully) useful links:
NERC NERC was formed following the 1966 Northeast US blackout.
The misc.industry.utilities.electric newsgroup homepage.
The Federal Energy Regulatory Commission.
Milalwi
(First time poster, long time reader.)
-
Things are getting pretty thin
Read through the reliability assesment reports of the North American Reliability Council (NERC). You will find that power companies are scrambling to build enough generating capacity to have adequate power availability margins. The "safe" level is a 15% margin; most regions are expected to fall below this in the next few years while the construction of new capacity is completed.
Interestingly, the western side of the US is projected to have the best power generating capacity while having the least reliable design. Many areas have power grids that aree inherently reliable in design, but have insufficient capacity to meet demand.
-JD -
Things are getting pretty thin
Read through the reliability assesment reports of the North American Reliability Council (NERC). You will find that power companies are scrambling to build enough generating capacity to have adequate power availability margins. The "safe" level is a 15% margin; most regions are expected to fall below this in the next few years while the construction of new capacity is completed.
Interestingly, the western side of the US is projected to have the best power generating capacity while having the least reliable design. Many areas have power grids that aree inherently reliable in design, but have insufficient capacity to meet demand.
-JD -
Re:So we lose power (1st?)Since Nuclear plants account for about 20 - 25% of US generating capacity, it is unlikely that having all nuc plants shutdown would cause the lights to go out.
You seem to be suggesting that we have 20-25% excess capacity, which is untrue. Depending on the time of year we are talking about, it is estimated for 1999 we have approximately 15% excess capacity during the middle of summer, and aprox. 25% during the middle of winter. So I believe some of us would be in the dark, if we were to take all the nuke plants out.
Besides, if SimCity is any indication, you don't want to be running your powerplants at 100% capacity for too long!
:)Anyway, as you've said, worst case, we won't have to power ALL of them down since some are already "compliant". I'm just trying to bring some real numbers to the table.
Try here: NERC