Software Update Shuts Down Nuclear Power Plant

← Back to Stories (view on slashdot.org)

Software Update Shuts Down Nuclear Power Plant

Posted by Soulskill on Friday June 6, 2008 @11:58AM from the we-have-safety-systems-because-we-are-very-stupid dept.

Garabito writes "Hatch Nuclear Power Plant near Baxley, Georgia was forced into a 48-hour emergency shutdown when a computer on the plant's business network was rebooted after an engineer installed a software update. The Washington Post reports, 'The computer in question was used to monitor chemical and diagnostic data from one of the facility's primary control systems, and the software update was designed to synchronize data on both systems. According to a report filed with the Nuclear Regulatory Commission, when the updated computer rebooted, it reset the data on the control system, causing safety systems to errantly interpret the lack of data as a drop in water reservoirs that cool the plant's radioactive nuclear fuel rods. As a result, automated safety systems at the plant triggered a shutdown.' Personally, I don't think letting devices on a critical control system accept data values from the business network is a good idea."

19 of 355 comments (clear)

Install Complete... by Anonymous Coward · 2008-06-06 12:02 · Score: 5, Funny

Must restart reactor to complete software installation.

[Yes] [No] [OMFG!]
Hmmm, threw an exception by Anonymous Coward · 2008-06-06 12:03 · Score: 5, Insightful

I'd rather it shut itself down then suffer major failure.
1. Re:Hmmm, threw an exception by xlv · 2008-06-06 12:44 · Score: 5, Funny
  
  I'd rather it shut itself down then suffer major failure. Personally, I'd rather it doesn't suffer a major failure at all, whether it's after a shutdown or not. Oh you meant than and not then, never mind...
Critical Update by Enderandrew · 2008-06-06 12:04 · Score: 5, Funny

Adds a whole new meaning to "Critical Update".

--
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
Oblig Simpsons reference by J'ai+Friedpork · 2008-06-06 12:11 · Score: 5, Funny

"Vent radioactive gas? Venting gas prevents explosion. [Yes / No]"

--
Took this comment seriously, did you?
Misreading of the Article by Anonymous Coward · 2008-06-06 12:12 · Score: 5, Interesting

"Personally, I don't think letting devices on a critical control system accept data values from the business network is a good idea." The article did not say that the data values were being read from the machine that was rebooted. It actually said that the rebooting triggered a problem in which values could not be read.

I wonder if they were using something like EPICS. I worked on a large experiment which used EPICS to control the system. Rebooting a machine would sometimes expose a problem with resources not being freed, eventually leading to a situation where data channels would read the 'INVALID/MISSING' value. The solution, as anyone who has worked on this sort of experiment will know, was to reboot more machines until the thing worked. ;-)

(I don't mean to complain about EPICS. It is very powerful and flexible... it's just that the version we used had these occasional hiccups.)
EULA! by bluephone · 2008-06-06 12:20 · Score: 5, Funny

It says right in the EULA that it's not to be used in a nuclear power plant!

--
jX [ Make everything as simple as possible, but no simpler. - Einstein ]
The problem is the update - not business network by markdj · 2008-06-06 12:21 · Score: 5, Interesting

I write this type of software for a living so I know that having a computer on the business network connected to the control computers is a risk, bur that risk can be managed. The problem here is that the software update wiped out the nuclear control system data. This exposes two bad problems. First customers are always asking why they can't update their system while it is still running. We liken that to changing your tire while driving down the road. Secondly the software update did not respect the data in the nuclear control system and synchronized it to new initial data in the update on the other system! Not a good idea. In critical safety systems, you always practice an update before actually doing one.
Re::O by Lurker2288 · 2008-06-06 12:40 · Score: 5, Insightful

What exactly do you find frightening about an automatic safety system doing exactly what it's supposed to in response to unusual input?
Only the biz machine was updated. Why trouble? by Ungrounded+Lightning · 2008-06-06 12:46 · Score: 5, Insightful

Secondly the software update did not respect the data in the nuclear control system and synchronized it to new initial data in the update on the other system! Not a good idea. In critical safety systems, you always practice an update before actually doing one.

I have no problem with a computer on the process control subnet reporting information to a computer on the business subnet.

I have a BIG problem with a computer on the business subnet being able to modify and corrupt data in a computer on the process control subnet.

"I can't dump data to the business side" is a reason to make a log entry and maybe sound a minor alarm. It's not a reason to shut down the reactor (unless the data is needed for regulatory compliance and the process control side isn't able to buffer it until the business side is working correctly.)

But if a business subnet computer can tamper with something as critical as a process control machine's idea of the level of coolant in a reservoir, it rings my "design flaw" alarms.

Is it ONLY able to reset it to "empty" as poorly-designed part of a communication restart sequence? Or could it also make the process control machine think the level was nominal when it WAS empty?

IMHO this should be examined more closely. It may have exposed a dangerous flaw in the software design.

Security flaws don't care if they're exercised by mischance or malice. If nothing else, this is a way to Dos a nuclear plant through a breakin on the business side of the net.

--
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
This was not a "fail-safe" incident by Drenaran · 2008-06-06 12:48 · Score: 5, Insightful

The problem here is that the system didn't shut down because it detected an error in the data collection system, instead it incorrectly detected a problem that did not in fact exist and then proceeded to take action. While the engineer in me is fairly certain that the system is designed to always fail to a safe state (as in, any automatic emergency response couldn't accidentally make things worse - at least not without raising all sorts of alarms), it is still concerning that internal control systems can be so effected by external servers.

In the article they mention that the system wasn't designed for security (since it was meant to be internal) - but this isn't a security issue at all! Any sort of system that relies upon other systems should be designed to assume failure can and will occur in other systems - that is not to say that it needs to verify/evaluate incoming data to make sure it is "good", but rather that it can tell the difference between receiving data (such as current water levels) and receiving no data at all (system failure). Once it has that it can ideally automatically switch to a backup system, or do what it did here and enter a fail-safe state (the difference being that it does so while pointing out the actual problem and not a incorrectly perceived problem in a different part of the system).
Re:Wow that is so funny by Anonymous Coward · 2008-06-06 12:50 · Score: 5, Insightful

And a shutdown, while incovenient, is not a catastrophe. In fact, it speaks well for the plant's safety that it did automatically shut down when faced with bad data.
Re:the slashdot crowd is dying to know... by Anonymous Coward · 2008-06-06 13:05 · Score: 5, Funny

If it was running Windows the OS is at fault.
If it was running something else then the application was at fault.
Re:One begs the question by badboy_tw2002 · 2008-06-06 13:20 · Score: 5, Funny

Good enough evidence for me! Microsoft caused a nuclear meltdown! Quickly, to the Blogo-Sphere!
Re:just to shortcircuit the nuclear hysteria by dbIII · 2008-06-06 13:32 · Score: 5, Insightful

While that may be true the first full scale prototypes of pebble bed are yet to go online - however construction of several in China is at an advanced stage. As Superphoenix showed with fast breeders you really need a full scale prototype to identify all of the problems (it was economic ones that killed fast breeders and not safety issues).
India's accelerated thorium idea is also very promising.
The major problem I see with US nuclear power is the assumption that it is a solved problem and almost zero has been spent on R&D for decades. The "new generation" of reactors from Westinghouse and others is little more than 1960's white elephants painted green.
Re:Wow that is so funny by Wo1ke · 2008-06-06 13:50 · Score: 5, Insightful

Yeah, so when a sensor breaks and stops sending in data, it'll keep running like usual, with maybe a small error code in the background. Cause, you know, that's how we want nuclear fucking powerplants to work.
Re:Wow that is so funny by icebike · 2008-06-06 16:10 · Score: 5, Informative

What part of FAIL SAFE don't you understand?

The System FAILED. It is programmed to SAFE the reactor when shit happens.

Without its sensors it had no choice but to assume worse case and scram the reactor.

It did it the right way. It did it the way it was programmed to do it.

What would you have it do to determine why it is no longer getting critical data? Send out a droid to check the cat5 cables? Its a frikin computer in a rack, not R2D2.

It worked the way it was supposed to.

Take a step back and let the big boys handle the reactor, Please.

--
Sig Battery depleted. Reverting to safe mode.
Re:Wow that is so funny by barius · 2008-06-06 17:42 · Score: 5, Insightful

I think you're missing the real point, which is that the central safety systems are being fed data from a 'business network'. What would happen if that computer had an issue that caused it to send the same data continuously even when the coolant level had really dropped? WHY are any safety systems receiving data from an insecure network?

It's bad enough that most reactors use regular PC's to do the data collection and reporting, given the security risks posed by such systems (especially if networked), but I never realized they would be so stupid as to feed data in the other direction like this!
Re:Wow that is so funny by Anonymous Coward · 2008-06-06 18:11 · Score: 5, Informative

I think you're missing the real point, which is that the central safety systems are being fed data from a 'business network'. What would happen if that computer had an issue that caused it to send the same data continuously even when the coolant level had really dropped? WHY are any safety systems receiving data from an insecure network?

It's bad enough that most reactors use regular PC's to do the data collection and reporting, given the security risks posed by such systems (especially if networked), but I never realized they would be so stupid as to feed data in the other direction like this!
Obviously you have -zero- experience with power plant networks. Allow me to enlighten albeit anonymously.

The reason machines like this receive data from networks that could be considered 'less secure' is because telemetry is required from a multitude of sources to actually ascertain any useful realtime information. Aggregation machines have to speak many different protocols and translate between them while communicating with other machines that belong at other plants, cities, states, and even companies to effectively get an accurate picture of the entire grid's current conditions.

The world of plant control machines themelves is very vendor-driven. Most facilities have turnkey solutions brought in by the few major players in this field. ABB, Hathaway, GE, etc. Those players don't even use the same SCADA protocols. Some use ICCP, some use DNP, and others prefer Etherpoll. I've seen RS232 data encapsulated into everything from fully-meshed TCP connections via OSI-Soft's PI to barely encoded into modbus and slapped onto ethernet with only an understanding of ARP.

The solutions are required because electricity is not just one powerplant pumping watts blindly. Instead, you have a multitude of plants all pushing power onto ISO-controlled grids that all have to work in concert with each other. This requires -- yes, you guessed it -- networking! The world of plant networks is pretty complex despite the hype you see in the media. The business of making actual watts appear magically at your house at a nice, consistent 60Hz is vastly more involved that most people realize.

Telemetry comes from secured networks, business networks, and other companies and controlling agencies. That is how it works. Period.

If you are actually interested in seeing the way these are regulated to be secured, the information is cleverly hidden in plain sight at the NERC website.