Software Update Shuts Down Nuclear Power Plant
Garabito writes "Hatch Nuclear Power Plant near Baxley, Georgia was forced into a 48-hour emergency shutdown when a computer on the plant's business network was rebooted after an engineer installed a software update. The Washington Post reports, 'The computer in question was used to monitor chemical and diagnostic data from one of the facility's primary control systems, and the software update was designed to synchronize data on both systems. According to a report filed with the Nuclear Regulatory Commission, when the updated computer rebooted, it reset the data on the control system, causing safety systems to errantly interpret the lack of data as a drop in water reservoirs that cool the plant's radioactive nuclear fuel rods. As a result, automated safety systems at the plant triggered a shutdown.' Personally, I don't think letting devices on a critical control system accept data values from the business network is a good idea."
Maybe nuclear power isn't a better choice.
Must restart reactor to complete software installation.
[Yes] [No] [OMFG!]
Scary!
1010011010
I'd rather it shut itself down then suffer major failure.
Adds a whole new meaning to "Critical Update".
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
When updating the computer that controls the entire facility, HAVE AN UNDO PLAN!
Was it running a Microsoft by-product or not?
Personally, I am reassured that these reactors are designed to shut down at the drop of a hat. This is not a situation were fuck-ups should be masked, any discontinuity, however minor, really needs to be highlighted and dealt with immediately.
"I bless every day that I continue to live, for every day is pure profit."
Surely this computer thingy must be the same as my home computer thingy....it always works when I turn it off and on again.
Sure glad the safety systems kicked in as per normal.
It was probably just a Microsoft Windows Update, I don't see how that could cause any problems....
As a regulatory wouldn't there be some check and balances to keep critical systems being on their separate ring and not on directly interdependent?
This is beyond incompetence... it is gross negligence.
Critical Updates are ready to be installed on your nuclear reactor. You must restart to complete them.
That's what you get for using Microsoft.
"Vent radioactive gas? Venting gas prevents explosion. [Yes / No]"
Took this comment seriously, did you?
To me it sounds much more like they have a bad system design if it's impossible to reboot one of the machines / it can't run with one of them offline. Not something which are to blame on the software update (shouldn't such things be expected anyway?)
I guess "software update" can have been used to bash Microsoft a little or something, not that it say windows update, or maybe the poster hates all kinds of software updates?
I wonder if they were using something like EPICS. I worked on a large experiment which used EPICS to control the system. Rebooting a machine would sometimes expose a problem with resources not being freed, eventually leading to a situation where data channels would read the 'INVALID/MISSING' value. The solution, as anyone who has worked on this sort of experiment will know, was to reboot more machines until the thing worked.
(I don't mean to complain about EPICS. It is very powerful and flexible... it's just that the version we used had these occasional hiccups.)
Reminds me of Terminal Error.
did it run Windows?
The only possible interpretation of any research whatever in the 'social sciences' is: some do, some don't
I for one welcome our new radioactive overlords.
Press hot grits to continue.
In Soviet Russia, reactor reboots you.
Yes, but does the reactor run Linux?
1) Break crucial system on reactor with update
2) Sell real update
3) Profit!
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
Does this actually mean that every system that effects operations in the plant _doesn't_ have a duplicate system running identical software acting as a shadow/backup? This would seem like a very basic level of system protection to have in a Nuclear Power Plant... If they had maintained such a system they would of loaded the new update onto the backup servers (which would be identical in every possible way to the mains), the system would of "broken" as it did here, and they would be able to keep operating while they figured out the problem.
Also, before you make the argument "but what if the update is critical?" - it's a Nuclear Power Plant! If any sort of update can be classified as so very urgent they couldn't put it off a couple days then I'd say we have bigger problems.
It says right in the EULA that it's not to be used in a nuclear power plant!
jX [ Make everything as simple as possible, but no simpler. - Einstein ]
While perhaps the system should be designed to behave differently, what happened here was a good thing. When things went wrong, rather than the reactor systems freaking out and doing random crap, they were properly designed to shift to a known safe state (i.e. Shut the hell down).
I write this type of software for a living so I know that having a computer on the business network connected to the control computers is a risk, bur that risk can be managed. The problem here is that the software update wiped out the nuclear control system data. This exposes two bad problems. First customers are always asking why they can't update their system while it is still running. We liken that to changing your tire while driving down the road. Secondly the software update did not respect the data in the nuclear control system and synchronized it to new initial data in the update on the other system! Not a good idea. In critical safety systems, you always practice an update before actually doing one.
Think about the cost associated with having and maintaining a completely hot-pluggable second control system. How much do you want your power bills to go up to pay for that? And what would be the point?
They have a perfectly adequate safety system that did exactly what it's supposed to do. It read confusing data and decided to shut the reactor down until a human came along and explained things satisfactorily. What's wrong with that? Aside from having the reactor offline for 48 hours, there was no other cost.
"... The move to SCADA systems boosts efficiency at utilities because it allows workers to operate equipment remotely."
Another proof that Homer Simpson was truly ahead of his time.
Are you mad, woman? You never know when an old calendar might come in handy. Sure, it's not 1985 now, but who knows what tomorrow will bring? -Homer
"Don't let fools fool you. They are the clever ones."
The chemical diagnostic data is damn important because it may determine things like corrosion rates and the amount of impurities circulating in the water, potentials for clogs etc... As with all other software, occasionally errors occur, and the appropriate way to respond when it does is to shutdown and blow some whistles as to ensure that the reactor is brought into a safe state before something else goes wrong. This is one of those cases where "Better safe than sorry" is a really rather good motto.
Patch Tuesday?
--
[Insert signature here]
Get ready to enter a new era of failed policies cribbed straight from the failed state of the USSR.
I'm gonna have to agree with that last statement in the summary. Basically under these circumstances, you take out the switch and you take out the plant and I doubt they guard the network closet as well as the reactor core. Plus the whole hacking thing. You really don't need to watch youtube videos and check your e-mail from a control computer and you can bring any actually needed updates and files to it manually via USB drive.
Google's Super Secret Search Algorithm: SELECT @search_results FROM internet WHERE @search_results = 'good'
The summary said: when a computer on the plant's business network was rebooted after an engineer installed a software update
We all know what really happened. Dude rebooted the computer so that Windows automatic update reminder to reboot wouldn't interrupt his Solitaire game every 10 minutes.
You are in a maze of twisty little passages, all alike.
The business computers should not be connected to the control network. What a crap design. It's as bad as me updating my laptop and having to ask Google to reboot their servers.
Engineering is the art of compromise.
"GROSS NEGLIGENCE - Failure to use even the slightest amount of care in a way that shows Recklessness or willful disregard for the safety of others." - 'Lectric Law Library.
Yeah, those bastards, the way they used THE SLIGHTEST AMOUNT OF CARE in designing a system that shuts down in response to unexpected data so as to avoid RECKLESSNESS with the SAFETY OF OTHERS.
Secondly the software update did not respect the data in the nuclear control system and synchronized it to new initial data in the update on the other system! Not a good idea. In critical safety systems, you always practice an update before actually doing one.
I have no problem with a computer on the process control subnet reporting information to a computer on the business subnet.
I have a BIG problem with a computer on the business subnet being able to modify and corrupt data in a computer on the process control subnet.
"I can't dump data to the business side" is a reason to make a log entry and maybe sound a minor alarm. It's not a reason to shut down the reactor (unless the data is needed for regulatory compliance and the process control side isn't able to buffer it until the business side is working correctly.)
But if a business subnet computer can tamper with something as critical as a process control machine's idea of the level of coolant in a reservoir, it rings my "design flaw" alarms.
Is it ONLY able to reset it to "empty" as poorly-designed part of a communication restart sequence? Or could it also make the process control machine think the level was nominal when it WAS empty?
IMHO this should be examined more closely. It may have exposed a dangerous flaw in the software design.
Security flaws don't care if they're exercised by mischance or malice. If nothing else, this is a way to Dos a nuclear plant through a breakin on the business side of the net.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
This is why you keep the IT nerds away from the process network.
I've had a whole plant lose view of it's system because some well meaning retard in IT decided to push updates onto a SCADA system without qualifying the updates....... never had it KILL the control side of things though....well done whoever you were, you've done well.
Burma?
The problem here is that the system didn't shut down because it detected an error in the data collection system, instead it incorrectly detected a problem that did not in fact exist and then proceeded to take action. While the engineer in me is fairly certain that the system is designed to always fail to a safe state (as in, any automatic emergency response couldn't accidentally make things worse - at least not without raising all sorts of alarms), it is still concerning that internal control systems can be so effected by external servers.
In the article they mention that the system wasn't designed for security (since it was meant to be internal) - but this isn't a security issue at all! Any sort of system that relies upon other systems should be designed to assume failure can and will occur in other systems - that is not to say that it needs to verify/evaluate incoming data to make sure it is "good", but rather that it can tell the difference between receiving data (such as current water levels) and receiving no data at all (system failure). Once it has that it can ideally automatically switch to a backup system, or do what it did here and enter a fail-safe state (the difference being that it does so while pointing out the actual problem and not a incorrectly perceived problem in a different part of the system).
Huh? I've read the NTSB report on that accident - and nowhere in it (IIRC) are computers implicated. The accident occurred due to damage to the pipes from construction equipment.
Rereading the report[PDF file] pretty much confirms my recollection, the SCADA system was not implicated as a primary or contributory cause of the accident. The SCADA system was malfunctioning at the time of the accident, but did not cause the overpressure, and 'may' have allowed the operators to relieve pressure had it been functioning and had they observed the pressure spike. The rupture was caused by construction damage to the pipeline and a faulty relief valve.
We liken that to changing your tire while driving down the road.
;}
Oh sure, NOW you think of a debian slogan
Good thing it wasn't written in Smalltalk. The slogan there is building the rest of the boat while underway.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
The thing I'm a bit puzzled about. . . if this system has data which is so important that the whole plant must be SHUT DOWN for two days if it fails, then why aren't there *at least* TWO of them (I'd say there's a good argument for 3 or 4, but. . .)? That way, you can take one out of the loop for updates, verify the update didn't hose your data, sync the data from the 'live' system, then put it online, take the other one offline, and complete the update on it.
If I were the power co owning this plant, I'd be ticked if the plant was dark for 2 days. With the price of energy these days, and the amount of energy a single Nuclear plant can generate, you're talking some real serious cash when the thing is down for 2 days. Especially if I have to look forward to the same thing happening again, potentially every time our systems need updating (not that it necessarily would happen every time, I would sure hope it wouldn't, but with only one system, every update is a potential for the whole plant to go down for some period of time).
maybe ;-)
Before there are too many retarded "OMG why was it on the business network!!!?LOL!??!" comments, I'll cover that right here:
It says the software is supposed to sync data between the control system and the business network. Obviously it has to be connected to both sides somehow. I'm not a power plant designer, but there's probably a good reason why people might need access to that data from the control system, and thus some kind of system acting as a safe bridge between the two rather than allowing unrestricted access from the business network.
The update f'd up and the control network went "Holy crap where did the cooling water go? Abort!" Everything worked like it was supposed to. The failure was caused by not testing the update in a lab environment before applying it to a live system.
At least it did not turn it into a meltdown, so at least the safety features worked in the software.
That is definitely a glass half full, as opposed to empty.
He's trying to find an opportunity to bash Microsoft!
Every system in a nuclear power plant has to be completely backed up. There should be no single point of failure. In fact, many/most of the backup systems are backed up.
The reactor should be shut down until this design fault is rectified.
Btw, compare the cost of a redundant computer system with that of a spare coolant pump. This is a pretty cheap problem to fix.
I'll admit I don't know the first thing about nuclear power plants, and even less about their control systems. With that in mind I would like to know what what great benefit is to be had by connecting these systems to the business network. Are these benefits worth the risk even if it is a manged risk?
most freakouts surrounding nuclear power are based on 1960s technology. modern reactor designs, such as pebble bed reactors, are designed to be passively safe. that is, you can just walk away from them, doing nothing, and they will not release gas, go china syndrome, or anything else unsafe. older nuke tech requires active safety management: someone must always be on the job, making sure nothing f***s up. designing safety into nuclear reactor design from the philosophical ground up is the way of the future
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
Computers are no a good choice because Windows sucks and M$ won't die. Hmmm, looks like that was what was wrong with the plant too.
Friends don't help friends install M$ junk.
Obviously you have never worked around a Nuclear Power Reactor when it did a emergency shutdown.
At the nuc site I worked at, there were two networks. The business network and the ops network. Data flowed from the ops network to the business network for statistics gathering only. The single thing that the business network did that affected operations and safety (regardless of my boss' attempt to justify budget) was the generation of work-orders. A total failure of the buisiness network would - at worst - result ina routine observation job to be missed which would cause the systems on the ops network to detect a 'fault' and bring the reactor away from criticality.
Yes, a simple software fault can 'shut down' a nuclear plant. These things are designed to 'trip' and shut-down automatically at the slightest thing going wrong. The most advanced and safest Nuc plant in the UK (SXB) does - or at least did - trip once a month or more.
Get a volt-meter that is sensitive to a thousandth of a volt, and allow it to shut down your house when it's input is not ideal. Give yourself three thousands of a volt either way off 'normal' and you are maybe experiencing the ridiculous measures a modern nuc plant puts itself under.
As a SCADA MMI guy, I just have to ask, where the hell was the intelligent human oversight? Luckly, it failed gracefully. From what I've see, this is just typical grossly negligent corporate effort. IMHO, these affluent corporations are way overpaided and far too irresponsible.
From the summary: If it's monitoring the primary control system then it seems to me like the machine would have to be on the control network. The real issue is why did the primary control system accept a reset from a monitoring system. It sounds like there's more than one bug to track down.
When our name is on the back of your car, we're behind you all the way!
^c^c^c
Wait, so they let their computer networked systems overwrite what the hard wired transmitters were telling them? The refuelling water tank, emergency charging tank and condensate storage tank all have bog standard level transmitters on them which report the level of borated water inside.
How can a computer system that takes it's values from these systems suddenly overwrite those values? All the plant's control should be from those transmitters, not from the reported data which goes via countless computers...
This sounds a little fishy to me (not that I wokr on a nuclear plant or anything...)...
Scenario : System comes up. Things don't work quite right. Some configurations are tweaked and system is now working fine.
Reboot. The tweaked configurations happen to go away. No one remembers which ones they were. The system is b0rked for a while.
I would hope that isn't the case for that system, but I have seen it happen before.
....you can just imagine that like most companies, their business network is all MS windows boxes that also have internet access, so is completely vulnerable to outside hacking too.
If this hadn't have happened it would have probably only been a matter of time before some hacker chanced upon the fact that they could actually control the nuclear facility from some compromised windows box.
Its amazing that these days some sys-admins/network admins still don't get it.
Lets just hope that this incident is enough to get them fired and for the comapny to hire people that know enough to make the system properly secure.
This reminds me of that time a single computer shut down an airport for several hours. It was a win9X machine that an employee would reboot monthly because of a know bug. After that employee left, nobody reboted the machine and it crashed.
The known bug was that win 9x stored the number of sec from last reboot in a int? and after about 47 days it would pass its max value and shut down. I did a quick search and this might be the story: http://software.silicon.com/applications/0,39024653,39124122,00.htm
Im a gamer, not a grammer major. This post is full of spelling and grammer mistakes.
... enter 4, 8, 15, 16, 23, 42.
Or else all hell breaks loose.
Have gnu, will travel.
... because then the computer would enter an infinite loop of reboots after the update.
Dohh! You beat me to this!! Well, I'll have a donut and console myself.
http://slashdot.org/comments.pl?sid=573665&cid=23655635
This reminds me of a funny image macro I saw a year ago about unauthorized upgrades to Glibc on servers:
http://img338.imageshack.us/img338/6923/1155529424306te2.jpg
If I had a dime for every function, that says: "There is 0 foo", when it really means: "I don't know, how much foo there is", I'd be millionaire...
In Soviet Washington the swamp drains you.
Didn't I hear that it was recently the 25th anniversary of the movie "war Games" and that when that was released, we had high mucky mucks talking about locking up modems and cutting phone lines?
Now, the clowns running these Nukes are trying to tell us that in the 80's and 90's there were no need to be worried about network security. What freak'n morons. Why fear terrorists when we have our own moronic citizens doing dumb shit like letting a Windows PC send data to a nukes control system. I'm still under the belief that the N.Y. power surg and black out was due to a Windows virus flooding the same ethernet network used to send display signaling information to the control room. When the Windows virus flooded the network, the ethernet messages could not get out, the queue filled and crashed the app. Look ma, no status messages telling the control personnel a surge was occurring and BAM, shutdowns start going off like falling dominoes.
We have nothing to fear but ourselves.....
LoB
"Anyone who stands out in the middle of a road looks like roadkill to me." --Linus
Subnet? It should not even been on the same physical network!
ah, the wonders of try/catch.
but we all know what it is that requires the 'system update' to reboot...
the microsoft PR nerds must've been on that one.
This is what I call real mission critical apps... not some business analyst dreams up for the accounting or billing system. Its amazing how crappy commodity software (I am looking at MS Windows.. but off the shelf Linux/Unix distro's can't be that better either) running mission critical apps.
These should run on multiple systems running in a cluster... you know lock step etc where updating or shutting down one machine does not necessitate the reactor going to a safe mode! The system should be run on customised _Real Time OSes_ with RT apps that have their every piece of code has been audited and proven not to be harmful and _NOT_ connected to the internet net or at the very least firewalls/gateways/filters that have been assured to work (CC 4+ or better would be a good start).
Its alarming that crappy general purpose OSes are creeping into to real mission critical control systems.. from reactors to combat systems that have the potential to kill thousands or millions of people. Sheesh.
~AC
Something is not right here...
Yes, the safety system kicking in is "a good thing".
Pulling data from another computer system for a safety related control system is not a bright idea (the weakest link problem).
Historically a safety control system in an Oil & Gas environment, all the inputs to the safety system are either hardwired or pulled from another safety system controller which has the appropriate level of redundancy (CPU boards and communication paths with communication watchdog timers).
Even transmitters in some circumstances can not be trusted hence the 2 out of 3 voting systems (take three transmitters measuring the same value and pick the middle of the three, if one of the transmitters fails high or low your choice will be the safe option).
Someone needs a serious think about where this plant is getting data for its safety shutdown system.
ZombieEngineer
was anyone else hoping this was due to a Windows update of some kind?
If this was a chemical plant I would be asking to see the HAZOP (http://en.wikipedia.org/wiki/Hazop) reports.
HAZOP studies are serious mind numbing exercises of systematically identifying every possible operational hazard. Should a hazard occur a mitigation action needs to be implemented. The resulting mitigation actions then themselves need to be run through the HAZOP process.
It should be fairly obvious that this is a recursive process and that modern chemical plant designs favor simple, intrinsically safe methods that don't require a complicated control scheme or otherwise the design engineer is condemed till doomsday reviewing the safety of the plant.
ZombieEngineer
Just 5 years ago, controls engineers wouldn't breath a word about how vulnerable the world was (and maybe still is). There are special computers called PLCs (Programmable Logic Controllers) that control just about everything in this world from factories to power plants to waste management facilities. They are the brains of all automation. They are also all connected to computers and those computers are all networked on LANS. And in the past, those computers were every bit as likely to get viruses as anyone else's computer. The fortunate thing is that no one who ever wrote a virus ever bothered to write one that would mess with the logic in a PLC. It would have been so easy. In fact, it still is. 99% of all the PLCs in the world are connected to computers in an unsecured fashion. If a virus in a PC were to write randomly into PLC memory, whatever that PLC was controlling would come to an uncontrolled halt. Engineers would never figure out why all the processors were crashing - diagnostics don't exist to monitor this kind of attack. In the days of highly potent viruses like Nimda, it wasn't uncommon for scores of computers that connected to all the PLCs in a given facility to be infected. If Nimda or its kind carried a PLC targetting payload we would have seen disaster much greater than the biggest doomsdayers ever predicted from the millenium bug.
But how did that ISO standard get approved? Microsoft OOXML as a standard springs to mind.
Undetectable Steganography? Yep, there's an app fo
its too bloody late! those systems are here to stay... and probably see more and more
Safety control systems in the chemical industry have been used for 20+ years. These systems have: - redundant CPU modules (which can be hot plugged) - redundant IO modules (which can be hot plugged) - redundant communication systems - self diagnostics (can detect a failed output transistor) - internal diagnostics (CPU voting to detect failed CPU core) - standard algorithms for redundant transmitters Shutting down is the "safer option" however there is still risks (such as thermal stressing pipework). It is a lesser of the two evils problem. This stuff is bread & butter for the chemical industry, there are a number of control companies that refuse to deal with the nuclear industry due to the requirement for unlimited indementy. ZombieEngineer
With the limited info and reporter's interpretation of hearsay....
Likely what happened was someone who was working on statistical process analysis figured they could optimize the process and save a few $$$ if they could have the *ControlSystem update it's control loops with a data calculated from this data server. The *Controls team dudes said "Sure, we can update the control with that variable, meanwhile if the data exchange goes unhealthy, we'll drop the value to zero (or other bound) which will still keep our fail-safe intact and trip the system."
But they didn't bother to do due diligence to verify the reliability & availability of the *ControlSystem was impacted from this new single point failure.
What might be scarier is this lack of scrutiny to a *Controls change implies they could have missed a failure mode that could lead to a fail-unsafe condition. Then the only saving grace is that there are multiple independent protection systems which are watching the critical process elements that aren't typically tied to the same interchange network or can't be updated via a network data exchange.
The article is not clear if just the main *ControlSystem tripped the unit (bad enough), or if the backup protections also thought they needed to trip the unit (really bad).
Chaulk it up to lazy engineering practice and oversight.....and ignorance of networking threats to the *Control network. Sharpen your pencils boys.
-Cheers
I'm really sad you posted AC, because I'm dying to hear some cool shop stories about nuclear reactors scramming.
You can't put too much water into a nuclear reactor
IMHO this should be examined more closely. It may have exposed a dangerous flaw in the software design.
Surely software design is diagrammed, studied and HAZOPed as much as the average P&ID?
Georgia's in Florida, dumbass!
Safety concerns could be reduced dramatically
if the operators and their families were
required to live within walking distance
of the plant they operate.
There are such requirements in the US, be they for SIL ratings, performing haz-op reviews, etc. Particularly in nuclear apps.
In a plant, not all control systems are SIL rated, but the safety backups usually are....though more and more operators are buying or upgrading to SIL qualified systems and extending SIL to other than just the safety and protection backups.
In this case, the engineers were probably asleep at the wheel and didn't realize the changes they made to the control software impacted the trip & protection systems, so didn't bother to even have a haz-op review prior to making the change to get updates to a control parameter (or set of parameters) from a networked device. They probably figured they were just adding a trim or tuning variable of some kind to the control loop and didn't do ANY real failure analysis.
Oops.
Oh well, time for all the governing bodies like the NRC to get out the microscopes and take a peek at the plant's operating procedures and engineers adherence to them.
Cheers
Cyber cyber cyber. Just when you thought that was going out of style!
It doesn't really matter in this case if the operation system is looking at plant data from a minor monitoring system. What is troubling here is that it's completely reliant upon this minor monitoring system. If this box someplace is so important as to cause a emergency shutdown in a nuclear power plant then one would think there would be a backup system that comes into place when the primary monitoring system goes down. Did they think this box would never have a hardware failure? That it would last forever as some kind of cosmic perpetual motion machine? I am very worried that operations management systems like this even get implemented in high security and important locations such as a nuclear power plant. Looks like it's time to higher a better and more intelligent Information Systems and Network Manager.
Well, theres ya a step in a "firesale".
It only has one computer checking the water reservoir??!!??!!
Doesn't sound very safe to me.
"Personally, I don't think letting devices on a critical control system accept data values from the business network is a good idea.""
Yeah, well that's why you obviously wouldn't succeed in business. You can't seem to grasp that things like time-to-market and pleasing focus groups are far more important than piddly little things like that. Geez.
Oh, you're not stuck, you're just unable to let go of the onion rings.
yes the system ultimately made the right choice, shutting down with a perceived loss of critical information.
however, this was a best choice response to a poorly engineered shutdown system.
a properly designed critical shutdown system would have completely independent sensors, for exactly this reason. by design, no external system (i.e. business network data collection) should be able to compromise the integrity of a safety system in any way. Safety systems are designed to be redundant within themselves on many levels so that even if some link in the chain were to fail, there's another link waiting to take it's place until repaired. Business systems, and often standard control systems, do not have that sort of availability/reliability, and so should have not part of the safety system.
Yeah, those bastards, the way they used THE SLIGHTEST AMOUNT OF CARE in designing a system that shuts down in response to unexpected data so as to avoid RECKLESSNESS with the SAFETY OF OTHERS. And to top it off they had the gall to report it instead of covering it up.
One needs an odd number of monitoring systems, so two will not suffice. If there is only two and they both report two different things then one still has to shut down the plant until things are sorted out. In fact, now that there is two systems, there is twice the chance of something going wrong. If there are three systems then the majority wins. There is still the problem of what to do with the bad system as hot swapping a new one in has the potential to bring everything down. It would also be a nightmare to test all the failure conditions. For example, one of the early shuttle launches was scrubbed because when all the redundant computers tried to synch up, the clock signals edge appeared and it had only been designed to deal with the high and low states of the signal and all the testing never encountered this condition. Redundant systems can also give a false sense of security if they are not maintained independently. For instance, if someone makes the same mistake to all of them or a batch of defective parts is used on all of them then they can easily all fail for the same reason within a short timespan of each other.
Why is it every time anything goes even slightly off-optimal in the nuclear industry it's news? Every day thousands are injured and killed in and by other industries and nuclear power just keeps on quietly pumping out the mega-watts but if somebody sneezes in the control room everyone in the world knows about it.
Haven't they ever heard of redundant systems? I would have thought that having more than one controller on vital equipment was obvious. Of course, there is another kind of redundancy that might become relevant for the responsible engineer; although I am not sure I think the guy should be fired - knowing how finances trump security, safety and common sense in most companies, he probably wasn't given the resources necessary.
"And then it rebooted" What? But that means.. Oh my god, please please do not tell me that in our nuclear power plants they run WINDOWS on critical parts of the system? Are they f...ing stupid? It is even dumb enough to run it at home, when you want a stable system, but there.. I try to ask our plant what they use and if I get the same answer I really considering to move my home to the place where I am most far away from it (and another). :(
Ubuntu, a terminal, Python and Slashdot. Thats all you need.
Don't forget the gun.
Patents Drive Free Software as Hurricanes Drive Construction Industry
Let me guess, a Nuke plant dependent on a machine running Windows?
Time to make that sign, "The end of the world is near!" LOL!
It's not a question of if the shutdown was the right think to do. It definitely was, the system thought there were a problem, it shutted down.
The problem here is that the system acted on corrupt data thinking it was the right data. It could (almost) as easily keep running when it should fail, if the opposite data was fed in a cooling failure situation.
_Renewable energy_ is the way of the future. Cue nuclear energy fanboys flames in 3.. 2.. 1..
This looks like USA almost made a nice little Chernobyl for itself :)
But it will be when Windows 7 comes out.
Slashdot. Unreadable news to annoy nerds. - wonkey_monkey
Why are they running a nuclear reactor on Windows?! Are there any other operating systems that automatically reboot after an update?
Someone tell me how I can protect myself and my family from stupid fucking people, please?
----
"We regret to inform you that we've melted half the planet because we were stupid."
All nuclear power plants have simulators to train the operators. They should all be required to test all changes and software updates in the simulator environment before installing them on line.
That implies that the scope of the simulators may need to be expanded. In addition to training operators, they should become a sandbox for testing any and all things connected directly or indirectly to operational systems.
If business systems become interconnected with operational systems, then the business systems too must be replicated in the simulator environment. That might become a very onerous requirement, but that very difficulty could have a benefit. Architectures that prove to be very onerous to duplicate in a simulated environment, should be rejected and redesigned.
It seems that potential loss of electric production running several tens of thousands of dollars per hour would be enough reason to maintain a complete simulation system with a match for every online computer and another computer simulating the reactor. All software changes ought to be wrung out thoroughly before trying them on the actual control systems.
"Part of the challenge is we have all of this infrastructure in the control systems that was put in place in the 1980s and '90s that was not designed with security in mind, and all of sudden these systems are being connected to [Internet-facing] business networks" said Brian Ahern, president and chief executive of Industrial Defender Inc., a Foxborough, Mass.-based SCADA security company
No, the problem is putting 'computers' on the Internet that were most certainly designed with security in mind, something the 'computers' most certainly fail at. To put in bluntly, running your SCADA units on Windows over the Internet is the dumbest thing I ever heard of. And that they are still running such designs five years after the great blackout of 2003, demonstrates incompetence and neglience boarding on the criminal
davecb5620@gmail.com
"Good enough evidence for me! Microsoft caused a nuclear meltdown! Quickly, to the Blogo-Sphere!"
.. include explicit mention of some unknown 'computer problems' at FirstEnergy, the Ohio utility thought to have triggered the regional power failures, in those preceding hours"
That's only funny if it wasn't even partly true. But here's something really funny:
"The Slammer worm penetrated a private computer network at Ohio's Davis-Besse nuclear power plant in January and disabled a safety monitoring system for nearly five hours"
"TRANSCRIPTS of telephone conversations between utility operators
davecb5620@gmail.com
"This is why you keep the IT nerds away from the process network"
.. :)
What's a 'process network' and who exactly do you get to fix your 'process network' ?
"I've had a whole plant lose view of it's system because some well meaning retard in IT decided to push updates onto a SCADA system without qualifying the updates....... never had it KILL the control side of things though....well done whoever you were, you've done well"
Assuming the above anecdote was even true, such an incident would never occur on a live system, and I'll tell you for why. You never update a life system - got that - never. At least in any competently run IT department.
What I suspect happened in the Georgia nuclear power plant was that some automatic patch process broke the 'computer', the rest of the story is just so much smoke screen.
re: 'qualifying the updates': Just how exactly do you qualify an update. Is an update the same as a patch or a bug fix. What motivates you to apply such qualifying updates. I mean if the computer ain't broke, then don't fix it.
If it's security updates then why bother, I mean if the 'process network' is secure you wouldn't need to. I would have thought they used end to end gateways running on embedded hardware, providing a VPN connection to the SCADA units.
But then again I only ever provided IT services to the double glazing sector, and what do I know
davecb5620@gmail.com
Four windmills shut down in an emergency data failure today. Then we turned them back on. No nuclear risk. No threat.
How much did the coal industry pay for this article? Notice all the coal billboards and TV ads in the northeast? NRC reports are becoming fuel for anti nuclear FUD articles.
Actually yes you are correct. You would need at least three monitor sensors for each data value. However for the computer reading data from these sensors you only need a single backup system. It should be high enough priority to replace and bring online the original computer system in at most a 24 to 36 hour time frame. If you were very paranoid then yes perhaps three computer systems might be needed. However to triple check data values you only need an odd number of sensors not necessarily an odd number of computer systems who read the values of the sensors. Good eye however in pointing out the odd number needs.
Personally, I work at a nuclear plant and I know that the NRC requires that business computers cannot send data to control system computers. Each control system has to be isolated via a firewall which only allows data to be sent from the control network out. No data is allowed to be sent in.
I saw UAC, I was thinking ID's DOOM game... but-- Microsoft UAC-- much better fit for the situation...
every day http://en.wikipedia.org/wiki/Special:Random
There are several measures of feedwater flow and reactor water level that a chemical monitoring program should never have been able to override.
I am a name troll of Westlake. Visit my homepage to learn why.
Actually, this was a fail-safe incident. Something in one of the monitoring systems screwed up - resetting data. in this situation the only logical safe thing to do is shutdown, because you no longer no what the real state of your system is.
Example: 3 Mile Island had a water sensor in a drain trap (yeah I know BRILLIANT). This sensor is the one the engineers were reading to "know" they had water in reactor. Meanwhile all the water boiled out due to a jammed pressure relief valve. Had the engineers bothered to check one of the other water sensors earlier, they would never had been within 45 minutes of a total and complete meltdown - far worse than Chernobyl. So, I'm rather glad that this reactor took the Human element out and forced them to look at more than just the one gauge they look at, because "that's the way we've always done it".
Unbelievable, I've seen more through testing on a treatment plant computer system. EVERYTHING is tested thoroughly before implementing updates. Don't take water treatment plants lightly, they are the one process equal in complexity to a nuclear power plant.
EP
"A process network is the network the SCADA/DCS system and it's physical controllers sit on, usually segregated from corporate LAN"
You do seem to have implimented a solution, which begs the question as to why the rest of the power industry haven't also done it. See here where their still using the Internet to relay SCADA data. They obviously don't use your methodology. Just who do you work for again?
"the vendor usually checks out stuff like windows updates, and assesses the impact in the system"
How do you checkput a service pack without installing it on a live system, when a service pack breaks something, the usual solution is ti reinstall, reinstall, reinstall. Again you do seem to have thought of a solution that would be of use to the rest of the IT industry.
"Updates are usually installed to fix/improve system operation"
My understanding is that service packs are provided by the software vendor in response to general issues, and not specifically to correct problems in a specific installation. In nix land, if there's a bug then you can directly contact the programmers and get specific solutions to your problem. I guess it's a different mind set.
davecb5620@gmail.com
Your statement needs qualifying.
Solar. Wind. Hydroelectric. Geothermal. Four other choices. These are available but do not produce the sheer quantity of energy as a nuclear reactor. Fusion will, theoretically, but it is stil in the theory stages.
Though you are right, the economical, low carbon emission choice in practice now is nuclear.
TANSTAAFL GIGO Acronyms to live by!
Unlike most posters here, I have actually been a Reactor Operator, albeit many years ago. Yes, you want any questionable signal from primary systems to initiate a scram. But I question why any primary system was connected to or using data from such a questionable source. In my experience, all such sources were hard cards - no software involved. Obviously "modern" plants have become too modern for my tastes.
For those who wonder what a scram is - it's from the early early tests where the rods were actually pulled by a human tugging a rope attached to a pulley. Once pulled, the rope was tied off, and he stood buy with an axe. If the pooh hit the rotating blades, he chopped the rope. Super-Critical Reactor Axe Man = SCRAM.
Edwin I. Hatch Nuclear Power Plant
has two General Electric boiling water reactors,
a type of nuclear reactor developed by the General Electric in the mid 1950s.