Software Update Shuts Down Nuclear Power Plant

Install Complete... by Anonymous Coward · 2008-06-06 12:02 · Score: 5, Funny

Must restart reactor to complete software installation.

[Yes] [No] [OMFG!]

Re:Install Complete... by sharkey · 2008-06-06 12:21 · Score: 2, Insightful

What, did they change the phone number in Dial-Up Networking?

--

--
"Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
Re:Install Complete... by Anonymous Coward · 2008-06-06 12:34 · Score: 3, Funny

Looks like the plug and play device 'Nuclear Reactor' is not fully SP3 compatible...
Re:Install Complete... by SlashWombat · 2008-06-06 12:41 · Score: 2, Funny

CTRL-ALT_DEL -> Kaboom.

or perhaps just another variation on the BSOD (Blu Screen Of Death)
Re:Install Complete... by arbiter1 · 2008-06-06 12:54 · Score: 1

i think box would say [No] [OMFG i am gonna get fired]
Re:Install Complete... by Anonymous Coward · 2008-06-06 13:25 · Score: 1, Funny

DRIVER_IRQL_NOT_LESS_THAN_OR_EQUAL
driver: cooling_system.sys

Operator: "Ohcrapohcrapohcrap! runningrunningRUNNING!"
Re:Install Complete... by fm6 · 2008-06-06 14:20 · Score: 2, Funny

Nuclear plants don't go "kaboom". They just emit a pleasant glow.
Re:Install Complete... by JustOK · 2008-06-06 15:24 · Score: 1

UAC jokes ensue... Cancel or Allow?

--
rewriting history since 2109
Re:Install Complete... by Joe+Jay+Bee · 2008-06-06 15:54 · Score: 2, Interesting

Tell that to the people who worked at Chernobyl.

Oh wait, you can't; they were all blowed up. ;)

--
I write bullshit
Re:Install Complete... by CycoChuck · 2008-06-06 16:41 · Score: 1

Looks like the plug and play device 'Nuclear Reactor' is not fully SP3 compatible... They installed Vista but only had XP drivers.

--
Windows is as solid as quicksand.
Re:Install Complete... by ben2umbc · 2008-06-06 17:43 · Score: 1

Its just that warm fuzzy feeling as your skin starts to melt...
Re:Install Complete... by sgant · 2008-06-06 18:34 · Score: 1

Yes, but in relative terms it was a small "kaboom" and not the "kaboom" many people associate with "nuclear kabooms".

--

"Leo Fender was in a 'state of grace' when he designed the Stratocaster." -- Paul Reed Smith
Re:Install Complete... by SlashWombat · 2008-06-06 19:42 · Score: 3, Informative

Obviously, you have never seen picures of Chernobyl. While it wasn't like an atomic bomb, it certainly went KABOOM. It blew a several hundred ton metal lid clean off the reactor, and demolished a fair percentage of the building containing the reactor core.
Re:Install Complete... by BlueParrot · 2008-06-06 22:19 · Score: 1

Actually they weren't. The explosion destroyed the reactor lid and parts of the reactor building but didn't cause much damage to the plant's control room. It was mainly radioactive particles emitted during the subsequent meltdown that killed people.
Re:Install Complete... by kitgerrits · 2008-06-06 23:35 · Score: 1

Correction"
[Yes] [No] [File not found]

--
"I was in love with a beautiful blonde once, dear. She drove me to drink. It's the one thing I am indebted to her for."
Re:Install Complete... by tristian_was_here · 2008-06-07 01:14 · Score: 1

It wasnt Vista they updated, it was probably itunes

Hmmm, threw an exception by Anonymous Coward · 2008-06-06 12:03 · Score: 5, Insightful

I'd rather it shut itself down then suffer major failure.

Re:Hmmm, threw an exception by xlv · 2008-06-06 12:44 · Score: 5, Funny

I'd rather it shut itself down then suffer major failure. Personally, I'd rather it doesn't suffer a major failure at all, whether it's after a shutdown or not. Oh you meant than and not then, never mind...
Re:Hmmm, threw an exception by Chandon+Seldon · 2008-06-06 14:34 · Score: 1

Surely education levels haven't slipped that far.

They easily could have.
In a couple of common accents, the words are pronounced the same. Further, this is the internet - not everyone speaks English as a first language.

--
-- The act of censorship is always worse than whatever is being censored. Always.
Re:Hmmm, threw an exception by CanadianRealist · 2008-06-06 14:57 · Score: 3, Funny

Come on people think about it, "Anonymous Coward" is a pretty English sounding name. I bet English is his first language.

Suppose, for example, that his first language was French, then he'd likely have a name like "Caword Anonoumouse".
Re:Hmmm, threw an exception by pipingguy · 2008-06-06 15:31 · Score: 1

The phenomenon is similar to the misuse of 'their', 'they're', 'there' (all of which sound the same to me as a non-accented Canadian) and 'we're', 'where' and 'were' (each of which is pronounced differently - to my ear). If you listen to Brits speaking, they often make the latter three words sound the same.
Re:Hmmm, threw an exception by iocat · 2008-06-06 17:03 · Score: 1

I like their Reactor's logic...
IF [anything seems fucked up] THEN SHUT DOWN REACTOR
Much rather have a business PC take the reactor offline than have anything *not* take the reactor offline when it needs it. I 3 nuclear power, but they need a better PR person. The spin here should be: "Look how safe nuclear power is! We'll go offline at the drop of a hat if we need to, etc."

--
Dude, I think I can see my house from here.
Re:Hmmm, threw an exception by zippthorne · 2008-06-06 18:21 · Score: 1

If his first language was french, his name would be more like, "Français anonyme."

--
Can you be Even More Awesome?!
Re:Hmmm, threw an exception by ShadowFalls · 2008-06-06 21:15 · Score: 1

Or as the one error I ran into with Windows Server 2003, "Catastrophic Failure". That sounds about right.
Re:Hmmm, threw an exception by Gnavpot · 2008-06-06 22:12 · Score: 1

Further, this is the internet - not everyone speaks English as a first language.

I am pretty sure that the then/than mistake is mostly done by native English speakers. The rest of us have learned most English words by reading them instead of hearing them.
Re:Hmmm, threw an exception by Hognoxious · 2008-06-06 23:10 · Score: 1

In a couple of common accents, the words are pronounced the same.
Such as? South African perhaps (efter ell, they enly heve wen vewel) but that's hardly common. Unless you live in South Africa, I suppose.

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Re:Hmmm, threw an exception by Danny+Rathjens · 2008-06-06 23:20 · Score: 1

Words our stored in our brain related by sound in addition to semantics; especially when it is our first language. When you are speaking out loud and you retrieve a word from your brain and say it out loud it doesn't matter if you retrieved the wrong homonym because people can't tell. If you are typing, however, this mistake shows up quite often. And spell checking software doesn't catch it because we are typing real words, simply the wrong words. :)

Furthermore, certain types of people - namely, a lot of us computer programming geeks - are more detail oriented than most so the incorrect words stands out to us even more than most people. That's a useful quirk when finding a single semi-colon or comma out of place can fix a bug in your program, but it makes reading normal written speech with all its warts and blemishes a bit annoying. 8^) The one that most annoys me is when people type "your" instead of "you're" or "you are".
Re:Hmmm, threw an exception by Danny+Rathjens · 2008-06-06 23:23 · Score: 1

No. I didn't type "our" instead of "are" on purpose. But let's just pretend I did, ok. :)
Re:Hmmm, threw an exception by jimicus · 2008-06-06 23:35 · Score: 2, Informative

I like their Reactor's logic...

IF [anything seems fucked up] THEN SHUT DOWN REACTOR

Friend of mine used to work in a nuclear power plant and that was basically how everything was set up. The staff were essentially there to prevent the reactor shutting itself down.
Re:Hmmm, threw an exception by dwater · 2008-06-07 00:36 · Score: 1

> 'we're', 'where' and 'were' ... If you listen to Brits speaking, they often make the latter three words sound the same.

I resemble that remark. I think they all sound different. no matter who's speaking. I don't know how to write this phonetically, but this is pretty much how I always have heard them :

we're - weer
where - ware
were - wirr

--
Max.
Re:Hmmm, threw an exception by Laur · 2008-06-09 02:29 · Score: 1

Suppose, for example, that his first language was French, then he'd likely have a name like "Caword Anonoumouse".
An anonoumouse once bit my sister.

--
When you lose something irreplaceable, you don't mourn for the thing you lost, you mourn for yourself. - Harpo Marx

Critical Update by Enderandrew · 2008-06-06 12:04 · Score: 5, Funny

Adds a whole new meaning to "Critical Update".

--
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.

Re:Critical Update by pyxl · 2008-06-06 14:13 · Score: 3, Funny

Supercritical update.

--

Given enough hydrogen, just about anything is possible.
Re:Critical Update by rhdaly · 2008-06-06 14:37 · Score: 1

Responding to undo an accidental downmod. Please ignore.

--
0 bottles of beer on the wall, 0 bottles of beer, take 1 down, pass it around, 4294967295 bottles of beer on the wall.
Re:Critical Update by CycoChuck · 2008-06-06 17:33 · Score: 1

Was this update really needed? Or did some tech just install it because it was available and M$ said it was critical?

--
Windows is as solid as quicksand.

Lesson learned: by Aaron32 · 2008-06-06 12:05 · Score: 1

When updating the computer that controls the entire facility, HAVE AN UNDO PLAN!

Re:Lesson learned: by J'ai+Friedpork · 2008-06-06 12:07 · Score: 2, Funny

Actually, I think that the lesson learned here was "when dicking around with the boss's computer, make sure it's not plugged into anything important first."

--
Took this comment seriously, did you?
Re:Lesson learned: by Anonymous Coward · 2008-06-06 12:17 · Score: 3, Insightful

However useful a tip that may be, it has nothing to do with this incident. You clearly never even made it to the article summary, let alone the actual article.

"... when the updated computer rebooted, it reset the data on the control system, causing safety systems to errantly interpret the lack of data as a drop in water reservoirs that cool the plant's radioactive nuclear fuel rods. As a result, automated safety systems at the plant triggered a shutdown."

From that snippet alone, it stands to reason that _any_ reboot of the computer would have caused this reset in at the control system. Nor is this at all surprising; go reset any data collection system connected to controller software for any sort of industrial process and see if the controller doesn't receive spurious data.

To me this is an example of the automated system doing it's job. "Hark! I am a coolant reservoir monitor and I have reason to believe there may be a loss of coolant inventory. Time to trip the system."
Re:Lesson learned: by bluefoxlucid · 2008-06-06 13:13 · Score: 4, Informative

No, it has no reason to believe the coolant system has water. It's called FAIL SAFE; if I'm not quite sure, then fuck it, back off and shut the grid down and go MAKE SURE everything looks right.

The proper response of a nuclear cooling system to not knowing whether or not it's working correctly is not "let's keep running hot and see if more sample data comes across."

--
Support my political activism on Patreon.
Re:Lesson learned: by Kavli · 2008-06-06 21:28 · Score: 1

When you can inject data from the business network to change the behaviour, then I raise the question if it's really fail safe.
--What if the water level was really low and the injected data told it was normal?
I have no information that this is possible. Hopefully there are other sanity checks between sensors and accumulated data, but still...

Fail-Safe by lobiusmoop · 2008-06-06 12:07 · Score: 4, Insightful

Personally, I am reassured that these reactors are designed to shut down at the drop of a hat. This is not a situation were fuck-ups should be masked, any discontinuity, however minor, really needs to be highlighted and dealt with immediately.

--
"I bless every day that I continue to live, for every day is pure profit."

Re:Fail-Safe by Sitnalta · 2008-06-06 12:17 · Score: 2, Funny

Yeah, but you don't want the reactor shutting down because the computer system is shit. That is most definitely not reassuring to me.
Re:Fail-Safe by snkline · 2008-06-06 12:23 · Score: 4, Insightful

Umm, yes you do. If something in the system is shit, you don't want the reactor ON!
Re:Fail-Safe by NMerriam · 2008-06-06 12:25 · Score: 3, Insightful

Yeah, but you don't want the reactor shutting down because the computer system is shit. That is most definitely not reassuring to me.

On, the contrary, shutting down because the system is shit sounds like a much better option than continuing to run despite the shittiness of the computer monitoring everything.

Of course, the ideal situation would be to have good computers that only get updated in scheduled, planned ways so that you don't have the issue at all. But shutting everything down when something is amiss is the only sensible response.

--
Recursive: Adj. See Recursive.
Re:Fail-Safe by Skrapion · 2008-06-06 17:07 · Score: 3, Insightful

That's my point. I don't want a reactor with ANY flaws. No matter how safe its default shutdown threasholds are. And I'd like to be king of all Londinum and wear a shiny hat.

Systems without flaws will never exist, so we need to design systems that do reasonable things when they encounter flaws.

In this case, the flaw wasn't even caused by the machines, but instead was directly caused by the "fleshy" parts of the system, and the machines still managed to handle the problem safely.

--
The details are trivial and useless; The reasons, as always, purely human ones.
Re:Fail-Safe by distantbody · 2008-06-06 18:31 · Score: 2, Insightful

The problem isn't that it shut down-- that's fine; the problem is that a software update for a nuclear power plant was actually allowed to produce an unexpected/unplanned event!
Re:Fail-Safe by doom · 2008-06-08 02:15 · Score: 1

software update for a nuclear power plant was actually allowed to produce an unexpected/unplanned event

Yes, software isn't just used to create cute bouncy cat icons for your web page, it's actually used as a component in critical safety systems. And when something is screwed with that safety system you want it to fail safely, which is what has happened here.
We might speculate that they need a better test setup (a network of machines that models the concurrency issues in the actual plant better), but I'm sure that's obvious to them too now, after the fact.
(I think some people here are confusing the internal network of the plant with the internet.)

D'oh! by file_reaper · 2008-06-06 12:07 · Score: 1

Surely this computer thingy must be the same as my home computer thingy....it always works when I turn it off and on again.

Sure glad the safety systems kicked in as per normal.

How could NRC even allow this in the first place? by McNihil · 2008-06-06 12:09 · Score: 1

As a regulatory wouldn't there be some check and balances to keep critical systems being on their separate ring and not on directly interdependent?

This is beyond incompetence... it is gross negligence.

Oblig Simpsons reference by J'ai+Friedpork · 2008-06-06 12:11 · Score: 5, Funny

"Vent radioactive gas? Venting gas prevents explosion. [Yes / No]"

--
Took this comment seriously, did you?

Re:Oblig Simpsons reference by Phanatic1a · 2008-06-06 12:17 · Score: 2, Funny

I'm impressed that for once dad's butt prevented the release of toxic g-
Re:Oblig Simpsons reference by truthsearch · 2008-06-06 12:38 · Score: 1, Redundant

Hey! All I have to type is Y. (To Marge) Hey, Miss Doesn't-find-me-attractive-sexually-anymore: I just tripled my productivity!

--
Developers: We can use your help.
Re:Oblig Simpsons reference by j79zlr · 2008-06-06 13:13 · Score: 1

I wash myself with a rag on a stick.

--
I'm not not licking toads.
Re:Oblig Simpsons reference by Redfeather · 2008-06-06 13:48 · Score: 1

You don't have enough vespene gas!

--
Those things you're doing with that stuff you just bought? That's not what it's for! -
Re:Oblig Simpsons reference by Annymouse+Cowherd · 2008-06-06 14:45 · Score: 2, Funny

We require more vespene gas, noob.
Re:Oblig Simpsons reference by CycoChuck · 2008-06-06 16:39 · Score: 1

To start, press any key.... Where's the any key?

--
Windows is as solid as quicksand.
Re:Oblig Simpsons reference by Acapulco · 2008-06-07 00:03 · Score: 1

"En - Oh"

--
Slashdot. Unreadable news to annoy nerds. - wonkey_monkey
Re:Oblig Simpsons reference by WK2 · 2008-06-07 04:16 · Score: 2, Funny

My wife for hire!

--
Write your own Choose Your Own Adventure. http://www.freegameengines.org/gamebook-engine/

More like bad system design by aliquis · 2008-06-06 12:11 · Score: 1

To me it sounds much more like they have a bad system design if it's impossible to reboot one of the machines / it can't run with one of them offline. Not something which are to blame on the software update (shouldn't such things be expected anyway?)

I guess "software update" can have been used to bash Microsoft a little or something, not that it say windows update, or maybe the poster hates all kinds of software updates?

Re:More like bad system design by RiotingPacifist · 2008-06-06 12:20 · Score: 4, Insightful

The only safe way to update a system is a reboot, sure you CAN do some stuff on linux bsd etc to avoid having to reboot( hell this was probably running some unix derivative so it was probably possible to do the update without rebooting), but you wouldn't want to run the risk of introducing an unchecked bug by doing a live update. when your choices are:
a) high chance of accidentally shutting down a reactor harmlessly
b) small chance of fucking up a nuclear reactor
you'll always go for (a), if your sane.

--
IranAir Flight 655 never forget!
Re:More like bad system design by VENONA · 2008-06-06 19:20 · Score: 2, Insightful

"It would be even better if the control network had a web server"

Probably not. Web servers are complex, and likely targets for attack. And the business people will end up doing endless cut and paste.

A better solution would be to accumulate the data that the businesspeople need on a single system on the control LAN. That system rsync's CSV files onto a system on the business LAN. No connections are initiated from the business LAN into the control LAN, and the data are more useful to MIS people on the business LAN.

--
What you do with a computer does not constitute the whole of computing.
Re:More like bad system design by ralphdaugherty · 2008-06-06 21:03 · Score: 1

well, one way to synch the data is restart the reactor.

sort of a primitive way to do it though.
Re:More like bad system design by aliquis · 2008-06-07 02:03 · Score: 1

Yeah, but the only thing i meant was that they should have a second system which could handle the same things while the first one was offline due to the reboot.
Re:More like bad system design by aliquis · 2008-06-07 02:08 · Score: 2, Informative

Bull shit, Apache with PHP, CGI, some database backend and so on may be vulnerable together with whatever page it runs. But just a simple network application which answers a page request and returns a fixed page with no scripts or nothing are neither complex or a likely target.

Just run a script not running in the webserver process which updates the fixed webpage and be done with it. Feel free to tell me how anyone would play around with that solution ...

"Endless cut and paste", yeah, because it's sooo impossile to have a perl script or whatever fetch the webpage and cut out the data you care about? Thought sure random form of read only network solution may be even better than something web based. I'm sure the GP was ok with such a design aswell.
Re:More like bad system design by VENONA · 2008-06-07 04:21 · Score: 3, Insightful

Simplicity is better than complexity if you're really after security. You could write a small Web server, which did nothing more than respond to HTTP requests, which was provable secure. It's been done. But it's also one more piece of software that has to be maintained. Or use a large Web server, such as Apache. It's been a long while since there was a remote exploit against Apache, when it was simply serving static pages. A DOS attack might still be possible, but that shouldn't accomplish anything but revealing the attack, as long as software running on the systems on the control LAN, which update the data host, don't become wedged if the data host becomes unresponsive. Which you would still want to test for, BTW.

*However*, one of the more powerful ideas in configuring highly secure LANS is that the more-secure LAN is simply never allowed to accept connections from the less-secure LAN. It's also something that's really easy to firewall, your network becomes easier to audit, etc. If you're a security practitioner, it makes your life easier. You still have to worry about the sneaker-net, physical security, etc., but now you're more able to focus your resources on those areas. Once again, simplicity is better than complexity if you're really after security.

I don't know where you got the idea that I thought it was, "sooo impossile to have a perl script or whatever fetch the webpage and cut out the data you care about." It's easy. But pretty much nothing is as easy to extract data from as a CSV file, which you could process with nothing more than awk. That doesn't get you far with automating report generation, populating a database, or whatever else you intend to *do* with the data, but there are endless tools for those jobs--Perl included.

Also, in my experience, people want to mess with Web pages. They're more visual, and people tend to want to 'improve' them, meaning your Perl screen-scraper likely has to change as well. I see a lot less clamor for changing the data format in CSV files.

In the end, use what you need--XML, for all I care. Just *don't allow your less-secure LAN to initiate connections into your more-secure LAN*. That was the root cause of the failure described in TFA. It's one of many reasons the rule is so basic, though obviously not yet widely-enough followed. Ideally, hosts on a secure LAN communicate with *nothing* outside that LAN. You justify and document every[1] step away from that ideal, if for no other reason than that it plays hell with formal trust models, which can be important inputs into designing a thorough audit. I don't see how you justify accepting incoming traffic when there's an easy way to avoid it. In an audit, I'd be busting you for that Web server. Simple as that.

An approach like the one above is likely to make life easier for several internal groups, including office staff. And quite possibly the ultimate users--power consumers.

[1] I mean every, not most. For example, how do you handle time? I favor an NTP server on the secure LAN taking time inputs from the GPS cloud. I've never worked for an organization that had a spare atomic clock lying around, or I'd have used that, and eliminated one more external data flow.

--
What you do with a computer does not constitute the whole of computing.

Misreading of the Article by Anonymous Coward · 2008-06-06 12:12 · Score: 5, Interesting

"Personally, I don't think letting devices on a critical control system accept data values from the business network is a good idea." The article did not say that the data values were being read from the machine that was rebooted. It actually said that the rebooting triggered a problem in which values could not be read.

I wonder if they were using something like EPICS. I worked on a large experiment which used EPICS to control the system. Rebooting a machine would sometimes expose a problem with resources not being freed, eventually leading to a situation where data channels would read the 'INVALID/MISSING' value. The solution, as anyone who has worked on this sort of experiment will know, was to reboot more machines until the thing worked. ;-)

(I don't mean to complain about EPICS. It is very powerful and flexible... it's just that the version we used had these occasional hiccups.)

Re:Misreading of the Article by SoapBox17 · 2008-06-06 13:13 · Score: 1

The article did not say that the data values were being read from the machine that was rebooted. It actually said that the rebooting triggered a problem in which values could not be read.
No, actually, the summary says "when the updated computer rebooted, it reset the data on the control system, causing safety systems to errantly interpret the lack of data as a drop in water reservoirs"... That doesn't really have much to do with the reboot itself (causing the computer to be unreachable or whatever) but that the data wasn't persistent. Completely different.
Re:Misreading of the Article by Anonymous Coward · 2008-06-06 13:14 · Score: 1, Interesting

It actually said that the rebooting triggered a problem in which values could not be read.

I feel so fucking vindicated:

Long uptimes are a bad thing! How do you know a configuration change hasn't rendered one of your startup scripts ineffective? If you have to reboot for some unexpected reason, you could be stuck debugging unrelated problems at very inopportune moments.

You need to schedule regular reboots so that you can test that your servers can start up fine at a moment's notice. Long uptimes are a sign a sysadmin hasn't been doing his job.

You're right. While you're on the phone with hazmat explaining that you have a issue with green goo, how about i test the reboots of my PBX before you give your address?

yeah, I run mission critical systems. yes, i have proper redundancy and resiliency systems. Think I'm going to disrupt operations to test my reboots? Hell no. When it comes to public safety, 5 nines is the *only* option.

Looks like necrogram or somebody with his attitude is responsible for this.
Re:Misreading of the Article by beakerMeep · 2008-06-06 13:44 · Score: 2, Funny

So you're saying it was an EPIC auto-fail?

--
meep

Terminal Error by Anubis_Ascended · 2008-06-06 12:13 · Score: 2, Interesting

Reminds me of Terminal Error.

the slashdot crowd is dying to know... by mathfeel · 2008-06-06 12:13 · Score: 4, Funny

did it run Windows?

--
The only possible interpretation of any research whatever in the 'social sciences' is: some do, some don't

Re:the slashdot crowd is dying to know... by Anonymous Coward · 2008-06-06 13:05 · Score: 5, Funny

If it was running Windows the OS is at fault.
If it was running something else then the application was at fault.
Re:the slashdot crowd is dying to know... by Arkaic · 2008-06-07 07:06 · Score: 1

Actually, when I worked briefly for a large company which I shall not name, I discovered that there are control systems which are running Windows 2000. The systems in were question were not in nuclear plants, but they were responsible for equipment which could lead to very bad things happening (think large explosions), if something went sufficiently wrong.

Re:One begs the question by fyoder · 2008-06-06 12:18 · Score: 1

Was it running a Microsoft by-product or not? The article doesn't say. I suppose it could have been Ubuntu, they've had a couple of kernel updates recently, but somehow I doubt it.

--
Loose lips lose spit.

EULA! by bluephone · 2008-06-06 12:20 · Score: 5, Funny

It says right in the EULA that it's not to be used in a nuclear power plant!

--
jX [ Make everything as simple as possible, but no simpler. - Einstein ]

Re:EULA! by ConceptJunkie · 2008-06-06 12:28 · Score: 1

Yeah, but everyone knows those EULAs are unenforceable.

--
You are in a maze of twisty little passages, all alike.
Re:EULA! by cjb658 · 2008-06-06 13:35 · Score: 1

Perhaps they changed it before they installed Windows.

In XP, it's just a plain text file on the CD.
Re:EULA! by Kalriath · 2008-06-06 23:20 · Score: 1

They also clearly aren't using iTunes, which says the same... oh wait, iTunes isn't allowed to be used to manufacture biochemical weapons, not power stations. Sorry, my bad.

--
For a site about things like basic rights, Slashdot users sure do like to censor "dissent".

This was Good by snkline · 2008-06-06 12:21 · Score: 3, Insightful

While perhaps the system should be designed to behave differently, what happened here was a good thing. When things went wrong, rather than the reactor systems freaking out and doing random crap, they were properly designed to shift to a known safe state (i.e. Shut the hell down).

The problem is the update - not business network by markdj · 2008-06-06 12:21 · Score: 5, Interesting

I write this type of software for a living so I know that having a computer on the business network connected to the control computers is a risk, bur that risk can be managed. The problem here is that the software update wiped out the nuclear control system data. This exposes two bad problems. First customers are always asking why they can't update their system while it is still running. We liken that to changing your tire while driving down the road. Secondly the software update did not respect the data in the nuclear control system and synchronized it to new initial data in the update on the other system! Not a good idea. In critical safety systems, you always practice an update before actually doing one.

big increases in your power bill! by Quadraginta · 2008-06-06 12:24 · Score: 3, Insightful

Think about the cost associated with having and maintaining a completely hot-pluggable second control system. How much do you want your power bills to go up to pay for that? And what would be the point?

They have a perfectly adequate safety system that did exactly what it's supposed to do. It read confusing data and decided to shut the reactor down until a human came along and explained things satisfactorily. What's wrong with that? Aside from having the reactor offline for 48 hours, there was no other cost.

Re:big increases in your power bill! by afidel · 2008-06-06 13:49 · Score: 1

Frankly I'm surprised something as expensive as a nuclear plant DOESN'T have redundant control systems with voting. I mean if something as cheap as a commuter jet has it why doesn't a $1B+ plant have it. The fact is taking a large baseload plant offline during peak system can and has lead to cascading failures in the grid, that costs the economy a LOT of money.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:big increases in your power bill! by epsalon · 2008-06-06 14:08 · Score: 1

For a commuter jet there is no failsafe mode. You can't just have the jet "shut down" when something wrong is detected. To deal with emergencies you need all systems functioning until you can get the jet to land.
In a nuclear power plant there is failsafe: drop the control rods. Sure, the plant won't produce electricity, but it will be safe. Compare to an unscheduled emergency landing of a jet.

--

Make even shorter URLs - 8LN.org

"King-size Homer" season 7 episode 7, Nov 5, 1995 by layer3switch · 2008-06-06 12:25 · Score: 4, Funny

"... The move to SCADA systems boosts efficiency at utilities because it allows workers to operate equipment remotely."

Another proof that Homer Simpson was truly ahead of his time.

Are you mad, woman? You never know when an old calendar might come in handy. Sure, it's not 1985 now, but who knows what tomorrow will bring? -Homer

--
"Don't let fools fool you. They are the clever ones."

Working as intended by BlueParrot · 2008-06-06 12:25 · Score: 2, Insightful

The chemical diagnostic data is damn important because it may determine things like corrosion rates and the amount of impurities circulating in the water, potentials for clogs etc... As with all other software, occasionally errors occur, and the appropriate way to respond when it does is to shutdown and blow some whistles as to ensure that the reactor is brought into a safe state before something else goes wrong. This is one of those cases where "Better safe than sorry" is a really rather good motto.

Re:Obligatory by Kamokazi · 2008-06-06 12:26 · Score: 4, Funny

Don't forget about the now mutated sharks living in the coolant water growing frickin' laser beams on their heads.

--
As our way of thanking you for your positive contributions to Slashdot, you are eligible to disable Slashdot 2.0.

well duh by ILuvRamen · 2008-06-06 12:27 · Score: 1

I'm gonna have to agree with that last statement in the summary. Basically under these circumstances, you take out the switch and you take out the plant and I doubt they guard the network closet as well as the reactor core. Plus the whole hacking thing. You really don't need to watch youtube videos and check your e-mail from a control computer and you can bring any actually needed updates and files to it manually via USB drive.

--
Google's Super Secret Search Algorithm: SELECT @search_results FROM internet WHERE @search_results = 'good'

Here's the real story... by ConceptJunkie · 2008-06-06 12:30 · Score: 1

The summary said: when a computer on the plant's business network was rebooted after an engineer installed a software update

We all know what really happened. Dude rebooted the computer so that Windows automatic update reminder to reboot wouldn't interrupt his Solitaire game every 10 minutes.

--
You are in a maze of twisty little passages, all alike.

Re:The problem is the update - not business networ by dissy · 2008-06-06 12:30 · Score: 4, Funny

First customers are always asking why they can't update their system while it is still running. We liken that to changing your tire while driving down the road. Oh sure, NOW you think of a debian slogan ;}

Re:Obligatory by turbidostato · 2008-06-06 12:34 · Score: 1

"Don't forget about the now mutated sharks living in the coolant water growing frickin' laser beams on their heads."

Wow! Imagine a beowulf cluster of these!

Re::O by Lurker2288 · 2008-06-06 12:40 · Score: 5, Insightful

What exactly do you find frightening about an automatic safety system doing exactly what it's supposed to in response to unusual input?

Re:Wow that is so funny by Anonymous Coward · 2008-06-06 12:41 · Score: 4, Insightful

Correct. It is not the better choice. In the foreseeable future, it is the only choice.

Re:No! by Anonymous Coward · 2008-06-06 12:43 · Score: 1, Insightful

Wow, way to parrot the summary.

Re:How could NRC even allow this in the first plac by Lurker2288 · 2008-06-06 12:44 · Score: 3, Funny

"GROSS NEGLIGENCE - Failure to use even the slightest amount of care in a way that shows Recklessness or willful disregard for the safety of others." - 'Lectric Law Library.

Yeah, those bastards, the way they used THE SLIGHTEST AMOUNT OF CARE in designing a system that shuts down in response to unexpected data so as to avoid RECKLESSNESS with the SAFETY OF OTHERS.

Only the biz machine was updated. Why trouble? by Ungrounded+Lightning · 2008-06-06 12:46 · Score: 5, Insightful

Secondly the software update did not respect the data in the nuclear control system and synchronized it to new initial data in the update on the other system! Not a good idea. In critical safety systems, you always practice an update before actually doing one.

I have no problem with a computer on the process control subnet reporting information to a computer on the business subnet.

I have a BIG problem with a computer on the business subnet being able to modify and corrupt data in a computer on the process control subnet.

"I can't dump data to the business side" is a reason to make a log entry and maybe sound a minor alarm. It's not a reason to shut down the reactor (unless the data is needed for regulatory compliance and the process control side isn't able to buffer it until the business side is working correctly.)

But if a business subnet computer can tamper with something as critical as a process control machine's idea of the level of coolant in a reservoir, it rings my "design flaw" alarms.

Is it ONLY able to reset it to "empty" as poorly-designed part of a communication restart sequence? Or could it also make the process control machine think the level was nominal when it WAS empty?

IMHO this should be examined more closely. It may have exposed a dangerous flaw in the software design.

Security flaws don't care if they're exercised by mischance or malice. If nothing else, this is a way to Dos a nuclear plant through a breakin on the business side of the net.

--
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way

This is why... by rat7307 · 2008-06-06 12:47 · Score: 3, Interesting

This is why you keep the IT nerds away from the process network.

I've had a whole plant lose view of it's system because some well meaning retard in IT decided to push updates onto a SCADA system without qualifying the updates....... never had it KILL the control side of things though....well done whoever you were, you've done well.

--
Burma?

Re:This is why... by rat7307 · 2008-06-06 12:51 · Score: 1

...although, after re-reading the story it's a little vauge...was he updating some random PC or was he actually updating the scada/process control software/firmware?

If it's the latter, I feel for him :-) , but you have to do your homework before going all patch crazy!

--
Burma?
Re:This is why... by freedom_india · 2008-06-06 16:04 · Score: 1

No self-respecting IT nerd would suggest a Windows box for such crucial operations.
It must be some half-assed manager sent by the management to "control" costs in a nuke plant because the fat cats could not get enough income to buy another yacht.
And that half-assed manager must have suggested in typical PPT way that replacing Solaris machines with WIndows was cheaper $1500 per seat.

--
"Doing what i can, with what i have." ~ Burt Gummer
Re:This is why... by rat7307 · 2008-06-06 20:39 · Score: 1

90%+ of all Supervisory/SCADA systems run on Windows, sorry to shoot you down.

The controllers may run proprietary O/S'es but nearly all the overarching architecture above the controller layer is Windows based.

I've yet to see a widely used CURRENT saftey management system or SCADA/Hybrid DCS system that isn't running on a Windows platform.

There's still older systems out there, but a lot of then now run under emulation under Windows as the hardware gets rarer.

(I work for a resonably big vendor in this field)

--
Burma?
Re:This is why... by bloobloo · 2008-06-07 06:20 · Score: 1

Foxboro I/A can be bought for either Windows or Solaris - at least that's what I'm using at the moment.

This was not a "fail-safe" incident by Drenaran · 2008-06-06 12:48 · Score: 5, Insightful

The problem here is that the system didn't shut down because it detected an error in the data collection system, instead it incorrectly detected a problem that did not in fact exist and then proceeded to take action. While the engineer in me is fairly certain that the system is designed to always fail to a safe state (as in, any automatic emergency response couldn't accidentally make things worse - at least not without raising all sorts of alarms), it is still concerning that internal control systems can be so effected by external servers.

In the article they mention that the system wasn't designed for security (since it was meant to be internal) - but this isn't a security issue at all! Any sort of system that relies upon other systems should be designed to assume failure can and will occur in other systems - that is not to say that it needs to verify/evaluate incoming data to make sure it is "good", but rather that it can tell the difference between receiving data (such as current water levels) and receiving no data at all (system failure). Once it has that it can ideally automatically switch to a backup system, or do what it did here and enter a fail-safe state (the difference being that it does so while pointing out the actual problem and not a incorrectly perceived problem in a different part of the system).

Re:This was not a "fail-safe" incident by Dachannien · 2008-06-06 15:31 · Score: 2, Insightful

instead it incorrectly detected a problem that did not in fact exist This might be splitting hairs, but I'd say it correctly detected a data inconsistency and responded appropriately. There could be a dangerous condition that is indistinguishable to the failsafe system from what actually happened - and it could be a condition that nobody's ever thought of before. It's far better to trigger the failsafe when a data inconsistency has occurred than to make a potentially incorrect automated judgment concerning the cause of the inconsistency leading to a more severe problem down the road.
Re:This was not a "fail-safe" incident by bendodge · 2008-06-06 16:02 · Score: 1

I don't really agree that lack of sensor data should be taken lightly, but it is interesting to think that wrecking external systems could cause it to shut down. If an enemy can't make it overheat and meltdown, shutting it off is the second best option, esp. if it is powering military operations.

--
The government can't save you.
Re:This was not a "fail-safe" incident by rant64 · 2008-06-07 00:03 · Score: 1

What strikes me as odd is that this computer system, that can ultimately cause the shutdown of the facility, is a single point of failure. The control system relies on critical data from this single computer - why isn't this designed as a majority set with three monitoring systems? This case, one component can fail, or taken offline for maintainance, with two others still providing reliable data. Only when two or more components start reporting different data or no data at all you can call it an emergency.

I know they do this to critical airplane systems. Why not a nuclear facility?

--
MMO Vampire Role Playing

Huh? by DerekLyons · 2008-06-06 12:49 · Score: 1

From TFA

In June 1999, a steel gas pipeline ruptured near Bellingham, Wash., killing two children and an 18-year-old, and injuring eight others. A subsequent investigation found that a computer failure just prior to the accident locked out the central control room operating the pipeline, preventing technicians from relieving pressure in the pipeline.

Huh? I've read the NTSB report on that accident - and nowhere in it (IIRC) are computers implicated. The accident occurred due to damage to the pipes from construction equipment.

Rereading the report[PDF file] pretty much confirms my recollection, the SCADA system was not implicated as a primary or contributory cause of the accident. The SCADA system was malfunctioning at the time of the accident, but did not cause the overpressure, and 'may' have allowed the operators to relieve pressure had it been functioning and had they observed the pressure spike. The rupture was caused by construction damage to the pipeline and a faulty relief valve.

Re:No! by maxume · 2008-06-06 12:49 · Score: 1

It's since been disconnected.

--
Nerd rage is the funniest rage.

Re:One begs the question by CaptainTux · 2008-06-06 12:49 · Score: 1

Remember when the Slammer worm hit the net a few years ago? There was an article in some defense newspaper I saw that mentioned that they were concerned about power generation and management facilities being hit by the worm. So, from that, I would say it's a reasonable assumption that the facility was running some version of Microsoft Windows (probably NT4 or 2000).

--
Anthony Papillion
Advanced Data Concepts, Inc.
"Quality Custom Software and IT Services"

Re:Wow that is so funny by Anonymous Coward · 2008-06-06 12:50 · Score: 5, Insightful

And a shutdown, while incovenient, is not a catastrophe. In fact, it speaks well for the plant's safety that it did automatically shut down when faced with bad data.

Re:The problem is the update - not business networ by Ungrounded+Lightning · 2008-06-06 12:51 · Score: 1

We liken that to changing your tire while driving down the road.

Oh sure, NOW you think of a debian slogan ;}

Good thing it wasn't written in Smalltalk. The slogan there is building the rest of the boat while underway.

--
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way

Where is the redundancy? by JSBiff · 2008-06-06 12:52 · Score: 1, Redundant

The thing I'm a bit puzzled about. . . if this system has data which is so important that the whole plant must be SHUT DOWN for two days if it fails, then why aren't there *at least* TWO of them (I'd say there's a good argument for 3 or 4, but. . .)? That way, you can take one out of the loop for updates, verify the update didn't hose your data, sync the data from the 'live' system, then put it online, take the other one offline, and complete the update on it.

If I were the power co owning this plant, I'd be ticked if the plant was dark for 2 days. With the price of energy these days, and the amount of energy a single Nuclear plant can generate, you're talking some real serious cash when the thing is down for 2 days. Especially if I have to look forward to the same thing happening again, potentially every time our systems need updating (not that it necessarily would happen every time, I would sure hope it wouldn't, but with only one system, every update is a potential for the whole plant to go down for some period of time).

Re:Where is the redundancy? by stephanruby · 2008-06-06 15:03 · Score: 1

If I were the power co owning this plant, I'd be ticked if the plant was dark for 2 days. With the price of energy these days, and the amount of energy a single Nuclear plant can generate, you're talking some real serious cash when the thing is down for 2 days.
Yeah, you'd be pissed, or you could be silently ecstatic too, either way it really depends on how your incentives are structured. If the Enron debacle taught us anything, it was that some power plants made more money for their owners, the more frequently their power plants failed, and the longer they failed.

Re:Obligatory by Alex+Belits · 2008-06-06 12:57 · Score: 1

No, but I have heard that frogs occasionally live there...

--
Contrary to the popular belief, there indeed is no God.

This was NOT a failure! by Anonymous Coward · 2008-06-06 13:00 · Score: 2, Insightful

Before there are too many retarded "OMG why was it on the business network!!!?LOL!??!" comments, I'll cover that right here:

It says the software is supposed to sync data between the control system and the business network. Obviously it has to be connected to both sides somehow. I'm not a power plant designer, but there's probably a good reason why people might need access to that data from the control system, and thus some kind of system acting as a safe bridge between the two rather than allowing unrestricted access from the business network.

The update f'd up and the control network went "Holy crap where did the cooling water go? Abort!" Everything worked like it was supposed to. The failure was caused by not testing the update in a lab environment before applying it to a live system.

Re:This was NOT a failure! by datajack · 2008-06-06 13:31 · Score: 1

Hmm .. not sure why the ops network would have to rely on such data sent from the business network. Monitoring of levels of important stuff is an ops function to my mind.

I'll admit that I'm too drunk to read TFA at the mo, so may have missed some detail :)
Re:This was NOT a failure! by mysidia · 2008-06-06 14:21 · Score: 1

It says the software is supposed to sync data between the control system and the business network. Obviously it has to be connected to both sides somehow.

The WTF here is that the ops network has any sort of data sync'ed from
the business network.
If the data is out of sync, the data from the ops network should always win, and data from the business network simply cannot be trusted!

All the measurement devices should be on the ops network.

The business network's view of the ops network should be read-only.

I.E. there should be signal lines to send data from the ops network to the business network, but no method for the business network to send data to the ops network on an automated basis.

There are security ramifications in allowing any data to be uploaded to equipment on the ops network.

It kinda worked then... by dindi · 2008-06-06 13:00 · Score: 3, Insightful

At least it did not turn it into a meltdown, so at least the safety features worked in the software.

That is definitely a glass half full, as opposed to empty.

MOD PARENT UP! by Lux · 2008-06-06 13:01 · Score: 4, Funny

He's trying to find an opportunity to bash Microsoft!

just to shortcircuit the nuclear hysteria by circletimessquare · 2008-06-06 13:06 · Score: 4, Informative

most freakouts surrounding nuclear power are based on 1960s technology. modern reactor designs, such as pebble bed reactors, are designed to be passively safe. that is, you can just walk away from them, doing nothing, and they will not release gas, go china syndrome, or anything else unsafe. older nuke tech requires active safety management: someone must always be on the job, making sure nothing f***s up. designing safety into nuclear reactor design from the philosophical ground up is the way of the future

--
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it

Re:just to shortcircuit the nuclear hysteria by dbIII · 2008-06-06 13:32 · Score: 5, Insightful

While that may be true the first full scale prototypes of pebble bed are yet to go online - however construction of several in China is at an advanced stage. As Superphoenix showed with fast breeders you really need a full scale prototype to identify all of the problems (it was economic ones that killed fast breeders and not safety issues).
India's accelerated thorium idea is also very promising.
The major problem I see with US nuclear power is the assumption that it is a solved problem and almost zero has been spent on R&D for decades. The "new generation" of reactors from Westinghouse and others is little more than 1960's white elephants painted green.
Re:just to shortcircuit the nuclear hysteria by BlueParrot · 2008-06-06 22:42 · Score: 1

The major problem I see with US nuclear power is the assumption that it is a solved problem and almost zero has been spent on R&D for decades. The "new generation" of reactors from Westinghouse and others is little more than 1960's white elephants painted green.

This may be true when it comes to reactor engineering, but when it comes to the fuel cycle pyroreprocessing has been researched actively at ANL and INL even after the muppets shut down the IFR (much of the work on pyroprocessing was allowed to proceed after the IFR shutdown, except for actinide recovery tho this research was latter resumed under the Gen IV initiative as part of the Advanced Fuel Cycle Initiative program ).

To those not too familiar with it, pyroprocessing is an alternative to aqueous reprocessing of nuclear waste with a few advantages. In particular, since pyroprocessing is based on electrochemical reactions in molten salts rather than organic extraction agents in nitric acid, it doesn't use any water and thus doesn't produce liquid waste. Thus in contrast to the British and French reprocessing plants a pyroprocessing plant wouldn't have to discharge any radioactive water into the sea, instead the salt and fission products is immobilized into a zeolite, forming a ceramic that can be disposed in geological repositories.

Furthermore, pyroprocessing can recover more than 99% of the long lived actinide wastes (performance reported 2007 ), and if you destroy those in fast breeders you are left with waste that decays to uranium levels of radioactivity within a few hundred years, meaning a repository like yucca mountain could easily be more or less guaranteed to contain it long enough.

Finally, if you did use breeders in conjunction with pyroprocessing then simply the amount of uranium left in the nuclear waste from previous reactors would be sufficient for hundreds of years of current energy consumption, thus eliminating the need for uranium mining, enrichment and a second waste repository.

There has been some political opposition to advanced nuclear technologies, but with Oil set to breach $200 a barrel by the end of the year, and with both presidential candidates open for developing new nuclear technology, it appears likely that this research will continue for some time.
Re:just to shortcircuit the nuclear hysteria by dbIII · 2008-06-07 13:34 · Score: 1

The above is an advancement in reprocessing and sounds like an advance in waste disposal until you consider the point Synrock was at two decades ago. Even back before then the people in that project realised that mere encapsulation was a dead end due to problems with leaching. The cutting edge of nuclear waste disposal technology in the USA is two decades behind a poorly funded Australian project (which now works), just as reactor design is two decades or more behind South Africa (pebble bed) and more basic research of reactor concepts is two decades behind India (accelerated thorium). Nuclear research has been starved in the USA to the point where almost all of the advances are made elsewhere and to make things far worse there is a powerful lobbying effort to build old plant which does not work very well as a mechanism to extract a lot of cash from the taxpayer.
Re:just to shortcircuit the nuclear hysteria by doom · 2008-06-08 02:34 · Score: 1

most freakouts surrounding nuclear power are based on 1960s technology. modern reactor designs, such as pebble bed reactors [wikipedia.org], are designed to be passively safe. that is, you can just walk away from them, doing nothing, and they will not release gas, go china syndrome, or anything else unsafe. older nuke tech requires active safety management: someone must always be on the job, making sure nothing f***s up. designing safety into nuclear reactor design from the philosophical ground up is the way of the future

You're fully in sync with the current state of pro-nuclear argument, as far as I can tell (note: I am also a pro-nuclear person), but personally, I've got problems with the line you're taking. The goal seems to be to both reassure people and to subtly flatter them -- yes, you people were absolutely right about nuclear power, but things are different now, the problems have been fixed.
I would argue that the truth is more like: you people were in complete hysterics over nothing, nuclear power has always been one of the safest ways of generating power (e.g. the worst incident in the US released nothing toxic and killed no one, and in comparison coal power spews toxic substances that kills thousands annually). The trouble is it appears to be a complete impossibility to get the American people to admit to themselves that they got something wrong (e.g. look at the current delusional state surrounding the Iraq invasion), so it's probably more politic to press the "new technology" button.
The trouble is that this leaves a fundamental problem in place, and untouched: our collective intelligence is incredibly low. We need some way of improving the way we make decisions.
Re:just to shortcircuit the nuclear hysteria by QuantumPion · 2008-06-09 04:30 · Score: 1

There is another type of reactor that is passively safe, but uses existing light-water technology. It is called the ESBWR. The problem with pebble beds is that there is no industry experience with them, and the cost of designing, testing, researching, manufacturing, and licensing a whole new design is too costly to be worthwhile in the short-term.

The ESBWR is just as passively save as pebble-beds are, but uses a more conventional BWR configuration. It uses natural circulation and has large pools of above-grade water for emergency cooling. The design is so robust that it requires no operator interaction for 72 hours in case of an emergency, and even after that time only minimal action is required.

Re:One begs the question by Viper+Daimao · 2008-06-06 13:09 · Score: 3, Informative

one begs the question...

No one doesn't

--
"In the game of life, someone always has to lose. To me, if life were fair, that someone would always be Oklahoma." -DKR

Re:Wow that is so funny by Anonymous Coward · 2008-06-06 13:10 · Score: 3, Insightful

Agreed. That was good software design to assume a worst-case scenario when the sensors stopped sending in data. The alternative (sending pager alerts or something) would be far worse.

Re::O by 3vi1 · 2008-06-06 13:18 · Score: 2, Funny

What exactly do you find frightening about an automatic safety system doing exactly what it's supposed to in response to unusual input? The part where a reboot was required. That makes me worried that they were using Windows.

The chemical company I work for has VAX/Unix systems that haven't been rebooted in over four years... and only then because of power outages.

Re:One begs the question by badboy_tw2002 · 2008-06-06 13:20 · Score: 5, Funny

Good enough evidence for me! Microsoft caused a nuclear meltdown! Quickly, to the Blogo-Sphere!

Re:One begs the question by cjb658 · 2008-06-06 13:27 · Score: 1

Nuclear power plants run Windows NT?

No wonder we have such a N.I.M.B.Y. problem with them.

Reboot on the business network? by datajack · 2008-06-06 13:27 · Score: 1

At the nuc site I worked at, there were two networks. The business network and the ops network. Data flowed from the ops network to the business network for statistics gathering only. The single thing that the business network did that affected operations and safety (regardless of my boss' attempt to justify budget) was the generation of work-orders. A total failure of the buisiness network would - at worst - result ina routine observation job to be missed which would cause the systems on the ops network to detect a 'fault' and bring the reactor away from criticality.

Yes, a simple software fault can 'shut down' a nuclear plant. These things are designed to 'trip' and shut-down automatically at the slightest thing going wrong. The most advanced and safest Nuc plant in the UK (SXB) does - or at least did - trip once a month or more.

Get a volt-meter that is sensitive to a thousandth of a volt, and allow it to shut down your house when it's input is not ideal. Give yourself three thousands of a volt either way off 'normal' and you are maybe experiencing the ridiculous measures a modern nuc plant puts itself under.

Business Network? by camperdave · 2008-06-06 13:31 · Score: 4, Interesting

The business computers should not be connected to the control network.

From the summary:

The computer in question was used to monitor chemical and diagnostic data from one of the facility's primary control systems...
... when the updated computer rebooted, it reset the data on the control system...

If it's monitoring the primary control system then it seems to me like the machine would have to be on the control network. The real issue is why did the primary control system accept a reset from a monitoring system. It sounds like there's more than one bug to track down.

--
When our name is on the back of your car, we're behind you all the way!

Re:Business Network? by VENONA · 2008-06-06 18:57 · Score: 2, Informative

"The real issue is why did the primary control system accept a reset..."

That was my first thought, too: a huge separation of privilege flaw. But, from TFA, "...when the updated computer rebooted, it reset the data on the control system, causing safety systems to errantly interpret the lack of data as a drop in water reservoirs that cool the plant's radioactive nuclear fuel rods."

So it's not that the system on the control side accepted a restart command that it shouldn't have. I'm not saying there's no problem--just that this wasn't the failure mode.

But it does make you wonder what else is wrong at this place, doesn't it?

TFA has a link to a Government Accounting Office paper on problems they found with TVA, which operates multiple reactors. Have a look at that thing http://www.gao.gov/new.items/d08526.pdf
to see a real mess. It's 62 pages of badness, but just reading page 2, "Results in Brief," will give most people the twitching horrors.

Password issues, bypassed firewalls, unpatched systems, limited logging, limited IDS, configuration management policy problems, physical security and training problems, etc. Apparently TVA has left no stone unturned in their efforts to fail an audit.

--
What you do with a computer does not constitute the whole of computing.

Fred? What's wrong with your keyboard? by 1310nm · 2008-06-06 13:31 · Score: 1

^c^c^c

Re::O by afidel · 2008-06-06 13:34 · Score: 4, Insightful

I have quite a few Windows 2003 servers that haven't been rebooted since August 2006 when we upgraded our computer room to a small datacenter (we went from a single busline and a constantly breaking AC unit to dual UPS's powered by separate generators and dual chillers with separate condensers.) It's not like it's impossible to get good uptimes on Windows, the only servers we reboot on a regular basis are our Citrix servers due to some bad code on Citrix's part that leaks memory over time and our Oracle server due to a bug where 10gR2 pulls time from the deprecate ticks counter (the same one that used to crash Windows9x) which rolls over after ~42 days. Both of those are the result of poor third party coding, not bugs in Windows.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.

Re:Only the biz machine was updated. Why trouble? by Platinumrat · 2008-06-06 13:35 · Score: 4, Interesting

Secondly the software update did not respect the data in the nuclear control system and synchronized it to new initial data in the update on the other system! Not a good idea. In critical safety systems, you always practice an update before actually doing one. I have no problem with a computer on the process control subnet reporting information to a computer on the business subnet. I have a BIG problem with a computer on the business subnet being able to modify and corrupt data in a computer on the process control subnet. "I can't dump data to the business side" is a reason to make a log entry and maybe sound a minor alarm. It's not a reason to shut down the reactor (unless the data is needed for regulatory compliance and the process control side isn't able to buffer it until the business side is working correctly.) But if a business subnet computer can tamper with something as critical as a process control machine's idea of the level of coolant in a reservoir, it rings my "design flaw" alarms. Is it ONLY able to reset it to "empty" as poorly-designed part of a communication restart sequence? Or could it also make the process control machine think the level was nominal when it WAS empty? IMHO this should be examined more closely. It may have exposed a dangerous flaw in the software design. Security flaws don't care if they're exercised by mischance or malice. If nothing else, this is a way to Dos a nuclear plant through a breakin on the business side of the net. I agree with the previous post. In railway signalling (at least outside of the USA) formal safety processes must be followed with software design and configuration. Part of that is a formal hazard analysis. There are various Safety Integrity Levels(SIL) for systems that are applied to different control and monitoring components (SIL-0 being lowest to SIL-4 for stuff that can kill people if it goes wrong). There is no condition under which it is even a acceptable for a business system to feed vital sensor data for the control system. This should always be a hazard analysis performed when making any changes to a control system, at which point this sort of thing should have been detected.

Re:Wow that is so funny by zappepcs · 2008-06-06 13:36 · Score: 4, Insightful

I'd go just a bit further and say that it speaks well for the software coders. There are at least three ways to treat any 'out of bounds' condition. They chose to make sure that the safe action was chosen.

An area where that loosely controlled type of team work gets into trouble unless all coders treat data passed to their code, and from their code in the same uniform functional ways.

It also makes me wonder how the code will react to certain malicious software, should it get loose in the facility. If I were writing code to destroy a nuclear facility, it is how data is passed from one process to another that I would definitely attack as well as other vulnerable places.

It is sort of reassuring to have seen a failure result in a controlled shutdown rather than some other, more undesirable action.

--
Support NYCountryLawyer RIAA vs People

Re:The problem is the update - not business networ by cjb658 · 2008-06-06 13:37 · Score: 2, Funny

The problem here is that the software update wiped out the nuclear control system data. Maybe it didn't pass WGA?

damned if you do... by Phantom+of+the+Opera · 2008-06-06 13:41 · Score: 3, Insightful

Scenario : System comes up. Things don't work quite right. Some configurations are tweaked and system is now working fine.

Reboot. The tweaked configurations happen to go away. No one remembers which ones they were. The system is b0rked for a while.

I would hope that isn't the case for that system, but I have seen it happen before.

Re:damned if you do... by TheUnknownCoder · 2008-06-06 16:28 · Score: 1

I would hope that isn't the case for that system, but I have seen it happen before. In a nuclear plant? No you did not.

--
Uncopyrightable: The longest word you can write without repeating a letter.
Re:damned if you do... by Phantom+of+the+Opera · 2008-06-07 08:55 · Score: 1

I would hope that isn't the case for that system, but I have seen it happen before. In a nuclear plant? No you did not. Goodness me, no! Everyone knows that nuclear plants are immune from generalization.

....and... by JustNiz · 2008-06-06 13:46 · Score: 1

....you can just imagine that like most companies, their business network is all MS windows boxes that also have internet access, so is completely vulnerable to outside hacking too.

If this hadn't have happened it would have probably only been a matter of time before some hacker chanced upon the fact that they could actually control the nuclear facility from some compromised windows box.

Its amazing that these days some sys-admins/network admins still don't get it.
Lets just hope that this incident is enough to get them fired and for the comapny to hire people that know enough to make the system properly secure.

Re:....and... by maxume · 2008-06-06 14:37 · Score: 1

You are dramatically overstating the consequences of this going undetected. Some hacker could have chanced upon it, triggering a safety shutdown and causing them to diagnose and correct the issue. That doesn't mean that there are no other problems, but this problem was handled safely, merely at a greater cost than it should have been handled.

--
Nerd rage is the funniest rage.

Every 108 minutes.... by PPH · 2008-06-06 13:49 · Score: 2, Funny

... enter 4, 8, 15, 16, 23, 42.

Or else all hell breaks loose.

--
Have gnu, will travel.

Re:Wow that is so funny by Wo1ke · 2008-06-06 13:50 · Score: 5, Insightful

Yeah, so when a sensor breaks and stops sending in data, it'll keep running like usual, with maybe a small error code in the background. Cause, you know, that's how we want nuclear fucking powerplants to work.

Couldn't have been Vista... by julie-h · 2008-06-06 13:52 · Score: 1

... because then the computer would enter an infinite loop of reboots after the update.

Re:"King-size Homer" season 7 episode 7, Nov 5, 19 by LM741N · 2008-06-06 13:53 · Score: 1

Dohh! You beat me to this!! Well, I'll have a donut and console myself.

Hey! by Anonymous Coward · 2008-06-06 13:55 · Score: 1, Funny

Who marked the parent troll? It's true that "twitter" is never mentioned in the thread and that this is what makes all those other threads "sockpuppety".

http://slashdot.org/comments.pl?sid=573665&cid=23655635

Re::O by AnotherBrian · 2008-06-06 14:00 · Score: 1

It sounds like the computer responsible for sending data about the water level in one of the cooling tanks had to be rebooted. When the safety systems noticed the lost connection, they assumed a problem with a critical part of the reactor and preformed a controlled shutdown. This is really exactly what should have happened.

Perhaps in the future the safetys could be designed to accept a 5 min. operator initiated override on systems that don't require immediate feedback on failure. Maybe they do already and someone forgot to set it.

Re:Wow that is so funny by archkittens · 2008-06-06 14:01 · Score: 2, Funny

Cause, you know, that's how we want nuclear fucking powerplants to work. why go with that when "over three million men trust natural male enhancement"? coal fucking powerplants are dirty, it's true, but many consider the act itself dirty. i dont see a problem.

0 is not a substitute for "no data" by mi · 2008-06-06 14:08 · Score: 1

interpret the lack of data as a drop in water reservoirs

If I had a dime for every function, that says: "There is 0 foo", when it really means: "I don't know, how much foo there is", I'd be millionaire...

--
In Soviet Washington the swamp drains you.

Re:0 is not a substitute for "no data" by rdebath · 2008-06-06 20:02 · Score: 1

If I got NaN when I asked "how much coolant is there for the reactor" I'd shut the fucker down too!
I might ask "What! How much?" first, but my finger would be hovering as I did.

Re:Only the biz machine was updated. Why trouble? by Anonymous Coward · 2008-06-06 14:10 · Score: 1, Insightful

Subnet? It should not even been on the same physical network!

Re:One begs the question by conan1989 · 2008-06-06 14:11 · Score: 1

could have been WinME

Re:Wow that is so funny by NuclearError · 2008-06-06 14:13 · Score: 1

As they say in reactor operations, "When in doubt, SCRAM it out."

--
Nuclear engineers build weapons. Civil engineers build targets.

Re:One begs the question by ppanon · 2008-06-06 14:14 · Score: 2, Insightful

Yeah, sure. When somebody screws up an expression in a way that makes no sense, we should just accept it. In addition, since people on Slashdot constantly misuse pairs of homonyms like then/than, effect/affect, their/they're, we should just ignore historical usage differences and use them interchangeably. We should just accept sloppiness and mediocrity because that's how Western civilization was built.

Then again, maybe intelligent and well-educated people will just ignore people who aren't intelligent enough or who can't be bothered to learn how to properly communicate. The medium is the message, and a badly-formed message says to the recipient either "I don't care enough about talking to you to take the time to say it properly" or "the content of the message can't be that great if I can't be educated enough to learn to express it well enough".

I don't get out of the way for subgeniuses.

--
Laissez lire, et laissez danser; ces deux amusements ne feront jamais de mal au monde. - Voltaire

Re::O by ozmanjusri · 2008-06-06 14:17 · Score: 1

You don't think computers connected to a nuclear power plant's control system should be fully patched?

--
"I've got more toys than Teruhisa Kitahara."

ah by im_not_jose · 2008-06-06 14:26 · Score: 1

ah, the wonders of try/catch.

Re:One begs the question by courseofhumanevents · 2008-06-06 14:27 · Score: 1

"Begs the question" makes perfect sense when used as "raises the question," and it's understood by the majority of those who would hear it. This is my criteria for a functional phrase. Who gives an ass if it also happens to be a name for a logical fallacy? The contextual differences should be enough to immediately determine whether it's meant in the logic sense or as a phrase. Complaining about it is nothing but misguided pedantry.

Re:One begs the question by courseofhumanevents · 2008-06-06 14:35 · Score: 1

The problem with your logic here is that using "begs the question" outside of its meaning as a logical fallacy is not actually an error, but a literal interpretation of the phrase. Using it in this sense doesn't actually do any harm to its original sense.

Re::O by afidel · 2008-06-06 14:48 · Score: 2, Informative

Probably not, they should be airgapped with tight control over access to the network they sit on. I don't like the idea of SCADA systems being on a shared network to begin with. In fact there's speculation that several recent incidents nationwide were due to systems on the shared network being compromised by targeted attacks from China. That may be conspiracy theory speculation but I've seen it discussed enough on serious network security boards that I'm starting to wonder if there isn't some ring of truth to it.

Patching for patching sake is an IT fetish that just as often as not leads to more problems than it solves. In fact the only problem I've had in the last two years that caused any significant client disruption was caused by a bad dat update (patch) to our AV software.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.

Time for someone to review the shutdown system by ZombieEngineer · 2008-06-06 14:50 · Score: 2, Insightful

Something is not right here...

Yes, the safety system kicking in is "a good thing".

Pulling data from another computer system for a safety related control system is not a bright idea (the weakest link problem).

Historically a safety control system in an Oil & Gas environment, all the inputs to the safety system are either hardwired or pulled from another safety system controller which has the appropriate level of redundancy (CPU boards and communication paths with communication watchdog timers).

Even transmitters in some circumstances can not be trusted hence the 2 out of 3 voting systems (take three transmitters measuring the same value and pick the middle of the three, if one of the transmitters fails high or low your choice will be the safe option).

Someone needs a serious think about where this plant is getting data for its safety shutdown system.

ZombieEngineer

Re:One begs the question by daybot · 2008-06-06 15:00 · Score: 1, Flamebait

one begs the question...

No one doesn't I've managed to restrain myself from marking this as Troll. The Wiki article you link to clearly explains that it's a contentious issue - language is in the hands of the people and for all my life I've heard this term used to mean raises the question. Anyone who argues against this usage is at least 20 years too late.

Re:One begs the question by daybot · 2008-06-06 15:06 · Score: 1

...we should just ignore historical usage differences and use them interchangeably. We should just accept sloppiness and mediocrity because that's how Western civilization was built.

I think you'll find it's civilisation.

Re:Wow that is so funny by spikedvodka · 2008-06-06 15:06 · Score: 4, Insightful

It's not a nuclear power plant, but still, my network...

I've set nagios up to monitor my network, and any los of signal is considered CRITICAL, not just a warning, but critical... and I need to know then.

--
I will not give in to the terrorists. I will not become fearful.

Time for a HAZOP (Hazard and Operability) study by ZombieEngineer · 2008-06-06 15:08 · Score: 1

If this was a chemical plant I would be asking to see the HAZOP (http://en.wikipedia.org/wiki/Hazop) reports.

HAZOP studies are serious mind numbing exercises of systematically identifying every possible operational hazard. Should a hazard occur a mitigation action needs to be implemented. The resulting mitigation actions then themselves need to be run through the HAZOP process.

It should be fairly obvious that this is a recursive process and that modern chemical plant designs favor simple, intrinsically safe methods that don't require a complicated control scheme or otherwise the design engineer is condemed till doomsday reviewing the safety of the plant.

ZombieEngineer

Re:Wow that is so funny by dryeo · 2008-06-06 15:09 · Score: 1

I'd guess that there are multiple redundant sensors and that when one goes out of range you'd get an error about bad sensor. In this case where it is actually a software problem all sensors went out of range which points to a real problem being likely so shutting down is the safe move.

--
https://en.wikipedia.org/wiki/Inverted_totalitarianism

Re:One begs the question by courseofhumanevents · 2008-06-06 15:14 · Score: 1

Unfortunately for your argument, there is value in "beg the question" as "raise the question," and language already has moved in that direction.

We were lucky to get through the danger years by Descartes123 · 2008-06-06 15:17 · Score: 1

Just 5 years ago, controls engineers wouldn't breath a word about how vulnerable the world was (and maybe still is). There are special computers called PLCs (Programmable Logic Controllers) that control just about everything in this world from factories to power plants to waste management facilities. They are the brains of all automation. They are also all connected to computers and those computers are all networked on LANS. And in the past, those computers were every bit as likely to get viruses as anyone else's computer. The fortunate thing is that no one who ever wrote a virus ever bothered to write one that would mess with the logic in a PLC. It would have been so easy. In fact, it still is. 99% of all the PLCs in the world are connected to computers in an unsecured fashion. If a virus in a PC were to write randomly into PLC memory, whatever that PLC was controlling would come to an uncontrolled halt. Engineers would never figure out why all the processors were crashing - diagnostics don't exist to monitor this kind of attack. In the days of highly potent viruses like Nimda, it wasn't uncommon for scores of computers that connected to all the PLCs in a given facility to be infected. If Nimda or its kind carried a PLC targetting payload we would have seen disaster much greater than the biggest doomsdayers ever predicted from the millenium bug.

Re:We were lucky to get through the danger years by NerveGas · 2008-06-06 18:40 · Score: 1

I know several people who work or have worked in factory automation, underwater ROVs, and other areas like that. These days, the computers controlling the machinery often are just PCs (well - expensive, small, industrially-cased PCs, but PCs nonetheless), running any of the standard operating systems. No need for special virii or exploits, just the same old regular ones.

Being able to tap into the vast numbers of programmers who can program for Linux or Windows means that the companies have a lot more options for hiring people at lower wages, and might be able to crank out a somewhat-working solution faster than their competitors at a lower cost. And nuclear safety be damned, the lowest bid is the lowest bid, right?

--
Oh, you're not stuck, you're just unable to let go of the onion rings.

ISO 9000!! by arthurpaliden · 2008-06-06 15:18 · Score: 1

But how did that ISO standard get approved? Microsoft OOXML as a standard springs to mind.

--
Undetectable Steganography? Yep, there's an app fo

Such systems already exist. by ZombieEngineer · 2008-06-06 15:20 · Score: 3, Interesting

Safety control systems in the chemical industry have been used for 20+ years. These systems have: - redundant CPU modules (which can be hot plugged) - redundant IO modules (which can be hot plugged) - redundant communication systems - self diagnostics (can detect a failed output transistor) - internal diagnostics (CPU voting to detect failed CPU core) - standard algorithms for redundant transmitters Shutting down is the "safer option" however there is still risks (such as thermal stressing pipework). It is a lesser of the two evils problem. This stuff is bread & butter for the chemical industry, there are a number of control companies that refuse to deal with the nuclear industry due to the requirement for unlimited indementy. ZombieEngineer

Re:One begs the question by ortholattice · 2008-06-06 15:22 · Score: 1

In addition, since people on Slashdot constantly misuse pairs of homonyms like then/than, effect/affect, their/they're, we should just ignore historical usage differences and use them interchangeably.

Don't forget its/it's and their/there. But my vote would go to lose/loose - allowing the latter as an alternate spelling of the former would go a long way towards making Slashdot an exemplary display of the grammatical/spelling skills of geeks.

Re:Wow that is so funny by profplump · 2008-06-06 15:22 · Score: 4, Informative

The system as a whole *did* know the reading was bogus. The control/safety system shut down because it stopped getting "safe" indications from the monitoring/input system. It seems pretty clear that the input system itself correctly logged the reason for the error.

The interface to the control system for the tank level doesn't (or at least shouldn't) have an entire separate "error" parameter -- it probably takes a simple numeric value from the input system.

The input software knows when the reading are bogus or missing. In that case it either stops sending input, which would presumably trigger a watchdog in the control system, or it sends data that indicates a worst-case scenario. with which the control system can do whatever it does in a worst-case scenario.

The control system itself doesn't care why there is or may not be safe input parameters, it only cares that it cannot rely on the input it needs for safe operation. Giving it any more information just adds code and interface complexity to safety-critical software.

Here's the system as implemented:
level = tank.getLevel()
if (level < SANE_MIN || level > SANE_MAX)
level = 0
control.input.set(TANK_LEVEL, level)

Here's the system you describe:
error = 1
level = tank.getLevel()
if (level > SANE_MIN && level < SANE_MAX)
error = 0
control.input.set(TANK_LEVEL, level, error)

The later makes the safety-critical control software more complex, with more test cases and more input parameters, none of which add any value to the safe operation of the control system. The error parameter potentially allows for operation during transient errors, but that's a decision you can make in other ways, without adding interface complexity.

The only inconvenience of the simpler interface is that you have to check the logs from the input device in addition to the control device to determine why the error occurred. And please don't argue that consolidated error logging is worth extra code complexity -- that's probably not even true in a web app, let alone a human-safety control system.

Re:This was Good -Bull by Dachannien · 2008-06-06 15:34 · Score: 1

I'm really sad you posted AC, because I'm dying to hear some cool shop stories about nuclear reactors scramming.

Re:Wow that is so funny by fallungus · 2008-06-06 15:36 · Score: 3, Funny

You can't put too much water in a nuclear reactor.

--
You call this a sig?

Re:Wow that is so funny by iminplaya · 2008-06-06 15:38 · Score: 1

It is sort of reassuring to have seen a failure result in a controlled shutdown rather than some other, more undesirable action.

You're not kidding!

--
What?

Water by Grocks · 2008-06-06 15:38 · Score: 2, Informative

You can't put too much water into a nuclear reactor

Re:Water by Anonymous Coward · 2008-06-06 17:45 · Score: 1, Interesting

You can't put too much water into a nuclear reactor Actually, you can, particularly in a PWR. If the turbine trips, or any other kind of loss of heatsink accident occurs, the primary loop coolant will initially heat up and expand. Without a gas volume to buffer the resulting pressure increase, the piping would burst. To make an automotive analogy (which I'm sure the typical /. user will appreciate), it would be like putting very rigid shocks and springs on a car. This is the exact reason why the operators at TMI initially turned of the Emergency Core Cooling System. They saw pressurizer water level rising and were concerned that the pressurizer would "go solid". Since the pressurizer is physically located above the pressure vessel, they assumed (wrongly) that the core was covered and turned off ECCS.

Also, if the water level is too high in the steam generator (or pressure vessel, in the case of a BWR), you will get water droplets mixed in with the steam going to the turbines. This is a good way to damage turbine blades.

Third, if you're concerned about maintaining a BWR subcritical, you shouldn't let the water level get too high. The water surrounding the core acts as a reflector, decreasing neutron leakage. So, higher water level leads to increased reactivity. In fact, my recollection is that, in some cases, the emergency operating procedures suggest lowering the water level in order to control reactivity.

On a different note, the reason this incident is somewhat concerning (to me, at least), is that the logic for the reactor protection system is supposed to be not only fail-safe but also fault-tolerant. There are typcially four independent channels, and the logic to actually get a scram is ((A || B) && (C || D)). So the question is, how did one computer failure cause multiple, supposedly-independent channels to indicate a scram condition?

Lastly, given the many statements suggesting that the electrical and software systems are on a hair-trigger, it's worthwhile to note that many mechanical failures don't require the plant to shut down immediately. The tech specs have the details. For example, the Hope Creek plant has been operating since Wednesday morning with one of it's Emergency Core Cooling Systems declared inoperable. That's right, they do not currently have a safety-rated system capable of injecting water when the reactor is at operating pressure. And they're allowed, by law, to operate like this for two weeks.
Re:Water by turgid · 2008-06-07 07:41 · Score: 1

Third, if you're concerned about maintaining a BWR subcritical, you shouldn't let the water level get too high. The water surrounding the core acts as a reflector, decreasing neutron leakage. So, higher water level leads to increased reactivity. In fact, my recollection is that, in some cases, the emergency operating procedures suggest lowering the water level in order to control reactivity.
BWRs are an accident waiting to happen. Don't they have boron injection (into the cooling water) to absorb neutrons in an emergency, like PWRs? Sizewell B does.
For example, the Hope Creek plant has been operating since Wednesday morning with one of it's Emergency Core Cooling Systems declared inoperable. That's right, they do not currently have a safety-rated system capable of injecting water when the reactor is at operating pressure. And they're allowed, by law, to operate like this for two weeks.
I'm glad I don't live in America. You guys have a very strange way of doing nuclear power.

--
Stick Men

Re:Only the biz machine was updated. Why trouble? by pipingguy · 2008-06-06 15:41 · Score: 1

IMHO this should be examined more closely. It may have exposed a dangerous flaw in the software design.

Surely software design is diagrammed, studied and HAZOPed as much as the average P&ID?

Re:Wow that is so funny by GigaplexNZ · 2008-06-06 15:43 · Score: 1

That's not what I'm suggesting at all. I'm suggesting that the accidental reset itself could have triggered a critical system shutdown instead of relying on the reset data being out of range. What if the reset fooled the system into thinking some subsystem that was experiencing a worst case scenario was within safe operating conditions? Why a workstation on the corporate network was able to reset this data is another issue in itself.

Re:One begs the question by Anonymous Coward · 2008-06-06 15:45 · Score: 1, Funny

Now I want to add a "Frequently Begged Questions" list to my site, just to piss off grammar Nazis.

Re::O by Anonymous Coward · 2008-06-06 15:46 · Score: 1, Insightful

What exactly do you find frightening about an automatic safety system doing exactly what it's supposed to in response to unusual input? The words "nuclear reactor" scare many people, like "monster in closet" scares many kids.

Re:One begs the question by courseofhumanevents · 2008-06-06 15:50 · Score: 3, Insightful

That's the thing, though; it's a misuse of a phrase so much as "kick the bucket" as a literal expression of kicking a bucket would be a misuse of a phrase. The definition of it can be easily achieved by examining the words it uses and their contexts, making it much less likely to confuse a non-native speaker than many other expressions in wide use. The main source of confusion would be if someone tries to make it out to be an invalid phrase.

One of the great things about English is that one can phrase something a million different ways and still get the same meaning; banning the use of one phrase because it happens to also be the name of a logical fallacy is silly and pointless.

Re:One begs the question by courseofhumanevents · 2008-06-06 15:53 · Score: 1

I submit for citation the original post. In it is a perfectly valid and widely understood use of the phrase.

Re:Only the biz machine was updated. Why trouble? by Anonymous Coward · 2008-06-06 16:02 · Score: 3, Informative

There are such requirements in the US, be they for SIL ratings, performing haz-op reviews, etc. Particularly in nuclear apps.

In a plant, not all control systems are SIL rated, but the safety backups usually are....though more and more operators are buying or upgrading to SIL qualified systems and extending SIL to other than just the safety and protection backups.

In this case, the engineers were probably asleep at the wheel and didn't realize the changes they made to the control software impacted the trip & protection systems, so didn't bother to even have a haz-op review prior to making the change to get updates to a control parameter (or set of parameters) from a networked device. They probably figured they were just adding a trim or tuning variable of some kind to the control loop and didn't do ANY real failure analysis.

Oops.

Oh well, time for all the governing bodies like the NRC to get out the microscopes and take a peek at the plant's operating procedures and engineers adherence to them.

Cheers

Re:Wow that is so funny by Skrapion · 2008-06-06 16:07 · Score: 1

What if the reset fooled the system into thinking some subsystem that was experiencing a worst case scenario was within safe operating conditions? It sounds like, if the system isn't receiving any data, it assumes the worst case scenario (i.e. no water reserves). So in your example, the system would still trigger a critical failure.

--
The details are trivial and useless; The reasons, as always, purely human ones.

Re:Wow that is so funny by icebike · 2008-06-06 16:10 · Score: 5, Informative

What part of FAIL SAFE don't you understand?

The System FAILED. It is programmed to SAFE the reactor when shit happens.

Without its sensors it had no choice but to assume worse case and scram the reactor.

It did it the right way. It did it the way it was programmed to do it.

What would you have it do to determine why it is no longer getting critical data? Send out a droid to check the cat5 cables? Its a frikin computer in a rack, not R2D2.

It worked the way it was supposed to.

Take a step back and let the big boys handle the reactor, Please.

--
Sig Battery depleted. Reverting to safe mode.

Re::O by Skrapion · 2008-06-06 16:16 · Score: 1

So... you haven't updated the kernel in four years?

--
The details are trivial and useless; The reasons, as always, purely human ones.

Re::O by sbjornda · 2008-06-06 16:17 · Score: 3, Insightful

Patching for patching sake is an IT fetish

Well, the auditors seem to expect it... as do the vendors when we call for support - "Oh, you say foobar isn't working... well it looks like you're 15 revisions behind; why don't you just fix that and call me when you're done. Oh, your policies state you need to test and certify them? Well I guess I won't be hearing from you for a while, then."

--
.nosig

Re:Wow that is so funny by icebike · 2008-06-06 16:20 · Score: 1

No data = data out of range.

Data anomaly = shutdown.

Rats chewing on a wire could have been the cause. I don't care. Intelligent adults want the system put into a safe condition when things are out of line. You can always determine the problem and restart the reactor if you shut it down in time.

There is no such thing as a perfect system. But this one comes pretty close.

--
Sig Battery depleted. Reverting to safe mode.

Re:One begs the question by Skrapion · 2008-06-06 16:48 · Score: 1

What do you mean "makes no sense"?

The "controversial" use of the phrase means "requests or invites another question", which I would argue is a fairly logical interpretation of the phrase.

The way you're using the phrase means "taking for granted a principle", which isn't at all intuitive or logical. In fact, it seems to be a corruption of Aristotle's original phrase, "Petitio Principii" ("begs the principle").

So, maybe you should just start using the phrase "begs the principle" instead of "begs the question".

--
The details are trivial and useless; The reasons, as always, purely human ones.

Re:Wow that is so funny by MadnessASAP · 2008-06-06 16:56 · Score: 1

Becuase it is Fail-SAFE a term you seem to not understand here. No data = very very very bad data, so the system did exactly what it should do which was put the reactor int the safest mode it could. Anything else is added complexity, in thsi case what was very likely a PLC expects a stream of data from this computer, it stopped receiving this data so it did the only sane thing it could do, it shut everything down as fast a possible.

--
I may agree with what you say, but I will defend to the death your right to face the consequences of saying it.

Re:Wow that is so funny by GigaplexNZ · 2008-06-06 17:02 · Score: 1

The problem wasn't caused by some sensor not sending data, it was data in the database being actively reset on purpose by a bit of software on another machine. This wasn't a case of "if some data anomaly occurs, shut down" - the article suggests this was a case of "some data anomaly occurred which happened to make the system think there is insufficient water" which caused a shutdown by chance. The data anomaly caused by the reset might not have fooled the system into thinking the water level was out of range.

Major system dependent on a minor system bad logic by kilroy0097 · 2008-06-06 17:16 · Score: 2, Insightful

It doesn't really matter in this case if the operation system is looking at plant data from a minor monitoring system. What is troubling here is that it's completely reliant upon this minor monitoring system. If this box someplace is so important as to cause a emergency shutdown in a nuclear power plant then one would think there would be a backup system that comes into place when the primary monitoring system goes down. Did they think this box would never have a hardware failure? That it would last forever as some kind of cosmic perpetual motion machine? I am very worried that operations management systems like this even get implemented in high security and important locations such as a nuclear power plant. Looks like it's time to higher a better and more intelligent Information Systems and Network Manager.

Re:Wow that is so funny by GigaplexNZ · 2008-06-06 17:20 · Score: 4, Insightful

It did it the way it was programmed to do it. Based on the information provided in the article, it was programmed to shut down due to lack of water. What actually happened was accidental data reset, which is what happened. A separate fail safe mechanism should have detected the missing critical data. Instead, it

errantly interpret the lack of data as a drop in water reservoirs - I would rather it correctly, as opposed to errantly, detect unsafe conditions. The plant should have shut down as it did, but it sounds a bit like chance that it actually did.

Re:Wow that is so funny by GigaplexNZ · 2008-06-06 17:23 · Score: 4, Insightful

I understand exactly what fail safe means. I agree that no data = very very very bad data. I agree that it should have gone into the safest possible mode. I don't agree that the "low water level" detection is the correct mechanism to determine the "no data = very very very bad data" condition. I'm suggesting that based on the information quoted in my original post,

safety systems to errantly interpret the lack of data as a drop in water reservoirs does not necessarily sound like good planning but sounded more like chance that some erroneous interpretation picked up on the invalid state. It may have detected the "no data = very very very bad data" case and shut down for that reason, but that's not what the article is suggesting. Other users hinting that I am a moron for thinking that the plant shouldn't have shut down have misinterpreted what I was trying to get across.

Re:Wow that is so funny by barius · 2008-06-06 17:42 · Score: 5, Insightful

I think you're missing the real point, which is that the central safety systems are being fed data from a 'business network'. What would happen if that computer had an issue that caused it to send the same data continuously even when the coolant level had really dropped? WHY are any safety systems receiving data from an insecure network?

It's bad enough that most reactors use regular PC's to do the data collection and reporting, given the security risks posed by such systems (especially if networked), but I never realized they would be so stupid as to feed data in the other direction like this!

Re:Wow that is so funny by barius · 2008-06-06 17:50 · Score: 1

I disagree. In this case the failure mode of the error was not foreseen, and while the code reacted in a safe way, there is no reason to believe that it was specifically coded to do this. From the description it appears to me that the system requires continous input of coolant levels, without that input the level appears to be zero (a continous input of 0). What would have happened if the software upgrade had caused the same levels to be reported continuously? A real drop in coolant might not have been detected! Not foreseeing a failure mode in which a data-provider becomes corrupt seems like a serious problem to me.

Re:Wow that is so funny by dave87656 · 2008-06-06 17:57 · Score: 1

I think it makes sense to shutdown when you cannot verify that your data is 100% available. The software did interpret "Lack of Data" as exactly that "Lack of Data" and shut down accordingly.

Re:Wow that is so funny by Anonymous Coward · 2008-06-06 18:11 · Score: 5, Informative

I think you're missing the real point, which is that the central safety systems are being fed data from a 'business network'. What would happen if that computer had an issue that caused it to send the same data continuously even when the coolant level had really dropped? WHY are any safety systems receiving data from an insecure network?

It's bad enough that most reactors use regular PC's to do the data collection and reporting, given the security risks posed by such systems (especially if networked), but I never realized they would be so stupid as to feed data in the other direction like this!

Obviously you have -zero- experience with power plant networks. Allow me to enlighten albeit anonymously.

The reason machines like this receive data from networks that could be considered 'less secure' is because telemetry is required from a multitude of sources to actually ascertain any useful realtime information. Aggregation machines have to speak many different protocols and translate between them while communicating with other machines that belong at other plants, cities, states, and even companies to effectively get an accurate picture of the entire grid's current conditions.

The world of plant control machines themelves is very vendor-driven. Most facilities have turnkey solutions brought in by the few major players in this field. ABB, Hathaway, GE, etc. Those players don't even use the same SCADA protocols. Some use ICCP, some use DNP, and others prefer Etherpoll. I've seen RS232 data encapsulated into everything from fully-meshed TCP connections via OSI-Soft's PI to barely encoded into modbus and slapped onto ethernet with only an understanding of ARP.

The solutions are required because electricity is not just one powerplant pumping watts blindly. Instead, you have a multitude of plants all pushing power onto ISO-controlled grids that all have to work in concert with each other. This requires -- yes, you guessed it -- networking! The world of plant networks is pretty complex despite the hype you see in the media. The business of making actual watts appear magically at your house at a nice, consistent 60Hz is vastly more involved that most people realize.

Telemetry comes from secured networks, business networks, and other companies and controlling agencies. That is how it works. Period.

If you are actually interested in seeing the way these are regulated to be secured, the information is cleverly hidden in plain sight at the NERC website.

Oh, yeah? by NerveGas · 2008-06-06 18:34 · Score: 1

"Personally, I don't think letting devices on a critical control system accept data values from the business network is a good idea.""

Yeah, well that's why you obviously wouldn't succeed in business. You can't seem to grasp that things like time-to-market and pleasing focus groups are far more important than piddly little things like that. Geez.

--
Oh, you're not stuck, you're just unable to let go of the onion rings.

Correct Response to a Faulty Design by spaten · 2008-06-06 18:46 · Score: 1, Informative

yes the system ultimately made the right choice, shutting down with a perceived loss of critical information.

however, this was a best choice response to a poorly engineered shutdown system.

a properly designed critical shutdown system would have completely independent sensors, for exactly this reason. by design, no external system (i.e. business network data collection) should be able to compromise the integrity of a safety system in any way. Safety systems are designed to be redundant within themselves on many levels so that even if some link in the chain were to fail, there's another link waiting to take it's place until repaired. Business systems, and often standard control systems, do not have that sort of availability/reliability, and so should have not part of the safety system.

Re::O by tonyr60 · 2008-06-06 18:53 · Score: 1

Well they just might have been running a modern OS.

Re:How could NRC even allow this in the first plac by Xarin · 2008-06-06 19:02 · Score: 2, Funny

"GROSS NEGLIGENCE - Failure to use even the slightest amount of care in a way that shows Recklessness or willful disregard for the safety of others." - 'Lectric Law Library.

Yeah, those bastards, the way they used THE SLIGHTEST AMOUNT OF CARE in designing a system that shuts down in response to unexpected data so as to avoid RECKLESSNESS with the SAFETY OF OTHERS. And to top it off they had the gall to report it instead of covering it up.

Re:Major system dependent on a minor system bad lo by Xarin · 2008-06-06 19:21 · Score: 1

One needs an odd number of monitoring systems, so two will not suffice. If there is only two and they both report two different things then one still has to shut down the plant until things are sorted out. In fact, now that there is two systems, there is twice the chance of something going wrong. If there are three systems then the majority wins. There is still the problem of what to do with the bad system as hot swapping a new one in has the potential to bring everything down. It would also be a nightmare to test all the failure conditions. For example, one of the early shuttle launches was scrubbed because when all the redundant computers tried to synch up, the clock signals edge appeared and it had only been designed to deal with the high and low states of the signal and all the testing never encountered this condition. Redundant systems can also give a false sense of security if they are not maintained independently. For instance, if someone makes the same mistake to all of them or a batch of defective parts is used on all of them then they can easily all fail for the same reason within a short timespan of each other.

Not news in any other industry by kombipom · 2008-06-06 19:36 · Score: 1

Why is it every time anything goes even slightly off-optimal in the nuclear industry it's news? Every day thousands are injured and killed in and by other industries and nuclear power just keeps on quietly pumping out the mega-watts but if somebody sneezes in the control room everyone in the world knows about it.

Re:Wow that is so funny by johannesg · 2008-06-06 20:03 · Score: 1

What would you have it do to determine why it is no longer getting critical data? Send out a droid to check the cat5 cables? Its a frikin computer in a rack, not R2D2. Oh, I'm so going to use that one during a meeting...

Re::O by Undead+NDR · 2008-06-06 20:07 · Score: 1

With all the updates that came out for VAX/Unix in the last four years?!?

Redundancy by jandersen · 2008-06-06 20:10 · Score: 1

Haven't they ever heard of redundant systems? I would have thought that having more than one controller on vital equipment was obvious. Of course, there is another kind of redundancy that might become relevant for the responsible engineer; although I am not sure I think the guy should be fired - knowing how finances trump security, safety and common sense in most companies, he probably wasn't given the resources necessary.

Re:Wow that is so funny by networkBoy · 2008-06-06 20:19 · Score: 1

um...
I swear I don't have anything but a trolling myspace page but:
Me Too!

That line is *too* good.

--
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump

Re::O by IntlHarvester · 2008-06-06 20:41 · Score: 1

You just rephrased what he said more eloquently. :)

--
Business. Numbers. Money. People. Computer World.

They shouldn't have used - The Reboot OSâ - by mpathy · 2008-06-06 20:45 · Score: 1

"And then it rebooted" What? But that means.. Oh my god, please please do not tell me that in our nuclear power plants they run WINDOWS on critical parts of the system? Are they f...ing stupid? It is even dumb enough to run it at home, when you want a stable system, but there.. I try to ask our plant what they use and if I get the same answer I really considering to move my home to the place where I am most far away from it (and another). :(

--
Ubuntu, a terminal, Python and Slashdot. Thats all you need.

Re:One begs the question by ppanon · 2008-06-06 20:46 · Score: 1

I think you'll find it's civilisation.

In French, and maybe in American English, but I live in Western Canada and was writing in Canadian English. In that context, it's civilization.

--
Laissez lire, et laissez danser; ces deux amusements ne feront jamais de mal au monde. - Voltaire

Re:One begs the question by ppanon · 2008-06-06 20:57 · Score: 1

The point is that it's not a person that begs the question. It's the situation or the argument.

Let me be more explicit. In either interpretation, be it classical or contemporary, "one" never begs the question unless you're talking about circular theorems regarding numbers to the power of 0.

--
Laissez lire, et laissez danser; ces deux amusements ne feront jamais de mal au monde. - Voltaire

Re:One begs the question by ppanon · 2008-06-06 20:59 · Score: 1

see this post

--
Laissez lire, et laissez danser; ces deux amusements ne feront jamais de mal au monde. - Voltaire

Re:Wow that is so funny by Hognoxious · 2008-06-06 21:15 · Score: 1

How was it 'left to chance'? If it doesn't know what the water level is it doesn't knwo that it's dangerous, but it doesn't know that it's safe either. Seems it erred on the side of caution which is correct behabiour, at least to anyone with a clue.

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."

you also need a gun by someone1234 · 2008-06-06 22:10 · Score: 1

Don't forget the gun.

--
Patents Drive Free Software as Hurricanes Drive Construction Industry

Re:Wow that is so funny by Hognoxious · 2008-06-06 22:49 · Score: 1

That's not what I'm suggesting at all.

Well at least two people interpreted your post the way GP did. Perhaps you should learn to express yourself more clearly?

What if the reset fooled the system into thinking some subsystem that was experiencing a worst case scenario was within safe operating conditions?

Do you have a proposed mechanism how that scenario could occur?

In any case I don't see the relevance of that to this discussion, which boils down to this: do you assume no news is good news, or bad news?

Why a workstation on the corporate network was able to reset this data is another issue in itself.

Well that much is true.

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."

Re:Wow that is so funny by GigaplexNZ · 2008-06-06 23:01 · Score: 1

Well at least two people interpreted your post the way GP did. Perhaps you should learn to express yourself more clearly? I seem to be picking up that vibe too.

Do you have a proposed mechanism how that scenario could occur? The temperature monitoring subsystem didn't appear to fail with the data reset, but the water subsystem did. The mechanism that supposedly allowed the temperature subsystem to ignore the problem is worth considering as an example.

In any case I don't see the relevance of that to this discussion, which boils down to this: do you assume no news is good news, or bad news? Given that most news these days is bad news, I'd rather have the absence of news. Critical data on the other hand, I'd assume there is a problem if it is missing.

Re:Wow that is so funny by ichigo+2.0 · 2008-06-06 23:03 · Score: 1

While that is interesting, it doesn't really explain why a PC is feeding water level data to the safety system. Surely the water level isn't something other power plants need to know? Unless I'm missing something, the water level sensors should be directly connected to the safety systems and not through an intermediary PC.

Re:One begs the question by Kalriath · 2008-06-06 23:11 · Score: 1

Actually, it's BRITISH English that uses the "s" - American uses "z" instead.

--
For a site about things like basic rights, Slashdot users sure do like to censor "dissent".

Re:Wow that is so funny by BlueParrot · 2008-06-06 23:18 · Score: 1

Uhm, some reactors are cooled with liquid sodium, and trust me, you CAN put too much water into those ...

Re::O by jimicus · 2008-06-06 23:32 · Score: 1

With all the updates that came out for VAX/Unix in the last four years?!? Deciding whether or not to apply patches is a cost/benefit risk analysis.

Potential cost: Only that the business-critical system they're controlling stops working. Nothing important, y'know.

Potential benefit: Well, if they're on a very isolated network and access to the systems is already very tightly restricted, that's a good question. A lot of Vax systems never even had TCP/IP installed - you use a serial console to connect, which by definition restricts how easy it is to get to the system in the first place.

I think a lot of people are missing the point here by mlopes · 2008-06-06 23:47 · Score: 1

It's not a question of if the shutdown was the right think to do. It definitely was, the system thought there were a problem, it shutted down.

The problem here is that the system acted on corrupt data thinking it was the right data. It could (almost) as easily keep running when it should fail, if the opposite data was fed in a cooling failure situation.

Just to summon some nuclear histeria by DrBoumBoum · 2008-06-07 00:05 · Score: 1

_Renewable energy_ is the way of the future. Cue nuclear energy fanboys flames in 3.. 2.. 1..

Re:The problem is the update - not business networ by Acapulco · 2008-06-07 00:08 · Score: 1

But it will be when Windows 7 comes out.

--
Slashdot. Unreadable news to annoy nerds. - wonkey_monkey

Re:Wow that is so funny by Hijacked+Public · 2008-06-07 00:44 · Score: 1

The control/safety system shut down because it stopped getting "safe" indications from the monitoring/input system. It seems pretty clear that the input system itself correctly logged the reason for the error. No, at least not according to the article.

According to the article a business network computer reset data on the control network, thus the safety system shut the plant down. The control system was not relying on data from the business network, but it was possible to alter control system data from the business network, which is the part that is bad.

Later in the article it is noted that all physical connections between the business and control networks have been removed, which further supports the idea that the control side doesn't rely on receiving data from the business side.

And any error logic to deal with bad field devices or high tank levels or whatever else probably looks more like:

|--[I:3.0 > 62]------------------(O:4.4) alarm---------------|
|--[/]I:3.1------(O:4.5) permissive to safety system-|

Or as best I can ASCII draw ladder logic with this stupid fucking lameness filter.

--
"Sacrifice for the good of The State" - The State

WHAT!? by Mistah+Bunny · 2008-06-07 00:49 · Score: 1

Why are they running a nuclear reactor on Windows?! Are there any other operating systems that automatically reboot after an update?

Re:Wow that is so funny by AikonMGB · 2008-06-07 00:55 · Score: 1

You are quoting a line from a Washington Post staff writer.. how do you know that he's an expert in Nuclear Powerplant IT? Does he really know why the safety controls shut the plant down? Maybe when the IT guys were explaining it to him over the phone, they used poor terminology and the writer in turn interpreted it as being in error.

I don't see why people are arguing over this.. one of two things happened:

The software was written correctly such that in the event of a lack of data, for whatever reason, it shut the plant down.
The software was written "incorrectly" and shut down because something unexpected happened.

Come to think of it, that second situation doesn't even seem to be all that "incorrect;" if they truly hadn't thought of the possibility of this happening when doing the coding, then the software still failed-over in the safest manner possible.

Aikon-

Re::O by Joe+The+Dragon · 2008-06-07 01:35 · Score: 1

so your systems don't have most of the updates install?

Re:One begs the question by Tango42 · 2008-06-07 01:47 · Score: 1

These days, the "wrong" usage is far more common then the "right" one. English doesn't have a central body that defines it (unlike some languages), it's defined by how people use it, so if more people use it incorrectly than correctly, the incorrect usage actually becomes correct.

Ding! Fries are done. by Sun.Jedi · 2008-06-07 02:00 · Score: 1

Someone tell me how I can protect myself and my family from stupid fucking people, please?
----
"We regret to inform you that we've melted half the planet because we were stupid."

Re:Wow that is so funny by protobion · 2008-06-07 02:01 · Score: 1

Based on the information provided in the article But articles of this nature are not the best source for "How to run a nuclear power plant" tips, are they?

In my experience, the situation is almost always more complicated than what a journalist reports. Having no information about how the sensors were setup, what the hardware involved is like, I would say the software reacted as it should have. We don't know, it might have even reported that it is getting no data.
But despite what the reason for it is, if Water Level=0, shut the plant down, is a safe baseline to go by. The shut down sequence was probably a separate module in the system, independent of what other modules in the software are doing. Its only job is a watchdog, assume the worst and shut the plant down. The engineers can figure out why that happened later...because this way the plant wont turn into a hot radioactive carnage-dump.

--
Essentia non sunt multiplicanda praeter necessitatem.

Require Testing On The Simulator by anorlunda · 2008-06-07 02:06 · Score: 1

All nuclear power plants have simulators to train the operators. They should all be required to test all changes and software updates in the simulator environment before installing them on line.

That implies that the scope of the simulators may need to be expanded. In addition to training operators, they should become a sandbox for testing any and all things connected directly or indirectly to operational systems.

If business systems become interconnected with operational systems, then the business systems too must be replicated in the simulator environment. That might become a very onerous requirement, but that very difficulty could have a benefit. Architectures that prove to be very onerous to duplicate in a simulated environment, should be rejected and redesigned.

not designed with security in mind .. by rs232 · 2008-06-07 02:46 · Score: 1

"Part of the challenge is we have all of this infrastructure in the control systems that was put in place in the 1980s and '90s that was not designed with security in mind, and all of sudden these systems are being connected to [Internet-facing] business networks" said Brian Ahern, president and chief executive of Industrial Defender Inc., a Foxborough, Mass.-based SCADA security company

No, the problem is putting 'computers' on the Internet that were most certainly designed with security in mind, something the 'computers' most certainly fail at. To put in bluntly, running your SCADA units on Windows over the Internet is the dumbest thing I ever heard of. And that they are still running such designs five years after the great blackout of 2003, demonstrates incompetence and neglience boarding on the criminal

--
davecb5620@gmail.com

the answer to the question .. by rs232 · 2008-06-07 02:54 · Score: 1

"Good enough evidence for me! Microsoft caused a nuclear meltdown! Quickly, to the Blogo-Sphere!"

That's only funny if it wasn't even partly true. But here's something really funny:

"The Slammer worm penetrated a private computer network at Ohio's Davis-Besse nuclear power plant in January and disabled a safety monitoring system for nearly five hours"

"TRANSCRIPTS of telephone conversations between utility operators .. include explicit mention of some unknown 'computer problems' at FirstEnergy, the Ohio utility thought to have triggered the regional power failures, in those preceding hours"

--
davecb5620@gmail.com

just who is this idiot ? by rs232 · 2008-06-07 03:13 · Score: 1

"This is why you keep the IT nerds away from the process network"

What's a 'process network' and who exactly do you get to fix your 'process network' ?

"I've had a whole plant lose view of it's system because some well meaning retard in IT decided to push updates onto a SCADA system without qualifying the updates....... never had it KILL the control side of things though....well done whoever you were, you've done well"

Assuming the above anecdote was even true, such an incident would never occur on a live system, and I'll tell you for why. You never update a life system - got that - never. At least in any competently run IT department.

What I suspect happened in the Georgia nuclear power plant was that some automatic patch process broke the 'computer', the rest of the story is just so much smoke screen.

re: 'qualifying the updates': Just how exactly do you qualify an update. Is an update the same as a patch or a bug fix. What motivates you to apply such qualifying updates. I mean if the computer ain't broke, then don't fix it.

If it's security updates then why bother, I mean if the 'process network' is secure you wouldn't need to. I would have thought they used end to end gateways running on embedded hardware, providing a VPN connection to the SCADA units.

But then again I only ever provided IT services to the double glazing sector, and what do I know .. :)

--
davecb5620@gmail.com

Re:just who is this idiot ? by rat7307 · 2008-06-07 09:54 · Score: 1

What's a 'process network' and who exactly do you get to fix your 'process network' ?

A process network is the network the SCADA/DCS system and it's physical controllers sit on, usually segregated from corporate LAN with maybe the odd server that bridges between the two.... and to fix it, you call *ME* :-P
Usually a Process Network is NOT managed by a sites I.T department, or at best they do all the IP allocations etc and then leave it the hell alone.
Assuming the above anecdote was even true, such an incident would never occur on a live system, and I'll tell you for why. You never update a life system - got that - never. At least in any competently run IT department.

Usally on a big system, you have 2 or more SCADA servers, with redundancy built in. Usually you take one server down, upgrade the software, then swap over and repeat, ensuring no loss of view or control, same for the controllers, most critical systems would operate redundant PLC's to ensure updates can occur. Therefore you CAN and often DO update live systems.........
re: 'qualifying the updates': Just how exactly do you qualify an update. Is an update the same as a patch or a bug fix. What motivates you to apply such qualifying updates. I mean if the computer ain't broke, then don't fix it.

By qualifying patches, I mean the vendor usually checks out stuff like windows updates, and assesses the impact in the system. You don't just go install the latest Microsoft hotfix for the hell of it, but I've seen it done....and I've had to fix the impact of that!

Updates are usually installed to fix/improve system operation.
It is a common procedure and there are usually controls in place, but occasionally things go pear shaped.

Standard PLCs can be surprisingly easy to halt, Safety PLCs even easier.....but one would have hoped in a Nuclear Plant there would have been procedural controls to avoid this.
But then again I only ever provided IT services to the double glazing sector, and what do I know .. :)

Hey, we all have different areas of expertise, this just happens to be my bread and butter! :-)

--
Burma?

Re::O by sjames · 2008-06-07 03:49 · Score: 1

Patching is entirely appropriate in many situations, but SCADA systems are certainly NOT one of them.

A public facing server needs to be patched frequently to prevent the 'sploit of the day. It may cause other issues, but they won't be as bad as having your server turned into a spambot of online casino scam behind your back.

As you say, SCADA systems should be protected by an airgap. At most, a translator system should provide read-only data to the business side. In that case, any patching is a bad idea. Long term stability and having everything using exactly the same very well tested version is a far batter strategy for avoiding problems. Rollout of a new version is (and should be) a long process of careful testing. Certainly not something to do monthly sight unseen.

Re:Wow that is so funny by Hognoxious · 2008-06-07 03:52 · Score: 1

I don't agree that the "low water level" detection is the correct mechanism to determine the "no data = very very very bad data" condition.

Nobody suggested it was. What they said was that if you don't know that the water level is good (so either it's bad, or we don't know) then shutdown is the safe option. That would apply to pressure, temperature or any other variable that can make big bangs happen.

Do learn the difference between an example and a definition.

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."

Re::O by sjames · 2008-06-07 04:06 · Score: 1

Perhaps in the future the safetys could be designed to accept a 5 min. operator initiated override on systems that don't require immediate feedback on failure.

Loss of coolant is perhaps the most serious event that can happen to a reactor. That's where meltdowns come from. That is one of those systems that requires immediate feedback, and so shouldn't allow an override.

Had the reactors at Chernobyl not allowed manual overrides like that, nobody would know that name today.

The mistake was allowing a system that important to be affected by a machine in the BUSINESS NETWORK (that is, a plain ol' LAN in a cube farm). This is exactly the sort of thing that can happen when you connect the red network to the black network.

Ideally, the data should have been gathered by a machine in the control network and published to a machine on the business network via a one way serial connection (That is, the Rx pin on the control network machine is actually not connected).

Re:Major system dependent on a minor system bad lo by kilroy0097 · 2008-06-07 04:35 · Score: 1

Actually yes you are correct. You would need at least three monitor sensors for each data value. However for the computer reading data from these sensors you only need a single backup system. It should be high enough priority to replace and bring online the original computer system in at most a 24 to 36 hour time frame. If you were very paranoid then yes perhaps three computer systems might be needed. However to triple check data values you only need an odd number of sensors not necessarily an odd number of computer systems who read the values of the sensors. Good eye however in pointing out the odd number needs.

Re:perhaps they should have used java. by TheSpoom · 2008-06-07 04:52 · Score: 1

Yes, because clearly they want to be using a closed-source virtual machine reading all of their mission-critical code in ways that can't be predicted.

--
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs

wow-- did I ever have a different take by way2trivial · 2008-06-07 05:28 · Score: 1

I saw UAC, I was thinking ID's DOOM game... but-- Microsoft UAC-- much better fit for the situation...

--
every day http://en.wikipedia.org/wiki/Special:Random

Try again. by westbake · 2008-06-07 06:01 · Score: 1

There are several measures of feedwater flow and reactor water level that a chemical monitoring program should never have been able to override.

--
I am a name troll of Westlake. Visit my homepage to learn why.

Re:Try again. by icebike · 2008-06-07 06:27 · Score: 1

And I suspect a failure of any ONE of these measures would have the same effect. Safe the reactor and THEN figure out why it failed.

Its pretty clear from this thread that everyone thinks the state of the art in monitoring and control are much further advanced than is actually the fact.

Too much TV.

The simple fact is that loss of data, anomalous data, out of range data, disagreeing data, ALL constitute valid reasons to shut down the reactor. This is the safe and conservative thing to do. Its the right thing to do.

This is why we have an electrical GRID. We can afford to be CAREFUL and shutdown a reactor rather than take risks with ever more complex and failure prone software.

I'm very glad the folks on this thread are not programming reactor management systems. We would have a Three Mile Island indecent every two weeks. They need to go back to writing their little visual basic toys and leave the critical stuff to adults.

--
Sig Battery depleted. Reverting to safe mode.

Re:Wow that is so funny by barius · 2008-06-07 06:35 · Score: 1

You're right I don't have professional experience. I'm not entirely without knowledge though. My University thesis for Software Eng. was to code a nuclear power reactor monitoring system. So, yes I do have some inside knowledge of their workings from all the research I had to do.

While your explanation of the inter-networking systems and protocols was all very interesting, it isn't really relevant to what I posted previously. I already know that the monitoring and reporting systems are highly networked, often using 3rd party software and hardware (writing this kind of software was what my thesis was about!).

The problem as I see it, is that the localized safety system at the plant was receiving data from what amounts to an insecure terminal. Now, I know that there are other back-up systems acting redundantly, but as I see it, the fact that there is a failure mode that depends on an insecure server is a threat that doesn't need to exist.

Re:perhaps they should have used java. by TheSpoom · 2008-06-07 07:49 · Score: 1

Ah, didn't realize they'd fully open sourced it... last I heard they had only open sourced portions of the libraries. And I didn't know about that last bit at all.

--
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs

This was ac"fail-safe" incident by celtic_hackr · 2008-06-07 09:21 · Score: 1

Actually, this was a fail-safe incident. Something in one of the monitoring systems screwed up - resetting data. in this situation the only logical safe thing to do is shutdown, because you no longer no what the real state of your system is.

Example: 3 Mile Island had a water sensor in a drain trap (yeah I know BRILLIANT). This sensor is the one the engineers were reading to "know" they had water in reactor. Meanwhile all the water boiled out due to a jammed pressure relief valve. Had the engineers bothered to check one of the other water sensors earlier, they would never had been within 45 minutes of a total and complete meltdown - far worse than Chernobyl. So, I'm rather glad that this reactor took the Human element out and forced them to look at more than just the one gauge they look at, because "that's the way we've always done it".

Re:This was ac"fail-safe" incident by doom · 2008-06-08 02:10 · Score: 1

So, I'm rather glad that this reactor took the Human element out and forced them to look at more than just the one gauge they look at, because "that's the way we've always done it".

Actually, the "human element" kind of repaired itself after TMI. Those guys repeatedly over-rode the safety systems and prevented the plant from shutting itself down. After that incident, no one in the nuclear industry was willing to "oh, it's probably just a false alarm again".

Re:Wow that is so funny by sjames · 2008-06-07 11:22 · Score: 1

Actually, it interpreted a critical parameter perfectly. It reports an existent water level iff there is adequate justification (sensor data) to draw that conclusion.

In the absence of evidence that there is adequate water (the safe condition), it assumes that there is not.

By the same token, in a situation where water present would be bad, the system assumes there IS water unless it has adequate justification to accept that there is NOT.

That in a nutshell is failsafe design.

Re::O by dcam · 2008-06-07 12:11 · Score: 1

I have quite a few Windows 2003 servers that haven't been rebooted since August 2006

So you haven't been installing the patches. I count reboots in October, September, November, December, January, February, April, January being particularly interesting as the TCP/IP stack was updated. Care to share your IP address? Enquiring minds want to know.

--
meh

Re:One begs the question by ppanon · 2008-06-07 21:55 · Score: 1

Ah, it would seem that's a rare case where the Canadian spelling is like the American spelling and not the British then. Doesn't change that daybot's attempt to correct the corrector fell flat. You had better success; thanks.

--
Laissez lire, et laissez danser; ces deux amusements ne feront jamais de mal au monde. - Voltaire

Re:One begs the question by ppanon · 2008-06-07 21:58 · Score: 1

When questions start handing out change to people, I'll start using the expression like jo42.

--
Laissez lire, et laissez danser; ces deux amusements ne feront jamais de mal au monde. - Voltaire

Re: just who do you work for ? by rs232 · 2008-06-08 02:19 · Score: 1

"A process network is the network the SCADA/DCS system and it's physical controllers sit on, usually segregated from corporate LAN"

You do seem to have implimented a solution, which begs the question as to why the rest of the power industry haven't also done it. See here where their still using the Internet to relay SCADA data. They obviously don't use your methodology. Just who do you work for again?

"the vendor usually checks out stuff like windows updates, and assesses the impact in the system"

How do you checkput a service pack without installing it on a live system, when a service pack breaks something, the usual solution is ti reinstall, reinstall, reinstall. Again you do seem to have thought of a solution that would be of use to the rest of the IT industry.

"Updates are usually installed to fix/improve system operation"

My understanding is that service packs are provided by the software vendor in response to general issues, and not specifically to correct problems in a specific installation. In nix land, if there's a bug then you can directly contact the programmers and get specific solutions to your problem. I guess it's a different mind set.

--
davecb5620@gmail.com

Re:One begs the question by daybot · 2008-06-08 21:09 · Score: 1

Doesn't change that daybot's attempt to correct the corrector fell flat. You had better success; thanks. Er...I was pointing out the irony of the American spelling in the context of your statement:

"When somebody screws up an expression in a way that makes no sense, we should just accept it. In addition, since people on Slashdot constantly misuse pairs of homonyms like then/than, effect/affect, their/they're, we should just ignore historical usage differences and use them interchangeably. We should just accept sloppiness and mediocrity because that's how Western civilization was built."

umm, more choices, less dangerous by xalorous · 2008-06-08 23:18 · Score: 1

Your statement needs qualifying.

Solar. Wind. Hydroelectric. Geothermal. Four other choices. These are available but do not produce the sheer quantity of energy as a nuclear reactor. Fusion will, theoretically, but it is stil in the theory stages.

Though you are right, the economical, low carbon emission choice in practice now is nuclear.

--
TANSTAAFL GIGO Acronyms to live by!

Re:One begs the question by ppanon · 2008-06-09 21:47 · Score: 1

I am afraid that I fail to see the irony since spelling civilization vs. civilisation is not an issue of homonyms (words with different meanings that sound the same) but instead one of regional spelling differences for a single word, and I spelled it correctly for the region in which I live.

--
Laissez lire, et laissez danser; ces deux amusements ne feront jamais de mal au monde. - Voltaire

This shows both good and bad by tame1 · 2008-06-10 07:45 · Score: 1

Unlike most posters here, I have actually been a Reactor Operator, albeit many years ago. Yes, you want any questionable signal from primary systems to initiate a scram. But I question why any primary system was connected to or using data from such a questionable source. In my experience, all such sources were hard cards - no software involved. Obviously "modern" plants have become too modern for my tastes.

For those who wonder what a scram is - it's from the early early tests where the rods were actually pulled by a human tugging a rope attached to a pulley. Once pulled, the rope was tied off, and he stood buy with an axe. If the pooh hit the rotating blades, he chopped the rope. Super-Critical Reactor Axe Man = SCRAM.

Slashdot Mirror

Software Update Shuts Down Nuclear Power Plant

259 of 355 comments (clear)