Data Storm Caused Nuclear Plant To Shut Down
rs232 writes to let us know that the US House of Representatives Committee on Homeland Security called this week for the Nuclear Regulatory Commission to further investigate the cause of excessive network traffic that shut down an Alabama nuclear plant. Investigators want to know whether the data storm could have been initiated from outside the plant.
All of the plant employees were looking up Starcraft 2 news.
I am on the road crew. This is my stop sign.
>Investigators want to know whether the data storm could have been initiated from outside the plant.
Do invesigators also want to know how a "data storm" could have caused a nuclear plant to shut down?
Some choice quotes, emphasis added:
An investigation into the failure found that the controllers for the pumps locked up following a spike in data traffic -- referred to as a "data storm" in the NRC notice -- on the power plant's internal control system network. The deluge of data was apparently caused by a separate malfunctioning control device, known as a programmable logic controller (PLC).
"Conversations between the Homeland Security Committee staff and the NRC representatives suggest that it is possible that this incident could have come from outside the plant," Committee Chairman Bennie G. Thompson (D-Miss.) and Subcommittee Chairman James R. Langevin (D-RI) stated in the letter. "Unless and until the cause of the excessive network load can be explained, there is no way for either the licensee (power company) or the NRC to know that this was not an external distributed denial-of-service attack."
Wow. Just...wow. As if you needed more proof that this wasn't a hacking attempt:
"The integrated control system (ICS) network is not connected to the network outside the plant, but it is connected to a very large number of controllers and devices in the plant," Johnson said. "You can end up with a lot of information, and it appears to be more than it could handle."
Seriously, how stupid do you have to be to think "OMG, Haxxors?" Answer: work at Homeland inSecurity, or be a Congresscritter. They already figured it out. It was a controller for a specific piece of equipment that flooded the network and triggered a bug in the variable-frequency-drive controllers for pumps.
Please help metamoderate.
You'd hope that in something as critical as a nuclear power plant the answer would be, very quickly, "no, it didn't come from an external source because that's impossible". Followed by detailed analysis of the logs to determine which internal system screwed up.
That said, the article is a bit sparse on actual technical details, so my derision may be unwarranted.
Sounds to me that the vendors under-engineered their network and still charged mega-bucks for it. The auditors, I'm sure, are making the most out of this to justify their fee.
Nothing to see, move along - I'll say!
I prefer Flambe as apposed flamebait.
As usual, the American government is looking to extend its control over things. "Oh noes, look what terrorists might have done. Homeland security needs more funding and less oversight to prevent this in the future." When will people learn to assume the government is lying first, then wait for them to prove themselves right later?
- I voted for Nintendo and against Bush
When you get back to the real world, let us know. You don't just wave a magic wand and completely redesign and reimplement a highly complex safety-critical system.
Mea navis aericumbens anguillis abundat
It's not the IT people PCL are coded by EE not IT people.
Isn't it a bit odd that they were using a non-deterministic network - something like Ethernet, by the sound of it. Back in the early 90s, I was always told that networks like Ethernet were great for office apps, but not where you wanted guaranteed times for message delivery. For that token ring, FDDI and the like were better. What is the network infrastructure of choice in a nuclear power station?
After yet re-reading, I find this government even more insanely stupider than I would have hoped for... Such failures are common among PLC and supervisory control and data acquisition (SCADA) systems, because the manufacturers do not test the devices' handling of bad data, said Dale Peterson, CEO of industrial system security firm DigitalBond.
"What is happening in this marketplace is that vendors will build their own (network) stacks to make it cheaper," Peterson said. "And it works, but when (the device) gets anything that it didn't expect, it will gag." So you mean to tell me pretty much there is no enforcement for manufacturers to maintain compliance on their products even if those products are going into a nuclear *ANYTHING... Which on the worst case scenario could cause catastrophe, yet we have regulatory commissions on the flow of ketchup, regulatory commissions/directions/etc., on weight loss products, lipsticks, etc. (FDA), but this place is not concerned with nuclear plants. Sinful.
Infiltrated dot Net
At least their reactor failed to "off" this time...
Schwab
Editor, A1-AAA AmeriCaptions
i think the fact that an unforeseen erroneous condition caused the plant to *shutdown* and not *meltdown* is a pretty good indication that it was designed quite well.
There will always be unforeseen situations. The key is for the system to shutdown in an orderly fashion. In programming, this is accomplished through use of error traps.
Now, the hysteria surrounding terrorism is another thing the plant engineers have to worry about.
i just wonder if and when we get to put this hysteria behind us, and get along with our lives. unfortunately, terry gilliam's brazil is on a constant loop in my mind these days. . . .
mr c
"Physics is like sex. Sure, it may give some practical results, but that's not why we do it." - R. Feynman
A cat fell asleep on a keyboard
Tor networks are generally not *that* fast.. so causing a data storm is not likely. ;)
;P
Sometimes such connections are sooo slow, it makes users cry. They don't call it onion routing for nothing, eh?
Seriously, how stupid do you have to be to think "OMG, Haxxors?" Answer: work at Homeland inSecurity, or be a Congresscritter. They already figured it out. It was a controller for a specific piece of equipment that flooded the network and triggered a bug in the variable-frequency-drive controllers for pumps.
As someone who used to work in system's engineering for a sister BWR, I think the inspection is a good idea. Oh, there's dumb and there's nuclear dumb but this is not a case of either. Nuclear dumb involves putting machine guns nests inside the plant. Finding the root cause of the accident is a good idea.
Handwaving about a PLC device won't do. What ultimately caused the PLC malfunction needs to be answered at a component level. There's going to be something wrong with it and that should be reported and every other device like it needs to be ripped out and trashed. If there is not component failure, there's a software problem which also must be understood.
Yes, it could have been hackers. The "internal control network" might at some point hits a desk that's connected to the wider world. It could be something mundane and unintentional, like an operator's virused up laptop.
An outage like that is something that's going to have both NRC and corporate ass-chewers looking at everything. Corporate might want to paint a nice picture for the NRC, but the poor devil that lies to them goes to jail. In either case, the problem will be identified and eliminated.
You might also have noted in the article that this is not the first plant to go thumbs down over some winblows born virus. In 2003, the slammer worm caused havoc at an offline Ohio plant. Yes, that was hackers. They did not mean to do it, but the plant's systems were open to it and failed. That's not acceptable from any standpoint.
Despite the better advice of the computer people at the plants, Entergy is a big M$ Partner. They take the big dogs out fishing and sell them the works. Ten years ago, M$ had something worth while and interesting. It was used in places it should not have been. Worse, the flaws from ten years ago have not been addressed or fixed. A good clean up is in order.
Friends don't help friends install M$ junk.
ENL4:RG3 UR FU3L R0:DS! Z1R:C0NIUM R3:INF0RC3M3NT - CH3:4P35T PR1:CES!
"Let's face it, it's a good story. Accuracy would kill it."
Firstly I would re-design that entire infrastructure and rid that power plant of incompetent IT people.
You need to find the root cause. You don't know it yet, so you don't really know what to do.
Chances are, the cause has been written up by the four or five systems engineering people in charge of the plant. They ARE competent, but they are never given the resources they need.
Why wasn't there any failover who knows.
There was a failover - they overrode the broken thing. Had the operators been gassed, the plant would have turned itself off when the water level got too high or low. This is a big deal but ultimately the plant was safely shut down and no one got hurt. It's designed to do that even if you could shear the feed water pipe off and they did not let the new fangled control network mess with that.
Friends don't help friends install M$ junk.
> data storm
Is that a nice way of saying they were downloading pr0n?
> US House of Representative's Committee on Homeland Security called further investigate
Boss: "So we don't have the backups for the first two weeks in April"
Employee: "Yes Boss. They were obviously misplaced by terrorists"
When Homeland Security is done, my refrigerator door was left ajar last night. I think it was terrorists too. Think I'll phone this one in.
"Ok, techie, give me the jist of it."
"It seems the problem was with the NC9828A chip"
"Oh? And what was the problem?"
"It melted, basically. It went bonkers."
"Ah, and then what happend?"
"Err... it caused the shutdown."
"But how?"
"Well, I presume the AH-982's got deluged with data, so they shut off."
"Ah, so it was some sort of data thing."
"Kind of, the failing chip would start sending data in the network t--"
"Hey, it's like a storm of data! Hah! I get it!"
"Umm, basically."
"Oh man. A data storm! I better tell the NRC"
"Ok, sure."
Later...
"Sir, I have the cause of the shutdown, it was caused what the tech guys here would call a data storm."
"A data storm? Wow. So your reactors got a bunch of bad datas, right?"
"Errr.. kind of, the microchips melted."
"Data can do that?"
"Yeah, it's like a storm on our, uh, logic networks. I guess that can melt the microchips"
"Uh oh. Maybe this storm came from outside the plant! One of those hacker attacks!"
"Hmmmm, the guy said it melted, but I suppo--"
"Oh crap I better inform Homeland Security!"
"Ok, sure."
Later still...
"Yeah, we had a data storm and it melted the reactor networks."
"How did this data storm happen?"
"I don't think they know yet, but it messed up big time."
"My God. Do you realize this could be Al Qaeda?!!"
"Could realize wha--"
"Al Qaeda! Terrorists. Internets terrorists."
"I don't know if the reactors are hooked up to the Interne--"
"Listen. Keep this quiet, but make sure you tell everyone you know. These reactors are not safe! No one is safe from the terror!"
"Well, it was a data storm. Can terrorists make data storms?"
"Yes. They caused your meltdown."
"No, no, the microchips melted down because of the storm. A meltdow--"
"In the terror business, there's more than one type of meltdown, you just let us handle this."
"Ok, sure."
1) They can't describe what happened
2) They can't tell if outside interference, whatever the nature occurred
3) That this might have an internal/design cause
This might sound unreasonable but I would never expect a power plant (which has a lot of things depending on it) to shut down unless there was a major failure of a component or some other safety risk. Network traffic on its own, or its effects shouldn't ever be the cause. In a nuclear power plant you control ALL the nodes attached to the network, the nodes attached should not be in a position where they can saturate any individual node to the point of failure, especially if that failure causes a shut down of something as critical as a power station.
I can think of times where I have seen massive network spikes usually caused by issues with routing on fairly non-trivial networks, or loops where mistakes have been made and policies have not been followed, (lack of sleep or lack of patience), but then comparing an advertising companies internal network at 3am, or a paper factories network at midnight to a nuclear power station is taking it a little far.
There will always be unforeseen situations. The key is for the system to shutdown in an orderly fashion. In programming, this is accomplished through use of error traps.That would be fair if we were talking about a software failure after some sort of unforeseen environmental issue, it would even be OK if an auto plant stopped production because of an unforeseen fault, and whilst power plants should certainly fail safe, they should be robust enough that a situation where failure is the only option is extremely difficult to achieve. whatever happened to redundancy?
Now, the hysteria surrounding terrorism is another thing the plant engineers have to worry about. As for the external angle terrorism or not, I doubt it. If there is a system that can be brought down by weight of traffic, and that system is important enough that failure requires a power-plant reboot (:)) then there needs to be an air-gap. Someone up thread suggested an employee's laptop with a virus as a possible method of infection.. Who in the hell allows an unchecked laptop of any description onto their LAN? never mind a network that also contains components that run a power plant!!I would suggest that this is hype to 1) keep terrorism at the top of everyone's agenda, and make people feel unsafe, after all that sells papers and grabs viewers (which in turn sell advertising) 2) deflect some of the negativity that this incident would produce (I wish that I could blame terrorists for my mistakes sometimes... "no that project plan... I haven't got it, but I'm checking to see if my poor time management is caused by terrorism or simply my inability to organise my resources properly") and 3) Security risks presumably attract additional funding, sureley it would be nice to get an extra few million in the next budget.
Honestly, this probably shows a component failure and some poor design, understandable, but unacceptable in this area. If and I say If with some considerable doubt, this turns out to be, or is reported as an external event, then whoever enabled external network access to what appear to be critical systems within a nuclear power plant on the US mainland need to be identified and punished, together with the contractors who built or maintained it, the managers or consultants that assessed and managed it and the politicians who have responsibility for public safety. But as I said, it will probably turn out to be a simple component failure and some poor design.
Great news, guys. This is going to be a non-issue. People are freaking out because a digital device is involved, and freaking out because a nuclear power plant was involved, but I do industrial control system and DCS design for a living, and I'll tell you right now, that you simply can't access control networks from the outside. There are seperate, often redundant networks, and even then, depending on the way the plant was designed, we're talking modbus plus or something that PCs don't normally access.
It's been a long time.
Who in God's name connects a plant's coolant regulation systems to the Internet? How could it be an outside agent when the "data storm" happened on the plant's INTERNAL network.
The article says that explicitly. "Internal network." The DHD is worried about outside agents penetrating the plant personnel, not someone with a laptop uploading a virus like Jeff Goldblum in "Independence Day."
If there *was* such a "data storm" attack, it would _have_ to be caused by an inside saboteur. The plant needs to focus on HUMAN security, not computer security. Either that or they need to reconsider a faulty design.
But can we try, just try, not to write completely hysterical baloney? Hysterical baloney is a tradmark of "Homeland Security," and they might see fit to sue.
--
Toro
I have actually seen such a problem myself: Controllers crashing because someone was testing the network. The problem was, ofcourse, that the CPU spent a lot of time to handle the amount of packages on the network and therefore didn't have time enough for it's real-time application. (It didn't help that the platform didn't support DMA.)
Solution: Make the network interrupt handler threaded and prioritize it below the real-time application. Sure, that doesn't help the SCADA performance, but you have to make sure that the real-time application meets it's deadlines no matter what is going on on the network. I simply don't buy that you can secure a network stretching over more than 1 meter against "data storms."
A "data storm" can be caused by lots of things, even an unstable driver causing a NIC to spew garbage packets. Or an application that hits a bug and begins spewing to the network. Or a failure of Spanning Tree causing network loops to arise (which can really mess up an Ethernet).
The wierdest I ever saw was a situation at a school where the entire network (built around high-end Cisco switches) crashed hard. It took 3 hours of troubleshooting and disconnecting various segments to finally pin down the cause. It was a little mini-switch that some teacher attached to the LAN that somehow had a meltdown and began spewing "valid" Ethernet packets with all kinds of random garbage source and destination MAC addresses, random payload, and valid checksums. No hosts were attached to the mini switch, so it had to be something in its microcontroller going haywire. This cause every switch to go nuts trying to maintain its forwarding tables ("show cpu" was 100% utilization) and resulted in no traffic going anywhere. It even crossed VLAN boundaries since all the switches had "trunk" ports using tagged VLANS, so the garbage packets still made it through the entire LAN.
These things happen sometimes. Network gear is generally pretty robust, but can still fail in wierd ways.