One Failed NIC Strands 20,000 At LAX

← Back to Stories (view on slashdot.org)

One Failed NIC Strands 20,000 At LAX

Posted by kdawson on Wednesday August 15, 2007 @07:56AM from the comp-dot-risks dept.

The card in question experienced a partial failure that started about 12:50 p.m. Saturday, said Jennifer Connors, a chief in the office of field operations for the Customs and Border Protection agency. As data overloaded the system, a domino effect occurred with other computer network cards, eventually causing a total system failure. A spokeswoman for the airports agency said airport and customs officials are discussing how to handle a similar incident should it occur in the future.

15 of 293 comments (clear)

Min score:

Reason:

Sort:

That's all it takes by Marxist+Hacker+42 · 2007-08-15 04:55 · Score: 1, Interesting

Though I heard it was a switch. Same idea though- all it takes is one malfunctioning card flooding the LAN with bad packets to bring it all down.

--
SJW: a person who perceives an injustice, and while correcting it, commits a greater injustice.
1. Re:That's all it takes by Jeremiah+Cornelius · 2007-08-15 06:53 · Score: 2, Interesting
  
  Then that would lead me to think "hub", not switch. Or just a really shitty switch...
  
  --
  "Flyin' in just a sweet place,
  Never been known to fail..."
2. Re:That's all it takes by KillerCow · 2007-08-15 08:15 · Score: 5, Interesting
  
  I am not a networks guy... but it's my understanding that a switch acts like a hub when it sees a TO: MAC address that it doesn't know what port it's on. They learn the switching structure of a network by watching the FROM fields on the datagrams. When the switch powers up, it behaves exactly like a hub and just watches/learns what MAC addresses are on which ports and builds a switching table. If it starts getting garbage packets, it will look at the TO field and say "I don't know what port this should go out on, so I have to send it on all of them." So garbage packets would overwhelm a network even if it was switched.
  
  It would take a router to stop this from happening. I don't think that there are many networks that use routers for internal partitioning. Even then, that entire network behind that router would be flooded.
3. Re:That's all it takes by Kadin2048 · 2007-08-15 08:23 · Score: 5, Interesting
  
  Would you think that LAX is running anything that out-of-date or crappy? I assume that they're running everything with spit, duct tape, wishful thinking, ancient custom software, near-fossilized hardware, and Excel spreadsheets ... just like pretty much everything else in the public sector.
  
  I've seen what's running some government agencies, and it's frightening.
  
  --
  "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
4. Re:That's all it takes by quanticle · 2007-08-15 10:55 · Score: 2, Interesting
  
  Surely management understands that redundancy is good.
  
  No. In managements' eyes, redundancy is bad. You're paying twice as much, but you're not getting any extra functionality in return.
  
  --
  We all know what to do, but we don't know how to get re-elected once we have done it
5. Re:That's all it takes by Kadin2048 · 2007-08-15 15:10 · Score: 2, Interesting
  
  I pity you, your state and everyone else using Access.
  
  Yeah, Access is a piece of shit. Unfortunately, it's a lot better than using Excel as a database, which is in many cases the alternative that I've witnessed.
  
  There are also a lack of alternatives: you have FileMakerPro, which is neat (I like it) but not very appealing to some because it has a significant learning curve compared to Access and is also proprietary and expensive; aside from that you have OO.org's Base, which is still immature; and then you've got custom SQL+webforms, which is usually the right choice for non-trivial projects, but requires users to realize the scope of their project at the outset.
  
  And as crummy as Access is, at least it gives you a path towards a separate frontend/backend. You don't get that when each employee is keeping their own critical information on a massive spreadsheet on their workstation's hard drive. And in more places than I'd like to think about, that's the way things work -- it's the dark side of giving every employee an actual computer as opposed to a dumb terminal.
  
  --
  "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
6. Re:That's all it takes by Anonymous Coward · 2007-08-15 20:18 · Score: 1, Interesting
  
  If the system was based on decent Layer 3 switches for all ports, configured to perform TCP/IP switching across all segments, then only the switch itself would be able to take down the network.
  
  That being said, the cost of wiring every port in an international airport for Layer 3 and ONLY Layer 3 switching would cost about $300 per port and that's probably for like 5000 ports. So figure a minimum of $1.5 million for network switches.
  
  In all other mission critical systems I've ever worked on (specifically in telecom and power) there are timelines for replacing equipment on schedules to avoid equipment reaching end of life. At the very least the equipment would be sent back to the manufacturer for refurbishing which would even include reflowing all the solder, often costing more than new equipment. It would add about $1.5 million + $3-$5 million installation+configuration every 2-3 years to the IT budget of the airport.
  
  That really is chump change to an international airport, but let's be serious, if we look at standards based on this is chump change or that is chump change, pretty soon, you have some freak with a post-it note fetish wallpapering a 400 sq. meter area with heart and pony shaped post-it notes because after all, what's $1 million on the grander scale, the airport won't even miss it. In something closer to reality, it means that people will be less conservative about upgrading from a 17" LCD to a 19" LCD just because it's chump change.
  
  Given the uncountable number of critical systems within an airport, security, fire department, fuel containment, tire refurbishing, wing repair, (yes, I'm making this crap up) etc... network switches just don't sound that important to the people regularly making purchasing decisions on a daily basis that could actually decide the lives of 200 or more passengers on each plane to take off or land.
  
  It puts things in perspective sometimes to recognize that an outage that forces and airport to reroute traffic safely to other airports and delay flights isn't as important as decisions that make sure fuel tanks and hoses don't explode causing the death of 500 or more people. I'm sure however that the airport will now dedicate a little more money to networking to try and keep this from happening again.
Re:Whiskey Tango Foxtrot by Jeremiah+Cornelius · 2007-08-15 08:01 · Score: 3, Interesting

Well.

Token ring sure used to fail like this! 1 bad station sending 10,000 ring-purge messages a second? Still, it was a truck. Files under 1Mb could be transferred, and this was TR/4, not 16!

--
"Flyin' in just a sweet place,
Never been known to fail..."
You figure it out by COMON$ · 2007-08-15 08:01 · Score: 3, Interesting

Let me know, knowing how to prevent failure to to a flaky nic on a network is a very large issue.
First you see latency on a network, then you fire up a sniffer and hope to god you can get enough packets to deduce which is the flaky card without shutting down every NIC on your network.
Of course I did write a paper on this behavior years ago in my CS networking class. Taking a Snort box and a series of custom scripts to notify admins with spikes on the network outside of normal operating ranges for that device's history. However implementing this successfully in an elegant fashion has been beyond me and I just rely on Nagios to do a lot of my bidding.

--
CS: It is all sink or swim...oh and did I mention there are sharks in that water?
Re:Whiskey Tango Foxtrot by mhall119 · 2007-08-15 08:04 · Score: 2, Interesting

A compromised system might be able to do it, but a system just going dark? The article says it was a partial failure, so I'm guessing the NIC didn't "go dark", instead it started flooding the network with bad packets.

--
http://www.mhall119.com
Re:Whiskey Tango Foxtrot by Anonymous Coward · 2007-08-15 08:09 · Score: 1, Interesting

I'm guessing the NIC didn't "go dark", instead it started flooding the network with bad packets. Yeah, and any decent switch (and some not-so-decent) would detect this and shut the port down.

Hell, I have a 7 year old dlink 8-port at home that can do this!
Re:Whiskey Tango Foxtrot by Billosaur · 2007-08-15 08:11 · Score: 2, Interesting

And beyond that... how come there is no redundancy? After 9/11, every IT organization on the planet began making sure there was some form or fail-over to a backup system or disaster recovery site to ensure that critical systems could not go down as the result of something similar or some other large-scale disaster. Not only was this system cobbled together apparently, there was no regard for the possibility of it failing for any reason.

--
GetOuttaMySpace - The Anti-Social Network
nic can take down a segment by KDN · 2007-08-15 08:24 · Score: 3, Interesting

Years ago we had a 10BT nic go defective so that whenever the nic was plugged into the switch it would obliterate traffic on that segment. The fun part: EVEN IF THE NIC WAS NOT PLUGGED INTO THE PC. Luckily that happened in one of the few areas that had switches at the time, everything else was one huge flat lan.
1. Re:nic can take down a segment by KDN · 2007-08-15 10:26 · Score: 2, Interesting
  
  Excuse me, but why the hell did you test for that in the first place?
  It was during the debugging phase. We got it to occur, and then turned off one machine at a time. When all the machines on the segment were off and the switch was still jabber isolated we all went "WTF?!" and then started unplugging cables.
The scope of the problem by WheelDweller · 2007-08-15 11:47 · Score: 5, Interesting

I agree, but the scope of the problem is much larger.

Americans are still designing systems (and I'm talking WHOLE systems, not just the computers) for the industrial revolution. Much the same way, we're educating our kids for the same purpose- to make them cogs for manufacturing.

The Japanese have a more 'cellular' structure, as opposed to the 'pyramid' designed back a couple of 'turns of the century' ago. One man on top drives five, who drive 200, who drive them all. But the Japanese model is more like object orientation: each unit has private parts. So long as the command it's given produces the proper results and stays within budget, who cares?

Assembly lines gather at their meetings and decide policy on their own. "Fred has been late 3 times this week; do we care?" and the only people to whom it matters, decide. There's no need for a strict, top-down policy, especially since only tiny organizations all do only one job.

Imagine the broken structures in a holding company; they own a newspaper, a carwash and a grocery store; the top man can't say "We'll only use glass containers", because that would be a disaster in a car wash. They can't say "we choose leaded inks" which might be fine for the car wash, but danger at the newspaper. Each unit has it's own purpose.

So how about giving the network admins the power to do *whatever* it takes to let them keep the equipment up to date? As long as it runs, under budget, and doesn't get'em on the newspapers, who cares about the specifics? Why not let the unused budget from every year sit in an account (not being taken back) and use THAT to improve infrastructure?

If these guys were able to have that kind of control, this discussion wouldn't be happening.

--
--- For a good time mail uce@ftc.gov