One Failed NIC Strands 20,000 At LAX
The card in question experienced a partial failure that started about 12:50 p.m. Saturday, said Jennifer Connors, a chief in the office of field operations for the Customs and Border Protection agency. As data overloaded the system, a domino effect occurred with other computer network cards, eventually causing a total system failure. A spokeswoman for the airports agency said airport and customs officials are discussing how to handle a similar incident should it occur in the future.
Though I heard it was a switch. Same idea though- all it takes is one malfunctioning card flooding the LAN with bad packets to bring it all down.
SJW: a person who perceives an injustice, and while correcting it, commits a greater injustice.
According to the effing article, it wasn't even a server, but a goddamn desktop. How in the holy hell does a desktop take down the whole system? I can't even conceive of a situation where that could be the case on anything other than a network designed by chimps, especially through a hardware failure...A compromised system might be able to do it, but a system just going dark?
For that to have had any effect at all, that system must have been the lynchpin for a critical piece of the network...probably some Homeland security abortion tacked on to the network, or some such crap...This is like the time I traced a network meltdown to a 4 port hub (not a switch, and unmanaged hub) that was plugged into (not a joke) a T-3 concentrator on one port, and and three subnets of around 200 computers each on the other 3 ports. Every single one of the outbound cables from the $15.00 hub terminated in a piece of networking infrastructure costing not less than $10,000 dollars.
This is like that. Single point of failure in the worst possible way. Gross incompetence, shortsightedness, and general disregard for things like "uptime"; pretty much what we've come to expect from the airline industry these days. If I'm not flying myself, I'm going to be driving, sailing, or riding a goddamn bicycle before I fly commercial.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
"...said airport and customs officials are discussing how to handle a similar incident should it occur in the future."
What makes them think they'll get another shot? Rank and file voters are ready with their own plan...should a 'similar incident' by the same fools happen again.
First you see latency on a network, then you fire up a sniffer and hope to god you can get enough packets to deduce which is the flaky card without shutting down every NIC on your network.
Of course I did write a paper on this behavior years ago in my CS networking class. Taking a Snort box and a series of custom scripts to notify admins with spikes on the network outside of normal operating ranges for that device's history. However implementing this successfully in an elegant fashion has been beyond me and I just rely on Nagios to do a lot of my bidding.
CS: It is all sink or swim...oh and did I mention there are sharks in that water?
...for not firing the networking manager. The fact that they were NOT terrified that this news would get out and were too stupid to cover it up indicates he/she and their subordinates SIMPLY DON'T KNOW THEY DID ANYTHING WRONG by not putting in a sufficently montiored switch architecture which would rapidly alert IT staff and lock out the offending node.
Simply amazing. Will someone in the press publish the names of these losers so they can be blacklisted?
Just not, I think it was deliberate, some org trying to see how they could gum up the system easily. I just don't think they want to admit they got hacked.
Zero proof, I just never take governmental reasons for spectacular failures at face value any longer. I used to when I was younger, but after seeing 7,892 lies in a row-well, I just don't trust them on anything important, and I don't believe in coincidences. That is my default position. Whatever they say first-is a lie until proven otherwise, and proof that would convince me is not some spokesperson claiming such and such.
There's too much weird stuff going on, especially with the markets and money supply and an apparent outright war with some global factions in this weird economy, and I think they are going to need a serious public emergency distraction real soon now to take the heat off the ongoing meltdown, and it IS melting down right now. Those failures are not accidents in other words, nor are a lot of the other "failures" we are seeing lately, I think they have been mostly all attacks by parties at this time unknown.
It just *stinks* to me, way too much BS being spoon fed to everyone in the media for it all to be true facts. Too much in a short time frame...just ain't buying it. And the rats deserting the sinking ships, a lot of biz execs bailing out, obvious collusion with artificially propping up some fatcats, weird murmurings from overseas..looks like historically all the major shifts before WW1, very similar.
Any one event, sure, a couple, possible, not these dozens of weird events in such a short time frame....nope, not probably unless it is on purpose. This is asymmetrical warfare, or false flag, or tests or probes or...dang something. It stinks. If I could pion it down better I would, but I am looking at this daily and if see something that makes it all fit I'll do another journal entry about it.
I've been antsy about all my preps lately,. moreso than usual, even for a fanatic like me.
I'll say one thing, keep your battery supplies good and have the best possible water filter handy, and a shortwave radio with some freqs programmed in, and take another look at the bugout bags if that is what will be needed because of where folks live, and make sure you have a good supply of h95 masks and assorted other medkit stuff. I got a feeling some cities are going to be hurting soon. And that's all it is, a feeling, can't get better than that at this time.
DHS's idea of a "backup plan" will probably be to build a huge fenced area into which to dump arriving passengers when their systems are down.
Thanks alot! I just had to look up "felching". Now I'll have that disgusting imagery in my mind the rest of the afternoon!
to any problem. Just do what my company does. Have a meeting! And remember that 8 hrs. of meetings per day will truly brighten your outlook!
Ha! Ha!
Teaches you to rely on technology...
Also known as IEEE 802.3ad supports aggregating NICs to both improve overall bandwidth as well as gracefully deal with failed links.t rol_Protocol
More info at http://en.wikipedia.org/wiki/Link_Aggregation_Con
Systems seem to be more commonly shipping with multiple NICs (esp. servers) so maybe this will be used more and more. It is important to note that the network switch/router needs to be able to support LACP (dumb/cheap switches do not while expensive/managed ones do) so that might be a barrier. Cisco switches and maybe others have implemented proprietary trunking/aggregation schemes but this 802.3ad is a standard.
In practice, I tried to use LACP with a Linksys SRW2048 $800 switch (targeted at small-businesses, much cheaper than typical managed switch) but it did not work reliably (performance got worse, some clients could not connect/timed-out.) Still working on it.
if it takes half an hour for a 'homeland security officer' to write down your job, because the _spell_checker_ doesn't know the word 'physics'... (and it is obvious that the woman behind the desk is _never_ going to
I'm sorry, but I'm kind of glad that people from the USA had to experience this stupid border protection.
Homeland security is a master of FUD btw, shouting at people to 'get in the line' and stuf to make them nervous, so possible terrorist start pissing in their pants and handing over bombs...
Always a warm welcome flying to the usa these days.
(Posting anonymously, because this homeland security crap scares me.)
To all you novice net admins out there: network cards do *not* like chunky peanut butter! Smooth/creamy only, please.
Now you see what happens when some joker thinks [s]he can get away with using chunky for something as critical as proper care and feeding of network cards. Pfft.
Bah! Kids these days... I tell ya. Probably the same folks that think the interwebnet is the same as the World Wide Web.
Great, Scott! What's next?!
Running 'Nix is like owning a Lightsaber. It's "a more elegant weapon for a more civilized time."
Wait, what?
If it was running Ubuntu and had the same hardware, they could have experienced the same problem as these guys.
The game.
When the U.S. Government is letting millions of illegal aliens cross over from Mexico and live here with impunity, then what the fuck is the point with stopping a few thousand document carrying people getting off of planes from entering the country?
I guess the system exists to give the appearance that the feds actually give a shit.
And then the Pres and Congress wonder why their approval ratings are as small as their shoe sizes...
Out of order? Fuck! Even in the future nothing works! - Dark Helmet (Rick Moranis) "Spaceballs"
Years ago we had a 10BT nic go defective so that whenever the nic was plugged into the switch it would obliterate traffic on that segment. The fun part: EVEN IF THE NIC WAS NOT PLUGGED INTO THE PC. Luckily that happened in one of the few areas that had switches at the time, everything else was one huge flat lan.
It seems the controller in L.A. suggested the same thing years ago. (pdf)
1 2152003.pdf
http://www.lacity.org/ctr/press/ctrpress18616087_
Got Trader Joe's? friendwich.com RSS feeds work now!
How about doing regular police work instead of pre crime, so that passengers don't have to stand around while your network flakes out?
Friends don't help friends install M$ junk.
We had something similar happen at my building when I worked at Kent State University. The air conditioning was being worked on and the workers thought it would be a good idea to plug an AC unit into the server room, something they had been specifically told not to do. The Additional load of the AC flipped the breaker and set off all the alarms, all the switches lost power and backup units shutdown all servers. It wouldn't have been so bad except that all university traffic ends up going through our building for internet, which caused all routers to become backed up. ALL of them. What made things worse was the new sysadmin didn't know about some of the backup systems, and no one knew how to reset the breakers (it was a special system) plus there was a special pin that had to be used that no one knew. It was a hillious 2 hours without internet.
Except in the future, the incident isn't going to be similar, aside from being similarly boneheaded. This attitude of "only defend yourself from things that have already happened to you before" is just plain dumb. Obviously their system was set up and administered by a boneheaded organization to begin with, and now that same boneheaded organization is rushing to convene a committee to discuss a committee to discuss how to prevent something that already happened from happening again. The root flaw is still in the organization.
In Repressive Burma, it's not just your connection that dies. slashdot.org/comments.pl?sid=314547&cid=20819199
The NIC that failed isn't the part that's at fault. NICs fail, and can be counted on to do so inevitably, if relatively unpredictably (MTBF is statistical).
The real problem NIC is the one that wasn't there as backup. Either a redundant one already online, or a hotswap one for a brief downtime, or just a spare that could be replaced after a quick diagnostic according to the system's exception handling runbook of emergency procedures.
Of course, we can't blame a NIC that doesn't exist, even if we're blaming it for not existing. We have to blame the people who designed and deployed the system with the single point of failure, and the managers and oversight staff who let the airport depend on that single point of failure.
But instead I'm sure we'll blame the dead NIC. Which gave its life in service to its country.
--
make install -not war
Where I work, if there's a packet storm someplace (server is getting attacked, server is attacker, or someone just has a really phat pipe on the other end and is moving a ton of data) we get a SNMP TRAP for packet threshold on the offending port. BAM! You know where the problem is, and since we have managed switches you just shut off the port if you can't resolve the problem.
Having said that, since the managed switches are gigE uplinked and each port is only 10/100, I don't think we've ever had a problem where a server was outbounding and brought down the switch/network (just made some extra latency). We've had some really large inbounds occasionally take down a whole switch, and heaven forbid some idiot shuts the port off on an inbound attack instead of nulling it at the border, cause then the ARP drops and the DOS gets forwarded to every port on the VLAN on a ton of switches.. but a broken NIC packet storming would not have been an issue.
OK, so maybe they don't have managed switches all the way down the to the lowest point on the network. They should still have SOME further up the chain and be monitoring them such that they know from what direction the problem is coming, and shut it off / look at it with a sniffer etc.
Infrastructure that is as important as an airport should have it's own infrastructure properly equipped and maintained with managed equipment, making this nearly a non-issue and certainly one easily resolved.
I used to work for a very large travel agency and have seen queues of travel resevations get pretty backed up and cause problems before although on a smaller scale.
Most reservations are checked for problems automatically but pushed through by a person and moved from one queue to another. If the program that checks them crashes, it can back things up.
I remember a program crashing and a queue getting 2000+ reservations in it before someone figured out what was going on and it had things screwed up for about 2 days while a replacement computer gradually cleared the queue out.
Contrary to popular belief, Unix is user friendly. It just happens to be particular about who it makes friends with.
Is this an out-take from the "BRADY BUNCH"?
Haven't you mods ever seen Airplane?
You're right to a point. An ethernet frame, along with the source and destination addresses, has a checksum. A switch that is using a store and forward procedure is supposed to drop the frame if the checksum is invalid. If the nic was throwing garbled frames onto the network, it would have to be garbled in such a way as to have a valid checksum (assuming they are using store and forward switches in the first place).
When our name is on the back of your car, we're behind you all the way!
Perhaps they upgraded their token ring to thinnet.
When our name is on the back of your car, we're behind you all the way!
This brings out an obvious point, despite the advances we have made in computing and IT, it is still relatively young and not that robust.
This is the equivalent of your car stops working and the 'check engine' light does not even come on. At least now some of the technology for cars is getting to the point that it will find the problem for you. The same still cannot be said for large computer networks.
When people stop treating computers as flawless wonder machines, then we shall see some real progress made.
Change to Wifi because that can't have NIC faults.
C'mon folk... help me out here!
Engineering is the art of compromise.
So a desktop got infected and started to Spew crap onto the network. Then we blame it on the nic it self..
HaH security what is that? when 1/2 our personal information can be found on p2p networks because government employees can't actually do the job they have to screw around and download music / movies or whom knows what else. What would make air port employees any different..
When you boil down to the root problem you'll find it's Lack of leaders ship allowing these problems / attitudes to exist.
And Great out tax dollars paid for this Screw up.
Got any references or links to various tutorials and/or documents on how I could setup my network to notify me about a rogue NIC?
This space is not for rent.
Sadly, many real-world systems are often nothing like what people might envision as them as. We all sit back in our chairs reading slashdot and thinking everything is masterfully architected, fully HA, redundant, etc.
Then as you work more places you start seeing that this is pretty far from actual truth. Many "production" systems are held together by rubber bands, and duct tape if you're lucky (but not even the good kind.) In my experience it can be a combination of poor funding, poor priorities, technical management that doesn't understand technology, or just a lack of experience or skills among the workers.
Not every place is a Google or Yahoo!, that I can imagine look and smell like technology wherever you go on their fancy campuses. Most organizations are businesses first, and tech shops last. If software and hardware appears to "work", it is hard to convince anybody in a typical business that anything should change- even if what is "working" is a one-off prototype running on desktop hardware. It often requires strong technical management and a good CIO/CTO to make sure that things happen like they should.
I suspect that a lot of things that we consider "critical" in our society are a hell of a lot less robust under then hood than anything Google is running.
Now that being said, a good slice and dice of a network will save you some heartache but it will not solve the problem as at some point your miracle subnets (I will assume you meant vlans) will all have to connect somewhere, now you have reduced the chance of a nic failing as there are less NICs and all are confined to their respective homes, but you still are dealing with NICs and they still short out, go bad, mice chew, and all goes haywire.
Of course you dont know these things unless you ahve actually experienced the scenarios (experienced it here with equipment less than a year old), on a large network and by large I mean at least 1000+ nodes.
My solution to the problem? Get away from large enterprise networks and stick to smaller ones, I really enjoy the perks of having a sub 500 node network and I have the time and can afford the equipment to cut things up properly.
CS: It is all sink or swim...oh and did I mention there are sharks in that water?
Are they saying that one bad card destroyed other cards? That seems a bit unusual.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
Once again, the terrorists are beaten to the punch by good-old-fashioned American incompetence.
By disabling one of our major airports, we've preemptively removed a major terrorist target. Beat that, FEMA! Score one for the FAA!
who forgot to pay the IT guy? last time i checked shouldn't most switches correct this issue... i could see them using hubs and thats what really caused the overload :)
(yes i know i suck at spelling fell free to correct my grammar and/or spellin i dont care, im still not going to change
Yeah, my employer makes monitoring software, so obviously I'm biased here, but... are you freaking joking me? They couldn't bother to install a monitoring system to warn them their NIC was fried? Hmm... I can think of one offhand ;)
-John Mark
Hyperic Community Manager
Like most of the TSA and the DHS poor design abetted with high levels of secrecy and politicization results in a fragile system. With a large number of single points of failure and a lack of fall-backs and fail-safes means that one problem hoses the system. One wonders what happens when things really go pear shaped in an unanticipated fashion.
A workstation NIC takes down a network? Never happened to me. Complete BS and total snafu. Where were spare parts? See the seque; "Mission Uncritical" staring either Jesus or Tom Cruise.
It is laughable that there is no non-computerised backup for the system. (How about filling out the forms and scanning them in later?)
This story is incomplete:
like the time I traced a network meltdown to a 4 port hub (not a switch, and unmanaged hub) that was plugged into (not a joke) a T-3 concentrator on one port, and and three subnets of around 200 computers each on the other 3 ports. Every single one of the outbound cables from the $15.00 hub terminated in a piece of networking infrastructure costing not less than $10,000 dollars.
Tell me that the old hub was hidden in a ceiling tile and that it melted because the HVAC dude thought it was unused. Then you will have matched the worst story I have heard yet.
Friends don't help friends install M$ junk.
Google search chattering NIC. You want to emulate this? Take a NIC and hard code it to 100/full and set your switch to auto or 100/half. Now start transfering a ton a data to the server. Watch what happens to the other network devices on that switch. A chattering NIC is similar. Sometimes worse.
Sometimes stuff just happens.
I agree, but the scope of the problem is much larger.
Americans are still designing systems (and I'm talking WHOLE systems, not just the computers) for the industrial revolution. Much the same way, we're educating our kids for the same purpose- to make them cogs for manufacturing.
The Japanese have a more 'cellular' structure, as opposed to the 'pyramid' designed back a couple of 'turns of the century' ago. One man on top drives five, who drive 200, who drive them all. But the Japanese model is more like object orientation: each unit has private parts. So long as the command it's given produces the proper results and stays within budget, who cares?
Assembly lines gather at their meetings and decide policy on their own. "Fred has been late 3 times this week; do we care?" and the only people to whom it matters, decide. There's no need for a strict, top-down policy, especially since only tiny organizations all do only one job.
Imagine the broken structures in a holding company; they own a newspaper, a carwash and a grocery store; the top man can't say "We'll only use glass containers", because that would be a disaster in a car wash. They can't say "we choose leaded inks" which might be fine for the car wash, but danger at the newspaper. Each unit has it's own purpose.
So how about giving the network admins the power to do *whatever* it takes to let them keep the equipment up to date? As long as it runs, under budget, and doesn't get'em on the newspapers, who cares about the specifics? Why not let the unused budget from every year sit in an account (not being taken back) and use THAT to improve infrastructure?
If these guys were able to have that kind of control, this discussion wouldn't be happening.
--- For a good time mail uce@ftc.gov
Linksys 1 LAX 0
I blame Microsoft since they took Clippy out of Word. He was my only friend.
Yes NICs can go crazy and start blasting broadcasts or Unicasts over your network, if you have a Cisco switch (or any other that supports storm control like features) you may want to enable it, it costs you nothing but the time it takes you to update the config. on the access switch (the one connected to your PCs) get into config mode at type this on every interface that connects directly to a PC (use the interface range command to speed things up if you want). Switch(config-if)#storm-control unicast level X where X is the percent of total interface bandwidth you specify as the threshold for cutting access to that port. Its measure every second, so if you have 100 meg port and you set it to 30, if the PC pushes more than 30 meg a sec in unicasts the switch kills the port, till the pc calms down, if its a 10 meg port the 30 then equals 3 meg, etc etc. You can also add a second line to control broadcasts by changing the word unicast to broadcast. If that had this in place, when the NIC went nuts, the switch would have killed the port, and no outage (I assume a lot here, but you get the point).
Man, when a Token Ring card went bad, it was hell on the network, nothing worked because the token would not get passed properly.
The worst thing is when a user decides to unplug the cable to move something or whatever. Then the token can fall out and you have to spend hours on your hands and knees with a magnifying glass trying to find the damn thing!
Its true! I saw it in a Dilbert cartoon!
In the free world the media isn't government run; the government is media run.
The CNN article was older and adds nothing to the main point of the discussion, that a cheap network component caused such an extensive failure. It appears the only purpose of the CNN page was its higher body count.
When did I say Cisco routers? Thanks for the advice though. We have a very modest sun fire that monitors close to ten thousand cable modems using a system similar to what (I think) I'm describing. You're right about that, and the UDP traffic below. But again, my idea of checking up on the clients, not the switch, might lead you to the root of the problem more easily. I would not use it as my only tool however. With very large networks like this, more then one type of monitoring is a good thing. A few sflow/netflow collectors, like ntop etc.. could also be useful. Add an snmp graphing server for your routers and more centered equipment, like MRTG or PRTG, and some custom stuff such as these scripts I like, and you have tools to find out who's crashing your network. Cisco equipment seems to not have a problem forwarding data to a netflow collector BTW. I've also used dnstop, which is great for finding bad clients / DoS, but that won't find a wonky Ethernet card I imagine. Once in a college dorm I followed the most active orange lights down the tree of 10Mb Cisco switches all the way to a very senile old ISA Ethernet card in a 386. It was bringing about 200 clients down. We kindly upgraded the kids computer. Network latency quite suddenly started happening in very set intervals for about a day, strangely correlating to in class / out of class times, so we kind of knew it was an Ethernet card. Managing a cable plant is not "thousands of ports" though. A CMTS will discriminate amongst hundreds of modems on one port thanks to the quite amazing upstream scheduling algorithms built into a few ASICs and modems don't fail into a crash the network state. Crappy Ethernet cards fail into a crash the modem state though. Often.
I cannot unsee what I saw'ed.
Thanks for making me curious enough to look it up as well. >_
So, what are you telling me, that their tubes got clogged?
The Christian Right is Neither (Christian nor right). See: Matthew 23, Matthew 25, Ezekiel 16:48-50
This reminds me of a situation that happened on my campus network. Somebody in one of the campus dorm rooms plugged in a router backwards. The router took down much of the network because it was serving up DHCP addresses. The IT people were able to track down the offending segment and shut it off within two hours.
I dated this nerdy girl named "Freda Felcher." We did some nasty things together, like when I tea-bagged her with a computer mouse while a force-feedback joystick hummed along in her anus. God, she did everything for me. She didn't like eels, however she like elephant trunks. :-(
i'm not even an operations guy but even i know enough to say "when it occurs in the future", not "should it occur in the future".
...so we can segment our networks so they don't *all* get hosed when something like this happens?
Didnt someone tell them its a series of tubes not a single tube... duh.
It seems to me that some of the new Network Access Control (NAC) technologies might be able to mitigate such a situation - they can automatically shutdown switch ports that are using excessive bandwidth or doing other "naughty things".
I can think of at least one way that a bad NIC can take out other hardware... Etherkiller. Kids, don't try this at home!
Did you mount a military-grade, variable-focus MASER on an unlicensed artificial intelligence?
Why can't you just make it work? It worked fine like this before when the switch was in the other closet, I'm tired of spending money replacing these things.
Not wanting to name names, but /me works at a large financial institution in Germany's southwest, and we are only now phasing out Token Ring equipment in our branch offices. Main site has mostly been migrated to Ethernet something like 2 years ago, aside from some legacy equipment that we have to maintain for backward compatibility and/or auditing requirements. Oh, and I still have a classic IBM 8228 MAU below my desk. Nothing's more satisfying than the "click" of the closing relay at the end of the day. Ah, those were the days... when real network engineers didn't need mains power to run their equipment, and could tell a successful insert into the ring without looking at some fancy LEDs.