One Failed NIC Strands 20,000 At LAX

Re:That's all it takes by Jeremiah+Cornelius · 2007-08-15 06:53 · Score: 2, Interesting

Then that would lead me to think "hub", not switch. Or just a really shitty switch...

--
"Flyin' in just a sweet place,
Never been known to fail..."

Whiskey Tango Foxtrot by SatanicPuppy · 2007-08-15 07:58 · Score: 5, Insightful

According to the effing article, it wasn't even a server, but a goddamn desktop. How in the holy hell does a desktop take down the whole system? I can't even conceive of a situation where that could be the case on anything other than a network designed by chimps, especially through a hardware failure...A compromised system might be able to do it, but a system just going dark?

For that to have had any effect at all, that system must have been the lynchpin for a critical piece of the network...probably some Homeland security abortion tacked on to the network, or some such crap...This is like the time I traced a network meltdown to a 4 port hub (not a switch, and unmanaged hub) that was plugged into (not a joke) a T-3 concentrator on one port, and and three subnets of around 200 computers each on the other 3 ports. Every single one of the outbound cables from the $15.00 hub terminated in a piece of networking infrastructure costing not less than $10,000 dollars.

This is like that. Single point of failure in the worst possible way. Gross incompetence, shortsightedness, and general disregard for things like "uptime"; pretty much what we've come to expect from the airline industry these days. If I'm not flying myself, I'm going to be driving, sailing, or riding a goddamn bicycle before I fly commercial.

--
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.

Re:Whiskey Tango Foxtrot by Jeremiah+Cornelius · 2007-08-15 08:01 · Score: 3, Interesting

Well.

Token ring sure used to fail like this! 1 bad station sending 10,000 ring-purge messages a second? Still, it was a truck. Files under 1Mb could be transferred, and this was TR/4, not 16!

--
"Flyin' in just a sweet place,
Never been known to fail..."
Re:Whiskey Tango Foxtrot by mhall119 · 2007-08-15 08:04 · Score: 2, Interesting

A compromised system might be able to do it, but a system just going dark? The article says it was a partial failure, so I'm guessing the NIC didn't "go dark", instead it started flooding the network with bad packets.

--
http://www.mhall119.com
Re:Whiskey Tango Foxtrot by MightyMartian · 2007-08-15 08:07 · Score: 5, Insightful

If the NIC starts broadcasting like nuts, it will overwhelm everything on the segment. If you have a flat network topology, then kla-boom, everything goes down the shits. A semi-decent switch ought to deal with a broadcast storm. The best way to deal with it is to split your network up, thus rendering the scope of such an incident significantly smaller.

--
The world's burning. Moped Jesus spotted on I50. Details at 11.
Re:Whiskey Tango Foxtrot by Billosaur · 2007-08-15 08:11 · Score: 2, Interesting

And beyond that... how come there is no redundancy? After 9/11, every IT organization on the planet began making sure there was some form or fail-over to a backup system or disaster recovery site to ensure that critical systems could not go down as the result of something similar or some other large-scale disaster. Not only was this system cobbled together apparently, there was no regard for the possibility of it failing for any reason.

--
GetOuttaMySpace - The Anti-Social Network
Re:Whiskey Tango Foxtrot by sigipickl · 2007-08-15 08:16 · Score: 2, Informative

This totally sounds like a token ring problem.... Either network flooding or dropped packets (tokens). These issues used to be a bear to track down- going from machine to machine in serial from the MAU...

Ethernet and switching has made me fat- I never have to leave my desk to troubleshoot.

--
Never trust anyone who takes pride in being called a 'geek'....
Re:Whiskey Tango Foxtrot by kylemonger · 2007-08-15 08:41 · Score: 2, Insightful

Do these people hire idiots with no training or experience or what?
I think just hiring idiots would be enough. No need to train them.
Re:Whiskey Tango Foxtrot by dave562 · 2007-08-15 08:47 · Score: 2, Insightful

They concentrated all of the redundancy dollars into layer B of the OSI model... the bureaucracy. There wasn't anything left for the lower layers.
Re:Whiskey Tango Foxtrot by Jaxoreth · 2007-08-15 09:53 · Score: 4, Funny

Still, it was a truck.
Which explains why it's not used in the Internet.

--
In general, it is safe and legal to kill your children. -- POSIX Programmer's Guide
Re:Whiskey Tango Foxtrot by Solandri · 2007-08-15 10:28 · Score: 2, Funny

Yeah, I had that happen at a small business I consulted for. Their flat LAN died. I eventually tracked the problem down to a cheap unmanaged switch which had a network cable plugged into it for people to plug their laptops into. Whoever used it last thought leaving the unplugged cable laying on the desk looked untidy, so they "helpfully" plugged it into an empty socket on the same switch.
Re:Whiskey Tango Foxtrot by flappinbooger · 2007-08-15 14:08 · Score: 2, Funny

But Token Rings are, like, obsolete and stuff, surely there wouldn't be something that obsolete in a place like an airport, right?

Right?

[crickets chirping]

Right?

--
Flappinbooger isn't my real name

Re:That's all it takes by Svet-Am · 2007-08-15 08:00 · Score: 5, Insightful

Of course they're running old and outdated hardware. When thing work, particularly in a mission critical situation, you don't touch them! Even if the IT admins knew that computer was old and on the brink of dying, how are they supposed to convince the suits and beancounters of that? Non-technical people take the approach that since computers are inherently binary (work or no-work) that if the machine is up and running _right now_ then there is no problem and no sense on spending money to replace it.

If the IT folks were clueless about this machine's age or condition, then the blame lies solely with them for not knowing what the hell they were doing. However, if it was the other folks who shot the IT folks down about upgrading then "welcome to the current state of business", unfortunately.

--
[move .sig! for great justice, take off every .sig!]

In other news... by djupedal · 2007-08-15 08:01 · Score: 2, Insightful

"...said airport and customs officials are discussing how to handle a similar incident should it occur in the future."

What makes them think they'll get another shot? Rank and file voters are ready with their own plan...should a 'similar incident' by the same fools happen again.

You figure it out by COMON$ · 2007-08-15 08:01 · Score: 3, Interesting

Let me know, knowing how to prevent failure to to a flaky nic on a network is a very large issue.

First you see latency on a network, then you fire up a sniffer and hope to god you can get enough packets to deduce which is the flaky card without shutting down every NIC on your network.

Of course I did write a paper on this behavior years ago in my CS networking class. Taking a Snort box and a series of custom scripts to notify admins with spikes on the network outside of normal operating ranges for that device's history. However implementing this successfully in an elegant fashion has been beyond me and I just rely on Nagios to do a lot of my bidding.

--
CS: It is all sink or swim...oh and did I mention there are sharks in that water?

Re:You figure it out by GreggBz · 2007-08-15 08:21 · Score: 4, Informative

One not to unreasonable strategy is to set up SNMP traps on all your NICs. This is not unlike the cable modem watching software at most Cable ISPs.

At first, I can envision it being a PITA if you have a variety of NIC hardware especially finding all those MIBs. But they are all pretty standard these days, and your polling interval could be fairly long, like every 2 minutes. You could script the results, sorting all the naughties and periodic non-responders to the top of the list. That would narrow things down a heck of a lot in a circumstance like this.

No alarms, but at least a quick heartbeat of your (conceivably very large) network. A similar system can be used to watch 30,000+ cable modems, without to much load on the snmp trap server.
Re:You figure it out by ctr2sprt · 2007-08-15 10:32 · Score: 5, Informative

One not to unreasonable strategy is to set up SNMP traps on all your NICs.

That doesn't make much sense. If the NIC goes down or starts misbehaving, the chances of your NIC's SNMP traps arriving at their destination is effectively zero. You probably mean setting up traps on your switches with threshold traps on all the interfaces, the switch's CPU, CAM table size, etc. Which would be more useful. You could also use a syslog server, which is going to be considerably easier if you don't have a dedicated monitoring solution.

But they are all pretty standard these days, and your polling interval could be fairly long, like every 2 minutes.

You're not thinking of traps if you're talking about polling. Traps are initiated by the switch (or other device) and sent to your log monster. You can use SNMP polling of the sort that e.g. MRTG and OpenNMS do which, with appropriate thresholds, can get you most of the same benefits. But don't use it on Cisco hardware, not if you want your network to function, anyway. Their CPUs can't handle SNMP polling, not at the level you're talking about.

No alarms, but at least a quick heartbeat of your (conceivably very large) network. A similar system can be used to watch 30,000+ cable modems, without to much load on the snmp trap server.

I think you are underestimating exactly how much SNMP trap spam network devices send. You'll get a trap for the ambient temperature being too high. You'll get a trap if you send more than X frames per second ("threshold fired"), and another trap two seconds later when it drops below Y fps ("threshold rearmed"). You'll get at least four link traps whenever a box reboots (down for the reboot, up/down during POST, up when the OS boots; probably another up/down as the OS negotiates link speed and duplex), plus an STP-related trap for each link state change ("port 2/21 is FORWARDING"). You'll get traps when CDP randomly finds, or loses, some device somewhere on the network. You'll get an army of traps whenever you create, delete, or change a vlan. If you've got a layer 7 switch that does health checks, you'll get about ten traps every time one of your HA webservers takes more than 100ms to serve its test page, which happens about once per server per minute even when nothing is wrong.

And the best part is that because SNMP traps are UDP, they are the first thing to get thrown away when the shit hits the fan. So when a failing NIC starts jabbering and the poor switch's CPU goes to 100%, you'll never see a trap. All you'll see are a bunch of boxes on the same vlan going up and down for no apparent reason. You might get a fps threshold trap from some gear on your distribution or core layers, assuming it's sufficiently beefy to handle a panicked switch screaming ARPs at a gig a second and have some brains left over, but that's about it. More likely you won't have a clue that anything is wrong until the switch kicks and 40 boxes go down for five minutes.

Monitoring a network with tens of thousands of switch ports sucks hardcore, there's no way around it.
Re:You figure it out by huge · 2007-08-16 01:03 · Score: 2, Informative

And the best part is that because SNMP traps are UDP, they are the first thing to get thrown away when the shit hits the fan.
In some cases it might be better idea to use inform instead of trap.

--
-- Reality checks don't bounce.

Head of IT for LAX should be fired... by Glasswire · 2007-08-15 08:05 · Score: 3, Insightful

...for not firing the networking manager. The fact that they were NOT terrified that this news would get out and were too stupid to cover it up indicates he/she and their subordinates SIMPLY DON'T KNOW THEY DID ANYTHING WRONG by not putting in a sufficently montiored switch architecture which would rapidly alert IT staff and lock out the offending node.
Simply amazing. Will someone in the press publish the names of these losers so they can be blacklisted?

Re:Head of IT for LAX should be fired... by Rob+T+Firefly · 2007-08-15 08:17 · Score: 5, Funny

They have to find someone who can not only design a vital high-traffic network and maintain it... but who didn't have fish for dinner.

--
Slashdot Burying Stories About Slashdot Media Owned
Re:Head of IT for LAX should be fired... by kschendel · 2007-08-15 08:24 · Score: 3, Informative

RTFA. This was a *Customs* system. Not LAX, not airlines. The only blame that the airlines can (and should) get for this is not shining the big light on Customs and Border Patrol from the very start. I think it's time that the airlines started putting public and private pressure on CBP and TSA to get the hell out of the way. It's not as if they are actually securing anything.

CBP deserves a punch in the nose for not having a proper network design with redundancy; and another punch in the nose for not having any clue what to do in an outage. They should have a reduced-service backup plan, and a manual backup plan, and a diversion backup plan. There's no excuse for federal officials to sit there like idiots waiting for things to magically get fixed. Oh wait, I guess some of them ARE idiots.

The backup plan by Animats · 2007-08-15 08:08 · Score: 5, Funny

DHS's idea of a "backup plan" will probably be to build a huge fenced area into which to dump arriving passengers when their systems are down.

Re:That's all it takes by COMON$ · 2007-08-15 08:09 · Score: 2, Insightful

apparently you are not familliar with what a bad nic does to even the best of switches.

--
CS: It is all sink or swim...oh and did I mention there are sharks in that water?

LACP by dy2t · 2007-08-15 08:15 · Score: 2

Also known as IEEE 802.3ad supports aggregating NICs to both improve overall bandwidth as well as gracefully deal with failed links.
More info at http://en.wikipedia.org/wiki/Link_Aggregation_Cont rol_Protocol

Systems seem to be more commonly shipping with multiple NICs (esp. servers) so maybe this will be used more and more. It is important to note that the network switch/router needs to be able to support LACP (dumb/cheap switches do not while expensive/managed ones do) so that might be a barrier. Cisco switches and maybe others have implemented proprietary trunking/aggregation schemes but this 802.3ad is a standard.

In practice, I tried to use LACP with a Linksys SRW2048 $800 switch (targeted at small-businesses, much cheaper than typical managed switch) but it did not work reliably (performance got worse, some clients could not connect/timed-out.) Still working on it.

Re:That's all it takes by KillerCow · 2007-08-15 08:15 · Score: 5, Interesting

I am not a networks guy... but it's my understanding that a switch acts like a hub when it sees a TO: MAC address that it doesn't know what port it's on. They learn the switching structure of a network by watching the FROM fields on the datagrams. When the switch powers up, it behaves exactly like a hub and just watches/learns what MAC addresses are on which ports and builds a switching table. If it starts getting garbage packets, it will look at the TO field and say "I don't know what port this should go out on, so I have to send it on all of them." So garbage packets would overwhelm a network even if it was switched.

It would take a router to stop this from happening. I don't think that there are many networks that use routers for internal partitioning. Even then, that entire network behind that router would be flooded.

Let that be a lesson to you... by urlgrey · 2007-08-15 08:22 · Score: 3, Funny

To all you novice net admins out there: network cards do *not* like chunky peanut butter! Smooth/creamy only, please.

Now you see what happens when some joker thinks [s]he can get away with using chunky for something as critical as proper care and feeding of network cards. Pfft.

Bah! Kids these days... I tell ya. Probably the same folks that think the interwebnet is the same as the World Wide Web.

Great, Scott! What's next?!

--
Running 'Nix is like owning a Lightsaber. It's "a more elegant weapon for a more civilized time."

The whole system is pointless anyway by Potent · 2007-08-15 08:23 · Score: 4, Insightful

When the U.S. Government is letting millions of illegal aliens cross over from Mexico and live here with impunity, then what the fuck is the point with stopping a few thousand document carrying people getting off of planes from entering the country?

I guess the system exists to give the appearance that the feds actually give a shit.

And then the Pres and Congress wonder why their approval ratings are as small as their shoe sizes...

--
Out of order? Fuck! Even in the future nothing works! - Dark Helmet (Rick Moranis) "Spaceballs"

Re:That's all it takes by Kadin2048 · 2007-08-15 08:23 · Score: 5, Interesting

Would you think that LAX is running anything that out-of-date or crappy? I assume that they're running everything with spit, duct tape, wishful thinking, ancient custom software, near-fossilized hardware, and Excel spreadsheets ... just like pretty much everything else in the public sector.

I've seen what's running some government agencies, and it's frightening.

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."

nic can take down a segment by KDN · 2007-08-15 08:24 · Score: 3, Interesting

Years ago we had a 10BT nic go defective so that whenever the nic was plugged into the switch it would obliterate traffic on that segment. The fun part: EVEN IF THE NIC WAS NOT PLUGGED INTO THE PC. Luckily that happened in one of the few areas that had switches at the time, everything else was one huge flat lan.

Re:nic can take down a segment by KDN · 2007-08-15 10:26 · Score: 2, Interesting

Excuse me, but why the hell did you test for that in the first place?
It was during the debugging phase. We got it to occur, and then turned off one machine at a time. When all the machines on the segment were off and the switch was still jabber isolated we all went "WTF?!" and then started unplugging cables.

"A similar incident" by The+One+and+Only · 2007-08-15 08:30 · Score: 2, Insightful

A spokeswoman for the airports agency, said airport and customs officials are discussing how to handle a similar incident should it occur in the future.

Except in the future, the incident isn't going to be similar, aside from being similarly boneheaded. This attitude of "only defend yourself from things that have already happened to you before" is just plain dumb. Obviously their system was set up and administered by a boneheaded organization to begin with, and now that same boneheaded organization is rushing to convene a committee to discuss a committee to discuss how to prevent something that already happened from happening again. The root flaw is still in the organization.

--
In Repressive Burma, it's not just your connection that dies. slashdot.org/comments.pl?sid=314547&cid=20819199

Blaming the Wrong NIC by Doc+Ruby · 2007-08-15 08:30 · Score: 2, Insightful

The NIC that failed isn't the part that's at fault. NICs fail, and can be counted on to do so inevitably, if relatively unpredictably (MTBF is statistical).

The real problem NIC is the one that wasn't there as backup. Either a redundant one already online, or a hotswap one for a brief downtime, or just a spare that could be replaced after a quick diagnostic according to the system's exception handling runbook of emergency procedures.

Of course, we can't blame a NIC that doesn't exist, even if we're blaming it for not existing. We have to blame the people who designed and deployed the system with the single point of failure, and the managers and oversight staff who let the airport depend on that single point of failure.

But instead I'm sure we'll blame the dead NIC. Which gave its life in service to its country.

--

--
make install -not war

Managed switches are FTW by Sehnsucht · 2007-08-15 08:39 · Score: 2, Insightful

Where I work, if there's a packet storm someplace (server is getting attacked, server is attacker, or someone just has a really phat pipe on the other end and is moving a ton of data) we get a SNMP TRAP for packet threshold on the offending port. BAM! You know where the problem is, and since we have managed switches you just shut off the port if you can't resolve the problem.

Having said that, since the managed switches are gigE uplinked and each port is only 10/100, I don't think we've ever had a problem where a server was outbounding and brought down the switch/network (just made some extra latency). We've had some really large inbounds occasionally take down a whole switch, and heaven forbid some idiot shuts the port off on an inbound attack instead of nulling it at the border, cause then the ARP drops and the DOS gets forwarded to every port on the VLAN on a ton of switches.. but a broken NIC packet storming would not have been an issue.

OK, so maybe they don't have managed switches all the way down the to the lowest point on the network. They should still have SOME further up the chain and be monitoring them such that they know from what direction the problem is coming, and shut it off / look at it with a sniffer etc.

Infrastructure that is as important as an airport should have it's own infrastructure properly equipped and maintained with managed equipment, making this nearly a non-issue and certainly one easily resolved.

It depends on the switch by camperdave · 2007-08-15 08:57 · Score: 4, Informative

You're right to a point. An ethernet frame, along with the source and destination addresses, has a checksum. A switch that is using a store and forward procedure is supposed to drop the frame if the checksum is invalid. If the nic was throwing garbled frames onto the network, it would have to be garbled in such a way as to have a valid checksum (assuming they are using store and forward switches in the first place).

--
When our name is on the back of your car, we're behind you all the way!

Re:That's all it takes by EmperorKagato · 2007-08-15 09:13 · Score: 5, Insightful

Even if the IT admins knew that computer was old and on the brink of dying, how are they supposed to convince the suits and beancounters of that?

You show the suits and bean counters how much it costs the company if the system failed and time was spent recovering that system.

--
----- You know you have ego issues when you register a domain in your name.

sadly... this may be typical by bwy · 2007-08-15 09:17 · Score: 4, Insightful

Sadly, many real-world systems are often nothing like what people might envision as them as. We all sit back in our chairs reading slashdot and thinking everything is masterfully architected, fully HA, redundant, etc.

Then as you work more places you start seeing that this is pretty far from actual truth. Many "production" systems are held together by rubber bands, and duct tape if you're lucky (but not even the good kind.) In my experience it can be a combination of poor funding, poor priorities, technical management that doesn't understand technology, or just a lack of experience or skills among the workers.

Not every place is a Google or Yahoo!, that I can imagine look and smell like technology wherever you go on their fancy campuses. Most organizations are businesses first, and tech shops last. If software and hardware appears to "work", it is hard to convince anybody in a typical business that anything should change- even if what is "working" is a one-off prototype running on desktop hardware. It often requires strong technical management and a good CIO/CTO to make sure that things happen like they should.

I suspect that a lot of things that we consider "critical" in our society are a hell of a lot less robust under then hood than anything Google is running.

Re:That's all it takes by ThinkingInBinary · 2007-08-15 09:31 · Score: 3, Insightful

Of course they're running old and outdated hardware. When thing work, particularly in a mission critical situation, you don't touch them! Even if the IT admins knew that computer was old and on the brink of dying, how are they supposed to convince the suits and beancounters of that? Non-technical people take the approach that since computers are inherently binary (work or no-work) that if the machine is up and running _right now_ then there is no problem and no sense on spending money to replace it.

There's no reason you can't leave the almost-broken computer there and get a new one. You just build a backup system. Surely management understands that redundancy is good. Then, when the crappy one breaks, you can swap it out instantly. That way, you don't have to mess with things prematurely, but you're only down for hopefully a few minutes. (Of course, replacing it "intentionally", before it fails, is more reliable, but keeping a backup system is a viable alternative if nobody wants to touch the working system.)

--
ttuttle is a rankmaniac

Re:That's all it takes by Vengance+Daemon · 2007-08-15 09:41 · Score: 4, Informative

Why are you assuming that this is an Ethernet network? As old as the equipment they are using is, it may be a Token Ring network - the symptoms that were described sound just like a "beaconing" token ring network.

Re:That's all it takes by Greventls · 2007-08-15 10:04 · Score: 3, Insightful

The new system is usually extremely expensive. Why spend all that money on a new system when the old one works? I know programmers who refuse to update their code from VB3.

Re:That's all it takes by spun · 2007-08-15 10:20 · Score: 2, Funny

I work in the public sector, and we don't use spit or duct tape much. We have custom software, it's not not ancient but it's written in COBOL anyway. The hardware is mostly new IBM blades and blade centers and we're phasing out the older stuff. We use Access databases, not Excel spreadsheets. But then, we're a state agency, not the Federal Government, so we may be doing it wrong.

--
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton

Re:That's all it takes by quanticle · 2007-08-15 10:53 · Score: 5, Insightful

You show the suits and bean counters how much it costs the company if the system failed and time was spent recovering that system.

That's very difficult to do, and your estimates of the costs will be called into question. Its often impossible to predict how long it'll take to diagnose and fix a problem unless you've already diagnosed and fixed a similar problem.

Making this kind of estimate also places you into a lose-lose position. If your estimate was high, then management sees you as "chicken little" and will be more likely to dismiss further concerns as more fearmongering. If your estimate was low, then the blame for the outage will cascade down onto you for not showing/convincing management that new equipment was needed.

--
We all know what to do, but we don't know how to get re-elected once we have done it

Um, what about a paper backup?? by LeRandy · 2007-08-15 10:53 · Score: 3, Insightful

Am I the only one laughing that back in old, antiquated Europe, our passport control have the ability to read the documents, with their own eyes? Oh I forget, how are you supposed to treat your visitors like criminals if you can't take their photograph, fingerprints, and 30-odd other bits of personal data to make sure we aren't terrier-ists (fans of small dogs). It doesn't help prevent terrorist attacks, but it does give you a nice big data mine (and how are you supposed to undermine people's rights effectively if you don't know everything about them).

It is laughable that there is no non-computerised backup for the system. (How about filling out the forms and scanning them in later?)

Re:That's all it takes by quanticle · 2007-08-15 10:55 · Score: 2, Interesting

Surely management understands that redundancy is good.

No. In managements' eyes, redundancy is bad. You're paying twice as much, but you're not getting any extra functionality in return.

--
We all know what to do, but we don't know how to get re-elected once we have done it

The scope of the problem by WheelDweller · 2007-08-15 11:47 · Score: 5, Interesting

I agree, but the scope of the problem is much larger.

Americans are still designing systems (and I'm talking WHOLE systems, not just the computers) for the industrial revolution. Much the same way, we're educating our kids for the same purpose- to make them cogs for manufacturing.

The Japanese have a more 'cellular' structure, as opposed to the 'pyramid' designed back a couple of 'turns of the century' ago. One man on top drives five, who drive 200, who drive them all. But the Japanese model is more like object orientation: each unit has private parts. So long as the command it's given produces the proper results and stays within budget, who cares?

Assembly lines gather at their meetings and decide policy on their own. "Fred has been late 3 times this week; do we care?" and the only people to whom it matters, decide. There's no need for a strict, top-down policy, especially since only tiny organizations all do only one job.

Imagine the broken structures in a holding company; they own a newspaper, a carwash and a grocery store; the top man can't say "We'll only use glass containers", because that would be a disaster in a car wash. They can't say "we choose leaded inks" which might be fine for the car wash, but danger at the newspaper. Each unit has it's own purpose.

So how about giving the network admins the power to do *whatever* it takes to let them keep the equipment up to date? As long as it runs, under budget, and doesn't get'em on the newspapers, who cares about the specifics? Why not let the unused budget from every year sit in an account (not being taken back) and use THAT to improve infrastructure?

If these guys were able to have that kind of control, this discussion wouldn't be happening.

--
--- For a good time mail uce@ftc.gov

A Cisco Config to prevent this by ScaredOfTheMan · 2007-08-15 12:37 · Score: 3, Informative

Yes NICs can go crazy and start blasting broadcasts or Unicasts over your network, if you have a Cisco switch (or any other that supports storm control like features) you may want to enable it, it costs you nothing but the time it takes you to update the config. on the access switch (the one connected to your PCs) get into config mode at type this on every interface that connects directly to a PC (use the interface range command to speed things up if you want). Switch(config-if)#storm-control unicast level X where X is the percent of total interface bandwidth you specify as the threshold for cutting access to that port. Its measure every second, so if you have 100 meg port and you set it to 30, if the PC pushes more than 30 meg a sec in unicasts the switch kills the port, till the pc calms down, if its a 10 meg port the 30 then equals 3 meg, etc etc. You can also add a second line to control broadcasts by changing the word unicast to broadcast. If that had this in place, when the NIC went nuts, the switch would have killed the port, and no outage (I assume a lot here, but you get the point).

Re:That's all it takes by dbIII · 2007-08-15 12:45 · Score: 2, Insightful

Then they do not believe you until you can point at 20,000 people stranded at LAX. At this point you are fired since you knew about the problem, made some fuss, but did not make enough fuss to actually convice the suits and bean counters. It does help others that can then point at the problem of somebody else and get their suits and bean counters to pay attention. This is why infrastructure failure disasters go in cycles determined by the attention span and age of management - each new generation has to see a major failure before they listen while engineers have the benefit of written knowlege going back years .

damn tokens... by myowntrueself · 2007-08-15 12:47 · Score: 2, Funny

Man, when a Token Ring card went bad, it was hell on the network, nothing worked because the token would not get passed properly.

The worst thing is when a user decides to unplug the cable to move something or whatever. Then the token can fall out and you have to spend hours on your hands and knees with a magnifying glass trying to find the damn thing!

Its true! I saw it in a Dilbert cartoon!

--
In the free world the media isn't government run; the government is media run.

Re:That's all it takes by Kadin2048 · 2007-08-15 15:10 · Score: 2, Interesting

I pity you, your state and everyone else using Access.

Yeah, Access is a piece of shit. Unfortunately, it's a lot better than using Excel as a database, which is in many cases the alternative that I've witnessed.

There are also a lack of alternatives: you have FileMakerPro, which is neat (I like it) but not very appealing to some because it has a significant learning curve compared to Access and is also proprietary and expensive; aside from that you have OO.org's Base, which is still immature; and then you've got custom SQL+webforms, which is usually the right choice for non-trivial projects, but requires users to realize the scope of their project at the outset.

And as crummy as Access is, at least it gives you a path towards a separate frontend/backend. You don't get that when each employee is keeping their own critical information on a massive spreadsheet on their workstation's hard drive. And in more places than I'd like to think about, that's the way things work -- it's the dark side of giving every employee an actual computer as opposed to a dumb terminal.

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."

Re:That's all it takes by itwerx · 2007-08-15 17:04 · Score: 2, Insightful

That's very difficult to do, and your estimates of the costs will be called into question.

Right, but that's why IT doesn't provide the numbers. It just provides the scenario and it's the bean-counters (BC) that provide the numbers.

IT: "We have some really old hardware that's going to fail any day now..."

BC: "So what?"

IT: "Well, that's a good question, we know it's going to cost $Bazillion to fix so we need to find out if it's worth it or not. Here's what will happen when it dies - LAX completely shuts down. Would that hurt the bottom line enough to justify budgeting $Bazillion?"

BC: "OMFG!" [throws money]

Re:That's all it takes by dbIII · 2007-08-15 22:31 · Score: 4, Insightful

No - it implies a great deal of management has become a shallow oral tradition with all the problems that implies. They are not learning from anything before them and react with great surprise when a Rupert Murdoch or a Bill Gates that does know how to learn from the mistakes of others leaves them with effectively nothing but their underwear. It's like Cortez in South America - he used tactics of Roman Generals that he had read about against those that did not have a written history.

In contrast technical staff get to hear a lot about the Tacoma Narrows Bridge, Liberty Ships, Titanic or similar disasters from long ago as illustrations of how things can go wrong before they get let out of their first year of training. Some management would discard those lessons as things from the days of dinosaurs which is why we seem to have maintainance, infrastructure and contingincy plans reduced to nothing every decade and then be seen as important in the years immediately following a string of expensive or deadly disasters.

Slashdot Mirror

One Failed NIC Strands 20,000 At LAX

50 of 293 comments (clear)