One Failed NIC Strands 20,000 At LAX

← Back to Stories (view on slashdot.org)

One Failed NIC Strands 20,000 At LAX

Posted by kdawson on Wednesday August 15, 2007 @07:56AM from the comp-dot-risks dept.

The card in question experienced a partial failure that started about 12:50 p.m. Saturday, said Jennifer Connors, a chief in the office of field operations for the Customs and Border Protection agency. As data overloaded the system, a domino effect occurred with other computer network cards, eventually causing a total system failure. A spokeswoman for the airports agency said airport and customs officials are discussing how to handle a similar incident should it occur in the future.

236 of 293 comments (clear)

Min score:

Reason:

Sort:

That's all it takes by Marxist+Hacker+42 · 2007-08-15 04:55 · Score: 1, Interesting

Though I heard it was a switch. Same idea though- all it takes is one malfunctioning card flooding the LAN with bad packets to bring it all down.

--
SJW: a person who perceives an injustice, and while correcting it, commits a greater injustice.
1. Re:That's all it takes by Jeremiah+Cornelius · 2007-08-15 06:53 · Score: 2, Interesting
  
  Then that would lead me to think "hub", not switch. Or just a really shitty switch...
  
  --
  "Flyin' in just a sweet place,
  Never been known to fail..."
2. Re:That's all it takes by morgan_greywolf · 2007-08-15 07:49 · Score: 1
  
  Would you think that LAX is running anything that out-of-date or crappy? Maybe, maybe not, but it does make a good case to run solid, proven and reliable network infrastructure hardware from a major manufacturer.
  
  Oh yeah, and redundant-path network connections for critical portions of the network wouldn't hurt either.
  
  --
  My blog
3. Re:That's all it takes by Svet-Am · 2007-08-15 08:00 · Score: 5, Insightful
  
  Of course they're running old and outdated hardware. When thing work, particularly in a mission critical situation, you don't touch them! Even if the IT admins knew that computer was old and on the brink of dying, how are they supposed to convince the suits and beancounters of that? Non-technical people take the approach that since computers are inherently binary (work or no-work) that if the machine is up and running _right now_ then there is no problem and no sense on spending money to replace it.
  
  If the IT folks were clueless about this machine's age or condition, then the blame lies solely with them for not knowing what the hell they were doing. However, if it was the other folks who shot the IT folks down about upgrading then "welcome to the current state of business", unfortunately.
  
  --
  [move .sig! for great justice, take off every .sig!]
4. Re:That's all it takes by SatanicPuppy · 2007-08-15 08:07 · Score: 1
  
  Sure, if you're buying consumer grade switching hardware, and you have only one subnet, or all your subnets are weirdly bridged or whatever.
  
  For my money, this should never have happened from a problem with one machine. That's wholly unacceptable. My home network is robust enough to handle one bad machine without going down completely...Hell, I could lose a whole subnet and no one on the other subnet would notice a thing.
  
  If this system or switch or whatever is critical, there should have been a fail over. They should have been able to trace the problem, and they should have been able to isolate it or remove it entirely. If you really do have a card going nuts and spamming the network, that is laughably easy to trace, unless you're in the habit of assigning dynamic IPs to critical pieces of your network.
  
  --
  ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
5. Re:That's all it takes by COMON$ · 2007-08-15 08:09 · Score: 2, Insightful
  
  apparently you are not familliar with what a bad nic does to even the best of switches.
  
  --
  CS: It is all sink or swim...oh and did I mention there are sharks in that water?
6. Re:That's all it takes by KillerCow · 2007-08-15 08:15 · Score: 5, Interesting
  
  I am not a networks guy... but it's my understanding that a switch acts like a hub when it sees a TO: MAC address that it doesn't know what port it's on. They learn the switching structure of a network by watching the FROM fields on the datagrams. When the switch powers up, it behaves exactly like a hub and just watches/learns what MAC addresses are on which ports and builds a switching table. If it starts getting garbage packets, it will look at the TO field and say "I don't know what port this should go out on, so I have to send it on all of them." So garbage packets would overwhelm a network even if it was switched.
  
  It would take a router to stop this from happening. I don't think that there are many networks that use routers for internal partitioning. Even then, that entire network behind that router would be flooded.
7. Re:That's all it takes by Kadin2048 · 2007-08-15 08:23 · Score: 5, Interesting
  
  Would you think that LAX is running anything that out-of-date or crappy? I assume that they're running everything with spit, duct tape, wishful thinking, ancient custom software, near-fossilized hardware, and Excel spreadsheets ... just like pretty much everything else in the public sector.
  
  I've seen what's running some government agencies, and it's frightening.
  
  --
  "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
8. Re:That's all it takes by Shagg · 2007-08-15 09:07 · Score: 1
  
  Would you think that LAX is running anything that out-of-date or crappy? I'm surprised they're even using IP. Many airline systems are still running on X.25 and mainframes.
  
  So, no, running out-of-date hardware wouldn't surprise me at all.
  
  --
  Unix is user friendly, it's just selective about who its friends are.
9. Re:That's all it takes by i8myh8 · 2007-08-15 09:10 · Score: 1
  
  Uhh.. negative. Any managed switch worth a sh!t would've noticed the bad checksum and dropped the package. Any network admin worth a sh!t would've had it setup so he knew where the problem was via SOME reporting capability. If 1 NIC was bombing the network someone should've known about it.
10. Re:That's all it takes by SatireWolf · 2007-08-15 09:10 · Score: 1
  
  Any switch worth it's muster has an automatic ACK throttle which can be turned on to control network flooding in the event of NIC failure. Apparently the IT department at LAX hasn't heard of reading the manual and just plugged it in like a linksys router and expected the defaults to be 'good enough'.
11. Re:That's all it takes by EmperorKagato · 2007-08-15 09:13 · Score: 5, Insightful
  
  Even if the IT admins knew that computer was old and on the brink of dying, how are they supposed to convince the suits and beancounters of that?
  You show the suits and bean counters how much it costs the company if the system failed and time was spent recovering that system.
  
  --
  ----- You know you have ego issues when you register a domain in your name.
12. Re:That's all it takes by ThinkingInBinary · 2007-08-15 09:31 · Score: 3, Insightful
  
  Of course they're running old and outdated hardware. When thing work, particularly in a mission critical situation, you don't touch them! Even if the IT admins knew that computer was old and on the brink of dying, how are they supposed to convince the suits and beancounters of that? Non-technical people take the approach that since computers are inherently binary (work or no-work) that if the machine is up and running _right now_ then there is no problem and no sense on spending money to replace it.
  
  There's no reason you can't leave the almost-broken computer there and get a new one. You just build a backup system. Surely management understands that redundancy is good. Then, when the crappy one breaks, you can swap it out instantly. That way, you don't have to mess with things prematurely, but you're only down for hopefully a few minutes. (Of course, replacing it "intentionally", before it fails, is more reliable, but keeping a backup system is a viable alternative if nobody wants to touch the working system.)
  
  --
  ttuttle is a rankmaniac
13. Re:That's all it takes by Vengance+Daemon · 2007-08-15 09:41 · Score: 4, Informative
  
  Why are you assuming that this is an Ethernet network? As old as the equipment they are using is, it may be a Token Ring network - the symptoms that were described sound just like a "beaconing" token ring network.
14. Re:That's all it takes by Greventls · 2007-08-15 10:04 · Score: 3, Insightful
  
  The new system is usually extremely expensive. Why spend all that money on a new system when the old one works? I know programmers who refuse to update their code from VB3.
15. Re:That's all it takes by spun · 2007-08-15 10:20 · Score: 2, Funny
  
  I work in the public sector, and we don't use spit or duct tape much. We have custom software, it's not not ancient but it's written in COBOL anyway. The hardware is mostly new IBM blades and blade centers and we're phasing out the older stuff. We use Access databases, not Excel spreadsheets. But then, we're a state agency, not the Federal Government, so we may be doing it wrong.
  
  --
  - None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
16. Re:That's all it takes by HowIsMyDriving? · 2007-08-15 10:52 · Score: 1
  
  That is the first thing I thought, "I bet they are still using Token Ring." Man, when a Token Ring card went bad, it was hell on the network, nothing worked because the token would not get passed properly.
  
  --
  Welcome to the Entropy Bar, may I take your order?
17. Re:That's all it takes by quanticle · 2007-08-15 10:53 · Score: 5, Insightful
  
  You show the suits and bean counters how much it costs the company if the system failed and time was spent recovering that system.
  
  That's very difficult to do, and your estimates of the costs will be called into question. Its often impossible to predict how long it'll take to diagnose and fix a problem unless you've already diagnosed and fixed a similar problem.
  
  Making this kind of estimate also places you into a lose-lose position. If your estimate was high, then management sees you as "chicken little" and will be more likely to dismiss further concerns as more fearmongering. If your estimate was low, then the blame for the outage will cascade down onto you for not showing/convincing management that new equipment was needed.
  
  --
  We all know what to do, but we don't know how to get re-elected once we have done it
18. Re:That's all it takes by quanticle · 2007-08-15 10:55 · Score: 2, Interesting
  
  Surely management understands that redundancy is good.
  
  No. In managements' eyes, redundancy is bad. You're paying twice as much, but you're not getting any extra functionality in return.
  
  --
  We all know what to do, but we don't know how to get re-elected once we have done it
19. Re:That's all it takes by scottv67 · 2007-08-15 10:57 · Score: 1
  
  I don't think that there are many networks that use routers for internal partitioning.
  
  We have *tons* of routers that separate the various subnets (which map 1-to-1 to VLANs) on our internal network. How do you get from one "broadcast domain" to another? Via the default gateway on your own subnet. That is passing through a router. (It may not be a physical box called a 'router' but the packets are still being routed).
20. Re:That's all it takes by morgan_greywolf · 2007-08-15 11:00 · Score: 1
  
  But then, we're a state agency, not the Federal Government, so we may be doing it wrong.
  
  Why yes, yes you are.
  
  --
  My blog
21. Re:That's all it takes by leeward · 2007-08-15 11:09 · Score: 1
  
  Of course they're running old and outdated hardware. When thing work, particularly in a mission critical situation, you don't touch them! Even if the IT admins knew that computer was old and on the brink of dying, how are they supposed to convince the suits and beancounters of that?
  
  And why shouldn't the bean counters expect the old, outdated hardware to work. Quite a bit of the air traffic control system is hardware that is decades old, though it is slowly being replaced. You have to go to the bean counters and say, yes this equipment is not as old as all that other equipment that has been running for decades. But it is cheap C.. made junk, and cannot be expected to operate reliably for more than a few years.
22. Re:That's all it takes by twitter · 2007-08-15 11:22 · Score: 1
  
  I work in the public sector, and we don't use spit or duct tape much. ... We use Access databases, not Excel spreadsheets. But then, we're a state agency, not the Federal Government, so we may be doing it wrong.
  
  The beauty of closed source software is that it looks as good as the graphic design people can make a box, and the customer never knows about the bad blood, spit and duct tape holding it all together. Even when the software breaks, the beautiful box tells the user that it's all their fault.
  
  I pity you, your state and everyone else using Access.
  
  --
  Friends don't help friends install M$ junk.
23. Re:That's all it takes by gkhan1 · 2007-08-15 11:23 · Score: 1
  
  Forgive my ignorance, but has anyone actually used Token Ring since, like, the mid-90s? I thought it was a completely outdated technology, replaced with ethernet?
24. Re:That's all it takes by phoenixwade · 2007-08-15 11:31 · Score: 1
  
  apparently you are not familliar with what a bad nic does to even the best of switches. And you are surprised by this why?
  
  --
  A positive attitude may not solve all your problems, but it will annoy enough people to make it worth the effort.
25. Re:That's all it takes by phoenixwade · 2007-08-15 11:37 · Score: 1
  
  Forgive my ignorance, but has anyone actually used Token Ring since, like, the mid-90s? I thought it was a completely outdated technology, replaced with ethernet? Well, yes. As of December of last year, there were at least 4 airports (minor regionals, admittedly) that still had token ring networks handling some site traffic.
  
  --
  A positive attitude may not solve all your problems, but it will annoy enough people to make it worth the effort.
26. Re:That's all it takes by kayditty · 2007-08-15 11:45 · Score: 1
  
  An ethernet switch can also revert to broadcast mode (hub-like functionality) if flooded with ARP requests, overflowing its internal ARP cache.
  
  And, as far as checksumming goes, it's worth noting that packets are summed differently with Fragment Free and Cut Through switching technologies, which, it's atleast somewhat possible, they could be using.
  
  Perhaps a better page here: http://www.intel.com/support/express/switches/sb/c s-014410.htm
27. Re:That's all it takes by desertfool · 2007-08-15 11:56 · Score: 1
  
  I hate to say "mod parent up" but I see this every day. We in IT have to accurate, even when we are dealing with "what if" scenarios. No matter what happens, IT loses.
  
  --
  Just a dude. Stuck in IT.
28. Re:That's all it takes by gkhan1 · 2007-08-15 12:01 · Score: 1
  
  Huh. Well, there you go. Learn something every day.
29. Re:That's all it takes by Macthorpe · 2007-08-15 12:14 · Score: 1
  
  The pity the kind of closed-minded idiot who would willingly throw aside something that works because of his own personal ideology.
  
  --
  "It does not do to leave a live dragon out of your calculations, if you live near him." - Tolkien
30. Re:That's all it takes by Chabil+Ha' · 2007-08-15 12:28 · Score: 1
  
  Surely management understands that redundancy is good. In my experience they only think its good when the incident has already happened and they've handed you your head.
  
  --
  We're all hypocrites. We all have hidden parts, it's the contrast between them that make us more a hypocrite than others
31. Re:That's all it takes by dbIII · 2007-08-15 12:45 · Score: 2, Insightful
  
  Then they do not believe you until you can point at 20,000 people stranded at LAX. At this point you are fired since you knew about the problem, made some fuss, but did not make enough fuss to actually convice the suits and bean counters. It does help others that can then point at the problem of somebody else and get their suits and bean counters to pay attention. This is why infrastructure failure disasters go in cycles determined by the attention span and age of management - each new generation has to see a major failure before they listen while engineers have the benefit of written knowlege going back years .
32. Re:That's all it takes by karnal · 2007-08-15 13:03 · Score: 1
  
  I'm kinda curious where you're getting the upside down "i" character from....
  
  --
  Karnal
33. Re:That's all it takes by karnal · 2007-08-15 13:07 · Score: 1
  
  Quick addition to that - they usually don't like it when you tell them "I told you so" as you're on the way out.....
  
  --
  Karnal
34. Re:That's all it takes by kd4tgc · 2007-08-15 13:10 · Score: 1
  
  remember that this is no ordinary IT dept here. there government employees. if it's broke we'll fix it tomorrow. I know from personal experience that only unimportant machines get replaced first.
35. Re:That's all it takes by Degrees · 2007-08-15 13:22 · Score: 1
  
  We shut down our Token-Ring MAU only last year - so it wouldn't surprise me to find that other people still had it. On the other hand, I would expect LAX (and especially DHS) to be on modern equipment.
  
  --
  "The most sensible request of government we make is not, "Do something!" But "Quit it!"
36. Re:That's all it takes by GoodOmens · 2007-08-15 13:27 · Score: 1
  
  It's more expensive to upgrade then just replacing the hardware. You have to factor in staff/consulting costs as well. A project I am working on currently costs around 60k/blade for a upgrade to new hardware. This costs factors in redundant attached SAN's, staff costs for tracking the plan, staff costs for migrating the hardware etc .... Hard to swallow considering they are going to do around 1500 ....
37. Re:That's all it takes by Allnighterking · 2007-08-15 14:23 · Score: 1
  
  This will perhaps sound foolish, but I've recently learned the secret here. Show them a pie chart. They Grok pie chart and if the slice can be made to appear large enough they start to listen real close.
  
  --
  I'm sorry, I'm to tired to be witty at the moment so this message will have to do.
38. Re:That's all it takes by Kadin2048 · 2007-08-15 15:10 · Score: 2, Interesting
  
  I pity you, your state and everyone else using Access.
  
  Yeah, Access is a piece of shit. Unfortunately, it's a lot better than using Excel as a database, which is in many cases the alternative that I've witnessed.
  
  There are also a lack of alternatives: you have FileMakerPro, which is neat (I like it) but not very appealing to some because it has a significant learning curve compared to Access and is also proprietary and expensive; aside from that you have OO.org's Base, which is still immature; and then you've got custom SQL+webforms, which is usually the right choice for non-trivial projects, but requires users to realize the scope of their project at the outset.
  
  And as crummy as Access is, at least it gives you a path towards a separate frontend/backend. You don't get that when each employee is keeping their own critical information on a massive spreadsheet on their workstation's hard drive. And in more places than I'd like to think about, that's the way things work -- it's the dark side of giving every employee an actual computer as opposed to a dumb terminal.
  
  --
  "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
39. Re:That's all it takes by dodongo · 2007-08-15 16:00 · Score: 1
  
  Would you think that LAX is running anything that out-of-date or crappy?
  
  What, you mean like the air traffic control system? Yes. Yes, I would.
40. Re:That's all it takes by mforbes · 2007-08-15 16:00 · Score: 1
  
  Yes, but the space shuttles still rely on five IBM AP-101 (about as powerful as an 80286 processor) machines for their flight guidance. However, the code has been thoroughly vetted on those machines, and the cost of switching to more modern processors and re-vetting the code, along with the risk that hey, it didn't work right! is prohibitive.
  
  Sometimes the legacy solutions are best left alone.
  
  --
  Allegedly real newspaper headline from 1998:
  Man Struck by Lightning Faces Battery Charge
41. Re:That's all it takes by boaworm · 2007-08-15 16:35 · Score: 1
  
  No, but it's funny as hell to do so anyhow :D
  
  --
  Probable impossibilities are to be preferred to improbable possibilities.
  Aristotele
42. Re:That's all it takes by kcelery · 2007-08-15 16:52 · Score: 1
  
  I would think the IT folks were not clueless, but helpless. Especially when the upper management is run by accountants who do not know the difference between 99% uptime and the 99.99% uptime. They think the difference is insignificant and the saving is immense.
43. Re:That's all it takes by itwerx · 2007-08-15 17:04 · Score: 2, Insightful
  
  That's very difficult to do, and your estimates of the costs will be called into question.
  
  Right, but that's why IT doesn't provide the numbers. It just provides the scenario and it's the bean-counters (BC) that provide the numbers.
  
  IT: "We have some really old hardware that's going to fail any day now..."
  
  BC: "So what?"
  
  IT: "Well, that's a good question, we know it's going to cost $Bazillion to fix so we need to find out if it's worth it or not. Here's what will happen when it dies - LAX completely shuts down. Would that hurt the bottom line enough to justify budgeting $Bazillion?"
  
  BC: "OMFG!" [throws money]
44. Re:That's all it takes by Anonymous Coward · 2007-08-15 18:17 · Score: 1, Insightful
  
  Making this kind of estimate also places you into a lose-lose position. If your estimate was high, then management sees you as "chicken little" and will be more likely to dismiss further concerns as more fearmongering. If your estimate was low, then the blame for the outage will cascade down onto you for not showing/convincing management that new equipment was needed.
  
  Rightfully so - it would be your mistake. Don't ever give a single number unless it's solid. Give a confidence interval (even a huge, rough, unscientific one) instead.
45. Re:That's all it takes by dynamo · 2007-08-15 19:42 · Score: 1
  
  As simple and stupid as that sounds, I totally agree.
  Everyone wants a bigger slice.
46. Re:That's all it takes by Anonymous Coward · 2007-08-15 20:18 · Score: 1, Interesting
  
  If the system was based on decent Layer 3 switches for all ports, configured to perform TCP/IP switching across all segments, then only the switch itself would be able to take down the network.
  
  That being said, the cost of wiring every port in an international airport for Layer 3 and ONLY Layer 3 switching would cost about $300 per port and that's probably for like 5000 ports. So figure a minimum of $1.5 million for network switches.
  
  In all other mission critical systems I've ever worked on (specifically in telecom and power) there are timelines for replacing equipment on schedules to avoid equipment reaching end of life. At the very least the equipment would be sent back to the manufacturer for refurbishing which would even include reflowing all the solder, often costing more than new equipment. It would add about $1.5 million + $3-$5 million installation+configuration every 2-3 years to the IT budget of the airport.
  
  That really is chump change to an international airport, but let's be serious, if we look at standards based on this is chump change or that is chump change, pretty soon, you have some freak with a post-it note fetish wallpapering a 400 sq. meter area with heart and pony shaped post-it notes because after all, what's $1 million on the grander scale, the airport won't even miss it. In something closer to reality, it means that people will be less conservative about upgrading from a 17" LCD to a 19" LCD just because it's chump change.
  
  Given the uncountable number of critical systems within an airport, security, fire department, fuel containment, tire refurbishing, wing repair, (yes, I'm making this crap up) etc... network switches just don't sound that important to the people regularly making purchasing decisions on a daily basis that could actually decide the lives of 200 or more passengers on each plane to take off or land.
  
  It puts things in perspective sometimes to recognize that an outage that forces and airport to reroute traffic safely to other airports and delay flights isn't as important as decisions that make sure fuel tanks and hoses don't explode causing the death of 500 or more people. I'm sure however that the airport will now dedicate a little more money to networking to try and keep this from happening again.
47. Re:That's all it takes by RockDoctor · 2007-08-15 21:11 · Score: 1
  
  This is why infrastructure failure disasters go in cycles determined by the attention span and age of management - each new generation has to see a major failure before they listen while engineers have the benefit of written knowlege going back years .
  
  That implies that the engineering department can still read records from years ago, while management-type people can't.
  I was going to make a bitchy comment showing management types to be the morons we all know that they are. Peter Principle and all that. Then I thought about the turnover rate of obfuscated paradigm-changing bullshit that one sees coming out of management types (and going into management types, judging from the drivel in airport book shops) ... and one wonders if they really can't understand the records from the previous management cycle. Plus, of course, this generation of management know that the previous generation were knuckle-dragging retards unfit to supervise the pencil-sharpening rota in a word-processing office (if they weren't such retards, they'd still be here. Right?)
  I think this might be the kernel of an excuse for some of the stupidities of management. Not a very good kernel, and screaming out for a more scientific approach to "management" as a subject to try to get the foundations right. But who really cares?
  
  --
  Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
48. Re:That's all it takes by RockDoctor · 2007-08-15 21:18 · Score: 1
  
  Surely management understands that redundancy is good.
  
  No. Management understands that redundancy of equipment is expensive, and that failures of equipment normally leads to redundancies in the technical staff who failed to work the correct miracles on the day of the failure.
  
  That sounds like a recommendation for deliberately setting up occasional problems to be solved, just to remind the management that there is a need to listen to what the technical staff say. Dropping network performance to 30% for a week, and gradually restoring it by cutting down on the PHB's porn-downloading bandwidth should work adequately. But you'll need to write the log files carefully before the event.
  
  --
  Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
49. Re:That's all it takes by tomofumi · 2007-08-15 21:31 · Score: 1
  
  Token Ring is still very useful in some special cases, like servers working in a very busy network but you need a guaranteed time slot to transfer data, it maybe useful in case like airport to transfer critical data/commands which can't be delayed due to network collision. But I suppose switches nowadays can do the same thing?
50. Re:That's all it takes by dbIII · 2007-08-15 22:31 · Score: 4, Insightful
  
  No - it implies a great deal of management has become a shallow oral tradition with all the problems that implies. They are not learning from anything before them and react with great surprise when a Rupert Murdoch or a Bill Gates that does know how to learn from the mistakes of others leaves them with effectively nothing but their underwear. It's like Cortez in South America - he used tactics of Roman Generals that he had read about against those that did not have a written history.
  In contrast technical staff get to hear a lot about the Tacoma Narrows Bridge, Liberty Ships, Titanic or similar disasters from long ago as illustrations of how things can go wrong before they get let out of their first year of training. Some management would discard those lessons as things from the days of dinosaurs which is why we seem to have maintainance, infrastructure and contingincy plans reduced to nothing every decade and then be seen as important in the years immediately following a string of expensive or deadly disasters.
51. Re:That's all it takes by tinkerghost · 2007-08-16 00:58 · Score: 1
  
  Surely management understands that redundancy is good.
  
  I'm on my 3rd switch in 18 months because the closet it's kept in doesn't vent properly. When the switch goes down, half the phones go down - I can't get the switch moved, and I can't get the closet ventilated. The management solution was to go with cheaper switches so they can write them off without paying so much. I expect to be making a run to one of the stores sometime around the end of next month trying to find a 24port switch in stock.
  Management doesn't understand redundancy, nor do they understand preventative maintenance when it comes to electronics. I'm down to scavenging carcasses for HDs from systems with burned (MB || PS) because management won't let me buy new ones to have on hand when one fails. Amazingly, 5yr old HDs scavenged from already failed systems don't seem to have an exceedingly long lifespan.
52. Re:That's all it takes by spun · 2007-08-16 02:19 · Score: 1
  
  Oh don't pity me, I don't have to support it. We're looking into open source alternatives, but we've got around 2,000 Access databases statewide. The really sad thing is, people are pulling copies of our main dataset from Sybase and working with those. Ugh. I want to slap the consultants who designed that thing.
  
  We're going to be using PostgreSQL in our next major rewrite and we simply won't be allowing anyone to dump the entire dataset. You'd be surprised how much support there is for open source here, even among management. Most of our back end servers run Linux. That's why they brought me on here, and that is why, thankfully, I don't have anything to do with any Microsoft products.
  
  --
  - None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
53. Re:That's all it takes by spun · 2007-08-16 02:23 · Score: 1
  
  Heh. But to be honest, it's no worse than many private sector setups I've seen. I've inherited much bigger messes than this before. The one thing that really upset me, seeing as how it affected me directly, was the near criminal disregard for proper fiber channel cable routing to our SAN. We experienced two kinked cables before I got management to schedule some downtime to fix it.
  
  --
  - None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
54. Re:That's all it takes by arivanov · 2007-08-16 03:41 · Score: 1
  
  Sorry you are being overlyoptimistic.
  
  Both management and engineering types are currently taught to view systems from the point of view of how they work, not how they fail. Computer and network engineers nowdays no longer study optimal control, advanced parts of probability theory and other mathematical principles which determine system stability. As a result they are along with the management in the same boat. They think positively and while they may have some remote recollections of the Tacoma Narrows they do not quite understand why it collapsed. At least this is the case in the computer science and networking industry.
  
  --
  Baker's Law: Misery no longer loves company. Nowadays it insists on it
  http://www.sigsegv.cx/
55. Re:That's all it takes by uncoveror · 2007-08-16 03:45 · Score: 1
  
  Not only that, but a non-computerized backup way of doing things needs to be available when the high-tech whizbangs go haywire. How did customs operate before computers? They could do it that was again until the system was back up. Tools like paper, pens and clipboards are probably all they would need.
  
  --
  The Uncoveror: It's the real news.
56. Re:That's all it takes by RockDoctor · 2007-08-16 03:53 · Score: 1
  
  They think positively and while they may have some remote recollections of the Tacoma Narrows they do not quite understand why it collapsed. At least this is the case in the computer science and networking industry.
  
  I'd hope that my friend who recently completed a network engineering degree would have a damned sight better idea of why the Tacoma bridge collapsed than that!
  Then again, his networking degree was taken in the engineering department, not the computing department.
  
  --
  Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
57. Re:That's all it takes by HedRat · 2007-08-16 05:00 · Score: 1
  
  I haven't noticed that. Perhaps it's a problem on your netwo
58. Re:That's all it takes by karnal · 2007-08-16 05:02 · Score: 1
  
  Awesome.
  
  --
  Karnal
59. Re:That's all it takes by fredklein · 2007-08-16 05:09 · Score: 1
  
  Presentation to Management:
  
  Cost of replacing switches 3 times: $$$$
  COst of lost time when phones go down: $$
  cost of lost time while IT replaces switch instead of doing their real job: $$$
  
  Cost of cutting a hole in the closet door and mounting a fan: $$
  
  Savings that could conceivably go to the managers bonus: $$$$
  
  Any questions?
60. Re:That's all it takes by afidel · 2007-08-16 06:23 · Score: 1
  
  Yes but the damage should have been confined to one switch/blade if they were using decent equipment. I know that most modern Cisco's will defensively shut down a port if they see too much garbage coming across. I had a CEO with ADD place a cable from one port into another port in a training room and it only took down the one switch because the adjacent switches shut off their uplinks to the switch with the switching loop.
  
  --
  There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
61. Re:That's all it takes by WuphonsReach · 2007-08-16 06:45 · Score: 1
  
  Exactly on the mark.
  
  MS Access fills, very successfully, an important niche. And OpenOffice still has a ways to go in order to fill its shoes. Need to make a backup of a table? Ctrl-Paste! Backup the entire MDB? Send to compressed file! No need to keep track of 20+ files that make up the database (and the queries, the forms, the reports, the code), although it would be nice for version control purposes.
  
  MS Access is one of the few reasons that we still use MS Windows. There's nothing quite like it yet that can replace it on non-Windows (and Windows) platforms.
  
  --
  Wolde you bothe eate your cake, and have your cake?
62. Re:That's all it takes by quanticle · 2007-08-16 07:34 · Score: 1
  
  The problem is that its almost never that cut-and-dried. You don't know (e.g. 100% probability) that something catastrophic will happen. You just think that something might happen if the conditions are sufficiently bad.
  
  The real scenario would go as follows:
  IT: We have some really old hardware that's about to die.
  
  BC: So what?
  
  IT: Well, probably nothing will happen, but there's a 25% chance that it'll cause a cascade failure that'll bring down LAX.
  
  BC: Only a 25% chance? You're asking us to spend $LOTS for a problem that might never occur? We'll fund it next year.
  
  --
  We all know what to do, but we don't know how to get re-elected once we have done it
63. Re:That's all it takes by zerkon · 2007-08-16 07:38 · Score: 1
  
  My High School (I graduated in 02) was using Token Ring until I was in about 10th grade (2000). So yes. And I imagine that a government entity, which in my experience so far (I work for them) changes INCREDIBLY slowly, it wouldn't be too far out of the realm of possibility for that to be the case...
  
  --
  The Answer
64. Re:That's all it takes by RogerWilco · 2007-08-16 21:29 · Score: 1
  
  I've seen it running powerplants, and everyone afraid of modifying anything, because the original coders were no longer there, and nobody realy understood what it did.
  
  Convincing Management that this is a risk is however beyond mortal men for some reason.
  
  --
  RogerWilco the Adventurous Janitor
65. Re:That's all it takes by Doctor-Optimal · 2007-08-17 07:36 · Score: 1
  
  In the old days there was a lot less air travel, and therefore less demand for %Airport System% (customs, baggage claim, whatever). Going back to the old ways only works when the "old way" and the "new way" are dealing with the same (or similar) load. Also people trained on the "new way" might not be trained in the "old way". (And why should they be? Training time, like everything else, is available in limited quantities.)
  
  --
  New punctuation update "~" (no quotes) at the end of a line to indicate sarcasm. ~
66. Re:That's all it takes by itwerx · 2007-08-19 17:22 · Score: 1
  
  I think most readers will understand that was not meant to be a real world example. Nonetheless, to take your version at face value, one is still putting the onus of the decision on the bean-counters. You don't provide the 25% number with that phrasing and you don't say "probably nothing". Both of those result in IT taking on some or all of the burden of the decision.
  Instead you say there's a 100% chance of failure, which is true. Then you can say there's ~25% chance that the failure will result in LAX shutting down but you need [some number less than $Bazillion] for proper risk analysis. That way if the problem isn't remedied it's NOT the the IT dept's head on the block.
  The fact is that IT people are regularly expected to perform risk and cost/benefit analysis which they are simply not qualified to do. Not to say that IT people can't do either one at all, but IT people are famous for not setting boundaries and this is one area where IT regularly plays cowboy and shoots itself in the foot.
Whiskey Tango Foxtrot by SatanicPuppy · 2007-08-15 07:58 · Score: 5, Insightful

According to the effing article, it wasn't even a server, but a goddamn desktop. How in the holy hell does a desktop take down the whole system? I can't even conceive of a situation where that could be the case on anything other than a network designed by chimps, especially through a hardware failure...A compromised system might be able to do it, but a system just going dark?

For that to have had any effect at all, that system must have been the lynchpin for a critical piece of the network...probably some Homeland security abortion tacked on to the network, or some such crap...This is like the time I traced a network meltdown to a 4 port hub (not a switch, and unmanaged hub) that was plugged into (not a joke) a T-3 concentrator on one port, and and three subnets of around 200 computers each on the other 3 ports. Every single one of the outbound cables from the $15.00 hub terminated in a piece of networking infrastructure costing not less than $10,000 dollars.

This is like that. Single point of failure in the worst possible way. Gross incompetence, shortsightedness, and general disregard for things like "uptime"; pretty much what we've come to expect from the airline industry these days. If I'm not flying myself, I'm going to be driving, sailing, or riding a goddamn bicycle before I fly commercial.

--
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
1. Re:Whiskey Tango Foxtrot by Jeremiah+Cornelius · 2007-08-15 08:01 · Score: 3, Interesting
  
  Well.
  
  Token ring sure used to fail like this! 1 bad station sending 10,000 ring-purge messages a second? Still, it was a truck. Files under 1Mb could be transferred, and this was TR/4, not 16!
  
  --
  "Flyin' in just a sweet place,
  Never been known to fail..."
2. Re:Whiskey Tango Foxtrot by spitek · 2007-08-15 08:02 · Score: 1
  
  chimps make flat networks.. with hubs.. what are switches and routers? I agree run into this problem several times in my carrer but for real. Bad network design. But hey Ive also worked in wireling closests in airports most of which these days are community run by the airport authority and leased by the airlines or other tennets of the airports. Used to work for an airline. NOT suprised one bit.
3. Re:Whiskey Tango Foxtrot by mhall119 · 2007-08-15 08:04 · Score: 2, Interesting
  
  A compromised system might be able to do it, but a system just going dark? The article says it was a partial failure, so I'm guessing the NIC didn't "go dark", instead it started flooding the network with bad packets.
  
  --
  http://www.mhall119.com
4. Re:Whiskey Tango Foxtrot by MightyMartian · 2007-08-15 08:07 · Score: 5, Insightful
  
  If the NIC starts broadcasting like nuts, it will overwhelm everything on the segment. If you have a flat network topology, then kla-boom, everything goes down the shits. A semi-decent switch ought to deal with a broadcast storm. The best way to deal with it is to split your network up, thus rendering the scope of such an incident significantly smaller.
  
  --
  The world's burning. Moped Jesus spotted on I50. Details at 11.
5. Re:Whiskey Tango Foxtrot by Anonymous Coward · 2007-08-15 08:09 · Score: 1, Interesting
  
  I'm guessing the NIC didn't "go dark", instead it started flooding the network with bad packets. Yeah, and any decent switch (and some not-so-decent) would detect this and shut the port down.
  
  Hell, I have a 7 year old dlink 8-port at home that can do this!
6. Re:Whiskey Tango Foxtrot by Billosaur · 2007-08-15 08:11 · Score: 2, Interesting
  
  And beyond that... how come there is no redundancy? After 9/11, every IT organization on the planet began making sure there was some form or fail-over to a backup system or disaster recovery site to ensure that critical systems could not go down as the result of something similar or some other large-scale disaster. Not only was this system cobbled together apparently, there was no regard for the possibility of it failing for any reason.
  
  --
  GetOuttaMySpace - The Anti-Social Network
7. Re:Whiskey Tango Foxtrot by spitek · 2007-08-15 08:11 · Score: 1
  
  Great point about token ring. Was heavly used in these types of situations as well. Don't know about LAX but a lot of the airports have upgraded since then. Anybody get to use Switched 100Mb token ring?? To bad that didn't make it. Could have been cool. The advantages of both switching speed and the preformence under load that token ring has.
8. Re:Whiskey Tango Foxtrot by Anonymous Coward · 2007-08-15 08:14 · Score: 1, Funny
  
  At a previous employer, we kept having a Cisco Switch crash and become unresponsive, making about 1/3 of the people connected to it in the office lose connection.
  
  After about 2 to 3 hours of investigation and it going down twice after we'd bring it back up, we soon found the problem. F**king intern, who was worthless anyways and about to get fired for other stupid mishaps had a Netgear switch he was using for setting up new desktops thought it'd be cute to plug one port of the switch to another port on it, that was creating the havoc and bringing down the switch and part of the network for most.
  
  I think he got fired two weeks later. I guess he had it coming since several times for lunch he would go home and take a nap, coming back after 3 or 4 hours cause he had overslept during such hour lunch break.
9. Re:Whiskey Tango Foxtrot by morgan_greywolf · 2007-08-15 08:15 · Score: 1
  
  Ugh. A flat network may be fine for something small, but for something as big and complex as an airport network, especially one at an airport the size of LAX? Unthinkable. Do these people hire idiots with no training or experience or what?
  
  --
  My blog
10. Re:Whiskey Tango Foxtrot by sigipickl · 2007-08-15 08:16 · Score: 2, Informative
  
  This totally sounds like a token ring problem.... Either network flooding or dropped packets (tokens). These issues used to be a bear to track down- going from machine to machine in serial from the MAU...
  
  Ethernet and switching has made me fat- I never have to leave my desk to troubleshoot.
  
  --
  Never trust anyone who takes pride in being called a 'geek'....
11. Re:Whiskey Tango Foxtrot by SatanicPuppy · 2007-08-15 08:20 · Score: 1
  
  Yup. I've never really seen a situation where you'd have more than a dozen or so computers on a crappy layer 1 switch. Higher quality hardware would throttle this stuff down to the very most local layer, unless you're specifically multicasting across the whole network, which is a security horror story.
  
  --
  ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
12. Re:Whiskey Tango Foxtrot by gEvil+(beta) · 2007-08-15 08:22 · Score: 1
  
  I told the boss we should get a proper network connection. But noooooo, he insisted that getting a consumer-level DSL connection and using Windows Internet Connection Sharing was the way to go...
  
  --
  This guy's the limit!
13. Re:Whiskey Tango Foxtrot by jafiwam · 2007-08-15 08:28 · Score: 1
  
  According to the effing article, it wasn't even a server, but a goddamn desktop. How in the holy hell does a desktop take down the whole system? I can't even conceive of a situation where that could be the case on anything other than a network designed by chimps, especially through a hardware failure...A compromised system might be able to do it, but a system just going dark?
  Los Angeles World Airports is a unique system of four airports owned and operated by the City of Los Angeles.
  Any further questions?
  Probably the lowest bidder union labor designing and setting it up. Shoulda called IBM.
14. Re:Whiskey Tango Foxtrot by spitek · 2007-08-15 08:36 · Score: 1
  
  Well, yes, yes and what. Now havent been in the airline business for a few years but there was one network I ran into that had over 2000 seats sitting on a flat network. Now one would hope that LAX in general is not like that. I do not know if how LAX's network is setup. However the government agenecy does have a higher likleyhood to be running in the dark ages. A flacky network card sending out bad packets I have seen. I have seen it cause all sorts of problems. But for it to do the damage it did. Never should have happened. On top of that the response time to find out what the problem was. Sounds like poor network design and yes chimps and idiots. Or it is always possable that there is talent just no money. Sorry for them if that is the case.
15. Re:Whiskey Tango Foxtrot by SatanicPuppy · 2007-08-15 08:38 · Score: 1
  
  Oh, no doubt. It's clearly what happened...This kind of thing is almost impossible with modern switching hardware, and not even the really expensive stuff, but the reasonable consumer stuff as well.
  
  Fricking stupid. People think it'll never come back to bite them, and it always does.
  
  --
  ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
16. Re:Whiskey Tango Foxtrot by archen · 2007-08-15 08:40 · Score: 1
  
  If you have two different network segments which are supposed to be routed and a connection between them appears, you end up with a scenario where packets may go to the switch or may hop the rogue connection. I had this happen at a new facility our company moved to. I had a few single switches divided in half with VLANS and somewhere someone ended up with a crossed connection. The result was strange packet storms that would mysteriously cause network interfaces to shut down completely on machines.
  
  It actually took me a while to figure it out until I saw our internal (FreeBSD) network firewall was reporting traffic coming in on the wrong interface. After that it was a zoo finding the wire...
17. Re:Whiskey Tango Foxtrot by charlesnw · 2007-08-15 08:40 · Score: 1
  
  Um. The systems that control aircraft are completely seperate from systems used to manage passengers.
  
  --
  Charles Wyble System Engineer
18. Re:Whiskey Tango Foxtrot by kylemonger · 2007-08-15 08:41 · Score: 2, Insightful
  
  Do these people hire idiots with no training or experience or what?
  I think just hiring idiots would be enough. No need to train them.
19. Re:Whiskey Tango Foxtrot by daveywest · 2007-08-15 08:42 · Score: 1
  
  I worked on a network where server kept dropping connections and users were reporting high latentcy. We eventually had to use a processes of elimination to isolate the bad connection before we found the bad line in the server room. We yanked it out and waited for the phone call from someone who couldn't get their email. Turns out they decided to turn off DHCP and self assign an IP address: the same one as the server.
20. Re:Whiskey Tango Foxtrot by Joe+The+Dragon · 2007-08-15 08:43 · Score: 1
  
  that is what the M$ tests talk about in there network setups.
21. Re:Whiskey Tango Foxtrot by Kadin2048 · 2007-08-15 08:45 · Score: 1
  
  Do these people hire idiots with no training or experience or what? Probably they do to some extent, but if it's like other places I've worked, they probably hire people who have a clue, but then tell them to do little bits and pieces, and never give them enough resources to actually do the job right.
  
  It's a lot of "we'll pay you to come out and install this." They don't want to hear 'well, you should really re-think the architecture of your whole network' as a response. They just want the new piece grafted on, and if you don't do the job, they'll just find somebody who will.
  
  That's how these horrible abortions of big systems / networks happen. They usually don't start off like that. They just grow and evolve without much in the way of a central plan until they finally keel over and die. Nobody wants to spend the time, money, or downtime to tear things down and rebuild them until they actually fail. So they just grow out of control.
  
  They probably had a flat network (or a switched one without any subnets) because that was the only way to keep everything working as it grew; as different contractors came in and tacked on this or that, they just added it on to whatever was there.
  
  --
  "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
22. Re:Whiskey Tango Foxtrot by dave562 · 2007-08-15 08:47 · Score: 2, Insightful
  
  They concentrated all of the redundancy dollars into layer B of the OSI model... the bureaucracy. There wasn't anything left for the lower layers.
23. Re:Whiskey Tango Foxtrot by SatanicPuppy · 2007-08-15 08:50 · Score: 1
  
  So? Does it give you confidence in the rest of their equipment when one misbehaving computer can bring down their entire network for nine hours?
  
  Bunch of monkeys. The reason I don't fly commercial anymore has nothing to do with the planes. It has everything to do with the airports.
  
  --
  ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
24. Re:Whiskey Tango Foxtrot by charlesnw · 2007-08-15 08:52 · Score: 1
  
  After 9/11, every IT organization on the planet began making sure there was some form or fail-over to a backup system or disaster recovery site
  
  Um no? A number of large organizations do not have a disaster recovery site. Just the other day Cisco.com was down for a few hours.
  
  --
  Charles Wyble System Engineer
25. Re:Whiskey Tango Foxtrot by cez · 2007-08-15 08:54 · Score: 1
  
  By linking the Netgear on two ports to the Cisco (regardless of whether a cross-over is used) you introduce a Loop into the network... a big No No without proper VLANing or routing inbetween as the MAC forwarding table of the cisco will pick up both ports. Using Spanning Tree Protocol is one line of defense for this, but not an end all cure. Depending on the router or type of trunking that the cisco was uplinked too, the whole segment might be automatically taken offline and disabled to prevent further damage, so usually the whole switch stack drops from the network.
  
  --
  Walk with Music;
26. Re:Whiskey Tango Foxtrot by cciechad · 2007-08-15 09:03 · Score: 1
  
  Not really most likely it was a STP issue in the Netgear case(Damn unmanaged switches with no/bad STP support). Basically what you end up with is a loop which can cause switch CPU to go to 100% even on 6500's which makes tracing it interesting. Fastest way is to start poping cards until it becomes responsive then isolate to a port. I don't like putting portfast on user ports anymore because of this.
  
  --
  https://www.fsf.org/associate/support_freedom
27. Re:Whiskey Tango Foxtrot by Lehk228 · 2007-08-15 09:05 · Score: 1
  
  i think the problem occurs when the switches "learn" a circular route and so packets get stuck in that circular route at whatever speed the network runs at.
  
  --
  Snowden and Manning are heroes.
28. Re:Whiskey Tango Foxtrot by andy314159pi · 2007-08-15 09:16 · Score: 1
  
  I think he got fired two weeks later. I guess he had it coming since several times for lunch he would go home and take a nap, coming back after 3 or 4 hours cause he had overslept during such hour lunch break.
  Yeah I totally take the lunchtime siesta at my office, and that limits my oversleeping to, at most, two hours. What a complete office noob. By the way, how do you fire an unpaid intern?
29. Re:Whiskey Tango Foxtrot by Kadin2048 · 2007-08-15 09:16 · Score: 1
  
  Idiot intern nonwithstanding, should that really have been possible? I thought that any decent router would see that a loop had occurred and shut down the port connected to it, rather than forwarding all the broadcast storm packets.
  
  After all, preventing layer 2 loops is what Spanning Tree is all about, and I thought Cisco had some similar system for figuring out if a link was unidirectional (if you're sending packets down to something and not getting anything back, it can shut it down, to keep it from just sending out lots of bogus requests).
  
  I doubt that crappy consumer switches do STP, but the upstream Cisco one should have ... shouldn't it?
  
  --
  "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
30. Re:Whiskey Tango Foxtrot by UbuntuDupe · 2007-08-15 09:35 · Score: 1
  
  How in the holy hell does a desktop take down the whole system? I can't even conceive of a situation where that could be the case on anything other than a network designed by chimps
  
  Heh, I think you're starting to get the sensation I had when one tiny error in GRUB locked me out of my computer entirely, to the point where even having the Ubuntu Install CD couldn't gain me any access to any OS whatsoever.
  
  Geez, what kind of chimp would allow such a damaging failure to occur along such a vital path, right?
  
  --
  Apology to Ubuntu forum.
31. Re:Whiskey Tango Foxtrot by badasscat · 2007-08-15 09:45 · Score: 1
  
  So? Does it give you confidence in the rest of their equipment when one misbehaving computer can bring down their entire network for nine hours?
  
  The point is ATC is not controlled by "their equipment". The airport authority and the FAA are two wholly separate agencies.
  
  It would be like worrying about the strength of the US Army based on the fact that a prisoner escaped from a police car owned by the NYPD. Sure, both agencies involve guys carrying guns, but they otherwise have nothing to do with each other.
  
  The FAA's got its own problems with its computer systems, but the two systems are 100% separate. Funded, designed and built separately and through different processes, run by different people.
32. Re:Whiskey Tango Foxtrot by SatanicPuppy · 2007-08-15 09:46 · Score: 1
  
  No, not really. You've got to expect to lose machines; failure happens. Could have been a motherboard or a power supply. I'd still expect you to be able to boot from CD though. You try knoppix? You should be able to boot to knoppix, then mount the /boot partition and have your way with grub.
  
  The thing is, a network topology is wildly different from a computer. It should be designed for parts of it to drop off, and parts to go berserk...These things happen all the time. It should be designed with a minimum of bottlenecks, and it should be extremely easy to identify and isolate problems.
  
  --
  ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
33. Re:Whiskey Tango Foxtrot by DA-MAN · 2007-08-15 09:48 · Score: 1
  
  Um no? A number of large organizations do not have a disaster recovery site. Just the other day Cisco.com was down for a few hours.
  
  Business continuity style disaster recovery doesn't really take public facing websites into account as high priority. Usually it's the payroll, accounts receivable and things needed to keep a business moving forward in case of a disaster.
  
  Letting customers visit your public website is probably the lowest priority in recovering from an actual disaster.
  
  --
  Can I get an eye poke?
  Dog House Forum
34. Re:Whiskey Tango Foxtrot by Spazmania · 2007-08-15 09:48 · Score: 1
  
  And even a marginally competent network administrator ought to be able to recognize that they face a packet storm and isolate the problem in about 30 minutes through the simple expedient of, "Unplug this switch. Did the problem stop? No. Plug it back in and unplug the next switch."
  
  I'll bet the sucker not only keeps his job but gets a commendation for finding the problem.
  
  --
  Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
35. Re:Whiskey Tango Foxtrot by UbuntuDupe · 2007-08-15 09:51 · Score: 1
  
  No, not really. You've got to expect to lose machines; failure happens. Could have been a motherboard or a power supply.
  
  Yeah, then I could go to the local store, buy a part, replace it, and move on with my life, like I've done before (a power supply did in fact go out on me before).
  
  I'd still expect you to be able to boot from CD though. You try knoppix? You should be able to boot to knoppix, then mount the /boot partition and have your way with grub.
  
  Hey! Capital idea! I'd download it on that computer I can't use and burn it on the CD drive I don't have access to!
  
  --
  Apology to Ubuntu forum.
36. Re:Whiskey Tango Foxtrot by Jaxoreth · 2007-08-15 09:53 · Score: 4, Funny
  
  Still, it was a truck.
  Which explains why it's not used in the Internet.
  
  --
  In general, it is safe and legal to kill your children. -- POSIX Programmer's Guide
37. Re:Whiskey Tango Foxtrot by ZachPruckowski · 2007-08-15 10:04 · Score: 1
  
  Redundancy only helps when you have a system that stops working, not one that malfunctions:
  
  For instance, imagine a RAID 1 in which the data is becoming corrupted. Having redundancy doesn't help: you just have two copies of a corrupted file.
  
  In this instance, a network card started spewing out crap. Because it could fill it's pipe, and most of the packets were rebroadcast down most of the other cables, they also filled those cables.
38. Re:Whiskey Tango Foxtrot by hedley · 2007-08-15 10:22 · Score: 1
  
  Back in the late 80's we had a network of Apollo DN300's. Days of lost time when the ring went down. Once and entire day lost. IT scrambling about in the ceiling tiles trying to TDR the cable, finally, a NIC out in a workstation in a closed office, the employee was on vacation. Whilst on vacation that NIC, like a bad xmas tree bulb took down the whole lan whilst it wouldn't play nice and pass on the token. Total junk and a total waste of time. I used to dream of taking that Apollo from CA to Chemlsford mass and heave it wrapped in lemons at the stairs of Apollo's plant.
39. Re:Whiskey Tango Foxtrot by IdolizingStewie · 2007-08-15 10:27 · Score: 1
  
  A lot of interns are paid these days. I made nearly $9000 this summer, working less than two and a half months.
40. Re:Whiskey Tango Foxtrot by Solandri · 2007-08-15 10:28 · Score: 2, Funny
  
  Yeah, I had that happen at a small business I consulted for. Their flat LAN died. I eventually tracked the problem down to a cheap unmanaged switch which had a network cable plugged into it for people to plug their laptops into. Whoever used it last thought leaving the unplugged cable laying on the desk looked untidy, so they "helpfully" plugged it into an empty socket on the same switch.
41. Re:Whiskey Tango Foxtrot by FussionMan · 2007-08-15 10:32 · Score: 1
  
  Do you work at DHS?
42. Re:Whiskey Tango Foxtrot by quizwedge · 2007-08-15 10:33 · Score: 1
  
  [Quote]This is like the time I traced a network meltdown to a 4 port hub (not a switch, and unmanaged hub) that was plugged into (not a joke) a T-3 concentrator on one port, and and three subnets of around 200 computers each on the other 3 ports. Every single one of the outbound cables from the $15.00 hub terminated in a piece of networking infrastructure costing not less than $10,000 dollars.[/Quote]
  
  LOL. Perhaps they ran out of funding after buying all of the rest of the hardware? :)
  
  --
  I have no .sig
43. Re:Whiskey Tango Foxtrot by cez · 2007-08-15 11:01 · Score: 1
  
  2...more or less. There is a packet coming in... and it goes, HEY! Theres my mac, I'mm hit that mofo up. Gets there, and as the packet comes into the new que and goes, Hey I see my mac over there...wtf? Goes back... etc, etc
  
  --
  Walk with Music;
44. Re:Whiskey Tango Foxtrot by SatanicPuppy · 2007-08-15 11:23 · Score: 1
  
  Not true for most people; the average schmo will have a computer with a proprietary motherboard (e.g. Dell), and the ubergeeks will scorn all boards that can be bought outside of Akihabara or some other exotic tech mecca. I'm not that rabid, but I still don't go down to Best Buy and buy a crap motherboard...Though I did buy a power supply there once.
  
  As for Knoppix, if you commonly corrupt your bootloader to the point where you can no longer access the machine, I recommend you burn yourself a copy next time you have access to a functional computer. A facility with Knoppix and a bunch of spare knoppix disks kept on hand will save you a world of grief. The Knoppix STD is right up there with the tone wand, the multitool, and the Fluke in my geek emergency kit.
  
  --
  ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
45. Re:Whiskey Tango Foxtrot by TheSkyIsPurple · 2007-08-15 11:23 · Score: 1
  
  I dont know their network architecture, but if they connect the remote sites (ie, LAX) to a central site over VPN, and don't distribute the concentrators appropriately, you could have one desktop machine drop every airport off the map at once.
  
  I've seen exactly that happen in a worldwide distributed call center environment.
46. Re:Whiskey Tango Foxtrot by SatanicPuppy · 2007-08-15 11:35 · Score: 1
  
  I have no explanation...It was just crammed in the bottom of the rack. At that time there was no fiber in the building, so it was just on a piece of CAT5...I stuck a toner on the end up in the comm room, and dug through the wires in the server room until I found it, tugged on the wire to see what it was attached to, and that little piece of shit fell out of the rack. I literally could not believe it...If it had been hooked into Lego hardware, or something by Mattel I could not have been more surprised.
  
  I snagged some emergency capital, and put together a fiber/gigabit solution just to the server room...To date its the most bang for my buck I've ever gotten for spending less than 5k. My boss took the credit, and I got reprimanded for being late on an unrelated deliverable. God I love I/T.
  
  --
  ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
47. Re:Whiskey Tango Foxtrot by gad_zuki! · 2007-08-15 11:37 · Score: 1
  
  >How in the holy hell does a desktop take down the whole system?
  
  Easy. We used to call it Token Ring.
48. Re:Whiskey Tango Foxtrot by phoenixwade · 2007-08-15 11:49 · Score: 1
  
  After about 2 to 3 hours of investigation and it going down twice after we'd bring it back up, we soon found the problem. F**king intern, who was worthless anyways and about to get fired for other stupid mishaps had a Netgear switch he was using for setting up new desktops thought it'd be cute to plug one port of the switch to another port on it, that was creating the havoc and bringing down the switch and part of the network for most. Intern notwithstanding, whoever was managing the network needed a kick in the ass too, your switch should have detected the loop and shut down that channel. Moreover, it should have been pretty obvious where the problem child was, 3 hours of investigation? Sounds like your team was taking advantage of a little siesta time too.
  
  --
  A positive attitude may not solve all your problems, but it will annoy enough people to make it worth the effort.
49. Re:Whiskey Tango Foxtrot by Jeremiah+Cornelius · 2007-08-15 11:56 · Score: 1
  
  DCA IRMA!
  
  --
  "Flyin' in just a sweet place,
  Never been known to fail..."
50. Re:Whiskey Tango Foxtrot by Phroggy · 2007-08-15 12:11 · Score: 1
  
  Still, it was a truck.
  Which explains why it's not used in the Internet. Because the Internet is a series of tubes?
  
  --
  $x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
  $x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
51. Re:Whiskey Tango Foxtrot by karnal · 2007-08-15 12:54 · Score: 1
  
  *bangs head on desk* That sucks.
  
  But you should keep your user population and your server population on different segments. Then the worst they can do is take out the management addy of a switch or a printer...
  
  --
  Karnal
52. Re:Whiskey Tango Foxtrot by dbIII · 2007-08-15 13:04 · Score: 1
  
  Good point. I'm curious about whether marginally competent network administrators were kept away due to balkanisation of the network in the name of security and they had to wait until the uber homeland people got there and tried to figure things out that were outside of their experience. Just cynical due to seeing situations where nobody is authorised to do their job and nobody is in charge - hopefully this was not the case.
53. Re:Whiskey Tango Foxtrot by flappinbooger · 2007-08-15 14:08 · Score: 2, Funny
  
  But Token Rings are, like, obsolete and stuff, surely there wouldn't be something that obsolete in a place like an airport, right?
  
  Right?
  
  [crickets chirping]
  
  Right?
  
  --
  Flappinbooger isn't my real name
54. Re:Whiskey Tango Foxtrot by adolf · 2007-08-15 14:39 · Score: 1
  
  It's not just the lone, cheap unmanaged switches that do this.
  
  I once saw a well-designed, multi-building, redundant network of HP Procurve switches fail completely. In this case, it wasn't just a loose cable - someone decided to plug both the "LAN" and "PC" ports of an IP phone (with integrated ethernet switch) into separate ports on the wall.
  
  It's easy to do on accident. It'd probably be just as easy to do on purpose, given root access to a system on the LAN and malicious intent.
  
  --
  Kid-proof tablet..
55. Re:Whiskey Tango Foxtrot by moosesocks · 2007-08-15 15:48 · Score: 1
  
  Plug a crummy $15 unmanaged hub into itself (ie. create a physical loopback).
  
  You'll likely generate a packet storm that works its way surprisingly far up the network until a fairly expensive/intelligent piece of equipment catches and blocks it.
  
  This happened once in the middle school I worked at. Our backbone switches caught it, but it did take an entire wing of the building offline for an hour or so before we could isolate the problem (we had some nice managed switches on the backend, and midrange Netgear switches in the rest of the building, and a single damn hub sitting dormant behind someone's desk).
  
  After that incident, we purchased and installed managed switches on the next level of hierarchy, and ran some extra so that no more than four computers were more than a single hop away from a managed switch, with most being directly connected to "very nice" hardware. The fiber-linked backbone we got along with it was a nice perk that essentially came "free" with our improved infrastructure.
  
  This was in a modestly funded K-8 school with about 300 students in it. For anything remotely similar to happen in one of the largest airports in the country is completely and profoundly inexcusable. Their IT staff had better have a damned good excuse if they want to keep their jobs.
  
  I'd grant them that such a failure could take out a single ticketing counter or something like a baggage carousel if they weren't meticulously careful when planning the network. However, to take out the entire airport is incomprehensible.
  
  --
  -- If you try to fail and succeed, which have you done? - Uli's moose
56. Re:Whiskey Tango Foxtrot by CodeBuster · 2007-08-15 17:45 · Score: 1
  
  This is like the time I traced a network meltdown to a 4 port hub (not a switch, and unmanaged hub) that was plugged into (not a joke) a T-3 concentrator on one port, and and three subnets of around 200 computers each on the other 3 ports. Every single one of the outbound cables from the $15.00 hub terminated in a piece of networking infrastructure costing not less than $10,000 dollars.
  
  Did you find the "network administrator" who was responsible for that hardware choice? If he didn't have an autographed get of jail free card from a manager saying that there was no money in the budget (after buying those $10,000 devices) to replace that $15 hub with (at least temporarily) a $50 switch from your nearest big box electronics outlet, until you can order something more appropriate then the "network administrator" should have been fired for incompetence and if the manager signed the piece of paper then they should fire him instead. They should really stop selling hubs, they are just about worthless now that switches are so darn cheap.
57. Re:Whiskey Tango Foxtrot by pe1chl · 2007-08-15 20:31 · Score: 1
  
  This is why I always wonder why manufacturers are so dumb to implement "automatic mdi/mdi-x" on all ports of unmanaged switches.
  You see this all the time today. There are almost no situations where it is useful. There is only one common situation where you need a crossover cable or port: when connecting two switches. In the past, there usually was a small switch near port 24 to select mdi/mdi-x (crossover) and it always was sufficient.
  Even when they want to automated that, it could have been done on a single port.
  
  Switches with auto mdi/mdi-x on all ports make it very easy to crash the network. Just plug a cable between two wall-outlets (accidentally or not).
  This was not a problem before this feature was introduced.
58. Re:Whiskey Tango Foxtrot by tomofumi · 2007-08-15 23:22 · Score: 1
  
  I have a similar case in my past job, a stupid user install a small 4-port hub and plug-in 2 PCs which belongs to different subnet, which caused the traffic of 2 subnets pass through....and our novell netware server start showing errors about received traffic coming from a wrong interface... It took us whole morning to find out where the crazy hub is located from the whole building. It is really hard to avoid ordinary users to install their own little hubs under their desk unless there is some kind of company policy in enforcement.
59. Re:Whiskey Tango Foxtrot by JonathanR · 2007-08-15 23:29 · Score: 1
  
  costing not less than $10,000 dollars. I presume you get that sort of currency from an ATM machine?
60. Re:Whiskey Tango Foxtrot by ekimminau · 2007-08-16 01:34 · Score: 1
  
  You are assuming they had spanning tree turned on. Ive been in a LARGE number of environments where it has been specifically disabled because of some long forgotten incident where it was blamed.
  
  --
  Armaments, 2-9-21 And Saint Attila raised the hand grenade up on high, saying, 'O Lord, bless this Thy hand grenade' N
61. Re:Whiskey Tango Foxtrot by phillyclaude · 2007-08-16 04:12 · Score: 1
  
  Whoops. Spanning Tree Protocol exists to prevent this problem. If those HP Switches had it configured, they would have blocked one of the looped ports, and problem solved.
  
  --
  A computer without a Microsoft operating system is like a dog without bricks tied to its head
62. Re:Whiskey Tango Foxtrot by S.O.B. · 2007-08-16 12:46 · Score: 1
  
  Yeah, and any decent switch (and some not-so-decent) would detect this and shut the port down.
  
  Hell, I have a 7 year old dlink 8-port at home that can do this!
  
  A layer 3 switch could handle this. Your D-Link router is not layer 3 and would collapse in seconds...if it lasted that long.
  
  --
  Some of what I say is fact, some is conjecture, the rest I'm just blowing out my ass...you guess.
63. Re:Whiskey Tango Foxtrot by charlesnw · 2007-08-22 08:59 · Score: 1
  
  Yes I am well aware of that. A number of large organizations do not have disaster recovery for back end systems either. I know this for a fact based on first hand experience at a number of organizations.
  
  --
  Charles Wyble System Engineer
In other news... by djupedal · 2007-08-15 08:01 · Score: 2, Insightful

"...said airport and customs officials are discussing how to handle a similar incident should it occur in the future."

What makes them think they'll get another shot? Rank and file voters are ready with their own plan...should a 'similar incident' by the same fools happen again.
1. Re:In other news... by fm6 · 2007-08-15 09:55 · Score: 1
  
  What makes them think they'll get another shot?
  You mean, besides the fact that DHS still has the same inept upper management they had during Katrina? And the fact that voters won't have any say in the matter until November 2008?
You figure it out by COMON$ · 2007-08-15 08:01 · Score: 3, Interesting

Let me know, knowing how to prevent failure to to a flaky nic on a network is a very large issue.
First you see latency on a network, then you fire up a sniffer and hope to god you can get enough packets to deduce which is the flaky card without shutting down every NIC on your network.
Of course I did write a paper on this behavior years ago in my CS networking class. Taking a Snort box and a series of custom scripts to notify admins with spikes on the network outside of normal operating ranges for that device's history. However implementing this successfully in an elegant fashion has been beyond me and I just rely on Nagios to do a lot of my bidding.

--
CS: It is all sink or swim...oh and did I mention there are sharks in that water?
1. Re:You figure it out by Anonymous Coward · 2007-08-15 08:20 · Score: 1, Insightful
  
  Why would anyone be stupid enough to have all hosts in a mission-critical setting on one subnet?
  Maybe you meant it's a "large issue" if you're a complete moron and put everything on one subnet, but everything is an issue if you're a complete moron, so there's nothing special about nics.
2. Re:You figure it out by GreggBz · 2007-08-15 08:21 · Score: 4, Informative
  
  One not to unreasonable strategy is to set up SNMP traps on all your NICs. This is not unlike the cable modem watching software at most Cable ISPs.
  
  At first, I can envision it being a PITA if you have a variety of NIC hardware especially finding all those MIBs. But they are all pretty standard these days, and your polling interval could be fairly long, like every 2 minutes. You could script the results, sorting all the naughties and periodic non-responders to the top of the list. That would narrow things down a heck of a lot in a circumstance like this.
  
  No alarms, but at least a quick heartbeat of your (conceivably very large) network. A similar system can be used to watch 30,000+ cable modems, without to much load on the snmp trap server.
3. Re:You figure it out by asphaltjesus · 2007-08-15 08:21 · Score: 1
  
  It's called teaming on windows and we use it. In fact, we had a flaky NIC just the other day. I'm not sure how many cards/vendors support teaming outside of HPaq.
  
  On linux, it's called bonding. This is a killer feature.
  
  I had some very limited professional experience with LAWA in the last couple of years. (LAWA runs LAX) I have no doubt there is quite a bit of consultant the usual chicanery going on whereby they don't actually hire qualified IT people, just people an elected official or two or three may know. The IT staff on hand is most likely has quite limited authority. Other than hiring more consultants they know to document the failure, little will ever come of it.
  
  How is that scenario possible you ask? Well, LAWA is a HUGE cash cow for the city/county so there are naturally quite a few political contributors lined up to get their goods/services contracts fulfilled.
  
  --
  Got Trader Joe's? friendwich.com RSS feeds work now!
4. Re:You figure it out by SatanicPuppy · 2007-08-15 08:32 · Score: 1
  
  The AC is right. Your network topology should be spread out over a number of subnets, and they should only talk to each other where it's critical. The subnets should be separated by expensive managed switches, or by custom hardware configured to monitor packet traffic and isolate problems. Critical systems should be largely inaccessible to the vast majority of the network, and where they are accessible the access is monitored and throttled. If one machine takes too much traffic, you need a second machine set up in a load balancing configuration.
  
  This stuff is basic. To have one card take down a whole network...I can't even conceive. There isn't one card that can talk to my whole network on all ports, and there would never be a need for such a thing.
  
  --
  ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
5. Re:You figure it out by t0rkm3 · 2007-08-15 08:58 · Score: 1
  
  You can see a similar behavior from Cisco IPS if you enable and tune the anomaly detector engine. This in turn feeds MARS... which is groovy except the alerting stinks within MARS. So you have to beat up Cisco and they'll hash out a xslt that will prettify the XML garbage into a nice little HTML email for the desktop support guys to chase down the offender. Couple that with some Perl to grab the fields and shove them in a DB for easy reference...
  
  It works, and it works a lot more easily than anything else that I have deployed to accomplish a similar task.
6. Re:You figure it out by COMON$ · 2007-08-15 09:33 · Score: 1
  
  You are correct and you speak far truer than the AC. Basic VLANs should be a part of any admin's topology, I worked on a network once where all 100 servers were connected to the same VLAN as an additional 120PCs subnet was just about full and they were asking for the problem above. I agree with you wholeheartedly, but my question is more related to local subnets 100 PCs here 100PCs there and I would like to detect a faulty nic before it disturbs the rest of the PCs.
  
  --
  CS: It is all sink or swim...oh and did I mention there are sharks in that water?
7. Re:You figure it out by ctr2sprt · 2007-08-15 10:32 · Score: 5, Informative
  
  One not to unreasonable strategy is to set up SNMP traps on all your NICs.
  
  That doesn't make much sense. If the NIC goes down or starts misbehaving, the chances of your NIC's SNMP traps arriving at their destination is effectively zero. You probably mean setting up traps on your switches with threshold traps on all the interfaces, the switch's CPU, CAM table size, etc. Which would be more useful. You could also use a syslog server, which is going to be considerably easier if you don't have a dedicated monitoring solution.
  
  But they are all pretty standard these days, and your polling interval could be fairly long, like every 2 minutes.
  
  You're not thinking of traps if you're talking about polling. Traps are initiated by the switch (or other device) and sent to your log monster. You can use SNMP polling of the sort that e.g. MRTG and OpenNMS do which, with appropriate thresholds, can get you most of the same benefits. But don't use it on Cisco hardware, not if you want your network to function, anyway. Their CPUs can't handle SNMP polling, not at the level you're talking about.
  
  No alarms, but at least a quick heartbeat of your (conceivably very large) network. A similar system can be used to watch 30,000+ cable modems, without to much load on the snmp trap server.
  
  I think you are underestimating exactly how much SNMP trap spam network devices send. You'll get a trap for the ambient temperature being too high. You'll get a trap if you send more than X frames per second ("threshold fired"), and another trap two seconds later when it drops below Y fps ("threshold rearmed"). You'll get at least four link traps whenever a box reboots (down for the reboot, up/down during POST, up when the OS boots; probably another up/down as the OS negotiates link speed and duplex), plus an STP-related trap for each link state change ("port 2/21 is FORWARDING"). You'll get traps when CDP randomly finds, or loses, some device somewhere on the network. You'll get an army of traps whenever you create, delete, or change a vlan. If you've got a layer 7 switch that does health checks, you'll get about ten traps every time one of your HA webservers takes more than 100ms to serve its test page, which happens about once per server per minute even when nothing is wrong.
  
  And the best part is that because SNMP traps are UDP, they are the first thing to get thrown away when the shit hits the fan. So when a failing NIC starts jabbering and the poor switch's CPU goes to 100%, you'll never see a trap. All you'll see are a bunch of boxes on the same vlan going up and down for no apparent reason. You might get a fps threshold trap from some gear on your distribution or core layers, assuming it's sufficiently beefy to handle a panicked switch screaming ARPs at a gig a second and have some brains left over, but that's about it. More likely you won't have a clue that anything is wrong until the switch kicks and 40 boxes go down for five minutes.
  
  Monitoring a network with tens of thousands of switch ports sucks hardcore, there's no way around it.
8. Re:You figure it out by huge · 2007-08-16 01:03 · Score: 2, Informative
  
  And the best part is that because SNMP traps are UDP, they are the first thing to get thrown away when the shit hits the fan.
  In some cases it might be better idea to use inform instead of trap.
  
  --
  -- Reality checks don't bounce.
Head of IT for LAX should be fired... by Glasswire · 2007-08-15 08:05 · Score: 3, Insightful

...for not firing the networking manager. The fact that they were NOT terrified that this news would get out and were too stupid to cover it up indicates he/she and their subordinates SIMPLY DON'T KNOW THEY DID ANYTHING WRONG by not putting in a sufficently montiored switch architecture which would rapidly alert IT staff and lock out the offending node.
Simply amazing. Will someone in the press publish the names of these losers so they can be blacklisted?
1. Re:Head of IT for LAX should be fired... by Rob+T+Firefly · 2007-08-15 08:17 · Score: 5, Funny
  
  They have to find someone who can not only design a vital high-traffic network and maintain it... but who didn't have fish for dinner.
  
  --
  Slashdot Burying Stories About Slashdot Media Owned
2. Re:Head of IT for LAX should be fired... by kschendel · 2007-08-15 08:24 · Score: 3, Informative
  
  RTFA. This was a *Customs* system. Not LAX, not airlines. The only blame that the airlines can (and should) get for this is not shining the big light on Customs and Border Patrol from the very start. I think it's time that the airlines started putting public and private pressure on CBP and TSA to get the hell out of the way. It's not as if they are actually securing anything.
  
  CBP deserves a punch in the nose for not having a proper network design with redundancy; and another punch in the nose for not having any clue what to do in an outage. They should have a reduced-service backup plan, and a manual backup plan, and a diversion backup plan. There's no excuse for federal officials to sit there like idiots waiting for things to magically get fixed. Oh wait, I guess some of them ARE idiots.
3. Re:Head of IT for LAX should be fired... by failedlogic · 2007-08-15 10:30 · Score: 1
  
  That's the easy way out but probably not the best one. Often, is the case in government, the "officials" think they're always right on areas outside their expertise. The admin probably knows what he's doing but is having to convince people who don't know anything about anything to change their policies so that he can make the network work correctly. They don't see anything wrong with the way its setup - it was working well *before* the failure right? So, no need to change!
4. Re:Head of IT for LAX should be fired... by bigstrat2003 · 2007-08-15 10:33 · Score: 1
  
  Yes, let's fire people because they don't cover up their mistakes. Then we'll have no one working in any position of importance but liars. Brilliant!
  IF (and that's a big if) we accept your logic that the lack of a cover-up means the IT head/network admin (your post isn't terribly clear on this point) didn't realize there was anything wrong with the way things were being done, then yes, I suppose that person should be fired. However, I call that logic "bullshit". Maybe I'm too damn optimistic, but I'd prefer to believe that it's because someone has the courage to take the heat for their mistakes, rather than try and lie to save face... not someone who's just too incompetent to know what the hell is going on.
  
  --
  "16MB (fuck off, MiB fascists)" - The Mighty Buzzard
5. Re:Head of IT for LAX should be fired... by dodongo · 2007-08-15 15:45 · Score: 1
  
  "There's no excuse for federal officials to sit there like idiots waiting for things to magically get fixed. Oh wait, I guess some of them ARE idiots."
  
  Not to mention the fact that you have a self-fulfilling excuse there. Federal officials sit there like idiots waiting for things to magically get fixed, de facto, because they are federal officials.
6. Re:Head of IT for LAX should be fired... by NaDrew · 2007-08-16 08:52 · Score: 1
  
  They have to find someone who can not only design a vital high-traffic network and maintain it... but who didn't have fish for dinner.
  
  Surely you can't be serious!
  
  --
  Vista:XPSP2::ME:98SE
7. Re:Head of IT for LAX should be fired... by Glasswire · 2007-08-17 13:47 · Score: 1
  
  The comment about the lack of a cover up was sarcasm. I would have expected someone senior to hide what happened the moment they realized what a horrible lack of preparedness this exposed. Instead, the incredible details came out, meaning that no one thought this was a grotesquely abnormal failure that should never have happened, but (I can only surmise) it was merely perceived to be unavoidable bad luck.
  The senior management should be fired not just because they were badly prepared, but because they don't seem to have realized they screwed up.
The backup plan by Animats · 2007-08-15 08:08 · Score: 5, Funny

DHS's idea of a "backup plan" will probably be to build a huge fenced area into which to dump arriving passengers when their systems are down.
1. Re:The backup plan by djupedal · 2007-08-15 08:15 · Score: 1
  
  :)
  
  I hear EMA has several new/used camp trailers I'm sure DHS could avail themselves of.
2. Re:The backup plan by Aellus · 2007-08-15 12:02 · Score: 1
  
  I like how this was modded up as Funny. It is funny, but its also something i could really see as a reality... :(
3. Re:The backup plan by Scruffy+Dan · 2007-08-15 16:26 · Score: 1
  
  LAX already has (or at least had this). Years ago when arriving to LAX from another country and catching a connecting flight that did not land in the US (say arriving to LAX from Mexico and taking a connection to Canada) you did not have to go through US customs. Instead you were relegated to a sealed section of the airport where you would wait for your connection to arrive. Essentially it was a few rooms and a few benches, and thats it. No food, no shops, no nothing. It sure sucks if your connection happened to be delayed by several hours... not that I am bitter
  
  --
  Just another crappy blog
4. Re:The backup plan by aaarrrgggh · 2007-08-15 17:01 · Score: 1
  
  ...The real irony is that there is a long line to get into this little "paradise."
  
  Such an awful airport. Guess it represents the city appropriately, though.
Re:I don't believe any of it by COMON$ · 2007-08-15 08:14 · Score: 1

Having worked for the gov't I think you underestimate the quality of employees there...how does that saying go "Two things are infitite the universe and gov't stupidity?". Could be a hack but they wouldnt know unless they brought in someone from the private sector who is smart enough to charge a bagillion an hour to show them how to properly plug in the nic.
Yes, I am glad to be out of that velvet lined rut and in a world where there are actual professionals.

--
CS: It is all sink or swim...oh and did I mention there are sharks in that water?
Stuff Normally All Forked Up by jo42 · 2007-08-15 08:14 · Score: 1

Ha! Ha!

Teaches you to rely on technology...
1. Re:Stuff Normally All Forked Up by MightyMartian · 2007-08-15 08:23 · Score: 1
  
  Yessirree. Let's put one of the busiest airports on the planet on a paper-messenger boy backup. Yeah, that'll clear the backlog real well.
  
  --
  The world's burning. Moped Jesus spotted on I50. Details at 11.
LACP by dy2t · 2007-08-15 08:15 · Score: 2

Also known as IEEE 802.3ad supports aggregating NICs to both improve overall bandwidth as well as gracefully deal with failed links.
More info at http://en.wikipedia.org/wiki/Link_Aggregation_Cont rol_Protocol

Systems seem to be more commonly shipping with multiple NICs (esp. servers) so maybe this will be used more and more. It is important to note that the network switch/router needs to be able to support LACP (dumb/cheap switches do not while expensive/managed ones do) so that might be a barrier. Cisco switches and maybe others have implemented proprietary trunking/aggregation schemes but this 802.3ad is a standard.

In practice, I tried to use LACP with a Linksys SRW2048 $800 switch (targeted at small-businesses, much cheaper than typical managed switch) but it did not work reliably (performance got worse, some clients could not connect/timed-out.) Still working on it.
Re:I don't believe any of it by ABasketOfPups · 2007-08-15 08:22 · Score: 1

So... have you seen "The Number 23" by any chance?
Let that be a lesson to you... by urlgrey · 2007-08-15 08:22 · Score: 3, Funny

To all you novice net admins out there: network cards do *not* like chunky peanut butter! Smooth/creamy only, please.

Now you see what happens when some joker thinks [s]he can get away with using chunky for something as critical as proper care and feeding of network cards. Pfft.

Bah! Kids these days... I tell ya. Probably the same folks that think the interwebnet is the same as the World Wide Web.

Great, Scott! What's next?!

--
Running 'Nix is like owning a Lightsaber. It's "a more elegant weapon for a more civilized time."
1. Re:Let that be a lesson to you... by Repossessed · 2007-08-15 13:56 · Score: 1
  
  I feel compelled to comment on your signature.
  
  Running *nix is more like having a lightsaber/blaster/speeder/x-wing/an incomplete (but fully operational) death star in the same device.
  
  And an intruction manual with pages spread all over the galaxy.
  
  --
  Liberte, Egalite, Fraternite (TM)
The whole system is pointless anyway by Potent · 2007-08-15 08:23 · Score: 4, Insightful

When the U.S. Government is letting millions of illegal aliens cross over from Mexico and live here with impunity, then what the fuck is the point with stopping a few thousand document carrying people getting off of planes from entering the country?

I guess the system exists to give the appearance that the feds actually give a shit.

And then the Pres and Congress wonder why their approval ratings are as small as their shoe sizes...

--
Out of order? Fuck! Even in the future nothing works! - Dark Helmet (Rick Moranis) "Spaceballs"
nic can take down a segment by KDN · 2007-08-15 08:24 · Score: 3, Interesting

Years ago we had a 10BT nic go defective so that whenever the nic was plugged into the switch it would obliterate traffic on that segment. The fun part: EVEN IF THE NIC WAS NOT PLUGGED INTO THE PC. Luckily that happened in one of the few areas that had switches at the time, everything else was one huge flat lan.
1. Re:nic can take down a segment by GuldKalle · 2007-08-15 09:28 · Score: 1
  
  Excuse me, but why the hell did you test for that in the first place? Was the computer just dead weight to hold the NIC in place?
  
  --
  What?
2. Re:nic can take down a segment by zmollusc · 2007-08-15 10:10 · Score: 1
  
  Maybe they were surprised that the fault didn't go away when that pc was powered down, so decided to test the card alone?
  
  --
  They whose government reduces their essential liberties for temporary security, receive neither liberty nor security.
3. Re:nic can take down a segment by KDN · 2007-08-15 10:26 · Score: 2, Interesting
  
  Excuse me, but why the hell did you test for that in the first place?
  It was during the debugging phase. We got it to occur, and then turned off one machine at a time. When all the machines on the segment were off and the switch was still jabber isolated we all went "WTF?!" and then started unplugging cables.
4. Re:nic can take down a segment by cbhacking · 2007-08-15 12:30 · Score: 1
  
  Interesting story. One slight correction: even if the NIC's computer was not turned on. I'm pretty sure if you physically removed the NIC from the machine it would have gone dark. However, modern systems are designed to be bootable via a network signal, so the card likely still gets some power even when the machine is off. There's a reason you remove the power cord, not just flip the switch in back, before doing open-case maintenance.
  
  --
  There's no place I could be, since I've found Serenity...
5. Re:nic can take down a segment by DMUTPeregrine · 2007-08-15 18:19 · Score: 1
  
  Actually, you should flip the switch in back, then remove the ATX cord from the motherboard. OR you should separately ground the case (the better option). The case should be grounded for all work done.
  
  --
  Not a sentence!
Re: Follow-up by asphaltjesus · 2007-08-15 08:24 · Score: 1

It seems the controller in L.A. suggested the same thing years ago. (pdf)

http://www.lacity.org/ctr/press/ctrpress18616087_1 2152003.pdf

--
Got Trader Joe's? friendwich.com RSS feeds work now!
Re:I don't believe any of it by denbesten · 2007-08-15 08:25 · Score: 1

I believe you are thinking of a quote attributed to Albert Einstein:

"Only two things are infinite, the universe and human stupidity, and I'm not sure about the former."
http://www.quotedb.com/quotes/1349
Social not technical problem. by twitter · 2007-08-15 08:27 · Score: 1

How about doing regular police work instead of pre crime, so that passengers don't have to stand around while your network flakes out?

--
Friends don't help friends install M$ junk.
Sounds Familiar...... by netrage_is_bad · 2007-08-15 08:28 · Score: 1

We had something similar happen at my building when I worked at Kent State University. The air conditioning was being worked on and the workers thought it would be a good idea to plug an AC unit into the server room, something they had been specifically told not to do. The Additional load of the AC flipped the breaker and set off all the alarms, all the switches lost power and backup units shutdown all servers. It wouldn't have been so bad except that all university traffic ends up going through our building for internet, which caused all routers to become backed up. ALL of them. What made things worse was the new sysadmin didn't know about some of the backup systems, and no one knew how to reset the breakers (it was a special system) plus there was a special pin that had to be used that no one knew. It was a hillious 2 hours without internet.
1. Re:Sounds Familiar...... by hsqueak · 2007-08-15 09:35 · Score: 1
  
  I bet those workers had an interesting time explaining the outage...
2. Re:Sounds Familiar...... by dana340 · 2007-08-15 15:52 · Score: 1
  
  I was working at my college when someone doing road word deciding that Call before you Dig was wrong about something... they took a 10" gas powered saw and started to cut through a 400 pair trunk connecting the two main sections of campus.... the resulting shorts shorted out equipment on both sides. our network admin was splicing together cables in a manhole under the main street through campus in the poring rain wile others had to replace the equipment on both sides.
  makes you think.. going with wireless is just so much easier!
  
  --
  "10001110101 - periodic table with a centerpiece of mind" -Clutch
3. Re:Sounds Familiar...... by netrage_is_bad · 2007-08-16 01:12 · Score: 1
  
  Wireless is definitly easier, but you just don't get the same performace. I'd perfer that they stop designing systems with single points of failure. Or if you do, have a backup so it can be fixed quickly.
"A similar incident" by The+One+and+Only · 2007-08-15 08:30 · Score: 2, Insightful

A spokeswoman for the airports agency, said airport and customs officials are discussing how to handle a similar incident should it occur in the future.
Except in the future, the incident isn't going to be similar, aside from being similarly boneheaded. This attitude of "only defend yourself from things that have already happened to you before" is just plain dumb. Obviously their system was set up and administered by a boneheaded organization to begin with, and now that same boneheaded organization is rushing to convene a committee to discuss a committee to discuss how to prevent something that already happened from happening again. The root flaw is still in the organization.

--
In Repressive Burma, it's not just your connection that dies. slashdot.org/comments.pl?sid=314547&cid=20819199
1. Re:"A similar incident" by Unordained · 2007-08-15 10:04 · Score: 1
  
  Isn't that basically what this whole thing is about, anyway? We sniff your shoes because we had a shoe-bomber. We check liquids because we heard someone might try that. We put cops on planes and tighten security on the cockpit doors because someone took advantage of that hole. We check a very few cargo containers coming into ports because we heard that was an idea thrown around. We suddenly learn to block physical access to the parking lots of FBI buildings because oddly enough, someone made use of that gap too.
  
  Do we have security guards on every train? Have terrorists not yet though "oh, the planes are protected, but not the trains?" Do we run all passengers of buses through metal detectors? Do we check papers on the way in and out of major cities?
  
  We do nothing until there's been a threat or an attack, and then we do too much assuming either
  a) people will hate the government for doing nothing when nothing can really be done, or
  b) terrorists are so dumb they'll keep trying the same thing over and over again, and patching the known hole is sufficient to stop them
  
  And then people get in trouble for pointing out to the government where other holes might be, because that's helping the terrorists, giving them ideas? (Don't talk about dirty bombs, we wouldn't want them to figure that out!) It's exactly like the problems we have in the IT security industry, except in the industry, you're hopefully dealing with some logical, geeky people who have a clue -- with our national security, we're dealing with emotional politicians and apathetic voters. The debate's going surprisingly well, considering.
2. Re:"A similar incident" by geekoid · 2007-08-15 10:54 · Score: 1
  
  So you are saying that a NIC will never go down again in the LAX system? That's quite a bold statement.
  
  --
  The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
3. Re:"A similar incident" by The+One+and+Only · 2007-08-15 14:04 · Score: 1
  
  I'm saying that a single NIC going down won't single-handedly hose the system again, unless they're horribly incompetent.
  
  --
  In Repressive Burma, it's not just your connection that dies. slashdot.org/comments.pl?sid=314547&cid=20819199
Blaming the Wrong NIC by Doc+Ruby · 2007-08-15 08:30 · Score: 2, Insightful

The NIC that failed isn't the part that's at fault. NICs fail, and can be counted on to do so inevitably, if relatively unpredictably (MTBF is statistical).

The real problem NIC is the one that wasn't there as backup. Either a redundant one already online, or a hotswap one for a brief downtime, or just a spare that could be replaced after a quick diagnostic according to the system's exception handling runbook of emergency procedures.

Of course, we can't blame a NIC that doesn't exist, even if we're blaming it for not existing. We have to blame the people who designed and deployed the system with the single point of failure, and the managers and oversight staff who let the airport depend on that single point of failure.

But instead I'm sure we'll blame the dead NIC. Which gave its life in service to its country.

--
--
make install -not war
1. Re:Blaming the Wrong NIC by Doc+Ruby · 2007-08-15 12:00 · Score: 1
  
  You're an anonymous fool. The system should survive the NIC in a desktop dying. It doesn't matter how hard it is to replace. Its dying shouldn't take out the system.
  
  It's getting to the point where I don't even need to read the rest of a comment, when I just see it's an AC who uses the word "dumbass". You're invariably dumbasses. Don't whine to me about how you don't know how to make a fault tolerant network in a critical system like an airport, just because you've got a crappy job where they make you find and replace NICs.
  
  --
  --
  make install -not war
2. Re:Blaming the Wrong NIC by Anonymous Coward · 2007-08-15 15:25 · Score: 1, Funny
  
  It was a single chip on the NIC.
  It was a single register on the chip on the NIC.
  It was a single gate on the register on the chip on the NIC.
  It was a single transistor on the gate on the register on the chip on the NIC.
  It was a single junction on the transistor on the gate on the register on the chip on the NIC.
  It was a single molecule on the junction on the transistor on the gate on the register on the chip on the NIC.
  It was a single atom in the molecule on the junction on the transistor on the gate on the register on the chip on the NIC.
  It was a single proton in the atom in the molecule on the junction on the transistor on the gate on the register on the chip on the NIC.
  It was a single quark in the proton in the atom in the molecule on the junction on the transistor on the gate on the register on the chip on the NIC- and I'm looking at you, Strange!!
3. Re:Blaming the Wrong NIC by chribo · 2007-08-15 20:00 · Score: 1
  
  Even worse, it can take quite a long time (10 minutes or more) until the network recovers after the mad NIC has been unplugged (I wouldn't belive that if I hadn't experienced this situation in a cluster).
  
  - chribo
Managed switches are FTW by Sehnsucht · 2007-08-15 08:39 · Score: 2, Insightful

Where I work, if there's a packet storm someplace (server is getting attacked, server is attacker, or someone just has a really phat pipe on the other end and is moving a ton of data) we get a SNMP TRAP for packet threshold on the offending port. BAM! You know where the problem is, and since we have managed switches you just shut off the port if you can't resolve the problem.

Having said that, since the managed switches are gigE uplinked and each port is only 10/100, I don't think we've ever had a problem where a server was outbounding and brought down the switch/network (just made some extra latency). We've had some really large inbounds occasionally take down a whole switch, and heaven forbid some idiot shuts the port off on an inbound attack instead of nulling it at the border, cause then the ARP drops and the DOS gets forwarded to every port on the VLAN on a ton of switches.. but a broken NIC packet storming would not have been an issue.

OK, so maybe they don't have managed switches all the way down the to the lowest point on the network. They should still have SOME further up the chain and be monitoring them such that they know from what direction the problem is coming, and shut it off / look at it with a sniffer etc.

Infrastructure that is as important as an airport should have it's own infrastructure properly equipped and maintained with managed equipment, making this nearly a non-issue and certainly one easily resolved.
not too suprised by mytrip · 2007-08-15 08:40 · Score: 1

I used to work for a very large travel agency and have seen queues of travel resevations get pretty backed up and cause problems before although on a smaller scale.

Most reservations are checked for problems automatically but pushed through by a person and moved from one queue to another. If the program that checks them crashes, it can back things up.

I remember a program crashing and a queue getting 2000+ reservations in it before someone figured out what was going on and it had things screwed up for about 2 days while a replacement computer gradually cleared the queue out.

--
Contrary to popular belief, Unix is user friendly. It just happens to be particular about who it makes friends with.
It depends on the switch by camperdave · 2007-08-15 08:57 · Score: 4, Informative

You're right to a point. An ethernet frame, along with the source and destination addresses, has a checksum. A switch that is using a store and forward procedure is supposed to drop the frame if the checksum is invalid. If the nic was throwing garbled frames onto the network, it would have to be garbled in such a way as to have a valid checksum (assuming they are using store and forward switches in the first place).

--
When our name is on the back of your car, we're behind you all the way!
Token Ring upgrade by camperdave · 2007-08-15 09:01 · Score: 1

Perhaps they upgraded their token ring to thinnet.

--
When our name is on the back of your car, we're behind you all the way!
IT is not that advanced by Herkum01 · 2007-08-15 09:07 · Score: 1

This brings out an obvious point, despite the advances we have made in computing and IT, it is still relatively young and not that robust.

This is the equivalent of your car stops working and the 'check engine' light does not even come on. At least now some of the technology for cars is getting to the point that it will find the problem for you. The same still cannot be said for large computer networks.

When people stop treating computers as flawless wonder machines, then we shall see some real progress made.
1. Re:IT is not that advanced by MightyMartian · 2007-08-15 10:20 · Score: 1
  
  Ethernet is hardly a new technology. Anybody with an ounce of knowledge should know about broadcast storms, and should know the standard techniques for isolating them (rebooting router/switch/hub is an awfully good start, and then from there taking individual NICs physically off the segment). There are tools that can help, but even if you're stuck with antiquated hubs and the like, there are still ways of dealing with this.
  
  Where I come from is over a decade of hard-won experience dealing with network issues. I can't imagine what caused that enormous a delay in getting things going again. I have visions of an inexperienced, confused tech running around freaking out, sweating profusely and crapping his pants. I find panic is the enemy of network administration, and it's difficult to put that aside particularly when the big awful happens in some mission critical app.
  
  --
  The world's burning. Moped Jesus spotted on I50. Details at 11.
2. Re:IT is not that advanced by Herkum01 · 2007-08-15 13:25 · Score: 1
  
  No Ethernet is pretty basic, and so are the tools for finding problems. A cable that is not terminated correctly can be almost impossible to detect without someone sitting down to scan network traffic. That requires a fairly high level of knowledge to do.
  
  My point is that people plug stuff in and expect it to work, and it does for the most part. However, when there is a problem you can either brute force it( pulling cables or reinstalling a whole OS) or use tools that require a high level of knowledge to use.
No, a multi-front plan by EmbeddedJanitor · 2007-08-15 09:08 · Score: 1

Arrest all NIC designers, engineers, network stack developers, IT managers,... on suspicion of conspiring to cause the problem.
Change to Wifi because that can't have NIC faults.
C'mon folk... help me out here!

--
Engineering is the art of compromise.
1. Re:No, a multi-front plan by toetagger1 · 2007-08-15 10:09 · Score: 1
  
  Arrest all NIC designers, engineers, network stack developers, IT managers,... on suspicion of conspiring to cause the problem. Change to Wifi because that can't have NIC faults. C'mon folk... help me out here! Print each package to be sent over the network, use the USPS first class mail to send it to the right destination on time, and hire a bunch of undocumented immigrants to enter the data again.
  I'm sure they already have a nice database to use to find prospects that could do the data entry!
  
  --
  who | grep -i blond | date cd ~; unzip; touch; strip; finger; mount; gasp; yes; uptime; umount; sleep
2. Re:No, a multi-front plan by SpaceLifeForm · 2007-08-15 11:46 · Score: 1
  
  Copper miners.
  
  --
  You are being MICROattacked, from various angles, in a SOFT manner.
Spewing by PacketScan · 2007-08-15 09:12 · Score: 1, Flamebait

So a desktop got infected and started to Spew crap onto the network. Then we blame it on the nic it self..
HaH security what is that? when 1/2 our personal information can be found on p2p networks because government employees can't actually do the job they have to screw around and download music / movies or whom knows what else. What would make air port employees any different..
When you boil down to the root problem you'll find it's Lack of leaders ship allowing these problems / attitudes to exist.
And Great out tax dollars paid for this Screw up.
References? by TypoNAM · 2007-08-15 09:15 · Score: 1

Got any references or links to various tutorials and/or documents on how I could setup my network to notify me about a rogue NIC?

--
This space is not for rent.
1. Re:References? by Shadowruni · 2007-08-15 10:42 · Score: 1
  
  I'd say look at Solarwinds. Or Nagios with proper plugins. A common thing I've seen with bad NICs is flapping on a given interface and I know at least Cisco will raise bloody hell about it. Sometimes it's something that's not auto-negotiating properly, but if you see that all of a sudden and then you see a bunch of systems go dead, you've got an idea of where to start looking/routing around.
  BTW my catchpa for this was spastic!
  
  --
  "Chinese Amazons, power armor, laser swords.... things just meant to be." - Shampoo, A Very Scary Bet
2. Re:References? by i8myh8 · 2007-08-15 12:38 · Score: 1
  
  I'm not a network admin, haven't been for years but I think GFI Lan Guard will monitor network traffic and let you know when the network is being flooded and by what MAC Address. There are several packages available that do just that.
sadly... this may be typical by bwy · 2007-08-15 09:17 · Score: 4, Insightful

Sadly, many real-world systems are often nothing like what people might envision as them as. We all sit back in our chairs reading slashdot and thinking everything is masterfully architected, fully HA, redundant, etc.

Then as you work more places you start seeing that this is pretty far from actual truth. Many "production" systems are held together by rubber bands, and duct tape if you're lucky (but not even the good kind.) In my experience it can be a combination of poor funding, poor priorities, technical management that doesn't understand technology, or just a lack of experience or skills among the workers.

Not every place is a Google or Yahoo!, that I can imagine look and smell like technology wherever you go on their fancy campuses. Most organizations are businesses first, and tech shops last. If software and hardware appears to "work", it is hard to convince anybody in a typical business that anything should change- even if what is "working" is a one-off prototype running on desktop hardware. It often requires strong technical management and a good CIO/CTO to make sure that things happen like they should.

I suspect that a lot of things that we consider "critical" in our society are a hell of a lot less robust under then hood than anything Google is running.
Sigh, ignorance is bliss by COMON$ · 2007-08-15 09:26 · Score: 1

no wonder you posted AC an answer like that is just begging to be flamed (modders here is your chance to click that button). Subnetting alone is not going to fix a flaky nic issue. Personally I love walking into a network where the admin doesnt understand the purpose of a subnet and just cuts everything up without segmenting the actual physical network. You need to go beyond /25 or /32, and create a substantial amount of subnets. A flakey nic will broadcast across and tear up that network as if there were no subnets. Then you have to figure the time it takes to cut up those networks and maintain all the vlans and you are looking at serious $$$
Now that being said, a good slice and dice of a network will save you some heartache but it will not solve the problem as at some point your miracle subnets (I will assume you meant vlans) will all have to connect somewhere, now you have reduced the chance of a nic failing as there are less NICs and all are confined to their respective homes, but you still are dealing with NICs and they still short out, go bad, mice chew, and all goes haywire.
Of course you dont know these things unless you ahve actually experienced the scenarios (experienced it here with equipment less than a year old), on a large network and by large I mean at least 1000+ nodes.
My solution to the problem? Get away from large enterprise networks and stick to smaller ones, I really enjoy the perks of having a sub 500 node network and I have the time and can afford the equipment to cut things up properly.

--
CS: It is all sink or swim...oh and did I mention there are sharks in that water?
1. Re:Sigh, ignorance is bliss by scottv67 · 2007-08-15 11:13 · Score: 1
  
  You need to go beyond /25 or /32, and create a substantial amount of subnets.
  
  So if you go past a /32, what exactly are you using for a subnet mask? /33? /34? I'd love to see your network. If you use one host on your /33 subnet to ping another host on your /33 subnet, are the round-trip times negative? ;^)
  
  Have you been getting your network design tips from Michael Keaton ("220, 221 - whatever it takes.")?
2. Re:Sigh, ignorance is bliss by COMON$ · 2007-08-16 01:08 · Score: 1
  
  sorry it was meant to mean, you have to go beyond just logically applying a subnet, there needs to be some physical separation as well. Sorry for the confusion but thanks for the Mr Mom quote :)
  
  --
  CS: It is all sink or swim...oh and did I mention there are sharks in that water?
3. Re:Sigh, ignorance is bliss by COMON$ · 2007-08-16 01:12 · Score: 1
  
  I am not aware of their topology but even if the terminals were all separate from the servers if all the PCs in the airport were on one subnet (a bit stupid in that field, but lets just say they took this one bit of precaution) if the terminals were unable to get to the server, no one would be able to validate passengers and the airport would be shut down.
  I once saw a network that was meticulously cut up so every PC was on its own VLAN, I guess that would be one way to do it but wow, what a PITA.
  
  --
  CS: It is all sink or swim...oh and did I mention there are sharks in that water?
4. Re:Sigh, ignorance is bliss by COMON$ · 2007-08-16 08:34 · Score: 1
  
  from your link:
  An interconnected, but independent segment of a network that is identified by its Internet Protocol (IP) address.
  A LAN that is part of a larger logical network.
  # A portion of a network, which may be a physically independent network, which shares a network address with other portions of the network and is distinguished by a subnet number. A subnet is to a network what a network is to an internet.
  or here for a wiki definition: In computer networks, a subnetwork or subnet is a range of logical addresses within the address space that is assigned to an organization.
  I wouldnt retort but maybe, just maybe by pointing out this flaw in the network admin wanabe's logic I can keep from having to clean up all the messes people make because they are dabbling in things they dont really understand.
  to clarify further from wikipedia:
  However, subnetting allows the network to be logically divided regardless of the physical layout of a network...
  But to be fair most people understand it as this: A typical subnet is a physical network served by one router, for instance an Ethernet network (consisting of one or several Ethernet segments or local area networks, interconnected by network switches and network bridges) or a Virtual Local Area Network (VLAN).
  BUT IT "professionals" have tended to think that if they just say /25-/32 that they are doing enough to survive a storm or to quiet down their network.
  
  --
  CS: It is all sink or swim...oh and did I mention there are sharks in that water?
5. Re:Sigh, ignorance is bliss by COMON$ · 2007-08-16 08:53 · Score: 1
  
  It was the terminology you used. subnet != VLAN. As I have mentioned in previous posts on this topic there is an issue where tracking down a nic going haywire on a LAN is difficult at best unless you get really lucky. We also dont know what kind of network this is, you have to think beyond Ether here, some of these systems are still stuck with systems that in the past "have just worked" so they dont change because of the cost to upgrade and the fear of the difficulty of the new system.
  Which brings me back again to state, finding a bad nic on a LAN before it causes serious damage is an issue that raises all sorts of fun questions:
  How big is too big for a vlan?
  How small is too small?
  Can IDS systems be customized to catch this and notify an admin?
  What kind of switches are there available to shut off a port that is misbehaving?
  What about broadcast storms?
  WHat happens when your nic fries the router it is attached to? ad so on and so forth.
  Of course maybe I am just hyper sensitive about the issue because I have seen the stupidity you refer to and have had to deal with it on almost every network I have been brought into as a consultant or full time employee.
  
  --
  CS: It is all sink or swim...oh and did I mention there are sharks in that water?
Are They Saying...? by Nom+du+Keyboard · 2007-08-15 09:35 · Score: 1

Are they saying that one bad card destroyed other cards? That seems a bit unusual.

--
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
1. Re:Are They Saying...? by tomstdenis · 2007-08-15 10:14 · Score: 1
  
  This is what happens when laymen report tech news...
  
  Tom
  
  --
  Someday, I'll have a real sig.
Re:I don't believe any of it by COMON$ · 2007-08-15 09:36 · Score: 1

Yes, but this is slashdot so I feel the compulsive need to render quotes in a way to fit my needs.

--
CS: It is all sink or swim...oh and did I mention there are sharks in that water?
oh who forgot... by crashelite · 2007-08-15 09:43 · Score: 1

who forgot to pay the IT guy? last time i checked shouldn't most switches correct this issue... i could see them using hubs and thats what really caused the overload :)

--
(yes i know i suck at spelling fell free to correct my grammar and/or spellin i dont care, im still not going to change
1. Re:oh who forgot... by crashelite · 2007-08-15 19:20 · Score: 1
  
  last time i checked it doesnt take 9 hours to fix a NIC, or even scan the network for what computer is putting out crap all over. if it was a "bad NIC" that caused it then there would of been major warning signs on the machine its self (that and most the times that i have ever seen a NIC fail the OS just says F-it and disregards it) i have seen it cause routers (linksys and so on)to reboot but routers like those are not like managed switches in networks like the ones the airport should have. not to mention it was a DT computer they say and last time i checked most computers at airports are just terminals not full blown desktop pc's... but i do agree with the fact that one NIC would not take out a whole network. it could take part of it out but not the whole thing
  
  --
  (yes i know i suck at spelling fell free to correct my grammar and/or spellin i dont care, im still not going to change
Uh... Monitor Much? by porkrind · 2007-08-15 09:48 · Score: 1

Yeah, my employer makes monitoring software, so obviously I'm biased here, but... are you freaking joking me? They couldn't bother to install a monitoring system to warn them their NIC was fried? Hmm... I can think of one offhand ;)

-John Mark

--
Hyperic Community Manager
1. Re:Uh... Monitor Much? by porkrind · 2007-08-15 09:52 · Score: 1
  
  Oh and of course, it doesn't exactly inspire confidence in the airline industry that a single point of failure can bring down a whole system. What about hard drives? Do they have contingency plans for when these fail? Memory?
  
  -John Mark
  
  --
  Hyperic Community Manager
Fragile System by monopole · 2007-08-15 10:01 · Score: 1

Like most of the TSA and the DHS poor design abetted with high levels of secrecy and politicization results in a fragile system. With a large number of single points of failure and a lack of fall-backs and fail-safes means that one problem hoses the system. One wonders what happens when things really go pear shaped in an unanticipated fashion.
Um, what about a paper backup?? by LeRandy · 2007-08-15 10:53 · Score: 3, Insightful

Am I the only one laughing that back in old, antiquated Europe, our passport control have the ability to read the documents, with their own eyes? Oh I forget, how are you supposed to treat your visitors like criminals if you can't take their photograph, fingerprints, and 30-odd other bits of personal data to make sure we aren't terrier-ists (fans of small dogs). It doesn't help prevent terrorist attacks, but it does give you a nice big data mine (and how are you supposed to undermine people's rights effectively if you don't know everything about them).
It is laughable that there is no non-computerised backup for the system. (How about filling out the forms and scanning them in later?)
Ah, only one thing missing. by twitter · 2007-08-15 11:30 · Score: 1

This story is incomplete:

like the time I traced a network meltdown to a 4 port hub (not a switch, and unmanaged hub) that was plugged into (not a joke) a T-3 concentrator on one port, and and three subnets of around 200 computers each on the other 3 ports. Every single one of the outbound cables from the $15.00 hub terminated in a piece of networking infrastructure costing not less than $10,000 dollars.

Tell me that the old hub was hidden in a ceiling tile and that it melted because the HVAC dude thought it was unused. Then you will have matched the worst story I have heard yet.

--
Friends don't help friends install M$ junk.
Chattering NIC by ACMENEWSLLC · 2007-08-15 11:40 · Score: 1

Google search chattering NIC. You want to emulate this? Take a NIC and hard code it to 100/full and set your switch to auto or 100/half. Now start transfering a ton a data to the server. Watch what happens to the other network devices on that switch. A chattering NIC is similar. Sometimes worse.

Sometimes stuff just happens.
The scope of the problem by WheelDweller · 2007-08-15 11:47 · Score: 5, Interesting

I agree, but the scope of the problem is much larger.

Americans are still designing systems (and I'm talking WHOLE systems, not just the computers) for the industrial revolution. Much the same way, we're educating our kids for the same purpose- to make them cogs for manufacturing.

The Japanese have a more 'cellular' structure, as opposed to the 'pyramid' designed back a couple of 'turns of the century' ago. One man on top drives five, who drive 200, who drive them all. But the Japanese model is more like object orientation: each unit has private parts. So long as the command it's given produces the proper results and stays within budget, who cares?

Assembly lines gather at their meetings and decide policy on their own. "Fred has been late 3 times this week; do we care?" and the only people to whom it matters, decide. There's no need for a strict, top-down policy, especially since only tiny organizations all do only one job.

Imagine the broken structures in a holding company; they own a newspaper, a carwash and a grocery store; the top man can't say "We'll only use glass containers", because that would be a disaster in a car wash. They can't say "we choose leaded inks" which might be fine for the car wash, but danger at the newspaper. Each unit has it's own purpose.

So how about giving the network admins the power to do *whatever* it takes to let them keep the equipment up to date? As long as it runs, under budget, and doesn't get'em on the newspapers, who cares about the specifics? Why not let the unused budget from every year sit in an account (not being taken back) and use THAT to improve infrastructure?

If these guys were able to have that kind of control, this discussion wouldn't be happening.

--
--- For a good time mail uce@ftc.gov
Lawl by TrashGUY · 2007-08-15 11:47 · Score: 1, Funny

Linksys 1 LAX 0
I Blame Microsoft by triso · 2007-08-15 12:30 · Score: 1

I blame Microsoft since they took Clippy out of Word. He was my only friend.
A Cisco Config to prevent this by ScaredOfTheMan · 2007-08-15 12:37 · Score: 3, Informative

Yes NICs can go crazy and start blasting broadcasts or Unicasts over your network, if you have a Cisco switch (or any other that supports storm control like features) you may want to enable it, it costs you nothing but the time it takes you to update the config. on the access switch (the one connected to your PCs) get into config mode at type this on every interface that connects directly to a PC (use the interface range command to speed things up if you want). Switch(config-if)#storm-control unicast level X where X is the percent of total interface bandwidth you specify as the threshold for cutting access to that port. Its measure every second, so if you have 100 meg port and you set it to 30, if the PC pushes more than 30 meg a sec in unicasts the switch kills the port, till the pc calms down, if its a 10 meg port the 30 then equals 3 meg, etc etc. You can also add a second line to control broadcasts by changing the word unicast to broadcast. If that had this in place, when the NIC went nuts, the switch would have killed the port, and no outage (I assume a lot here, but you get the point).
damn tokens... by myowntrueself · 2007-08-15 12:47 · Score: 2, Funny

Man, when a Token Ring card went bad, it was hell on the network, nothing worked because the token would not get passed properly.

The worst thing is when a user decides to unplug the cable to move something or whatever. Then the token can fall out and you have to spend hours on your hands and knees with a magnifying glass trying to find the damn thing!

Its true! I saw it in a Dilbert cartoon!

--
In the free world the media isn't government run; the government is media run.
1. Re:damn tokens... by totally+bogus+dude · 2007-08-15 16:24 · Score: 1
  
  Most non-trivial networks will use spanning tree, and something like "bpduguard" on all ports which aren't expected to be connected to another switch. If you connect the switch to itself (or another switch), the BPDUs will cause it to shut down the receiving port, at least for a while. No flood.
  
  Mind you, not all networks do this, and I have managed to put a loop in a network once or twice. Makes the LEDs blink a lot. Very pretty.
How many people were stranded? by schwit1 · 2007-08-15 13:26 · Score: 1

The CNN article was older and adds nothing to the main point of the discussion, that a cheap network component caused such an extensive failure. It appears the only purpose of the CNN page was its higher body count.
Ok, you got me. by GreggBz · 2007-08-15 13:30 · Score: 1

That doesn't make much sense. If the NIC goes down or starts misbehaving, the chances of your NIC's SNMP traps arriving at their destination is effectively zero. You probably mean setting up traps on your switches with threshold traps on all the interfaces, the switch's CPU, CAM table size, etc. Which would be more useful. You could also use a syslog server, which is going to be considerably easier if you don't have a dedicated monitoring solution.
I'm talking about scripted snmpget commands, looking at a dhcp leases file to determine active clients periodically polling them, and manging historical data in some kind of database. If the device is misbehaving and not responding, flag it as non responsive.
You're not thinking of traps if you're talking about polling. Traps are initiated by the switch (or other device) and sent to your log monster. You can use SNMP polling of the sort that e.g. MRTG and OpenNMS do which, with appropriate thresholds, can get you most of the same benefits. But don't use it on Cisco hardware, not if you want your network to function, anyway. Their CPUs can't handle SNMP polling, not at the level you're talking about.
Yes, I'm definitely confused about the whole trap thing. Not traps. I looked it up for clarification.
When did I say Cisco routers? Thanks for the advice though. We have a very modest sun fire that monitors close to ten thousand cable modems using a system similar to what (I think) I'm describing.
And the best part is that because SNMP traps are UDP, they are the first thing to get thrown away when the shit hits the fan. So when a failing NIC starts jabbering and the poor switch's CPU goes to 100%, you'll never see a trap.
You're right about that, and the UDP traffic below. But again, my idea of checking up on the clients, not the switch, might lead you to the root of the problem more easily. I would not use it as my only tool however. With very large networks like this, more then one type of monitoring is a good thing. A few sflow/netflow collectors, like ntop etc.. could also be useful. Add an snmp graphing server for your routers and more centered equipment, like MRTG or PRTG, and some custom stuff such as these scripts I like, and you have tools to find out who's crashing your network. Cisco equipment seems to not have a problem forwarding data to a netflow collector BTW. I've also used dnstop, which is great for finding bad clients / DoS, but that won't find a wonky Ethernet card I imagine.
And the best part is that because SNMP traps are UDP, they are the first thing to get thrown away when the shit hits the fan. So when a failing NIC starts jabbering and the poor switch's CPU goes to 100%, you'll never see a trap. All you'll see are a bunch of boxes on the same vlan going up and down for no apparent reason. You might get a fps threshold trap from some gear on your distribution or core layers, assuming it's sufficiently beefy to handle a panicked switch screaming ARPs at a gig a second and have some brains left over, but that's about it. More likely you won't have a clue that anything is wrong until the switch kicks and 40 boxes go down for five minutes.

Monitoring a network with tens of thousands of switch ports sucks hardcore, there's no way around it.
Once in a college dorm I followed the most active orange lights down the tree of 10Mb Cisco switches all the way to a very senile old ISA Ethernet card in a 386. It was bringing about 200 clients down. We kindly upgraded the kids computer. Network latency quite suddenly started happening in very set intervals for about a day, strangely correlating to in class / out of class times, so we kind of knew it was an Ethernet card. Managing a cable plant is not "thousands of ports" though. A CMTS will discriminate amongst hundreds of modems on one port thanks to the quite amazing upstream scheduling algorithms built into a few ASICs and modems don't fail into a crash the network state. Crappy Ethernet cards fail into a crash the modem state though. Often.
Re:That makes sense by quizzicus · 2007-08-15 13:34 · Score: 1

The average journalist is not technical enough to convey information from a technical source to a technical audience. And they can't just quote, either. No, they have to "explain" the story in their own words, whether or not they understand it. The cause of this problem, of course, is that skilled technical people would rather not be journalists. This is why it's so hard to find people to write documentation.
Re:I don't believe any of it by ColdWetDog · 2007-08-15 14:04 · Score: 1

Take your medicine, take a walk. Back away from the computer.
You'll be fine...

--
Faster! Faster! Faster would be better!
It's a series of tubes by kimvette · 2007-08-15 14:09 · Score: 1

So, what are you telling me, that their tubes got clogged?

--
The Christian Right is Neither (Christian nor right). See: Matthew 23, Matthew 25, Ezekiel 16:48-50
Re:I don't believe any of it by l79327 · 2007-08-15 15:14 · Score: 1

Hack, ha, $10.00 used mini switch from home with looped cabling is the real terror weapon.
reminds me of my campus network by damonlab · 2007-08-15 16:02 · Score: 1

This reminds me of a situation that happened on my campus network. Somebody in one of the campus dorm rooms plugged in a router backwards. The router took down much of the network because it was serving up DHCP addresses. The IT people were able to track down the offending segment and shut it off within two hours.
*when* it occurs in the future by cathector · 2007-08-15 19:23 · Score: 1

i'm not even an operations guy but even i know enough to say "when it occurs in the future", not "should it occur in the future".
isn't this why we use VLANs? by RMH101 · 2007-08-15 21:20 · Score: 1

...so we can segment our networks so they don't *all* get hosed when something like this happens?
Network Acess Control? by TheSync · 2007-08-16 02:36 · Score: 1

It seems to me that some of the new Network Access Control (NAC) technologies might be able to mitigate such a situation - they can automatically shutdown switch ports that are using excessive bandwidth or doing other "naughty things".
Etherkiller by WhiteDragon · 2007-08-16 05:08 · Score: 1

I can think of at least one way that a bad NIC can take out other hardware... Etherkiller. Kids, don't try this at home!

--
Did you mount a military-grade, variable-focus MASER on an unlicensed artificial intelligence?
Any questions? by tinkerghost · 2007-08-16 07:06 · Score: 1

Why can't you just make it work? It worked fine like this before when the switch was in the other closet, I'm tired of spending money replacing these things.
1. Re:Any questions? by fredklein · 2007-08-16 16:26 · Score: 1
  
  "Put your hand on top of your monitor. Feel that warmth? All electronic equipment creates heat. Too much heat fries electronics. I can "make it work" for the $$$ I mentioned, or we can keep spending $$$$ to replace fried equipment."
Re:I don't believe any of it by zogger · 2007-08-16 08:34 · Score: 1

I prefer boron titanium alloy, you insensitive clod!