Plug In an Ethernet Cable, Take Your Datacenter Offline
New submitter jddj writes: The Next Web reports on a hilarious design failure built into Cisco's 3650 and 3850 Series switches, which TNW terms "A Network Engineer's Worst Nightmare". By plugging in a hooded Ethernet cable, you...well, you'll just have to see the picture and laugh. They write: "The cables, which are sometimes accidentally used in datacenters, feature a protective boot that sticks out over the top to ensure the release tab isn’t accidentally pressed or broken off, rendering the cable useless. That boot would hit the reset button which happened to be positioned directly above port one of the Cisco switch, which causes the device to quietly reset to factory settings."
"There’s an easy way to prevent it happening at all, by disabling the button" Another easy way to prevent this from happening would be DON'T BUY THIS SWITCH
Regardless of the design of the connector, having the reset button directly above the port is a bad design. It's simply too easy to hit it with your thumb just plugging in or removing a cable. I suppose holding it down for several seconds resets to factory, which is what happens when using cables with the boot. Still, regardless of that more severe problem, it was a bad design in the first place.
Better known as 318230.
Are 'config t' and 'write erase' too difficult to remember? Bothered by all those inconvenient keystrokes? Try the new EasyBoot(TM) from Cisco, the most convenient way to reset your router!
Building Better Software
From the article:
The cables, which are sometimes accidentally used in datacenters, feature a protective boot that sticks out over the top to ensure the release
and then
Such a situation could cause a problem in any size datacenter, where these switches and cables are commonly used
So are they commonly used on accident? Accidentally used commonly? I was reading the article to figure out what type of cable was often used, but apparently it's these cables but only by accident all the time.
If a single device brings down your entire data center, you've got design problems and your architect should be fired or retrained. These days everything is redundant in triplicate at minimum and new devices spin up automatically based on automatic provisioning and chef/puppet type setups. Even if your core router (why would you have just one!?!?!?!) shits the bed and resets to factory defaults with VLAN 1 and basic STP with no routing interfaces configured, if your NOC folks did a good job, a proper MSTP / VRF / TRILL / SDN ( OpenFlow, etc) / etc like setup should route around that shit and QA will have already tested the "core clos spine device reboots to factory defaults" test case at which point you have just another device for a low paid lackey to swap out based on your network monitor going yellow.
If you work in a Fortune 500 datacenter and you can't handle this sort of outage, get the fuck out. You're the reason shit's going downhill. Also if a Cisco 3650 or 3850 bring down your datacenter, see previous negative asshole sentiment or get a new job if your manager is responsible for the confines of such a clusterfuck. No participation trophy for such asshattery.
'We are trying to prove ourselves wrong as quickly as possible, because only in that way can we find progress.' RPF
You're plugging it in wrong.
While I like the auto-LART feature, I wonder what the switch is doing there at all: If the switch is working properly, it doesn't need a reset button.
If the switch is not working properly, it needs to be burdensome to power-cycle it, to encourage people to complain loudly to the responsible vendor(s) until the product actually works.
In these modern times, I think an accessible reset switch is like: "Yo dawg, I heard you like to 'fix' things by pushing buttons, so we put buttons on your Enterprise switches so you can reset one-handed while you [...]"
ObTopic: I once helped take down an enterprise LAN with an Ethernet cable. It was 10-ish years ago, and we just installed a new-fangled VoIP phone system. Each VoIP deskset had a built-in unmanaged 10/100 switch. This was a very handy thing before our modern enlightened structured cabling roll-outs, because it could be trivially daisy-chained with a desktop computer and standardized PoE was not yet a thing.
Anyhow, we started late on a Wednesday, and finished just before start of business Thursday: Record time for replacing an old Nortel with a few hundred extensions, I tell you. And I went home and died on my couch, having been awake and actually working (prep, etc) for about 40 hours.
At 7:23AM, my phone rang. It was my manager. Their entire network had crashed, hard. They blamed us. They were livid. I read my manager the NSFW riot act, hung up, and went back to sleep.
Turns out that after we left, some unknown person had plugged both external switched ports of a deskset into both ports on a wallplate connected to a then high-end HP Procurve switch, which itself connected to a factory and office tower full of other HP Procurve switches carefully set up in a redundant "mesh fabric" mode. This carefully-constructed, redundant network then died in a broadcast packet storm.
Once they found the error and unplugged that one extraneous heads-will-roll wayward wire, things more-or-less instantly returned to normal.
(STP would've instantly made this a complete non-issue, but at that time STP and HP's mesh conflicted with eachother and could not cohabitate. I understand that this was subsequently resolved, though I don't deal with HP switches often enough to verify.)
Kid-proof tablet..