Slashdot Mirror


Stupid Data Center Tricks

jcatcw writes "A university network is brought down when two network cables are plugged into the wrong hub. An employee is injured after an ill-timed entry into a data center. Overheated systems are shut down by a thermostat setting changed from Fahrenheit to Celsius. And, of course, Big Red Buttons. These are just a few of the data center disasters caused by human folly."

17 of 305 comments (clear)

  1. bad article is bad by X0563511 · · Score: 5, Insightful

    The summary reads like a digg post, and has two different links that, in actuality, link to the exact same thing.

    This needs some fixin'.

    --
    For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    1. Re:bad article is bad by Anonymous Coward · · Score: 1, Insightful

      For me, it' usually more due to time constraints and the belief that if I spend too much time composing a narrative somebody else will submit it first.

    2. Re:bad article is bad by macwhizkid · · Score: 5, Insightful

      Article also needs fixin' in the lessons learned from the incidents described. Look, I'm sorry, but if your hospital network was inadvertently taken down by a "rogue wireless access point", the lesson to be learned isn't that "human errors account for more problems than technical errors" -- it's that your network design is fundamentally flawed.

      Or the woman who backed up the office database, reinstalled SQL server, and backed up the new (empty) server on the same tape. Yeah, a new tape would have solved that problem. Or, you know, not being a mindless automaton. Reminds me of a quote one of my high school teachers was fond of: "Life is hard. But life is really hard if you're stupid."

    3. Re:bad article is bad by macwhizkid · · Score: 2, Insightful

      It is not obvious to someone replacing the backup tape whether the backup is appended to the previous backup or replaces the previous one entirely. The former was not all that uncommon back when backup tapes had decent sizes. These days where you need 4 tapes to backup a single drive no one appends.

      Yeah, it's not clear from TFA whether she thought there was enough space, or was just clueless. Regardless, though, when you have mission critical data on a single drive you shut it down, put in a fire safe until you're ready to restore, whatever. But you don't just casually keep using it. And who backs up a test database install anyway?

      It's just interesting that the first story in the article was a technical problem (poor network design/admin) being blamed on user error (unauthorized wireless AP/network cable plugged into wrong switch), while the second story was procedural user error (do the backup every day, no matter what) being blamed on a technical problem (the backup system).

    4. Re:bad article is bad by fishbowl · · Score: 2, Insightful

      >These days where you need 4 tapes to backup a single drive no one appends.

      These days with LTO-4, my biggest problem is having enough time to guarantee a daily backup.

      --
      -fb Everything not expressly forbidden is now mandatory.
    5. Re:bad article is bad by mlts · · Score: 2, Insightful

      This is one reason that D2D2T setups are a good thing. If the tape gets overwritten, most likely the copy sitting on the HDD is still useful for recovering.

      One thing I highly recommend businesses get is a backup server. It can be as simple as a 1U Linux box connected to a Drobo that does an rsync. It can be a passive box that does Samba/CIFS serving, one account for each machine and each machine's individual backup program dump to it. Or the machine can have an active role and run a backup utility like Retrospect. The advantage of the active role is that one can plug another external drive into the server, copy backed up data to it, then take the external drive offsite for additional data protection. It isn't as good as full tape rotation, but better than nothing.

    6. Re:bad article is bad by internewt · · Score: 2, Insightful

      Seriously?! Your job title is Network Administrator! Administer the damn network! It's what you were hired to do!

      And if management doesn't want to make the change?

      Get it in writing.

      Then when you are set up for a fall when the inevitable happens, you have something to cover your arse with.

      --
      Car analogies break down.
  2. Re:Not using Cisco ACLs by omglolbah · · Score: 3, Insightful

    Amusingly anyone who ever worked as tech crew at a lan party knows that this is the first thing you look for... :p

  3. Re:Quad Graphics 2000 by Anonymous Coward · · Score: 3, Insightful

    Why the fuck was the button unlabeled? That's the REAL MISTAKE.

  4. Re:Video FTW by dsoltesz · · Score: 2, Insightful

    Thank you... you've single-handedly made spending my time on recycled, old digg news completely and totally worth it.

  5. Re:Not using Cisco ACLs by Gumbercules!! · · Score: 3, Insightful

    I have to agree with this guy. As soon as IP addresses started being assigned incorrectly, the first thing I would be doing is checking the DHCP server. ipconfig /all on a windows box (so may 3 seconds of typing) would give this answer.

    More to the point, though - why was another DHCP allowed on the network? Can your switches not block or refuse to route DHCP traffic from the wrong host?? Otherwise every single student who brings in their own wifi box is going to shut down the network.

  6. Re:Power strips (with on/off buttons) are bad by Velox_SwiftFox · · Score: 3, Insightful

    Covering those power strip buttons with a hardened glob fixing them in the "on" position is what an electric glue gun is for.

  7. Re:Network meltdown due to hub cross-connects by coryking · · Score: 2, Insightful
  8. Re:Not using Cisco ACLs by blair1q · · Score: 4, Insightful

    Or unplug it.

    The slow part is figuring out that that's the problem. The first time it happens to you.

    Which is why it's good to have oldbies around, to whom lots of weird shit has happened.

  9. Fun with PIX by mkiwi · · Score: 3, Insightful

    I had fun with a company awhile back. They are about 300 employees and ~90mil/year, so this is a small corporation.

    Anyway, the company was trying to get a VPN tunnel established to their China office, and they were having a hell of a time at it. The employees on the China side had no IT experience so everything was done remotely.

    It just so happens that one of the Chinese employees was recruited to make a change to the PIX firewall on the China side in order to get everything working. To our astonishment, it worked, and we had a secure VPN tunnel established.

    The problem was accounts in the US started to get locked out, alphabetically, every 30 minutes. Our Active Directory was getting tons of password crack attempts from inside our internal network. I was using LDAP to develop an application at the time, so naturally I was suspect for causing all these lockouts.

    Fast-forward a week. We look at the configuration of the Chinese firewall and it allowed all access from any IP address on the Chinese side. In other words, crackers were trying to get into our systems through our VPN tunnel in China. In effect, our corporate LAN had been directly connected to the Internet. Once we figured that out, I was free to go back to work and the network lived to see another day, but that incident caused major trouble for all our employees.

    Moral of the story: Don't trust a Chinese firewall.

  10. Re:Network meltdown due to hub cross-connects by X0563511 · · Score: 2, Insightful

    Well.

    The foundry switch I was screwing around with today... wasn't letting the IP Engineer send all the vlans to the mirror port. I could only watch management traffic (STP, etc) and nothing of any actual use.

    It was great! Finally I got pissed off and shoved a homemade passive tap on the uplink and was -then- able to see the issue.

    A hub would have made this a 5 minute job.

    --
    For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
  11. Onsite Training = Bad by Bruha · · Score: 2, Insightful

    I dont care where you work, if you're on site doing training, you're probably also sucked back into the work cycle. I see it all the time at work, I have always preferred offsite training, turn off the cell phones. It also helps if you have to use your laptop on the lab, because 99% of the time it means you can not vpn into work so email is not a concern either.

    I think my other Data Center operators would agree were all understaffed, and I work on a network with hundreds of millions of customers using it on a 24/7 cycle. The other danger nobody speaks of is that some companies are too passive when it comes to testing redundancy because half the time while there's redundancy in the system to keep a DMZ up and running, there's no spare DMZ capacity to handle a true outage such as a fiber ring failure that isolates the data center or other disaster. Companies need to design their redundancy so you can unplug the entire data center and your customers never knows it, because if you do not, you will rue the day a true outage happens that impacts the entire datacenter and you will hear about it on the news later. Not a good thing.