Slashdot Mirror


Stupid Data Center Tricks

jcatcw writes "A university network is brought down when two network cables are plugged into the wrong hub. An employee is injured after an ill-timed entry into a data center. Overheated systems are shut down by a thermostat setting changed from Fahrenheit to Celsius. And, of course, Big Red Buttons. These are just a few of the data center disasters caused by human folly."

24 of 305 comments (clear)

  1. bad article is bad by X0563511 · · Score: 5, Insightful

    The summary reads like a digg post, and has two different links that, in actuality, link to the exact same thing.

    This needs some fixin'.

    --
    For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    1. Re:bad article is bad by macwhizkid · · Score: 5, Insightful

      Article also needs fixin' in the lessons learned from the incidents described. Look, I'm sorry, but if your hospital network was inadvertently taken down by a "rogue wireless access point", the lesson to be learned isn't that "human errors account for more problems than technical errors" -- it's that your network design is fundamentally flawed.

      Or the woman who backed up the office database, reinstalled SQL server, and backed up the new (empty) server on the same tape. Yeah, a new tape would have solved that problem. Or, you know, not being a mindless automaton. Reminds me of a quote one of my high school teachers was fond of: "Life is hard. But life is really hard if you're stupid."

    2. Re:bad article is bad by Sponge+Bath · · Score: 4, Funny

      I only got a 200 on my English SAT. I's got no writin' skills.

      You has done been promoted to /. editor. Collect your "Grammer be important!" t-shirt at the door.

  2. Network meltdown due to hub cross-connects by Florian+Weimer · · Score: 5, Interesting

    Can this really happen easily? I thought for really ugly things to happen, you need to have switches (without working STP, that is).

    1. Re:Network meltdown due to hub cross-connects by omglolbah · · Score: 4, Informative

      Oh yes, it works quite well for sabotaging a network.

      It used to be a constant issue at LAN parties where "pranksters" would do it before going to sleep... Usually we never found them but when we did we flogged them with cat5 cables stripped of insulation :p

    2. Re:Network meltdown due to hub cross-connects by ianalis · · Score: 4, Interesting

      According to CCNA Sem 1, a hub is a multiport repeater that operates in layer 1. A switch is a multiport bridge that operates in layer 2. I thought these definitions are universally accepted and used, until I used non-Cisco devices. I now have to refer to L2 and L3 switches even if CCNA taught me that these are switches and routers, respectively.

    3. Re:Network meltdown due to hub cross-connects by pushing-robot · · Score: 4, Funny

      Ah, yes, what network technician hasn't felt the sting of the old "cat5 o' eight tails"?

      --
      How can I believe you when you tell me what I don't want to hear?
    4. Re:Network meltdown due to hub cross-connects by Geoff-with-a-G · · Score: 5, Informative

      I'm CCNP, taking my CCIE lab next month, I'll give this a shot.

      Yes, the "cow goes moo" level definitions you get are "hub = L1, switch = L2, router = L3" but the reality is more complex.
      A hub is essentially a multi-port repeater. It just takes data in on one port and spews it out all the others.
      A switch is a device that uses hardware (not CPU/software) to consult a simple lookup table which tells it which port(s) to forward the data, and does so very fast (if not always wire-speed). Think like the GPU/graphics card in your PC. Something specific super fast.
      A router is a device that understands network hierarchy/topology (in the case of IP, this is mainly about subnetting, but there are plenty of other routed protocols) and can traverse that hierarchy/topology to determine the next hop towards a destination.

      Now, because of the protocol addressing in Ethernet and IP, these lend themselves easily to hub/switch/router = L1/L2/L3, but they're not really defined that way.

      These days, most Cisco switches (3560, 3750, 6500, etc) run IOS, the software which can do routing, and which uses CEF. CEF in a nutshell takes the routing table (which would best be represented as a tree) and compiles it into a "FIB", which is essentially a flat lookup-table version of that same (layer 3, IP) table. It also caches a copy of the L2 header that the router needs to forward an L3 packet. The hardware (ASICs) in the switches hold this FIB, and thus allow them to "switch" IP/L3 packets at fast rates and without CPU intervention, thus making them still "switches", even if they run a routing protocol and build a routing table.

      Meanwhile, when Cisco refers to a "router" in marketing terms, they're talking about a device with a (relatively) powerful CPU, which can not only perform actual routing, but also usually more CPU-intensive inter-network tasks like Netflow and NBAR.

  3. Router Plugged Into Itself by Anonymous Coward · · Score: 5, Funny

    Where I work a couple years ago one of the non-technical people decided to plug a router into itself. Ended up bringing down the whole network for ~25 people in a company which depended on the Internet (Internet marketing company).

    Unfortunately one of the tech guys figured it out literally as everyone was standing by the elevator waiting for it to take us home. We were that close to freedom :(

  4. Quad Graphics 2000 by Anonymous Coward · · Score: 5, Interesting

    In the summer of 2000 I worked at Quad/Graphics (printer, at least at that time, of Time, Newsweek, Playboy, and several other big-name publications). I was on a team of interns inventorying the company's computer equipment -- scanning bar coded equipment, and giving bar codes to those odds and ends that managed to slip through the cracks in the previous years. (It's amazing what grew legs and walked from one plant to another 40 miles away without being noticed.)

    One of my co-workers got curious about the unlabeled big red button in the server room. Because he lied about hitting it, the servers were down for a day and a half while a team tried to find out what wiring or environmental monitor fault caused the shutdown. That little stunt cost my co-worker his job and cost the company several million dollars in productivity. It slowed or stopped work at three plants in Wisconsin, one in New York, and one in Georgia.

    The real pisser was the guilty party lying about it, thereby starting the wild goose chase. If he had been honest, or even claimed it was an accident, the servers would have all been up within the hour, and at most plants little or no productivity would have been lost.

    The reality: a 20 year old's shame cost a company millions.

    1. Re:Quad Graphics 2000 by FictionPimp · · Score: 5, Funny

      Well, where I work some maintenance genius decided that the location of the red button (near the entrance door) was too risky. They said people coming in the door could hit it while trying to turn on the lights.

      Their solution? They moved it to behind the racks. So every time I bend down to move or check something I have to be conscious not to turn off the power to the entire room with my ass.

    2. Re:Quad Graphics 2000 by drsmithy · · Score: 5, Funny

      One of my co-workers got curious about the unlabeled big red button in the server room. Because he lied about hitting it [...]

      At a previous job we had one of these (albeit with a "Do not push this, ever" label above it) that did nothing more than set off a siren and snap a photo of the offender with a hidden camera. Much amusement was had by all when some new employee's curiosity inevitably got the better of them.

  5. Video by AnonymousClown · · Score: 5, Funny
    Here's a video of a tech worker explaining why these things happen.

    It's very disturbing and you'll see why these things happen.

    --
    RIP America

    July 4, 1776 - September 11, 2001

  6. Re:Not using Cisco ACLs by jimicus · · Score: 4, Informative

    Hours?

    You get something on the network which has an IP from the offending DHCP server, use ARP to establish what that DHCP servers' MAC address is then lookup the switches' own tables to figure out which port that MAC is plugged into and switch that port off and wait for the equipment owner to start complaining. Takes about 3-5 minutes to do by hand, and some switches can do it automatically.

  7. Re:I got a good one too! by Yvan256 · · Score: 4, Funny

    192.168.x.x? That's amazing. I've got the same IPs on my luggage.

  8. My favourite human error - a true story by Kupfernigk · · Score: 5, Interesting
    This was a server room at an (unnamed) UK PLC. The air conditioning had remote management, and the remote management notified the maintenance people that attention was needed. So someone was sent out, on a Friday afternoon.

    When he arrived, most of the staff had gone home and the skeleton IT staff didn't want to hang around. So, they sent him away on the basis that his work wasn't "scheduled".

    Everybody came back on Monday to find totally fried servers.

    --
    From scarped cliff or quarried stone she cries "A thousand types are gone, I care for nothing, no not one."
    1. Re:My favourite human error - a true story by dirk · · Score: 5, Funny

      I have a better AC story. We had a second AC unit installed in server room, as the first was cranking 24/7 and was just barely keeping up, with the thought that the 2 of them in tandem could handle the load. A few days after it was installed, we noticed the room was hot when we got in in the morning. Not enough to cause alarms, but hotter than it should be. As the day went on, it dropped, so we chalked it up to a one time fluke. This happened a time or 2 more throughout the week, but it always dropped during the day. Finally the weekend came, and it got hot enough to cause an alarm. We got in and the AC units kicked on without us actually doing anything, and the room started to cool down. We called out AC guys and they checked both system and couldn't find anything wrong with either of them. Well, the same thing happened again that night. Finally, someone was there late, trying to see if they could see what was going on. Everything was fine throughout the evening, so they finally decided to leave. Luckily, they noticed as they walked out the door and flipped off the lights that the AC units both turned off. HE went back in to verify, and when he turned the lights back on, the AC units both started again. Turned the lights off, and they both shut off again. The genius (lowest bid) company that we hired to install the new AC unit had wired both units into the wall switch for the lights! So when we were there checking, we had the lights on and everything worked perfectly. We went home for the day and turned off the lights, and the AC units. Needless to say, that company isn't even allowed inside out building anymore!

      --

      "Information wants to be expensive" - Stewart Brand, the same guy who said "Information wants to be free"
  9. data centers 101 by ei4anb · · Score: 4, Funny

    Those data centers in the article sound huge, some may even have up to ten servers!

  10. Re:Not using Cisco ACLs by blair1q · · Score: 4, Insightful

    Or unplug it.

    The slow part is figuring out that that's the problem. The first time it happens to you.

    Which is why it's good to have oldbies around, to whom lots of weird shit has happened.

  11. Mainframe days story by assemblerex · · Score: 5, Interesting

    The old tape machines (six foot tall) used to put out a tremendous amount of heat. Space is at a premium, so in the mainframe room the drives were normally put edge to edge,
    with one pushing air in and the other pulling air out. The machines had two 10-12" fans per unit, so stacking two or three units was fine. One site had so many machines side to
    side (over 7), the air coming out the last machine regularly set things on FIRE. It was not uncommon for the machine to ignite lint going through the stack, with it coming out the
    end as a small explosion like dust in a grain silo explosion. A fire extinguisher was kept on hand, and the wall eventually got a stainless steel panel because it was so common.

  12. Ah, the memories! And lessons, too. by martyb · · Score: 5, Funny

    Ah, the memories! Here are some of the stories I've heard and or witnessed over the years.

    1. Orientation: As a co-op student at DEC in 1980, I was told this (possibly apocryphal) story. On seemingly random occasions, a fixed-head disk drive would crash at the main plant in Maynard, Massachusetts. Not all of the drives, just a couple. Apparently the problem was isolated when someone was midway between the computer room and the loading dock. They heard the bump of a truck backing hard into the loading dock followed very shortly by a curse from the computer room! It apparently caused enough of a jolt to cause platters to tilt up and hit the heads... but only on the drives which were oriented north-south; those oriented east-west were not affected. So came the directive that all drives, henceforth, needed to be oriented north-south.
    2. Hot Stuff: Seems that a mini-computer developed a nasty tendency to crash in the early afternoon. But only on some days. Diagnostics were run. Job schedules were checked and evaluated. All the software and hardware checked out A-OK. This went on for quite a while until someone noticed that there was a big window to the outside and that in the early afternoon the sun's light would fall upon the computer. This additional heat load was enough to put components out of expected operational norms and caused a crash.
    3. Cool!: A friend of mine was a field engineer for DEC back in the day when minicomputers had core memory. He was called into a site where their system had some intermittent crashes. He ran diagnostics. All seemed to be within spec. He replaced memory boards. Still crashed. Replaced mother boards. Reloaded the OS from fresh tapes. Still crashed. He finally noticed that one of the fans on the rack was not an official DEC fan. Though it WAS within spec for airflow and power draw, it was NOT within spec for magnetic shielding... it would sporadically cause bit flips in the (magnetic) core memory. Swapping out the fan solved the problem.
    4. This sucked: Another place had a problem with a computer that would sometimes crash in the early evening after everyone went home for the day. Well, not everyone. The cleaning staff apparently noticed a convenient power strip on a rack and plugged their vacuum cleaner into it. The resulting voltage sag took down the server!
    5. Buttons: Every couple years, IBM would hold an open house where anyone in the community could come in and get a tour of the facility (Kingston, NY). This was back in 1984, IIRC. PCs were just starting to make an impact at this time... big iron was king. We're talking about a huge raised-floor area with multiple mainframes, storage, tape drives... MANY millions of dollars per system. A few hundred users on a system was quite an accomplishment back then and these boxes could handle a thousand users. We were also in the midst of a huge test effort of the next release of VM/SP. I had come in that Sunday afternoon to get several tests done (death marches are no fun). All of a sudden the mainframe I was on crashed. Hard. I'd grown accustomed to this as we were at a point where we were "eating our own dog food"; the production system was running the latest build of the OS. But, an hour later and it was STILL down. Apparently, a tour guide had led a group to one of the operator consoles and a child could not resist pressing buttons. Back in those days, booting a mainframe meant "re-IPL" Initial Program Load. Unless the computer was REALLY messed up and wouldn't boot. Only then would someone re-IML the system. Initial Microcode Load. Guess which button the kid pressed? It left the system in such a wonky state that it had to be reloaded from tape. All the development work of that weekend was lost and had to be recreated and rebuilt. (It was a weekend and backups were only done on weekday nights.) It took us a week to get things back to normal.
    6. Drivers: A friend of mine at IBM told me of an
  13. Washer in the UPS by Bob9113 · · Score: 4, Interesting

    My favorite was at a big office building. An electrician was upgrading the fluorescent fixtures in the server room. He dropped a washer into one of the UPSs, where it promptly completed a circuit that was never meant to be. The batteries unloaded and fried the step-down transformer out at the street. The building had a diesel backup generator, which kicked in -- and sucked the fuel tank dry later that day. For the next week there were fuel trucks pulling up a few times a day. Construction of a larger fuel tank began about a week later.

  14. Re:Not using Cisco ACLs by fluffy99 · · Score: 4, Informative

    Cisco switches have a wonderful feature called dhcp snooping.

    Not supported on many of the lower end Cisco edge switches. It believe it also interferes with DHCP relaying.

    Another great tool is "ip verify source vlan dhcp-snooping
    " which can be used to block traffic from IPs/macs that did not obtain their IP from the DHCP server. This nicely prevents users from statically assigning addresses and/or spoofing their mac address.

  15. USB drive running mission critical WAFS by gagol · · Score: 4, Interesting

    I was employed in a 50 employees publicity company. They have a couple of offices across the country and need to share a filesystem through WAFS. The main repository for the WAFS was running off a USB drive, connected to the server using a wire too short. I pointed the problem multiple times to my IT boss (no IT background what so ever) without success, tried to talk the issue to the owner of the company, without success, and one day tyhe worst happenned. The USB controller of the drive fried and we lost the last day of work. Thw windows server system went AWOL. It took an external consultant 3½ days to rebuild the main server, which was running the AD, WAFS, Exchange and our enterprise database. It costed us an account worth 12 MILLIONS $. The big boss then hired consultants and gave them over a thousand box to get her told the exact same thing I pointed to 3 months earlier when I audited the IT infrastructure. Two months later she comes top me and ask me how much it would cost to have a bullet-proof infrastructure. I told her to invest arounbd 80K in virtualisation solution with scripts to move VM around when workload changes and go with a consolidated storage with live backups and replication. It was too expensive. Another three months pass, she hire some consultants, gave them another thousands $ to get told basically the same thing I told her 3 months earlier... Than is where i quitted.

    --
    Tomorrow is another day...