Slashdot Mirror


Stupid Data Center Tricks

jcatcw writes "A university network is brought down when two network cables are plugged into the wrong hub. An employee is injured after an ill-timed entry into a data center. Overheated systems are shut down by a thermostat setting changed from Fahrenheit to Celsius. And, of course, Big Red Buttons. These are just a few of the data center disasters caused by human folly."

305 comments

  1. bad article is bad by X0563511 · · Score: 5, Insightful

    The summary reads like a digg post, and has two different links that, in actuality, link to the exact same thing.

    This needs some fixin'.

    --
    For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    1. Re:bad article is bad by X0563511 · · Score: 0, Redundant

      Oh. And the summary text is, verbatim, the first part of the article. Wow, Timothy... this was just bad.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    2. Re:bad article is bad by Anonymous Coward · · Score: 1, Informative

      Let me help out a bit

      Printable Version

    3. Re:bad article is bad by Anonymous Coward · · Score: 2, Interesting

      I seem to remember in the early days of Telehouse London an engineer switched off power to the
      entire building. Only two routes out of the UK remained (one was a 256k satellite connection)
      that had their own back-up power.

    4. Re:bad article is bad by Timex · · Score: 3, Funny

      the summary text is, verbatim, the first part of the article.

      It is my personal observation that this seems to be the best way to get anything on the front page: using the article text as the "summary". Isn't it nice to see that Slashdot submitters are so original in their writing skill? :D

      --
      When politicians are involved, everyone loses.
    5. Re:bad article is bad by Anonymous Coward · · Score: 1, Insightful

      For me, it' usually more due to time constraints and the belief that if I spend too much time composing a narrative somebody else will submit it first.

    6. Re:bad article is bad by macwhizkid · · Score: 5, Insightful

      Article also needs fixin' in the lessons learned from the incidents described. Look, I'm sorry, but if your hospital network was inadvertently taken down by a "rogue wireless access point", the lesson to be learned isn't that "human errors account for more problems than technical errors" -- it's that your network design is fundamentally flawed.

      Or the woman who backed up the office database, reinstalled SQL server, and backed up the new (empty) server on the same tape. Yeah, a new tape would have solved that problem. Or, you know, not being a mindless automaton. Reminds me of a quote one of my high school teachers was fond of: "Life is hard. But life is really hard if you're stupid."

    7. Re:bad article is bad by commodore64_love · · Score: 2, Interesting

      But.....

      I only got a 200 on my English SAT. I's got no writin' skills. That's why I became a computer geek instead.

      --
      "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
    8. Re:bad article is bad by dsoltesz · · Score: 2, Informative

      *yawn* That's because it was on digg, posted in a nearly identical fashion, two days ago. Agreed. Bad article is bad. And now it's old.

    9. Re:bad article is bad by OnlineAlias · · Score: 2, Interesting

      The first one, at the IU school of medicine, I'm very familiar with that place...they have no data center to speak of, and I do not know that person. I never heard of that incident. Also, who doesn't run spanning tree with BPDU gaurd and other such protections. I know IU does, for a fact.

      Something is very very wrong with that article.

    10. Re:bad article is bad by Mitchell314 · · Score: 1

      . . . so that makes it perfect for /. !
      :P

      --
      I read TFA and all I got was this lousy cookie
    11. Re:bad article is bad by Cylix · · Score: 1

      They said hub so maybe it was from the 90s?

      That was a big danger back in the day when running a lot of hubs and reserving switches closer to the core.

      So either it was a limitation of funds that led to the problem or a limitation of intelligence.

      --
      "You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
    12. Re:bad article is bad by amorsen · · Score: 1

      Or the woman who backed up the office database, reinstalled SQL server, and backed up the new (empty) server on the same tape. Yeah, a new tape would have solved that problem. Or, you know, not being a mindless automaton.

      It is not obvious to someone replacing the backup tape whether the backup is appended to the previous backup or replaces the previous one entirely. The former was not all that uncommon back when backup tapes had decent sizes. These days where you need 4 tapes to backup a single drive no one appends.

      Of course there are tons of other things wrong with a one-tape backup schedule, but again she couldn't necessarily be expected to know about them.

      --
      Finally! A year of moderation! Ready for 2019?
    13. Re:bad article is bad by macwhizkid · · Score: 2, Insightful

      It is not obvious to someone replacing the backup tape whether the backup is appended to the previous backup or replaces the previous one entirely. The former was not all that uncommon back when backup tapes had decent sizes. These days where you need 4 tapes to backup a single drive no one appends.

      Yeah, it's not clear from TFA whether she thought there was enough space, or was just clueless. Regardless, though, when you have mission critical data on a single drive you shut it down, put in a fire safe until you're ready to restore, whatever. But you don't just casually keep using it. And who backs up a test database install anyway?

      It's just interesting that the first story in the article was a technical problem (poor network design/admin) being blamed on user error (unauthorized wireless AP/network cable plugged into wrong switch), while the second story was procedural user error (do the backup every day, no matter what) being blamed on a technical problem (the backup system).

    14. Re:bad article is bad by fishbowl · · Score: 2, Insightful

      >These days where you need 4 tapes to backup a single drive no one appends.

      These days with LTO-4, my biggest problem is having enough time to guarantee a daily backup.

      --
      -fb Everything not expressly forbidden is now mandatory.
    15. Re:bad article is bad by kefler · · Score: 1

      Agreed. I just buried it.

    16. Re:bad article is bad by The+Grim+Reefer2 · · Score: 1, Offtopic

      Isn't it nice to see that Slashdot submitters are so original in their writing skill? :D

      I believe you meant to say, "Ain't it gr8 2 C th@ /. submitters R so 0Ri9IN4l @ writin' skillz?"

    17. Re:bad article is bad by Sponge+Bath · · Score: 4, Funny

      I only got a 200 on my English SAT. I's got no writin' skills.

      You has done been promoted to /. editor. Collect your "Grammer be important!" t-shirt at the door.

    18. Re:bad article is bad by green1 · · Score: 2, Informative

      To this day most cheap switches still can't handle a network cable with the 2 ends plugged in to the same switch. As a telco company technician I can't count the number of times I've solved someone's Internet connectivity problem by unplugging said cable. (I sort of understand it when there's a big mess of cables and you can't see where they all go, but I've also seen some really ridiculous ones where the troublesome cable is less than a foot long and therefore extremely obviously out of place!)

      And before someone says it, yes I'm positive these are switches and not hubs. (I don't think I've actually seen a hub in use in a couple of years now... can you still even buy them?)

      Now I hope that high end switch gear is better, but I have to admit that I haven't tried the experiment to find out (most high end switch gear I have access to is in use for mission critical stuff, no matter how remote the chances of actually taking it down, I'm not going to try the experiment)

    19. Re:bad article is bad by BrokenHalo · · Score: 2, Interesting

      while the second story was procedural user error (do the backup every day, no matter what) being blamed on a technical problem (the backup system).

      Back in the late '80s when I was working on Prime "mini-computers" (as such machines were then known), I would receive periodic calls from Prime's tech support to alert me to (yet) another bug found in their BRMS (Backup/Restore Management System), and would I pretty-please stop using it. As it happened, I was using their less sophisticated but otherwise bombproof dump/restore utilities, so this was never an issue for me, but it was still pretty funny...

    20. Re:bad article is bad by BrokenHalo · · Score: 1

      These days where you need 4 tapes to backup a single drive no one appends. These days with LTO-4, my biggest problem is having enough time to guarantee a daily backup.

      Then you are fortunate to have such leeway. I won't preface this with a "get off my lawn" or anything uttered in a Yorkshire accent, but for the first 15 years of my work with computers, backups occupied hundreds of tapes (and lots of time) every single day.

    21. Re:bad article is bad by mlts · · Score: 2, Insightful

      This is one reason that D2D2T setups are a good thing. If the tape gets overwritten, most likely the copy sitting on the HDD is still useful for recovering.

      One thing I highly recommend businesses get is a backup server. It can be as simple as a 1U Linux box connected to a Drobo that does an rsync. It can be a passive box that does Samba/CIFS serving, one account for each machine and each machine's individual backup program dump to it. Or the machine can have an active role and run a backup utility like Retrospect. The advantage of the active role is that one can plug another external drive into the server, copy backed up data to it, then take the external drive offsite for additional data protection. It isn't as good as full tape rotation, but better than nothing.

    22. Re:bad article is bad by adolf · · Score: 0

      I only got a 200 on my English SAT. I's got no writin' skills.

      You has done been promoted to /. editor. Collect your "Grammer be impotent!" t-shirt at the door.

      FTFY.

    23. Re:bad article is bad by niteshifter · · Score: 1

      ... Reminds me of a quote one of my high school teachers was fond of: "Life is hard. But life is really hard if you're stupid."

      Good teacher, following the sage words of the great Western philosopher Hondo*:
      "Life is tough. It's tougher when you're stupid."

      * aka John Wayne ;)

    24. Re:bad article is bad by PrecambrianRabbit · · Score: 1

      he lesson to be learned isn't that "human errors account for more problems than technical errors" -- it's that your network design is fundamentally flawed.

      No kidding. The sysadmin who uttered that quote comes across poorly as well. He makes the excuse: "It was like that when I got here, so I inherited the bad design." Seriously?! Your job title is Network Administrator! Administer the damn network! It's what you were hired to do!

    25. Re:bad article is bad by Geoff-with-a-G · · Score: 1

      Well, since nobody reads the articles, just the summaries, that actually seems rather efficient.

    26. Re:bad article is bad by pgmrdlm · · Score: 1

      Seriously?! Your job title is Network Administrator! Administer the damn network! It's what you were hired to do!

      And if management doesn't want to make the change?

      --
      Anonymous comments are as pathetic as the anonymous "sources" that contaminate gutless journalism from the New York Time
    27. Re:bad article is bad by PrecambrianRabbit · · Score: 1

      Then you don't make the change. But then the rationale is: "management didn't approve it," instead of "It was this way when I came on board." There are other valid reasons, too; maybe it would be too expensive, maybe the downtime (or risk of downtime) would be too great. But the reason given was "I inherited it," which isn't a good one, in my opinion.

    28. Re:bad article is bad by internewt · · Score: 2, Insightful

      Seriously?! Your job title is Network Administrator! Administer the damn network! It's what you were hired to do!

      And if management doesn't want to make the change?

      Get it in writing.

      Then when you are set up for a fall when the inevitable happens, you have something to cover your arse with.

      --
      Car analogies break down.
    29. Re:bad article is bad by pgmrdlm · · Score: 1
      Well, I was taught early on to have everything in writing when it came to management decisions. Everything from correspondence when testing where they said just move on, its not a real problem. To decisions like what is being discussed here.

      Never trust management, even if your part of it.

      --
      Anonymous comments are as pathetic as the anonymous "sources" that contaminate gutless journalism from the New York Time
    30. Re:bad article is bad by Anonymous Coward · · Score: 0

      Suck-start a shotgun. FTFY hasn't been funny since 1997.

    31. Re:bad article is bad by fishbowl · · Score: 1

      If I had multiple LTO-4 drives I'm sure I could use them. My system right now is to backup about a terabyte per day (within the compression), and the problem is that even with the best SAS drives I can get, I'm stuck with the amount of data that can be moved in 9 hours. More than that, and I can't have a verify pass. LTO-4 is the first format I've had the pleasure of using, where the media size wasn't horribly below the typical storage use case. "Hundreds of gigabytes" is starting to approach "reasonable critical daily backup space" except for audio/video media shops. There's really not much you can do for that situation, especially considering the typical budget.

      --
      -fb Everything not expressly forbidden is now mandatory.
    32. Re:bad article is bad by adolf · · Score: 1

      If I were talented enough to suck-start a shotgun, I'd have a couple of ribs removed and never leave the house.

      FWIW. HAND.

    33. Re:bad article is bad by afidel · · Score: 2, Interesting

      Actual Cisco stuff (as opposed to Linksys gear with a Cisco badge) will discover a loop in an adjacent switch and shutdown the uplink port. Of course if you haven't turned on sw portfast the switch will do spanning tree which will keep the port from ever coming up, so yes better switches will definitely solve the problem. I had a network where the training room and C* row were serviced from the same 48 port switch, our very ADD CEO was in the training room trying to ignore a boring meeting and plugged two adjacent popups into each other, took down C* row but the upstream switch caught the problem so the rest of the company kept working. CFO threw a fit until the root cause was determined and then it was solved by putting a warning on the ports in the training room =)

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    34. Re:bad article is bad by deniable · · Score: 1

      Ah ha, I think you've found a way to lure the Viagra Spammers into the open. Does anyone have a spare car battery?

    35. Re:bad article is bad by adolf · · Score: 1

      Sorry about that.

      I've got a car battery, but I don't think it'll do the job unless you use it to crush their head. I do have a neon sign transformer handy, though. This, a couple of clip leads, and a pair of piercing needles should do the job.

    36. Re:bad article is bad by Anonymous Coward · · Score: 0

      Multiple servers, multiple tape drives.

    37. Re:bad article is bad by jamesh · · Score: 1

      D2D2T is a great way of solving the "my file is corrupt. I need the copy from wednesday" problem and the "our server room burnt down, here's some new servers, make it happen" problem. A good backup application can take periodic incremental backups to disk for point in time restoration, and then synthesize a full backup to tape every night, for total-loss restoration. A dedicated backup server does this better as you say. I use Bacula for this.

    38. Re:bad article is bad by Anonymous Coward · · Score: 1, Funny

      Suck-start a shotgun. FTFY has never been funny.

      FTFY

    39. Re:bad article is bad by amentajo · · Score: 1

      Regardless, though, when you have mission critical data on a single drive you shut it down, put in a fire safe until you're ready to restore, whatever. But you don't just casually keep using it.

      And I'm sure that the subject of this story knows that very well now, having made that mistake. Depending on the software involved and the quality of the written instructions that the subject was given, it may not have been obvious to a non-professional that there was a risk of losing existing data contained on that tape.

      And who backs up a test database install anyway?

      Someone who wants to make sure that the backups are being properly created and can be restored successfully before leaving it alone for an extended period of time.

      the second story was procedural user error (do the backup every day, no matter what) being blamed on a technical problem (the backup system).

      I call it a wash. Yes, it was a user error to back up the test database to a tape with valuable data on it.
      However, the article implies that the user requested a $35 tape from her employer to use for new backups (it only actually says that the employer didn't authorize $35 for a second tape... not that a request was actually made). If this request were to have been approved and the user used the new tape for all future backups (and kept the old one read-only to preserve historical data), then the problem would not have occurred.

    40. Re:bad article is bad by gtall · · Score: 1

      This isn't really a data center story, but it does involve computers. We were installing a metal forming line in a factory in Birmingham, Ala. I had the software controlled flying die and another engineer had the strictly hardware controlled flying cuttoff. The metal came off a huge coil though a loop into a feed to stop and through another loop to the flying die and then the flying cuttoff to chop it to the correct length. The loops act like queues allowing the material to come or exit continuously while either being took out or fed in stages. The feed to stop stamped some holes the steel, the flying cutoff stamps some other holes after the roll-former and then the flying cuttoff did its thing. The flying cuttoff die had a ball screw connected to a motor via a coupling. The coupling was filled with about 5/8 inch ball bearings, about 30 of these.

      We power up the line, the feed to stop is stamping, the flying die takes off and catches the line and the die comes under the press which chomps down and the holes are made. The point on the line where the cutoff is to be made comes up before the flying cuttoff...which promptly takes off like a bat out of hell, fails to cut off, runs into the forward stops...wham, wham, wham...the motor trips out. We scratch out heads thinking. The hardware engineer insists on trying it again with the rest of us assuming since nothing has changed, it will do the same thing again. I'm sitting on top of the control box about 6 feet up to see the whole line. Line starts up, feed to stop does its thing, flying die does its thing. The point on the line where the cutoff is to be made comes up before the flying cuttoff...which promptly takes off like a bat out of hell, fails to cut off, runs into the forward stops...wham, wham, blam...ball bearings flying everywhere. I hit the deck not wanting to die that particular day or lose my eyesight.

    41. Re:bad article is bad by deniable · · Score: 1

      And yet you got all the apostrophes in the right places.

    42. Re:bad article is bad by Anonymous Coward · · Score: 0

      *raises hand*

      of course we also run 10Mbps on our GigE stuff since the powers that be figure anything faster will DESTROY THE TUBEZ!

      Anon to hide my shame.

    43. Re:bad article is bad by mcgrew · · Score: 1

      Compile stopped: Syntax error

    44. Re:bad article is bad by mcgrew · · Score: 1

      It is my personal observation that this seems to be the best way to get anything on the front page: using the article text as the "summary". Isn't it nice to see that Slashdot submitters are so original in their writing skill?

      You can't always blame the submitter. Some stories I've submitted were posted verbatim, some were completley rewritten bt the /. editors.

    45. Re:bad article is bad by Anonymous Coward · · Score: 0

      That is the base score, literally for getting your name correctly bubbled in.

      Congratulations.

    46. Re:bad article is bad by Cramer · · Score: 1

      Few people get to build and maintain the Perfect Network(tm). Even when building a new network from the ground up, there are often constraints limiting the design. (money being in the top two.) When you're handed a random network, rarely (closer to never) do you have the luxury of rebuilding it. Even the simple operation of cleaning up a decade of spaghettified wiring has it's issues.

    47. Re:bad article is bad by Bengie · · Score: 1

      wait.. people still use tape?

    48. Re:bad article is bad by badkarmadayaccount · · Score: 1

      I think I see a market niche for archive drives, this tape buiseness is annoying - simple, lots of spare blocks, low RPM, shock/temp resitant, easily replaceable and interchangeable across drive models control electronics.

      --
      I know tobacco is bad for you, so I smoke weed with crack.
  2. Oh really? by Iriscal · · Score: 0

    So this is why Comcast has been stonewalling me with their excuses.

  3. Network meltdown due to hub cross-connects by Florian+Weimer · · Score: 5, Interesting

    Can this really happen easily? I thought for really ugly things to happen, you need to have switches (without working STP, that is).

    1. Re:Network meltdown due to hub cross-connects by Lehk228 · · Score: 1

      a hub can also be a switch. I have worked with people who referred to both switches and repeaters as hubs

      --
      Snowden and Manning are heroes.
    2. Re:Network meltdown due to hub cross-connects by Pentium100 · · Score: 3, Informative

      This should work quite OK with hubs. A hub, after all, sends the packet to every port except the one where it came from. So two hubs in a loop should just forward the same packet back and forth all the time.

    3. Re:Network meltdown due to hub cross-connects by omglolbah · · Score: 4, Informative

      Oh yes, it works quite well for sabotaging a network.

      It used to be a constant issue at LAN parties where "pranksters" would do it before going to sleep... Usually we never found them but when we did we flogged them with cat5 cables stripped of insulation :p

    4. Re:Network meltdown due to hub cross-connects by betterunixthanunix · · Score: 1

      I saw this happen at my high school once -- someone thought it would be funny to connect one port of an old switch to another port on that same switch. The entire network was flooded for a day while the IT staff tried to figure out where the switch was.

      That was years ago though, I would have thought that by now, these issues had been resolved.

      --
      Palm trees and 8
    5. Re:Network meltdown due to hub cross-connects by jimicus · · Score: 1

      It has in theory. Spanning tree should take care of it.

      Though I have seen interop issues which prevent any traffic from going between two different vendors' STP-enabled switches.

    6. Re:Network meltdown due to hub cross-connects by ianalis · · Score: 4, Interesting

      According to CCNA Sem 1, a hub is a multiport repeater that operates in layer 1. A switch is a multiport bridge that operates in layer 2. I thought these definitions are universally accepted and used, until I used non-Cisco devices. I now have to refer to L2 and L3 switches even if CCNA taught me that these are switches and routers, respectively.

    7. Re:Network meltdown due to hub cross-connects by X0563511 · · Score: 2, Interesting

      It's so irritating when you ask for a hub, and someone hands you a switch. Stores do the same thing. It's hard enough to find hubs, let alone find them when the categorization lumps them together.

      No, I said hub. I don't want switching. I want bits coming in one port to come back out of all the others.

      You can do that with a switch, but getting a switch that can do that is a bit more pricey than a real hub...

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    8. Re:Network meltdown due to hub cross-connects by MoogMan · · Score: 1

      Reading TFA, it was almost certainly because STP wasn't set up correctly. For instance, if the switchport in question had bpduguard enabled then it would have become disabled as soon as the erroneous hub was added, resulting in a localised issue not a network-wide problem.

      It's an issue that many Network Engineers learn the hard way exactly once and fix quickly by reviewing their STP configuration and in many cases, introduce QoS for sanity.

      "We didn't do an official lessons learned [exercise] after this, it was just more of a 'don't do that again,'" says Bowers

      Well, apart from that guy.

    9. Re:Network meltdown due to hub cross-connects by coryking · · Score: 2, Insightful
    10. Re:Network meltdown due to hub cross-connects by coryking · · Score: 0, Offtopic

      Stupid slashdot botching my HTML...

    11. Re:Network meltdown due to hub cross-connects by Mad+Bad+Rabbit · · Score: 2, Interesting

      Cheap deep-packet inspection (using an old hub and Wireshark) ?

      --
      >;k
    12. Re:Network meltdown due to hub cross-connects by Anonymous Coward · · Score: 0

      BPDU guard would not help since hubs do not generate BPDUs as they are not bridges.

    13. Re:Network meltdown due to hub cross-connects by Shimbo · · Score: 2, Informative

      Can this really happen easily? I thought for really ugly things to happen, you need to have switches (without working STP, that is).

      Spanning tree can not deal with the situation where there is a loop on a single port, which you can do easily by attaching a consumer grade switch. There are various workarounds (such as BPDU protection) but they aren't standard, and require manual configuration. Once your network gets big enough, you probably can't afford not to use them, though.

    14. Re:Network meltdown due to hub cross-connects by pushing-robot · · Score: 4, Funny

      Ah, yes, what network technician hasn't felt the sting of the old "cat5 o' eight tails"?

      --
      How can I believe you when you tell me what I don't want to hear?
    15. Re:Network meltdown due to hub cross-connects by pushing-robot · · Score: 1

      Or they saw a loose cord and helpfully plugged it into the nearest available jack. Never attribute to malice...

      --
      How can I believe you when you tell me what I don't want to hear?
    16. Re:Network meltdown due to hub cross-connects by Anonymous Coward · · Score: 0

      Well, a router IS an L3 switch, even if it is frequently called a router. For that matter, a switch is an L2 router. Routing and Switching are largely two different names put on the same function. The major difference is how the tables get populated.

    17. Re:Network meltdown due to hub cross-connects by bsDaemon · · Score: 2, Informative

      There is such a thing as a Layer 3 switch. They have routing functionality built-in, mostly to reduce latency for inter-vlan routing across a single switch. Cisco makes devices called Layer 3 switches, which are different from routers.

    18. Re:Network meltdown due to hub cross-connects by tuxicle · · Score: 1

      Most switches with a management interface of some kind (this includes Netgear units with a "web" interface) allow you to designate a 'monitor' port that will let you do what you're describing.

    19. Re:Network meltdown due to hub cross-connects by omglolbah · · Score: 1

      When you're 16 working at a LAN party you get somewhat motivated when an 18 year old girl wearing duct-tape clothing (skimpy at that :p) wields such a tool :p

    20. Re:Network meltdown due to hub cross-connects by ColdWetDog · · Score: 3, Funny

      When you're 16 working at a LAN party you get somewhat motivated when an 18 year old girl wearing duct-tape clothing (skimpy at that :p) wields such a tool :p

      Yes, and now look at you. Years later, life wasted. Posting to Slashdot on a weekend.

      If only you had listened to your mother and gone into welding.

      --
      Faster! Faster! Faster would be better!
    21. Re:Network meltdown due to hub cross-connects by lumbercartel.ca · · Score: 1

      Ah, yes, what network technician hasn't felt the sting of the old "cat5 o' eight tails"?

      You're thinking of the Cat5 o' Nine Tails -- or maybe you just lost count because you were on the receiving end of one?

    22. Re:Network meltdown due to hub cross-connects by Anonymous Coward · · Score: 0

      Where you would have worked on your car the whole weekend while the wife complained about her doing all the housework all while you talked to your other motor enthusiast on the phone about the next car show you would attend... or dragrace.. or tractor pulling.. or what ever the fuck. No, I'll stay in computers.

    23. Re:Network meltdown due to hub cross-connects by Anonymous Coward · · Score: 2, Funny

      Ah, yes, what network technician hasn't felt the sting of the old "cat5 o' eight tails"?

      You're thinking of the Cat5 o' Nine Tails -- or maybe you just lost count because you were on the receiving end of one?

      <--- Joke

      O<-- You

      CAT5 cable has eight conductors inside it... hence the joke.

    24. Re:Network meltdown due to hub cross-connects by omglolbah · · Score: 1

      I had supportive parents that encouraged me to do what I wanted to do with my life :p

      I work with oil rig control systems and I'm happy doing that.

      My life is by no means perfect but I'm fairly happy where I am at the moment ;)

    25. Re:Network meltdown due to hub cross-connects by FlyMysticalDJ · · Score: 0, Redundant

      Cat o'nine tails? More like cat-5 cables! amirite?

    26. Re:Network meltdown due to hub cross-connects by Anonymous Coward · · Score: 0

      a cat5-of-9-tails, as it were

    27. Re:Network meltdown due to hub cross-connects by Anonymous Coward · · Score: 0

      Such things aren't useful anymore because 100mb "hubs" were rare and gig "hubs" simply don't exist. And these days, who wants to snoop a 100 or 10mb link?

      Do what everyone else does these days and get a switch that does port mirroring.

    28. Re:Network meltdown due to hub cross-connects by Pentium100 · · Score: 1

      He said "cheap". An dumb hub is cheaper than a managed switch.

    29. Re:Network meltdown due to hub cross-connects by AK+Marc · · Score: 1

      I don't know jack about that product, but on a glance it included Full-Duplex as a feature. My understanding was that hubs only do half. Either my recollection is wrong, or that is a switch that they are calling a hub.

    30. Re:Network meltdown due to hub cross-connects by AK+Marc · · Score: 1

      How can you tell a layer-3 switch from a router? As far as I can tell, it's an issue of counting ports, which isn't in any official definition.

    31. Re:Network meltdown due to hub cross-connects by Anonymous Coward · · Score: 0

      CAT6, on the other hand, would make a nice Cat6 o' Nine Tails. One of them would even be sturdy nylon. *ouch*.

    32. Re:Network meltdown due to hub cross-connects by bsDaemon · · Score: 2, Informative

      The physical difference is pretty much the key. The Layer 3 switch will have a bunch of Ethernet ports, but generally no serial ports (other than the console and auxiliary, of course). The layer 3 switch tends to push most of the work logic off onto ASICs rather than doing it in software on CPU time, too. That way you don't suffer much performance loss when routing between VLANs, but you wouldn't put on at a WAN uplink or network border.

    33. Re:Network meltdown due to hub cross-connects by pacman+on+prozac · · Score: 1

      The hub itself wouldn't generate any BPDUs, but since it just repeats electrical signals on the wire then it would be forwarding those from the next switch back up the loop (likely to be the same physical switch) so BPDU guard would still shut the port down.

      There are other loop protections, Cisco switches send loopback packets onto the line and will shut the link down if they see their own loopback packet again. It's a default setting so should work even if BPDU guard (and storm control etc) aren't enabled, unless it's specifically turned off with the "no keepalive" command.

    34. Re:Network meltdown due to hub cross-connects by omglolbah · · Score: 1

      Or if you want to inflict some real damage just leave the pairs in their shielding to make a cat6-o-4+1 eviltail of pain and suffering

    35. Re:Network meltdown due to hub cross-connects by X0563511 · · Score: 2, Insightful

      Well.

      The foundry switch I was screwing around with today... wasn't letting the IP Engineer send all the vlans to the mirror port. I could only watch management traffic (STP, etc) and nothing of any actual use.

      It was great! Finally I got pissed off and shoved a homemade passive tap on the uplink and was -then- able to see the issue.

      A hub would have made this a 5 minute job.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    36. Re:Network meltdown due to hub cross-connects by Anonymous Coward · · Score: 0

      A hub should not create a new collision-domain, so the two hubs should create an instant collision that stops all transmission due to the half-duplex collision detection algorithm. A switch creates separate collision domains on each port, so two switches can buffer and bounce a frame back and forth. This is the real distinction between hub and switch.

    37. Re:Network meltdown due to hub cross-connects by dawgs72 · · Score: 1

      Oh yes it can happen that easily. It happened at the University I was working at about a month ago. We were ghosting a batch of 30 new laptops and showing the new tech how to set up the computers to be imaged. He inadvertently created a loop-back, and caused the entire campus network to shut down.

    38. Re:Network meltdown due to hub cross-connects by Geoff-with-a-G · · Score: 5, Informative

      I'm CCNP, taking my CCIE lab next month, I'll give this a shot.

      Yes, the "cow goes moo" level definitions you get are "hub = L1, switch = L2, router = L3" but the reality is more complex.
      A hub is essentially a multi-port repeater. It just takes data in on one port and spews it out all the others.
      A switch is a device that uses hardware (not CPU/software) to consult a simple lookup table which tells it which port(s) to forward the data, and does so very fast (if not always wire-speed). Think like the GPU/graphics card in your PC. Something specific super fast.
      A router is a device that understands network hierarchy/topology (in the case of IP, this is mainly about subnetting, but there are plenty of other routed protocols) and can traverse that hierarchy/topology to determine the next hop towards a destination.

      Now, because of the protocol addressing in Ethernet and IP, these lend themselves easily to hub/switch/router = L1/L2/L3, but they're not really defined that way.

      These days, most Cisco switches (3560, 3750, 6500, etc) run IOS, the software which can do routing, and which uses CEF. CEF in a nutshell takes the routing table (which would best be represented as a tree) and compiles it into a "FIB", which is essentially a flat lookup-table version of that same (layer 3, IP) table. It also caches a copy of the L2 header that the router needs to forward an L3 packet. The hardware (ASICs) in the switches hold this FIB, and thus allow them to "switch" IP/L3 packets at fast rates and without CPU intervention, thus making them still "switches", even if they run a routing protocol and build a routing table.

      Meanwhile, when Cisco refers to a "router" in marketing terms, they're talking about a device with a (relatively) powerful CPU, which can not only perform actual routing, but also usually more CPU-intensive inter-network tasks like Netflow and NBAR.

    39. Re:Network meltdown due to hub cross-connects by Anonymous Coward · · Score: 2, Funny

      It's a "cat5 o' NINE tails"

      Which is how you achieve 5 9's reliability, once you take it to your vendor's sales rep.

    40. Re:Network meltdown due to hub cross-connects by autocracy · · Score: 1

      I built a loopback plug. Two wires connecting 1 and 2 to 3 and 6.

      --
      SIG: HUP
    41. Re:Network meltdown due to hub cross-connects by autocracy · · Score: 1

      I built a loopback plug. Two wires that connected 1 and 2 to 3 and 6. Plug it into one port and you're done.

      --
      SIG: HUP
    42. Re:Network meltdown due to hub cross-connects by Anonymous Coward · · Score: 0
    43. Re:Network meltdown due to hub cross-connects by complete+loony · · Score: 1

      We've had some slightly more insidious issues at LAN parties. Where PC's with multiple NIC's and NAT configured were both linked to the LAN and directly together at the same time.

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
    44. Re:Network meltdown due to hub cross-connects by Anonymous Coward · · Score: 1, Informative

      a hub can also be a switch. I have worked with people who referred to both switches and repeaters as hubs

      A repeater is a two-port device, traditionally they are half-duplex but you can find full-duplex versions. The only real use for them is to extend the length of an ethernet span- most ethernet networks don't need repeaters anymore, so other than some special commercial-grade ethernet extenders they're pretty hard to find.

      A hub is simply a multi-port repeater. Also traditionally half-duplex, although you can find them in full duplex now-a-days. Data in one port echos out all other ports. Which makes a mess of your network really fast as you add machines to it. These days they are primarily used as a "poor man's" port-mirror or as paperweights. Most of the ones for sale in retail stores are going to still be a 10 or a 10/100 but in half duplex. Usually if you want full duplex you'll have to order online.

      A switch is a device which will only send the data to the correct port (unless you have another port mirroring it, in which case both get it). Most switches off the shelf are at least 10/100 full duplex, and it's pretty common to see them in 10/100/1000 full duplex these days. And they usually aren't much more expensive than a hub.. so most networks don't bother with hubs at all anymore.

      So you can make a switch operate as a hub if you want, but you can't make a hub act like a switch.

      In addition, managed switches these days often have layer 3 or higher functions built in, so what you get off the shelf might not really be a 'pure' switch. The line between switches and routers is certainly starting to blur these days.

      Cheap deep-packet inspection (using an old hub and Wireshark) ?

      If it's an old hub, see my above point regarding half vs. full duplex. In any event, if you chip out an extra $10 or $15 bucks over a hub you can get a managed switch with port mirroring capability, which will work a lot better and give much more fine-grained control over what traffic you want to sniff. I'm talking maybe 50 bucks max (quick google search shows some cheap full-duplex 10/100 managed switches with mirroring for under $45US)

      DISCLAIMER:

      It's been a while since I saw the CCNA exam, but I think Cisco officially considers the difference between a switch and a hub to be that the hub is half-duplex repeater, and switches are full-duplex single-port forwarding devices. Just be aware that in the real world there is a lot more gray area than on the Cisco exams.

    45. Re:Network meltdown due to hub cross-connects by Anonymous Coward · · Score: 0

      You're right. You cannot do full duplex on a hub. Both packets would need to be sent out on all ports, giving you a nice collision.

      Also, hubs aren't 10/100. You cannot have data coming in at 100 mbit and going out at 10, which would be required in a hub. In a switch, it works fine as long as only part of the 100 mbit data is going to the 10 mbit port, and after that you start dropping packets. Hubs are usually 10 mbit, but do exist in 100 mbit.

    46. Re:Network meltdown due to hub cross-connects by evildarkdeathclicheo · · Score: 2, Informative

      Modern routers are actually switches, not routers. They use packet based switching, not processor based routing like their ancient predecessors. Hell even Cisco tried to fix this when they introduced the GSR (gigabit switch/routers) late last century. It is really "how" these devices direct traffic from one port to the next that defines what they "are", not what OSI layer they operate at. That said, it's still easier for people to understand using the old-school nomenclature.

    47. Re:Network meltdown due to hub cross-connects by AK+Marc · · Score: 1

      10/100 hubs are usually handled by being a two port switch and multi-port hub. Though they contain a switch, the only time they act as such is across the differing speeds.

    48. Re:Network meltdown due to hub cross-connects by Cramer · · Score: 1

      No one makes true hubs anymore. It would very likely be a 10base-something device to boot. Yes, there were (are) 100meg hubs -- there are two sitting in my lab -- but by then switches (aka bridges) were becoming the norm. Today, networking standards don't support such things -- 1G, 10G, 40G, 100G, etc. are all full duplex transports.

    49. Re:Network meltdown due to hub cross-connects by Cramer · · Score: 1

      ... or using a tap from the start would have made this a 5 min job.

    50. Re:Network meltdown due to hub cross-connects by Cramer · · Score: 1

      A L3 switch is a switch that can do IP (IPv6) routing. An unmanaged switch is never a "L3 switch". If your switch has an ip address for management only, then it isn't a L3 switch. There are L3 switches with a few as 4 or 8 ports upwards to hundreds of ports, exactly like L2 switches -- the number of ports means nothing. You'll have to read the specs for the device.

    51. Re:Network meltdown due to hub cross-connects by Cramer · · Score: 1

      You should've left it with the "cow goes moo" definitions. The definition is the definition. Hubs operate at level 1 -- they are, for lack of better term, analog devices. (I have a 10base-2 hub ("repeater") and there isn't a single peice of digital anything in it.) Switches are layer 2 devices -- Frame Relay Switch, ATM Switch, Ethernet Switch, etc. They look at layer 2 information to decide what to do. Routers pay attention to layer 3 information -- IP (v4 or v6), IPX, Appletalk, etc. How they go about do this forwarding is irrelevant to the classification -- 'tho almost all traffic forwarding, be it at layer 2 or 3, is done "in silicon" by specialize processors these days leaving the general purpose cpu to handle configuration and general management. (routing protocols, vlan tables, dot11 authentication, etc.)

      Yes, there are devices that blur the line. But that doesn't change the definitions. Routing happens at layer 3. Switch happens at layer 2. That does not preclude a single device from doing both. What you call such a device is open to marketing meetings. :-) Lots of people call them "Layer 3 Switches"; Nortel calls them "Ethernet Routing Switches".

      BTW, the reason (some) Cisco switches run IOS is for consistency. There was a time when they wanted everything to run IOS. There are still some switches that run CatOS -- a hold over from the purchase of Kalpana(?). The current generation PIX/ASA firewalls have configurations that look very much like IOS.

    52. Re:Network meltdown due to hub cross-connects by Cramer · · Score: 1

      If it's done Right(tm), it should cause a continous collision. Which any good hub sees as a "jabber" condition and disables ("partition") the port. A very cheap "virtual wire" hub on the other hand, jams every port. (and if left like this long enough, might actually burn out.)

    53. Re:Network meltdown due to hub cross-connects by Cramer · · Score: 1

      Dumb, unmanaged switches don't do spanning-tree.

      At any rate, it shouldn't take DAYS to find a single device. Start unplugging cables until the flood stops... follow that cable and repeat.

    54. Re:Network meltdown due to hub cross-connects by Anonymous Coward · · Score: 0

      A hub is an easy way for a administrator, like myself to spy on the traffic in the school I worked at. I just plug in the hub at the point I anted to see, and monitor all the traffic. I was glad I found a hub at my school.

    55. Re:Network meltdown due to hub cross-connects by Geoff-with-a-G · · Score: 1
      I guess we're gonna have to disagree here. "The" definition is not "the definition", since there's many definitions. If you want to appeal to an authoritative definition here, Wikipedia says...

      "The term commonly refers to a network bridge that processes and routes data at the data link layer (layer 2) of the OSI model. Switches that additionally process data at the network layer (layer 3 and above) are often referred to as Layer 3 switches or multilayer switches."

      http://en.wikipedia.org/wiki/Network_switch

      IOS on switches isn't only about consistency (else they wouldn't be rolling out a whole new generation of NX-OS) but rather about adding all the valuable routing and services code they've spent years developing to a wider array of devices.

      I can run BGP and OSPF on a 3560 "switch". You can't tell me that's a pure layer 2 device.

    56. Re:Network meltdown due to hub cross-connects by Cramer · · Score: 1

      IOS on switches isn't only about consistency (else they wouldn't be rolling out a whole new generation of NX-OS) but rather about adding all the valuable routing and services code they've spent years developing to a wider array of devices.

      You must be new, and never talked to Cisco engineers from the period. They were putting IOS on switches long before they had any L3 switches. The L2 hardware wasn't (and still isn't) part of the routing domain. Sure, they can run a routing process, but they don't route. They only have an IP address for management; in a perfect world, that network is isolated for security. The ONLY reason for running IOS on everything was to have the same interface on everything. Instead of having to understand the pix firewalls, the catalyst switches, the ios routers, etc., if they all run IOS, your engineers only have to understand one language. You don't have to have completely independent software development camps. And training becomes a lot less complicated.

      The vision hasn't exactly held together :-) IOS, IOS XR, NX-OS, Linux based ASA's, etc. However, from a bird's eye view, they're configured more-or-less the same way. (it's not as bad as CatOS vs. IOS.)

      [In Cisco circles, mls (multilayer switching) is something very different.]

    57. Re:Network meltdown due to hub cross-connects by X0563511 · · Score: 1

      I didn't mention I had to make the tap on the spot. Meh.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    58. Re:Network meltdown due to hub cross-connects by Geoff-with-a-G · · Score: 1

      I won't burn too much more time trying to assert how many Cisco engineers I've talked to; I'll stick to the technical stuff.

      "Sure, they can run a routing process, but they don't route. They only have an IP address for management" is simply incorrect:
      http://www.cisco.com/en/US/docs/switches/lan/catalyst3560/software/release/12.2_50_se/configuration/guide/swiprout.html

      Do I need to paste command outputs from the Distro switches in my building which are routing between 96 different VLAN interfaces, all with IP addresses on them? Your statements are accurate for 2924/3548 generation switches, but modern "layer 3 switches" are actually layer 3 switches, capable of routing packets across network boundaries.

      If you're referring to the fact that the EARL ASICs on the 6500 supervisors are separate from the MSFC which runs the routing protocols, then that's correct but specific to that platform (and 7600's, which are almost the same hardware). However that's still the routing protocol, not the routing. The PFC lives on the supervisor and contains the Layer 3 forwarding information, thus "routing" those packets (L3 switching really, but you don't seem to believe in L3 switching).

      Classic "mls" is also much in the minority, as nearly all of Cisco's "multilayer switching" is done by CEF these days, not requiring a punt to the MSFC or "router" even for the first packet of a flow.

    59. Re:Network meltdown due to hub cross-connects by Anonymous Coward · · Score: 0

      Ouch, the painful truth of realisation!

  4. Router Plugged Into Itself by Anonymous Coward · · Score: 5, Funny

    Where I work a couple years ago one of the non-technical people decided to plug a router into itself. Ended up bringing down the whole network for ~25 people in a company which depended on the Internet (Internet marketing company).

    Unfortunately one of the tech guys figured it out literally as everyone was standing by the elevator waiting for it to take us home. We were that close to freedom :(

    1. Re:Router Plugged Into Itself by mister_dave · · Score: 1

      Internet marketing company

      Spam?

    2. Re:Router Plugged Into Itself by Anonymous Coward · · Score: 0

      The Question is, why did a nono-person decide to plug the router to anywhere..

    3. Re:Router Plugged Into Itself by X0563511 · · Score: 1

      Nah, those guys just lease dedicated servers until they get an abuse takedown, then move on (or bitch and whine to squeeze that server for all it's worth)

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
  5. Don't try this at work... by alphatel · · Score: 2, Interesting
    • Plug all the ethernet-like T1 cables into a switch
    • Change the administrator password and forget what you changed it to
    • Hang everything off a single power strip, no UPS
    • Buy expensive remote management cards but don't bother to configure them
    --
    When the foot seeks the place of the head, the line is crossed. Know your place. Keep your place. Be a shoe.
    1. Re:Don't try this at work... by v1 · · Score: 3, Interesting

      - run thinnet lines along the floor under people's desks, for them to occasionally get kicked and aggravate loose crimps, taking entire banks of computers (in a different wing of the building) off the LAN with maddening irregularity

      - plug a critical switch into one of the ups's "surge only" outlets

      - install expensive new baytech RPMs on the servers at all remote locations, and forget to configure several of the servers to "power on after power failure".

      - on the one local server you cannot remote manage, plug its inaccessible monitor into a wall outlet

      honorable mention:

      - junk the last service machine you have laying around that has a scsi card in it while you still have a few servers using scsi drives

      --
      I work for the Department of Redundancy Department.
  6. Not using Cisco ACLs by Nimey · · Score: 3, Interesting

    Our entire network was brought down a few years ago when a student plugged a consumer router into his dorm room's port. Said router provided DHCP, and having two conflicting DHCP servers on the network terminally confused everything that didn't use static IPs.

    Took our networking guys hours to trace that one down.

    --
    Hail Eris, full of mischief...

    E pluribus sanguinem
    1. Re:Not using Cisco ACLs by omglolbah · · Score: 3, Insightful

      Amusingly anyone who ever worked as tech crew at a lan party knows that this is the first thing you look for... :p

    2. Re:Not using Cisco ACLs by GuldKalle · · Score: 2, Interesting

      I had that error too, on a city-wide network. The solution? Get an IP from the offending router, go to its web interface, use the default password to get in, and disable DHCP.

      --
      What?
    3. Re:Not using Cisco ACLs by jimicus · · Score: 4, Informative

      Hours?

      You get something on the network which has an IP from the offending DHCP server, use ARP to establish what that DHCP servers' MAC address is then lookup the switches' own tables to figure out which port that MAC is plugged into and switch that port off and wait for the equipment owner to start complaining. Takes about 3-5 minutes to do by hand, and some switches can do it automatically.

    4. Re:Not using Cisco ACLs by Anonymous Coward · · Score: 0

      Uh, it took hours to figure out that was the problem genius.

    5. Re:Not using Cisco ACLs by TangoMargarine · · Score: 1

      Your signature is particularly appropriate in this situation :-)

      --
      Unity? Screw that: XFCE. Slashdot Beta? Screw that: SoylentNews. Australis? Screw that: Pale Moon. UX developers DIAF
    6. Re:Not using Cisco ACLs by X0563511 · · Score: 1

      Hmm. People seem to get an address from one of two subnets, randomly. I wonder what the problem could be!?

      That, and people seem to be afraid of firing up the o'le packet sniffer... it would have been REALLY clear (immediatly) what the problem is, should someone do that.

      If you don't have (or don't know how to make) a passive tap, GTFO.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    7. Re:Not using Cisco ACLs by contrapunctus · · Score: 2, Funny

      I have done this error before :)

      What surprised me was that the linksys router assigned IP numbers up thorough the uplink connection. I thought that was impossible, guess not.

    8. Re:Not using Cisco ACLs by PrimordialSoup · · Score: 0

      hours to figure out..in which room of the dorm it was plugged in !

    9. Re:Not using Cisco ACLs by Gumbercules!! · · Score: 3, Insightful

      I have to agree with this guy. As soon as IP addresses started being assigned incorrectly, the first thing I would be doing is checking the DHCP server. ipconfig /all on a windows box (so may 3 seconds of typing) would give this answer.

      More to the point, though - why was another DHCP allowed on the network? Can your switches not block or refuse to route DHCP traffic from the wrong host?? Otherwise every single student who brings in their own wifi box is going to shut down the network.

    10. Re:Not using Cisco ACLs by eric2hill · · Score: 2, Informative

      Cisco switches have a wonderful feature called dhcp snooping.

      ip dhcp snooping
      Followed by
      ip dhcp snooping trust
      on your port that supplies DHCP to the network. This ensures that only the trusted port can hand out dhcp addresses, and as a bonus, the switch tells you which MAC has which IP.
      show ip dhcp snooping binding

      --
      LOAD "SIG",8,1
      LOADING...
      READY.
      RUN
    11. Re:Not using Cisco ACLs by blair1q · · Score: 4, Insightful

      Or unplug it.

      The slow part is figuring out that that's the problem. The first time it happens to you.

      Which is why it's good to have oldbies around, to whom lots of weird shit has happened.

    12. Re:Not using Cisco ACLs by Nimey · · Score: 1

      *shrug* Most likely they'd never considered a "hostile" DHCP server on the network (lots of other things could have killed the network, so they thought), and had never seen what that looks like.

      OTOH we can't pay very well, so we can't get top-notch talent.

      --
      Hail Eris, full of mischief...

      E pluribus sanguinem
    13. Re:Not using Cisco ACLs by jimicus · · Score: 1

      *shrug* Most likely they'd never considered a "hostile" DHCP server on the network (lots of other things could have killed the network, so they thought), and had never seen what that looks like.

      OTOH we can't pay very well, so we can't get top-notch talent.

      My employer develops router firmware. Our engineers are experts at finding odd ways to kill the network ;)

    14. Re:Not using Cisco ACLs by Darth_brooks · · Score: 1

      That just tells you what it's plugged in to. Doesn't necessarily tell you *where* it is, it just narrows it down. and if you can't disable that switch port remotely....hoo boy...and since it's in a dorm you have the risk of multiple patches in a single room or worse, someone smart enough to say "hey, this doesn't work in my room, lemme try my friend's room down the hall..."

      Goes back to the old line "I've lost a server. Literally lost it. It's up, it responds to ping, i just cant *find* it."

      --
      There are some people that if they don't know, you can't tell 'em.
    15. Re:Not using Cisco ACLs by jimicus · · Score: 1

      You have no business running a network of any significant size without switches you can remotely disable ports on - and hopefully up to date documentation of which physical ports are patched into which switch ports, though I know from experience that such documentation tends to be obsolete before it's even completed.

    16. Re:Not using Cisco ACLs by sumdumass · · Score: 1

      Well, to be fair, this portion of the thread branched off of someone claiming it happened at lan parties. So your right, there was no business, but it's a little difficult to get people to tell you what a MAC address is let alone what their MAC address is before connecting.

      And the biggest offender I have seen in this regard is the MS internet connection sharing BS where someone connects to the same port they would have connected to at home.

    17. Re:Not using Cisco ACLs by fluffy99 · · Score: 4, Informative

      Cisco switches have a wonderful feature called dhcp snooping.

      Not supported on many of the lower end Cisco edge switches. It believe it also interferes with DHCP relaying.

      Another great tool is "ip verify source vlan dhcp-snooping
      " which can be used to block traffic from IPs/macs that did not obtain their IP from the DHCP server. This nicely prevents users from statically assigning addresses and/or spoofing their mac address.

    18. Re:Not using Cisco ACLs by green1 · · Score: 1

      I had a friend years ago (who worked for the local cable company) who had 2 cable modems in the same house, at one point they ended up accidentally configured as follows:
      modem1 -> linksys router -> hub -> modem2
      This ended up causing havoc on the entire cable company's node at the time as the internal DHCP server ended up exposed to the external network and many other customers were reaching it instead of the cable co's server. Apparently his buddy in the cable company's NOC called him up and was quite unhappy with him....

      Now in the defence of cable networks, I believe that they are much more secure than they were at that time (must have been about 13 or 14 years ago) and I suspect a similar trick these days would not result in a similar level of havoc.

    19. Re:Not using Cisco ACLs by green1 · · Score: 1

      He said it was a city-wide network, often it's MUCH easier to do the software fix in such a case as locating the problem, and dispatching someone to the right location to deal with it can take significant amounts of time. That said, if you are in charge of a network of that size you should have the ability to turn off the port that it's connected to (rather a software version of unplugging it) and then you can wait for the idiot to call your help desk and administer the appropriate punishment.

      However more often than not, the person who finds the problem isn't one of the people trusted with such access, nor do they have any meaningful way of contacting anyone who does, or even of being taken seriously by the IT department, leaving their only option being to take matters in to their own hands and doing what the original poster suggested and simply logging in to the device and disabling it in any way they can.

    20. Re:Not using Cisco ACLs by lumbercartel.ca · · Score: 1

      Default router password lists are a very important tool for matters such as this. This is slowly becoming less useful though as more and more users are actually reading the product manuals and changing the administrator password from its default before unintentionally serving DHCP to the internet.

    21. Re:Not using Cisco ACLs by green1 · · Score: 1

      I don't think it has anything to do with reading manuals, manufacturers of such devices now include all sorts of "setup" apps which force users to do this, on the bright side it means fewer unsecured wireless access points lying around (many ship secure by default now) on the down side it means fewer unsecured wireless access points lying around (when you just need your net connection in a strange location!)

      I am finding that on more and more customer owned equipment though that I am no longer able to just sit down and enter "admin" in either the user or password field (or both) and get access. the next step is the manufacturers need to get their setup applications to put a sticker on the router with the settings info, it's all great to be secure, but when the users don't know the settings or password you're back to square one.

    22. Re:Not using Cisco ACLs by Anonymous Coward · · Score: 0

      I had a similar issue once that confused the hell out of me. It was an extremely small (4 employees or so?) healthcare staffing office, and typically speaking the symptom was that the server was not reachable. So of course after seeing that for myself, the first thing I tried was to start up a ping to try to get a handle on where the problem was.

      Okay, the ping isn't getting through so it's time to--what the hell? One got through. But now it stopped. And... okay, another one got through.

      Well son of a bitch. I was pretty confused at that point. Did the obvious things of course; checked IP addresses, etc. Long story short whatever piss-ass NICs they had bought did not assign any MAC addresses. Two of the NICs on their network (including the one in their server) had an address of 00:00:00:00:00:00. I just arbitrarily assigned some (which is probably not the world's greatest approach) and went about my day.

    23. Re:Not using Cisco ACLs by socsoc · · Score: 1

      I've done this before without any ill results. What consumer router provides DHCP to WAN?

      Or better, what university thought it'd be a good idea to pass DHCP from ports dedicated to students?

    24. Re:Not using Cisco ACLs by Nimey · · Score: 1

      Couldn't tell you what kind of router that was - the student unplugged it before our networking guy tracked down which port it was on.

      As to the other thing, it was more a crime of omission. One supposes that our Cisco kit will default to passing any packet, including DHCP, and our guys didn't know how to block that.

      --
      Hail Eris, full of mischief...

      E pluribus sanguinem
    25. Re:Not using Cisco ACLs by nametaken · · Score: 1

      For all we know the student plugged in the LAN side.

      The network should have prevented this anyway.

    26. Re:Not using Cisco ACLs by niteshifter · · Score: 1

      ...

      Another great tool is "ip verify source vlan dhcp-snooping " which can be used to block traffic from IPs/macs that did not obtain their IP from the DHCP server. This nicely prevents users from statically assigning addresses and/or spoofing their mac address.

      The first - locking IP addys - is always true. The second is not when the physical port can be accessed.

    27. Re:Not using Cisco ACLs by Anonymous Coward · · Score: 0

      I haven't seem a Cisco managed switch (not counting thier Linksys stuff) made in the past 5 years that doesn't have DHCP snooping. Maybe longer.

      DHCP snooping doesn't interfere with relaying. You just need to make the uplink port as being trusted (can't remember the command off the top of my head), so you know only allow DHCP responses from that one port.

    28. Re:Not using Cisco ACLs by Anonymous Coward · · Score: 0

      Generally, no. Traditional switches are strictly layer-two devices; which means they don't know the packets they're carrying contain IP (let alone recognising DHCP in particular). Modern Cisco switches (apparently) have such 'look inside the packet' type features, however.

    29. Re:Not using Cisco ACLs by eric2hill · · Score: 1

      That used to be true on the older 1900 gear, but all the new 2960's have it. Even the 3500's had it in the later CatOS stack. It doesn't interfere with relaying as long as you're trusting the correct port upstream. You can even configure a router to relay DHCP requests to remote DHCP servers and PXE servers at the same time for remote-boot services without onsite servers.

      In any case, many of the network-based threats that are out have mitigation procedures available, as long as someone's willing to spend the time/effort/energy/money to implement them. It all winds up being a balance between reliability and cost, as usual.

      --
      LOAD "SIG",8,1
      LOADING...
      READY.
      RUN
    30. Re:Not using Cisco ACLs by ZorinLynx · · Score: 1

      Shouldn't take hours to find this.

      1. Fire up a network sniffer on a test machine.
      2. Renew the DHCP lease. Two responses will come in, the correct one and the one from the rogue server.
      3. We now have the hardware address of the bad server. Start at the core switch and follow the ports that have that address on them until you get to the offending end station.
      4. Disable the port on the switch.
      5. Kill* the user who plugged in the offending device.

      * - Punishment may vary, but death is surely the best way to keep the problem from happening again, and to set an example for the rest of the lusers. ;)

    31. Re:Not using Cisco ACLs by ZorinLynx · · Score: 1

      I should add that THIS is why network switches should NEVER use DHCP to obtain their management IP addresses. I see this practice frequently and it makes me want to smack around the admin and teach them some common sense.

      Running around the building with a laptop because your switches can't obtain a DHCP lease is something no one should ever have to do!

    32. Re:Not using Cisco ACLs by jackbird · · Score: 1

      Verizon FiOS routers put the defualt (unique and reasonably secure) SSID and WEP key on the label next to the S/N

    33. Re:Not using Cisco ACLs by dbIII · · Score: 1

      Then you go to the next switch, and then the next in the chain etc. With old gear you are better off just walking around looking at what is plugged into the wall sockets if you've never seen that MAC address before. Or to save even more time, just walk over to where the softwae developers are, see which one looks guilty, then look under his desk.

    34. Re:Not using Cisco ACLs by dbIII · · Score: 1

      Can your switches not block or refuse to route DHCP traffic from the wrong host

      In the past only expensive switches could do this. Now mid-range switches can do it. Even now it's still a situation where if you can trust your staff or can put up with an outage from this every year ot two that you can get around twice the number of ports on dumb switches for the same money.
      If you only have a handful of things on dynamic addresses the disruption can be minor, paticularly if you firewall off potential troublemakers (the second outage was only one developer disrupting one other laptop).
      If I was in the situation of the admin described I would have attempted to migrate as much as possible to static addresses until a budget was available to upgrade the switches, but it would probably be low on the list of proirities. It's not something you expect to happen very often.

    35. Re:Not using Cisco ACLs by Darth_brooks · · Score: 1

      Of course not, but I've gone in to plenty of situations where a site has grown up around oddball hardware. It's nice when you've got a free hand to design everything in your own image from the ground up, but 99 times out of 100 you're walking in to someone elses career of mistakes.

      --
      There are some people that if they don't know, you can't tell 'em.
    36. Re:Not using Cisco ACLs by Anonymous Coward · · Score: 0

      DHCP servers have changed recently to by default ignore requests for the wrong IP. Before that change, default was to send a DHCPNAK reply to those requests, and when both DHCP servers do this, nobody is getting an IP.

      They will then keep using their old IP (if they have one), until the lease expires. So the result is (or was) that after a while, machines start dropping off the network one by one, or even falling back to the 169.254 (or whatever) autoconfiguration subnet.

    37. Re:Not using Cisco ACLs by X0563511 · · Score: 1

      Ick... in that case, this doesn't sound like fun.

      Still, a packet sniffer would solve the mystery in seconds. People really should stop being afraid of them.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    38. Re:Not using Cisco ACLs by Cramer · · Score: 1

      see which one looks guilty

      Indeed. This fondly reminds me of finding a warez server on the corp. lan. From looking at the traffic graphs (mrtg) to standing at his desk... under 15 minutes. The machine in question... sitting on the desk across from his minus a 3com nic and a WD hard drive. He had "gone for a smoke." "Send him to my desk when he gets back," as I walk away with the still hot mini-tower.

      (Moral: if you're going to run a warez server, you might want to talk to the admin who's going to notice a 5000% increase in traffic.)

  7. Quad Graphics 2000 by Anonymous Coward · · Score: 5, Interesting

    In the summer of 2000 I worked at Quad/Graphics (printer, at least at that time, of Time, Newsweek, Playboy, and several other big-name publications). I was on a team of interns inventorying the company's computer equipment -- scanning bar coded equipment, and giving bar codes to those odds and ends that managed to slip through the cracks in the previous years. (It's amazing what grew legs and walked from one plant to another 40 miles away without being noticed.)

    One of my co-workers got curious about the unlabeled big red button in the server room. Because he lied about hitting it, the servers were down for a day and a half while a team tried to find out what wiring or environmental monitor fault caused the shutdown. That little stunt cost my co-worker his job and cost the company several million dollars in productivity. It slowed or stopped work at three plants in Wisconsin, one in New York, and one in Georgia.

    The real pisser was the guilty party lying about it, thereby starting the wild goose chase. If he had been honest, or even claimed it was an accident, the servers would have all been up within the hour, and at most plants little or no productivity would have been lost.

    The reality: a 20 year old's shame cost a company millions.

    1. Re:Quad Graphics 2000 by Anonymous Coward · · Score: 3, Insightful

      Why the fuck was the button unlabeled? That's the REAL MISTAKE.

    2. Re:Quad Graphics 2000 by FictionPimp · · Score: 5, Funny

      Well, where I work some maintenance genius decided that the location of the red button (near the entrance door) was too risky. They said people coming in the door could hit it while trying to turn on the lights.

      Their solution? They moved it to behind the racks. So every time I bend down to move or check something I have to be conscious not to turn off the power to the entire room with my ass.

    3. Re:Quad Graphics 2000 by Anonymous Coward · · Score: 1, Informative

      Someone needs a Molly-guard

    4. Re:Quad Graphics 2000 by X0563511 · · Score: 3, Informative

      Hmm, if only someone could invent some kind of cover to prevent accidental use...

      I think a compounding issue is that the facilities guy (or higher up) is a cheapass.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    5. Re:Quad Graphics 2000 by Anonymous Coward · · Score: 0

      What label do you give to a big red button? All I can think of is "Big Red Button - Do Not Press."

    6. Re:Quad Graphics 2000 by Mark+Hood · · Score: 1

      We had a similar one - the doors to the switch room were slightly pushed into the room, so there was a one-foot return either side. The switch to unlock the doors was on one side, and thus invisible if you walked up to the doors and started looking.

      Guess where the emergency power off switch was? If you said 'right beside the door, unlabelled' you win a long weekend rebuilding disks from backups.

      After the 2nd time the cleaner plunger herself into darkness, they added a label, molly-guard and moved the button.

      --
      Liked this comment? Why not buy me something nice
    7. Re:Quad Graphics 2000 by Ben4jammin · · Score: 1

      I may be getting cynical in my old age (about to hit the big four oh)...but if I am troubleshooting this, I am checking for a button press REGARDLESS of what anyone says about not pushing it. I have yet to see a button (labeled or not) that doesn't get pressed at some point. Of course, due to my cynicism I would have had a camera pointed at said button.

    8. Re:Quad Graphics 2000 by Anonymous Coward · · Score: 0

      That little stunt cost my co-worker his job

      What exactly cost him his job? Pressing the button, or lying about it?

    9. Re:Quad Graphics 2000 by Sulphur · · Score: 1

      This is not the button you are looking for. Nothing to press here. Move along.

    10. Re:Quad Graphics 2000 by sjames · · Score: 1

      The real pisser was the guilty party lying about it, thereby starting the wild goose chase. If he had been honest, or even claimed it was an accident, the servers would have all been up within the hour, and at most plants little or no productivity would have been lost.

      It's a combination effect. Part one is a control that has an effect way out of line with it's actuation (just one press and everything goes to hell). Buttons like that need to have a pin to be pulled first, glass to break, something.

      Part two is a culture that places more emphasis on assigning blame and punishing the blamed party than on just solving the problem and getting on with life. He probably covered it up because he thought telling even if it was an accident would get him fired.

    11. Re:Quad Graphics 2000 by KiloByte · · Score: 1

      Hey, that cover is not enough to stop a crafty 3 years old, not just a curious temp.

      As a toddler, at mom's work (kindergarten was only for families of members of the Party...), I turned off a row of disk drives -- of the washing machine sized kind, standing far away from the actual computers. They had such molly guards and had their power switches far up, but it was a matter of standing on my toes and lifting the flap. And I repeated that twice on other drives before getting stopped. And funnily enough, they banned kids from that place only when I did it again on another day :p

      --
      The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
    12. Re:Quad Graphics 2000 by drsmithy · · Score: 5, Funny

      One of my co-workers got curious about the unlabeled big red button in the server room. Because he lied about hitting it [...]

      At a previous job we had one of these (albeit with a "Do not push this, ever" label above it) that did nothing more than set off a siren and snap a photo of the offender with a hidden camera. Much amusement was had by all when some new employee's curiosity inevitably got the better of them.

    13. Re:Quad Graphics 2000 by Anonymous Coward · · Score: 0

      O my God, best post ever man, i almost piss my pants.

      ROFL

      Thanks, im going to do that at my work

    14. Re:Quad Graphics 2000 by afidel · · Score: 1

      Buttons like that need to have a pin to be pulled first, glass to break, something.

      Yep, ours has a Molly cover and then a ring behind the button on the plunger that keeps it from being pressed until you break a row of beads and pull the ring out.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    15. Re:Quad Graphics 2000 by vertinox · · Score: 1

      The reality: a 20 year old's shame cost a company millions.

      No. The person too lazy to label the button cost the company millions of dollars.

      Its like throwing a fish at a bear and not expecting him to eat it.

      Oh and don't just label it "Do not press!" that will just entice someone.

      "Emergency Shutdown" would have sufficed.

      --
      "I am the king of the Romans, and am superior to rules of grammar!"
      -Sigismund, Holy Roman Emperor (1368-1437)
    16. Re:Quad Graphics 2000 by Anonymous Coward · · Score: 0

      It was labelled, it was RED. Red = kill switch.

    17. Re:Quad Graphics 2000 by QuantumBeep · · Score: 1

      Molly?

  8. Obligatory: The Etherkiller by Anonymous Coward · · Score: 2, Funny
    1. Re:Obligatory: The Etherkiller by lumbercartel.ca · · Score: 1

      V.35 Killer

      Taking serial to new extremes. T-1 down and telco says its not their equipment that's at fault? Take matters into your own hands and assure them it's their problem.

      Ha ha, this one's a classic! But nobody would ever do this -- after all, everyone loves their phone company!

  9. From TFA by ep32g79 · · Score: 1

    Sure, technology causes its share of headaches, but human error accounts for roughly 70% of all data-center problems.

    And 70% of all statistics are made up on the spot.

    1. Re:From TFA by Anonymous Coward · · Score: 0

      Sure, technology causes its share of headaches, but human error accounts for roughly 70% of all data-center problems.

      And 70% of all statistics are made up on the spot.

      According to the Uptime Institute, a New York-based research and consulting organization that focuses on data-center performance, human error causes roughly 70% of the problems that plague data centers today. The group analyzed 4,500 data-center incidents, including 400 full downtime events

      70% of all Slashdot readers who make snarky comments don't read the actual article.

    2. Re:From TFA by lumbercartel.ca · · Score: 1

      Assuming this study wasn't entirely automated, what was the margin of error?

  10. Video by AnonymousClown · · Score: 5, Funny
    Here's a video of a tech worker explaining why these things happen.

    It's very disturbing and you'll see why these things happen.

    --
    RIP America

    July 4, 1776 - September 11, 2001

    1. Re:Video by Anonymous Coward · · Score: 0

      In related video :

      http://www.youtube.com/watch?v=F7DYbDoh0R8&feature=related

      Classic.

  11. Our University... by arhhook · · Score: 1

    Our University was brought to it's knees when a student in the residents halls was putzing around and accidentally installed a DHCP server on his box. Because the effects were unknown to the student that installed the DHCP server, it took about a day before they knew what was going on and disabled his switchport on the network.

  12. I got a good one too! by debile · · Score: 1

    Someone plugged an home router into the government office where I was doing consulting. (he wanted a switch to plug a networked printer)

    The router started giving 192.168.x.x IP to everyone on the floor, soon including a few servers (including the Lotus Notes one)

    Took 3 days for the admins to find out the source of the problem and where the router was... abysmal loss of productivity needless to say I gave them a good speech on not routing 192.168 packets on the network and isolating their networks.

    1. Re:I got a good one too! by tagno25 · · Score: 1

      Took 3 days for the admins to find out the source of the problem and where the router was... abysmal loss of productivity needless to say I gave them a good speech on not routing 192.168 packets on the network and isolating their networks.

      The biggest problem there is that the servers where getting their IP from a DHCP server.

    2. Re:I got a good one too! by Bengie · · Score: 1

      or that the servers are not on their own vLAN with an ACL that doesn't block other vLAN's DHCP

    3. Re:I got a good one too! by ledow · · Score: 1

      And that the switches weren't blocking DHCP from anything but the authorised DHCP server, and that it took 3 days to track down a rogue DHCP server (not hard, you usually get the MAC address in seconds, trace that to a port, disconnect the port and see who shouts that their network connection isn't working - if it's a remote switch on the end of that port, go to that switch, rinse and repeat).

      Hell, it would take less that an hour if you just pulled cables at random until that MAC disappeared.

      Like most of the things in the story - incompetent admins and IT setups allow human error to be amplified. Seriously, one of them is basically a hospital network not using spanning-tree.

    4. Re:I got a good one too! by Yvan256 · · Score: 4, Funny

      192.168.x.x? That's amazing. I've got the same IPs on my luggage.

    5. Re:I got a good one too! by Anonymous Coward · · Score: 0

      I happed to be the 256th customer of a website provider, and my website ended up with the x.x.x.255 IP-address. After blaming slow DNS updates for a while I realized the unusual IP-address.

  13. Don't forget accidentally triggering the Halon by RogueWarrior65 · · Score: 1

    Way back in the day at the B.U. computer center, the machine room had an extensive Halon fire system with nozzles under the raised flooring and on the ceiling. Pretty big room that housed an IBM mainframe, about a half dozen tape drives, maybe 50 refrigerator-sized disk drives, racks and racks of magnetic tape, a laser printer the size of a small car, networking hardware, etc. etc. One day, the maintenance people were walking through and their two-way radios set off the secondary fire alarm. At that point, you had about 10 seconds to escape. Watching the security camera video afterward was highly entertaining. One moment you saw the operator standing in front of the consoles and the next you saw him bolting out of the double doors.

    1. Re:Don't forget accidentally triggering the Halon by Anonymous Coward · · Score: 0

      We've had that happen with our CO2 canisters. Somebody set of the smoke detectors and the guard on duty wasn't at his desk to cancel the 30 second countdown, so out went the power and in went all the CO2 gas. Took us 3 hours to cycle the air in the comp room and get everything rebooted again.

    2. Re:Don't forget accidentally triggering the Halon by Anonymous Coward · · Score: 0

      One time, we had a problem with two interns getting trapped in our serving room when one of them managed to trigger the alarm system. Doors locked, air pumped out, replaced by argon gas. Resetting the alarms wasn't a problem, and we managed to evacuate the bodies inside of some carpet rolls before anybody could ask too many difficult questions...

    3. Re:Don't forget accidentally triggering the Halon by Szechuan+Vanilla · · Score: 1

      Bah, amateurs: REAL computer operation personnel can breathe Halon...

      --
      This space intentionally left blank.
    4. Re:Don't forget accidentally triggering the Halon by mrchilly0 · · Score: 1

      We had a tech in one of our hubs go out for a smoke. He came back in, (this is where the details are sketchy) and he thinks he exhaled his last drag as he walked in the hub. An alarm sounded (later found out it wasn't even the smoke detector) and he hit the dump button. Video showed him tripping over cables as he ran to the OPPOSITE exit since that's the side he parked on. He is the only tech that I know in the company that was subjected to weekly drug tests for 3 months.

  14. Re:Video FTW by dsoltesz · · Score: 2, Insightful

    Thank you... you've single-handedly made spending my time on recycled, old digg news completely and totally worth it.

  15. My favourite human error - a true story by Kupfernigk · · Score: 5, Interesting
    This was a server room at an (unnamed) UK PLC. The air conditioning had remote management, and the remote management notified the maintenance people that attention was needed. So someone was sent out, on a Friday afternoon.

    When he arrived, most of the staff had gone home and the skeleton IT staff didn't want to hang around. So, they sent him away on the basis that his work wasn't "scheduled".

    Everybody came back on Monday to find totally fried servers.

    --
    From scarped cliff or quarried stone she cries "A thousand types are gone, I care for nothing, no not one."
    1. Re:My favourite human error - a true story by dirk · · Score: 5, Funny

      I have a better AC story. We had a second AC unit installed in server room, as the first was cranking 24/7 and was just barely keeping up, with the thought that the 2 of them in tandem could handle the load. A few days after it was installed, we noticed the room was hot when we got in in the morning. Not enough to cause alarms, but hotter than it should be. As the day went on, it dropped, so we chalked it up to a one time fluke. This happened a time or 2 more throughout the week, but it always dropped during the day. Finally the weekend came, and it got hot enough to cause an alarm. We got in and the AC units kicked on without us actually doing anything, and the room started to cool down. We called out AC guys and they checked both system and couldn't find anything wrong with either of them. Well, the same thing happened again that night. Finally, someone was there late, trying to see if they could see what was going on. Everything was fine throughout the evening, so they finally decided to leave. Luckily, they noticed as they walked out the door and flipped off the lights that the AC units both turned off. HE went back in to verify, and when he turned the lights back on, the AC units both started again. Turned the lights off, and they both shut off again. The genius (lowest bid) company that we hired to install the new AC unit had wired both units into the wall switch for the lights! So when we were there checking, we had the lights on and everything worked perfectly. We went home for the day and turned off the lights, and the AC units. Needless to say, that company isn't even allowed inside out building anymore!

      --

      "Information wants to be expensive" - Stewart Brand, the same guy who said "Information wants to be free"
    2. Re:My favourite human error - a true story by Linker3000 · · Score: 2, Funny

      More AC fun - all in the same room - as refurbed into a computer room in the 1980s by the in-house maintenance team:

      1) They re-lined the walls, but also boxed in the radiators without turning them off - we had numerous AC engineers turning up and scratching their heads while they re-did their thermal load calculations until we realised our walls were warm to the touch.

      2) They put the AC stat on a pillar by the windows so in the summer, the heat radiation falling on the stat from outside made the AC run harder than required, which we really didn't notice as the room was 'cold', but come winter, the AC often wouldn't kick in until we had the stat relocated.

      Best cock-up I saw was a computer room with a 4ft under-floor void. There should have been a 4 inch void, but there was a major cock-up between architects and builders. The floor panels sat on some spookily-sized pillars (which must have been specially made) and the IT staff actually put some servers under the floor.

      --
      AT&ROFLMAO
    3. Re:My favourite human error - a true story by xenn · · Score: 1

      Needless to say, that company isn't even allowed inside out building anymore!

      freaky man, sounds like a dream I once had.

    4. Re:My favourite human error - a true story by internewt · · Score: 3, Funny

      Best cock-up I saw was a computer room with a 4ft under-floor void. There should have been a 4 inch void, but there was a major cock-up between architects and builders. The floor panels sat on some spookily-sized pillars (which must have been specially made) and the IT staff actually put some servers under the floor.

      Was Nigel Tufnel the architect?

      --
      Car analogies break down.
    5. Re:My favourite human error - a true story by SleazyRidr · · Score: 1

      I skimmed over your comment, I merged UK PLC into KFC, so then your last line read to me as;

      Everybody came back on Monday to find country fried servers.

      I lolled, then went back to reading it properly.

  16. Human error rate by frisket · · Score: 1
    Human error rate is enormously variable, but for infrequently-occurring tasks (those you only do occasionally, not every day), a value of between 1% and 2% is a useful approximation.

    I am fortunate in working in an organisation with perhaps the best and most competent ops manager I have ever worked with, but even with well-written procedures and well-trained ops staff, errors still occur — but very rarely.

  17. cascade failures by Velox_SwiftFox · · Score: 3, Interesting

    How can this leave out the standard cascade failure scenario?

    Trying to achieve redundancy, someone gets what they think is worst-case-30A of servers with multiple power supplies, plugs one power supply on each into one PDU rated 30A, one power supply into the other.

    They may or may not know that the derated capacity of of the circuit is only 24A, the data center is unlikely to warn them as they only appear to be using 15A per circuit at most.

    Anyway, something happens to one of the PDUs and the power is lost from it. Perhaps power factor corrections (remember the derating?) and cron jobs running at midnight on all the servers that raise the load high simultaneously. Maybe just the failure of one of the PDUs that was feared, causing the attempt at "redundancy".

    In any case, all of the load is then put on the remaining circuit, and it always fails. The whole rack loses power.

    1. Re:cascade failures by omglolbah · · Score: 2, Interesting

      Yep, it is one of the specific steps when we define requirements for server racks. Sadly not all the customers pay attention and then yell for us to come fix the mess when they find out years later :p

      This is especially fun if the trip to the "datacenter" involves a helicopter ride to the oil rig where it is located :p

    2. Re:cascade failures by georgewilliamherbert · · Score: 1

      The sad part is, datacenter power people are now on the "avoid stranded power" trip trying to increase power efficiency (UPSes and PDUs running at 80% are much more efficient than those running at 50%). They don't seem to understand or be willing to provision to support one leg actually failing completely.

      They're handling the "one server out of tens has a power supply failure on one leg" failure, but not the "the whole rack flips to only using B power due to X"...

    3. Re:cascade failures by omglolbah · · Score: 1

      I see that too often unfortunately. Mostly early in projects when they are trying to find places to save money... Usually non-technical people asking questions about why the redundancy is so expensive.

      What it usually boils down to for us when it comes to 'selling' proper redundancy is reducing downtime due to maintenance.

      If you have a properly redundant power system you can take down the A side for maintenance or upgrades while B is up etc.

      On a side note.... I take perverse pleasure in doing redundancy and blackout tests :p
      There is something oddly satisifying about flipping a big red switch and watch -nothing- happen but a red line on a screen and a little buzzer going nuts *cackles*

    4. Re:cascade failures by Bengie · · Score: 1

      I see lots of high-end server grade UPSs claiming 97% efficiency at 80%+ load and 95% efficiency at 35%+ load. Is 80% really that great?

  18. Power strips (with on/off buttons) are bad by gavving · · Score: 2, Funny

    So I'm working in this company's datacenter on their networking equipment. But it's installed is such a crappy way that there's a floor tile pulled right next to the rack and the cables are run down into that hole. I'm working around on the equipment and step down into the hole by accident, at that point I notice that it's suddenly alot quieter where I'm standing, I look down and realize I'd just stepped on the power button of a power strip that most of the networking equipment was plugged into. Oh Sh!t. At the time the room was empty except for me, I quickly turn the strip back on. About the time the switches are just finishing coming back up one of the companies IT guys comes in and asks if anything's going on. I look at him a little confused and say "I'm not sure, what's up?". The network's back up by the time they noticed it.... I probably should have admitted it, but no harm, no foul. :)

    1. Re:Power strips (with on/off buttons) are bad by Velox_SwiftFox · · Score: 3, Insightful

      Covering those power strip buttons with a hardened glob fixing them in the "on" position is what an electric glue gun is for.

    2. Re:Power strips (with on/off buttons) are bad by Anonymous Coward · · Score: 0

      Those are surge protectors. They are to stop fires if something is drawing too much power, or if lightning hits they save equipment.

      Please do not disable them.

    3. Re:Power strips (with on/off buttons) are bad by Anonymous Coward · · Score: 2, Informative

      The power switch on an extension strip is not a surge protector. The surge protection mechanism is usually an internal varistor. It's safe to disable the switch mechanism.

      Don't disable the breaker or fuse, however. That's suicidal.

    4. Re:Power strips (with on/off buttons) are bad by gedhrel · · Score: 1

      Agh.

      We've had a number of problems caused by contractors who behave like naughty children in this fashion. So much so that our briefing to people let into the machine room (our estates people will just let contractors in and leave them unsupervised!) includes "if you accidentally hit the big red button (that has a cover over it these days because it is right where you'd expect a lightswitch to be, and we've been stung by that before) or pour your marguerita into the UPS stack, yank out a power cable because you've climbed up a rack ratehr than getting the ladder*, or ANYTHING, do not try to turn stuff back on. Let us know what happened. We are aware that accidents happen and there will be no recriminations for honesty."

      * or the guy with a hacksaw in his hand who says: "whilst I was being careful and not sawing through your fibre runs, I noticed that someone had sawn through your fibre run. Honest." It really is a bloody circus.

    5. Re:Power strips (with on/off buttons) are bad by trapnest · · Score: 1

      Except where the switch also functions as a breaker of sorts. Glueing it closed will not disable the breaker, but it will prevent people from resetting it.

    6. Re:Power strips (with on/off buttons) are bad by Critical+Facilities · · Score: 1

      Those are surge protectors. They are to stop fires if something is drawing too much power, or if lightning hits they save equipment. Please do not disable them.

      If you think those "surge protectors" are going to save your gear from any significant surge, you've got another think coming. Those flimsy "circuit breakers" built in to them are more of a liability than they are a protection. Besides, do you really think a lightning strike is going to get from your electrical primaries, through your transfer switches, through your UPS, several breakers (with trip units), and come all the way through the PDU? Time to lose the power strip, it's bad practice.

    7. Re:Power strips (with on/off buttons) are bad by Anonymous Coward · · Score: 0

      What about hot gluing cards to the motherboard? I guess to prevent unseating.

  19. data centers 101 by ei4anb · · Score: 4, Funny

    Those data centers in the article sound huge, some may even have up to ten servers!

    1. Re:data centers 101 by cowboy76Spain · · Score: 1

      Well, they are those that probably will have less people and with less experience servicing it.... you can try to manage the first couple of servers with some "flexibility"; when you have hundreds of them everything must be done "by the book" or thing go definitely wrong.

      When I got to my current job, a couple of servers (our first rack servers) where installed, and nobody was "in charge" of them. Being myself a guy with initiative, I did the best that I could with them even if I had only experience in programming. The second funny thing I found was that, when one of the mirrored disks failed and I called for a spare, I gave back the good one. The first funny thing is that the backup that the people that did setup the machines didn't really backup anything of importance (it was funny because we found about it just after #2).

      --
      Why can't /. have a rich-text editor? Editing your own HTML is so XXth century.
  20. Oops... by ReederDa · · Score: 1

    You've got to admit, although the results were disastrous, someone will remember this and have a good laugh over it. I am now.

  21. Electrical Contractors by EmagGeek · · Score: 1

    I can definitely relate to that one. I've never had one that didn't try to deviate from plan to increase their profit on the job. I've even seen them put breakers in a panel that weren't connected to anything to make it appear as if they ran the circuit, when all they did was piggyback a circuit on another one to save the cost of running the wire. By the time you find the problem, they're long gone.

    Gotta watch them like a hawk and make sure they do everything they're supposed to do.

    1. Re:Electrical Contractors by John+Hasler · · Score: 1

      > By the time you find the problem, they're long gone.

      That's why payment should not be authorized until the work has been inspected and signed off.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
  22. Mainframe days story by assemblerex · · Score: 5, Interesting

    The old tape machines (six foot tall) used to put out a tremendous amount of heat. Space is at a premium, so in the mainframe room the drives were normally put edge to edge,
    with one pushing air in and the other pulling air out. The machines had two 10-12" fans per unit, so stacking two or three units was fine. One site had so many machines side to
    side (over 7), the air coming out the last machine regularly set things on FIRE. It was not uncommon for the machine to ignite lint going through the stack, with it coming out the
    end as a small explosion like dust in a grain silo explosion. A fire extinguisher was kept on hand, and the wall eventually got a stainless steel panel because it was so common.

    1. Re:Mainframe days story by Idarubicin · · Score: 1

      One site had so many machines side to side (over 7), the air coming out the last machine regularly set things on FIRE. It was not uncommon for the machine to ignite lint going through the stack, with it coming out the end as a small explosion like dust in a grain silo explosion. A fire extinguisher was kept on hand, and the wall eventually got a stainless steel panel because it was so common.

      I call BS.

      Thermodynamics 101: If the air coming out of the last unit is hot enough to ignite things, then what is the minimum temperature of the stuff inside?

      I can maybe believe that there was some sort of electrical fault inside that was infrequently arcing (maybe when a dust bunny passed through the fans?) and that might have caused the apparent problem. But there's no way to have functional electronics that are hot enough to ignite organic matter.

      --
      ~Idarubicin
    2. Re:Mainframe days story by Cramer · · Score: 1

      The air setting things on fire, I doubt. However, the components being cooled certainly can get hot enough to spark lint. It's a Bad Idea(tm) but it can happen.

    3. Re:Mainframe days story by Cramer · · Score: 1

      This sounds a lot like the MaxTNT story at PSINet... those things are left-right vented. So a few rows of those things can stack a great deal of heat across a room. Their solution was bits of cardboard to deflect the air.

      I can personally atest to the chimney effect with a rack of USR Total Control modem shelves. The air into the bottom one was 20C; the air at the top (through 7 shelves) was 38C. Bottom line... don't stack them on top of each other, and don't put that many in a single rack.

  23. Don't forget the classics by coryking · · Score: 0

    I've seen a network brought down when a student (or employee) plugged their toy windows 2000 server into the campus network. Said "server" was configured as a domain controller (or whatever they called it before active directory, it's been a while). Toss in DHCP and their box got DOS'd as the entire campus tried using them for authentication.

    Good times. Can you even do that kind of thing these days?

    1. Re:Don't forget the classics by Ben4jammin · · Score: 1

      Yes, you can. Unfortunately the concept of broadcast traffic is well beyond some people who should know better. DHCP (with or without Active Directory) can make the network look "broken" to the average user.

      I work at a college and we have classrooms devoted to MS classes covering such things. So, we put small routers in the server room to serve just these specialty rooms and block said broadcast traffic (such as DHCP) from affecting the rest of the student network. Invariably, some instructor vastly overestimates his/her understanding of networking and removes the classroom from the router and puts it on the student network. Next time they or a student fires up a server, the entire student network is inoperable due to incorrect gateway and DNS settings.

      Even with such history, the instructors are allowed to have Router/WAPs with no input from IT. So it goes like this: 1) Router/WAP delivered to Campus A...within hours the student network at Campus A is "down" 2) Router/WAP delivered to Campus B...within hours the student network at Campus B is "down" 3) Router/WAP delivered to Campus C...within hours the student network at Campus C is "down"

      Of course each occurrence means someone from IT has to drop what they are doing and make (often an after-hours) trip to said campus to fix it. You would think technology instructors would be smart enough to figure out: the network was working, I plugged something in, the network stopped working...maybe I should unplug it? But sadly, they don't figure that out. In a related story, IT now has a supply of company bought WAPs sitting in a closet.

    2. Re:Don't forget the classics by fluffy99 · · Score: 1

      I've seen a network brought down when a student (or employee) plugged their toy windows 2000 server into the campus network. Said "server" was configured as a domain controller (or whatever they called it before active directory, it's been a while). Toss in DHCP and their box got DOS'd as the entire campus tried using them for authentication.

      Good times. Can you even do that kind of thing these days?

      Sure you can still do that today. All it takes is a poorly setup network and clueless sysadmin, just as it did back then.

      The damage of a rogue DHCP server isn't an instantaneous problem everyone describes it, as clients normally go try to go back to the previous DHCP server to renew their lease. Many OSs (though not all) will remember that server through a reboot, but most devices don't and they will disappear as they get rebooted.

    3. Re:Don't forget the classics by dbIII · · Score: 1

      Sure you can still do that today. All it takes is a poorly setup network and clueless sysadmin

      You don't have to be clueless, all it takes is having old equipment or cheap equipment. It's not the sort of thing that should bring a network down for hours if anyone with a clue is on site.

    4. Re:Don't forget the classics by Anonymous Coward · · Score: 0

      Funny sounds like my college district: Is this Junior or Upper Division, and Central California or elsewhere? :)

      I just finished up an automotive and a computer programming degree, and you know what the scary part was? The auto degree had *MUCH* more competent individuals. The other amusing part? The teachers with the lowest comptencies had the highest degree levels (IE Auto teachers were mostly non-college graduates being 'grandfathered in' with the requirement they be actively working on a degree and have it within the next 4-8 years. Computer teachers for their part had to already have them, and were required to have a master's or above to teach programming.) However the most technically competent teachers for the most part had the least formal education, and in the few cases where their competency was matched by a degree they'd had a decade of work in some high stress field and 'retired' to teaching because they thought of it as a more enjoyable and productive field than their former one.

  24. FedEx, get insurance/ship your server by AnAdventurer · · Score: 3, Interesting

    When I was IT manager for a big retail mfg we had a cross-country move from the SF bay area to TN (closer to shipping hubs and lower tax rates). I was hired for the new plant, and I was there setting up everything (I did not know the company knew next to nothing about technology) and the last thing shipped before the company shutdown for the move was ship the data server via 2 day FedEx. The CFO packed it up and shipped it out, as the driver pulled away from the bay the server fell off the bumper and onto the cement. They picked it up (looking undamaged in it's box). When I opened it there was a shower of parts. A HD drive had detached from the case but not the cable and had swung around in that case like a flail. CFO had NOT INSURED the shipment or taken anything apart. That and much more to save $50 here and there.

    --
    6.8SPC TR of 550, l xwind at 6, drift rt at 26" drops 77". AT has 503 ft-lbs at 1403 fps. FT 0.86
    1. Re:FedEx, get insurance/ship your server by sjames · · Score: 1

      I can understand not insuring it though. I shipped a 1U fully insured. Double boxed w/ foam inserts and all. It arrived at it's destination in a different box. The back was caved in exactly as would be expected if it were gored w/ a forklift. They refused to pay claiming it was "improperly packed".

      In the case you talk about, they SHOULD have paid anyway since they failed to take even minimal care.

    2. Re:FedEx, get insurance/ship your server by Anonymous Coward · · Score: 0

      We had a server that we shipped to one of our satellite locations after being configured at HQ, and one of the techs were supposed to follow a few days after for installation. We insured the package for the cost of the server; however much we paid (vendor) for it. Our (shipping carrier) dropped it off one of their conveyer belts. The front right side of the 5U chassis was messed up, and (vendor) declared the warranty void so we couldn't use it in production even if it did turn on. At that point we realized that we'd have to schedule another trip out at an additional expense. Lesson learned - Insure your packages for not only the value of the materials, but for whatever additional cost you'll incur for having to perform the replacement (airfare, hotel, etc).

    3. Re:FedEx, get insurance/ship your server by Anonymous Coward · · Score: 0

      The shipping company would likely deny a claim for more than the value of the sever.

    4. Re:FedEx, get insurance/ship your server by dbIII · · Score: 1

      I have an almost identical story - was it FedEx for you too? In my case they refunded the shipping to the guys that sent it but I still had a forked server - functioning but with a bent front plate and zero resale value.

    5. Re:FedEx, get insurance/ship your server by sjames · · Score: 1

      It was. Unfortunately, this one wasn't at all salvageable, the mainboard was broken and the whole case was warped and bent too badly to even put another board in.

      There have been other incidents with other shippers. The one common thread is that no matter how obviously they mistreated the package, they NEVER apologized and paid up on the insurance.

    6. Re:FedEx, get insurance/ship your server by pnutjam · · Score: 1

      Servers always have zero resale value.

  25. How about... by denzacar · · Score: 1

    ICBM Launch Control - Moscow, Leningrad, Novosibirsk
    WAIT FOR PRESIDENT'S ORDERS BEFORE PRESSING

    Deathly silence after someone does press the button should be adequate punishment.
    Naturally, potential super-criminals, James Bond villains and right-leaning survivalist nationalist employees should be explained button's real purpose to avoid accidents caused by someone deciding to rid the world of communism during their lunch break.

    --
    Mit der Dummheit kämpfen Götter selbst vergebens
  26. DC close to water by jalewis · · Score: 0

    I worked in a datacenter that was two blocks from the harbor. The datacenter is on the second floor, but what the hell do you do if you're in the building and there is a flood, or if you're at home and have to get to the DC? It reminds me of New Orleans, but that didn't stop them from building it.

    1. Re:DC close to water by Dynedain · · Score: 1

      Honestly if there's a flood that prevents you from getting into the building, then the datacenter will probably be down anyways. All those underground telecom switching stations and power stations don't hold up well in flood conditions.

      --
      I'm out of my mind right now, but feel free to leave a message.....
  27. Big Red Button Story by trydk · · Score: 1

    Many years ago I worked at a mainframe installation (IBM S/360 to give you an idea of my age ;-). The computer was installed at the back of a huge room with plenty of space for expansion. For some incomprehensible reason BRBs (Big Red Buttons) were placed along the skirting board every ten feet or so, which had hitherto not been a problem -- with all the space nobody came near during daily (and nightly) operations.

    Every morning at around two AM a guy came with a load of cassettes containing cheques from the banks for clearing. He usually just opened the door to the room and shoved each cassette in to slide, like curling stones, across the floor to the cheque sorter.

    And one morning, well ... A cassette decided to slide all the way across the room and unerringly triggered one of the BRBs square on. Half a night's work to be redone.

  28. Data center power by PPH · · Score: 3, Interesting

    Back when I worked for Boeing, we had an "interesting" condition in our major Seattle area data center (the one built right on top of a major earthquake fault line). It seems that the contractors who had built the power system had cut a few corners and used a couple of incorrect bolts on lugs in some switchgear. The result of this was that, over time, poor connections could lead to high temperatures and electrical fires. So, plans were made to do maintenance work on the panels.

    Initially, it was believed that the system, a dually redundant utility feed with diesel gen sets, UPS supplies and redundant circuits feeding each rack could be shut down in sections. So the repairs could be done on one part at a time, keeping critical systems running on the alternate circuits. No such luck. It seems that bolts were not the only thing contractors skimped upon. We had half of a dual power system. We had to shut down the entire server center (and the company) over an extended weekend*.

    *Antics ensued here as well. The IT folks took months putting together a shut down/power up plan which considered numerous dependencies between systems. Everything had a scheduled time and everyone was supposed to check in with coordinators before touching anything. But on the shutdown day, the DNS folks came in early (there was a football game on TV they didn't want to miss) and pulled the plug on their stuff, effectively bringing everything else to a screeching halt.

    --
    Have gnu, will travel.
    1. Re:Data center power by thegarbz · · Score: 2, Interesting

      Basic rules of redundancy. A UPS isn't!

      We had a similar situation to yours except we actually had a dual power system. The circuit breakers on the output however had very dodgy lugs on their cables which caused the circuit breakers to heat up, A LOT. This moved them very close to their rated trip current. When we eventually came in to do maintenance on one of the UPSes we turned it off as per procedure, naturally the entire load moved to the other. About 30 seconds later we hear a click come from a distribution board on the wall, and suddenly refinery operators were shouting panicked abuse through the 2ways to turn the damn thing back on.

      These UPSes fed the emergency shutdown system of an oil refinery. Operators don't like their naps interrupted.

    2. Re:Data center power by PPH · · Score: 1

      Basic rules of redundancy. A UPS isn't!

      Particularly when management looks at the loading on each UPS and rejects your request for funding for larger units because each is only loaded to 60% of capacity.

      But then this is true of everything from data centers to high voltage transmission lines. They start out designed with a capacity margin adequate for emergencies. But then management insists that they be run right up to their maximum thermal capacity under normal load conditions. And then something happens.

      --
      Have gnu, will travel.
  29. This Simply Demonstrates ... by smpoole7 · · Score: 1

    ... that an idiot with his/her hand on a switch, a breaker or a power cord is more dangerous than even the worst computer bug.

    (Judging from the houses that I see on my way to work each morning, some people shouldn't even be allowed to buy PAINT without supervision. And we provide them with computers and access to the Internet nowadays!)

    (If that doesn't terrify you, you have nerves of steel.)

    --
    Cogito, igitur comedam pizza.
  30. web guy vs sales dude by nuonguy · · Score: 1

    http://www.youtube.com/watch?v=7wRxASytPuQ is the most common reason servers go down. Come on, show of hands, how many of you have been a part of a scenario like this?

  31. Ethernet routing loops FTL by Just+Brew+It! · · Score: 1

    We've had multiple incidents nearly identical to one of the stupid tricks described in the article. One of our (former) techs had a habit of running two cables between the same pair of switches... or even plugging both ends of a single cable into the same switch! Needless to say, neither of these scenarios ends well.

    1. Re:Ethernet routing loops FTL by aXis100 · · Score: 1

      1) They are switching loops, not routing loops.
      2) If you enable spanning tree on all port, you wont have any problems - in fact you can then make multiple connections for redundancy.

  32. Big red button: a true story by eparker05 · · Score: 2, Informative

    My mother, who is a database admin for a county office (and has been for a long time), was getting a tour of a brand new mainframe server in the basement of her department's building back in the early 80's. At some point during the tour a large red button was pointed out that controlled the water-free fire suppression system. When pressed it activated a countdown safety timer that could be deactivated when the button was pulled back out.

    Always wanting to try things for herself, she went to the red button at the end of the tour and pressed it. No timer was activated, instead a noticeable shutting down sound was heard as the buzzing of the mainframe died down. She accidentally hit the manual power-off button for the mainframe which was situated very close to the fire suppression button and happened to look similar.

    All the IT staff of that building got to go home early that day because the mainframe took several hours to reboot and it was already lunch. She was very embarrassed and I have heard that story many times.

  33. Ah, the memories! And lessons, too. by martyb · · Score: 5, Funny

    Ah, the memories! Here are some of the stories I've heard and or witnessed over the years.

    1. Orientation: As a co-op student at DEC in 1980, I was told this (possibly apocryphal) story. On seemingly random occasions, a fixed-head disk drive would crash at the main plant in Maynard, Massachusetts. Not all of the drives, just a couple. Apparently the problem was isolated when someone was midway between the computer room and the loading dock. They heard the bump of a truck backing hard into the loading dock followed very shortly by a curse from the computer room! It apparently caused enough of a jolt to cause platters to tilt up and hit the heads... but only on the drives which were oriented north-south; those oriented east-west were not affected. So came the directive that all drives, henceforth, needed to be oriented north-south.
    2. Hot Stuff: Seems that a mini-computer developed a nasty tendency to crash in the early afternoon. But only on some days. Diagnostics were run. Job schedules were checked and evaluated. All the software and hardware checked out A-OK. This went on for quite a while until someone noticed that there was a big window to the outside and that in the early afternoon the sun's light would fall upon the computer. This additional heat load was enough to put components out of expected operational norms and caused a crash.
    3. Cool!: A friend of mine was a field engineer for DEC back in the day when minicomputers had core memory. He was called into a site where their system had some intermittent crashes. He ran diagnostics. All seemed to be within spec. He replaced memory boards. Still crashed. Replaced mother boards. Reloaded the OS from fresh tapes. Still crashed. He finally noticed that one of the fans on the rack was not an official DEC fan. Though it WAS within spec for airflow and power draw, it was NOT within spec for magnetic shielding... it would sporadically cause bit flips in the (magnetic) core memory. Swapping out the fan solved the problem.
    4. This sucked: Another place had a problem with a computer that would sometimes crash in the early evening after everyone went home for the day. Well, not everyone. The cleaning staff apparently noticed a convenient power strip on a rack and plugged their vacuum cleaner into it. The resulting voltage sag took down the server!
    5. Buttons: Every couple years, IBM would hold an open house where anyone in the community could come in and get a tour of the facility (Kingston, NY). This was back in 1984, IIRC. PCs were just starting to make an impact at this time... big iron was king. We're talking about a huge raised-floor area with multiple mainframes, storage, tape drives... MANY millions of dollars per system. A few hundred users on a system was quite an accomplishment back then and these boxes could handle a thousand users. We were also in the midst of a huge test effort of the next release of VM/SP. I had come in that Sunday afternoon to get several tests done (death marches are no fun). All of a sudden the mainframe I was on crashed. Hard. I'd grown accustomed to this as we were at a point where we were "eating our own dog food"; the production system was running the latest build of the OS. But, an hour later and it was STILL down. Apparently, a tour guide had led a group to one of the operator consoles and a child could not resist pressing buttons. Back in those days, booting a mainframe meant "re-IPL" Initial Program Load. Unless the computer was REALLY messed up and wouldn't boot. Only then would someone re-IML the system. Initial Microcode Load. Guess which button the kid pressed? It left the system in such a wonky state that it had to be reloaded from tape. All the development work of that weekend was lost and had to be recreated and rebuilt. (It was a weekend and backups were only done on weekday nights.) It took us a week to get things back to normal.
    6. Drivers: A friend of mine at IBM told me of an
  34. Obligatory by garyisabusyguy · · Score: 3, Funny
    --
    Wherever You Go, There You Are
    1. Re:Obligatory by lumbercartel.ca · · Score: 1

      That was hilarious. Will there ever be a sequel?

    2. Re:Obligatory by IMightB · · Score: 1

      Here's the orig WWW site. Bonus it appears that they have been busy lately and have 2 more episodes avail.

      http://www.thewebsiteisdown.com/

  35. Washer in the UPS by Bob9113 · · Score: 4, Interesting

    My favorite was at a big office building. An electrician was upgrading the fluorescent fixtures in the server room. He dropped a washer into one of the UPSs, where it promptly completed a circuit that was never meant to be. The batteries unloaded and fried the step-down transformer out at the street. The building had a diesel backup generator, which kicked in -- and sucked the fuel tank dry later that day. For the next week there were fuel trucks pulling up a few times a day. Construction of a larger fuel tank began about a week later.

    1. Re:Washer in the UPS by Cramer · · Score: 1

      I recall the story of the telco electricians "losing" a screw driver. They dropped it (accidentally) into a 400A(?) -48VDC distribution panel. They couldn't even find the handle. And it didn't trip anything. :-)

  36. Know your colo contracts by 1984 · · Score: 2, Interesting

    I had one a few years back which highlighted issues with both our attention to the network behavior, and the ISP's procedures. One day the network engineer came over and asked if I knew why all the traffic on our upstream seemed to be going over the 'B' link, where it would typically head over the 'A' link to the same provider. The equipment was symmetrical and there was no performance impact, it was just odd because A was the preferred link. We looked back over the throughput graphs and saw that the change had occurred abruptly several days ago. We then inspected the A link and found it down. Our equipment seemed fine, though, so we got in touch with the outfit that was both colo provider and ISP.

    After the usual confusion it was finally determined that one of the ISP's staff had "noticed a cable not quite seated" while working on the data center floor. He had apparently followed a "standard procedure" to remove and clean the cable before plugging it back in. It was a fiber cable and he managed to plug it back in wrong (transposed connectors on a fiber cable). Not only was the notion of cleaning the cable end bizarre -- what, wipe it on his t-shirt? -- and never fully explained, but there was no followup check to find out what that cable was for and whether it still worked. It didn't, for nearly a week. That highlighted that we were missing checks on the individual links to the ISP and needed those in addition to checks for upstream connectivity. We fixed those promptly.

    Best part was that our CTO had, in a former misguided life, been a lawyer and had been largely responsible for drafting the hosting contract. As such, the sliding scale of penalties for outages went up to one-month free for multi-day incidents. The special kicker was that the credit applied to "the facility in which the outage occurred", rather than just to the directly effected items. Less power (not included in the penalty) the ISP ended up crediting us over $70K for that mistake. I have no idea if they train their DC staff better these days about well-meaning interference with random bits of equipment.

    1. Re:Know your colo contracts by Jeremy+Erwin · · Score: 2, Informative

      Not only was the notion of cleaning the cable end bizarre -- what, wipe it on his t-shirt? -- and never fully explained,

      There are in fact, standard procedures for cleaning fibre optic cable.

    2. Re:Know your colo contracts by Jayfar · · Score: 2, Informative

      After the usual confusion it was finally determined that one of the ISP's staff had "noticed a cable not quite seated" while working on the data center floor. He had apparently followed a "standard procedure" to remove and clean the cable before plugging it back in. It was a fiber cable and he managed to plug it back in wrong (transposed connectors on a fiber cable). Not only was the notion of cleaning the cable end bizarre -- what, wipe it on his t-shirt? -- and never fully explained, but there was no followup check to find out what that cable was for and whether it still worked. It didn't, for nearly a week.

      Actually there's nothing odd about cleaning a fiber connection at all and it is a very exacting process (see link below). Apparently exacting in this case just didn't include re-inserting the ends in the right holes.

      Inspection and Cleaning Procedures for Fiber-Optic Connections
      http://www.cisco.com/en/US/tech/tk482/tk876/technologies_white_paper09186a0080254eba.shtml

    3. Re:Know your colo contracts by 1984 · · Score: 2, Informative

      That's what I was getting at -- it's not as if it's a simple case of blowing on the end to clear out some fluff. Detailed procedures, including not least unplugging the other end of said cable to make sure it's unlit, which would include finding said other end. And likely go and get various the items required for the cleaning procedure. Which would add up at least to a conversation or two, and perhaps one with us the customer discussing the topic. I'm not disagreeing with cleaning of fiber cables sometimes being necessary, but I didn't for a moment believe all that had actually gone on.

  37. Fun with PIX by mkiwi · · Score: 3, Insightful

    I had fun with a company awhile back. They are about 300 employees and ~90mil/year, so this is a small corporation.

    Anyway, the company was trying to get a VPN tunnel established to their China office, and they were having a hell of a time at it. The employees on the China side had no IT experience so everything was done remotely.

    It just so happens that one of the Chinese employees was recruited to make a change to the PIX firewall on the China side in order to get everything working. To our astonishment, it worked, and we had a secure VPN tunnel established.

    The problem was accounts in the US started to get locked out, alphabetically, every 30 minutes. Our Active Directory was getting tons of password crack attempts from inside our internal network. I was using LDAP to develop an application at the time, so naturally I was suspect for causing all these lockouts.

    Fast-forward a week. We look at the configuration of the Chinese firewall and it allowed all access from any IP address on the Chinese side. In other words, crackers were trying to get into our systems through our VPN tunnel in China. In effect, our corporate LAN had been directly connected to the Internet. Once we figured that out, I was free to go back to work and the network lived to see another day, but that incident caused major trouble for all our employees.

    Moral of the story: Don't trust a Chinese firewall.

  38. A year or so ago by Naznarreb · · Score: 1

    A year or so ago my company's entire network nation wide was taken down for several hour by a single misconfigured router in Texas.

  39. None of us are innocent. by BrokenHalo · · Score: 3, Interesting

    Good judgement comes from experience. And most experience comes as a result of bad judgement.

    Just about anyone who has been in the line of fire as sysadmin for long enough will recall some ill-concieved notion that caused untold trouble. Since my earliest experience with commercial computers was in a batch-processing environment, my initial mishaps rarely inconvenienced anybody other than myself. But I still recall an incident much later (early '90s) when I inadvertently managed to delete the ":per" directory on a Data General mainframe (more or less equivalent to /dev on a *nix box), then having to watch for about 45 minutes while my users' PIDs disappeared. I'll never forget that red-faced moment of knocking on my boss's door and letting him know he might want to leave his phone off the hook for the next hour...

    1. Re:None of us are innocent. by Helen+O'Boyle · · Score: 3, Interesting

      Good post title, BrokenHalo. I'll chime in with my two. 1987, my first full time job. I was a small ISV's UNIX guru. I wanted to remove everything under /usr/someone. I cd'd to /usr/someone and typed, "rm -r *", then I realized, hey, I know that won't get everything, better add some more, and the command became, "rm -r * .*". I realized, oh, no, this'll get .. too, so I better change it to: "rm -r * .?*". It took about 12 microseconds after I hit enter to realize that ".?*" still included "..". Yes, disastrous results ensued, even though I was able to ^C to avoid most of the damage, and I had the backup tape (back in the day, we used reels) in the tape drive just as users (other devs) began to notice that /usr/lib wasn't there. Yep, I have my own memories of red-facedly telling my boss, "oops, I did this, I'm in the process of fixing it now. Give me half an hour." In the future, "rm -r /usr/someone" did the trick nicely. Early 1990's, I was consulting in the data center of a company with 8 locations around the world. It contained the company's central servers that were accessed by about 700 users. Being a consultant, they didn't have a good place to put me, so I ended up at a desk in the computer room. Behind me was a large counter-high UPS that the previous occupant had used as somewhat of a credenza, and I carried on the tradition. That is, until the day I had put my cape on there, and the cape slid down and through one of those Rube Goldberg miracles caught the UPS master shutoff handle, pulled it down, and I heard about 30 servers (thank goodness there weren't more) powering down instantaneously. Amazingly, I lived, based on the ops manager pointing out to the powers that be that it was a freak accident and that others had been sitting similar stuff in the same place for years. The cape, however, was not allowed back in the data center. Fortunately, I've had better luck and/or been more careful over the past 20 years.

    2. Re:None of us are innocent. by afidel · · Score: 3, Funny

      Wait, this was a 700 person company and they had single power source servers? Yeah the root cause of that one was not your cape =)

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    3. Re:None of us are innocent. by Anonymous Coward · · Score: 1, Interesting

      My co-worker was windows guy and learning the *nix command on a Mac with OS X. He tried the rm -r on a mount point that he had learn to map to our dev server housing the builds and source. Just like what Mr. Jobs kept saying, it simply works. Luckily, the server was backed up nightly. The restore took about a day. Everybody got the read only access after that.

    4. Re:None of us are innocent. by kv9 · · Score: 1

      ...cape...

      superman is that you

    5. Re:None of us are innocent. by Anonymous Coward · · Score: 0

      A cape?!? Were you wearing your underwear over your pants, too?

    6. Re:None of us are innocent. by yurtinus · · Score: 1

      No, I'm betting it's just Cory Doctorow.

      --
      +1 Disagree
    7. Re:None of us are innocent. by dgower2 · · Score: 1

      Why do people type "*nix" instead of spelling it out?

      --

      Proverbs 21:19 It is better to dwell in the wilderness, than with a contentious and an angry woman.

  40. USB drive running mission critical WAFS by gagol · · Score: 4, Interesting

    I was employed in a 50 employees publicity company. They have a couple of offices across the country and need to share a filesystem through WAFS. The main repository for the WAFS was running off a USB drive, connected to the server using a wire too short. I pointed the problem multiple times to my IT boss (no IT background what so ever) without success, tried to talk the issue to the owner of the company, without success, and one day tyhe worst happenned. The USB controller of the drive fried and we lost the last day of work. Thw windows server system went AWOL. It took an external consultant 3½ days to rebuild the main server, which was running the AD, WAFS, Exchange and our enterprise database. It costed us an account worth 12 MILLIONS $. The big boss then hired consultants and gave them over a thousand box to get her told the exact same thing I pointed to 3 months earlier when I audited the IT infrastructure. Two months later she comes top me and ask me how much it would cost to have a bullet-proof infrastructure. I told her to invest arounbd 80K in virtualisation solution with scripts to move VM around when workload changes and go with a consolidated storage with live backups and replication. It was too expensive. Another three months pass, she hire some consultants, gave them another thousands $ to get told basically the same thing I told her 3 months earlier... Than is where i quitted.

    --
    Tomorrow is another day...
    1. Re:USB drive running mission critical WAFS by Anonymous Coward · · Score: 0

      Well, what was your job. Sounds to me like you weren't involved in IT, but you think were the IT Guru of the office. Get back to delivering the post.

    2. Re:USB drive running mission critical WAFS by Anonymous Coward · · Score: 0

      a USB drive runnning # Women's Auxiliary Ferrying Squadron? I am drunk but that seems unlikely.

    3. Re:USB drive running mission critical WAFS by St.Creed · · Score: 1

      I had the same problem. Then I became a freelance 'datawarehouse architect' and now I'm one of those asshole consultants who get paid a lot to say the same things you already told your boss. Believe me, it's way more fun then being ignored :)

      --
      Therefore, by the (faulty) logic you're using, you're just a cow with a keyboard - osu-neko (2604)
  41. Happened at an auto plant, too. by Ungrounded+Lightning · · Score: 1

    Computer room was in middle of plant on second floor. Fire sprinkler pipe went through concrete floor under raised floor via a hole lined with a somewhat oversized pipe. Was small clearance around the pipe.

    Plant was shut down for model changeover (when the line workers go deer hunting and the plant engineers and related workers fix or change everything that needs fixing or changing.) Somebody was welding a cable rack near the ceiling and the smoke drifted up through the gap, into the space under the raised floor, and set off the halon. A decade's worth of dust and many of the raised floor sections went flying.

    Security responded to the alarm. No sign of fire. Per procedure they switched to the backup halon tank. Half an hour later...

    --
    Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
  42. There is. Tubes. (possibly before you were born) by Kupfernigk · · Score: 2, Informative

    I don't know how old these tape machines were, but I can assure you that back in the day we had power systems that used vacuum tubes, and the tube space needed to be air cooled. The air temperature could reach several hundred Celsius if the fans stopped. Shortly after this would come the plop of inrushing air as the envelope of a KT88 collapsed at the hottest point. It would not be good design practice to series the units like this, but again back in the day thermal management wasn't even a black art. The last piece of electronic equipment I recall that used large power tubes in its control circuits was still in service in 1982, and the power resistors had to be replaced regularly because otherwise they would eventually burn out.

    --
    From scarped cliff or quarried stone she cries "A thousand types are gone, I care for nothing, no not one."
  43. My button story.... by Anonymous Coward · · Score: 0

    Supervisor said try this key in the fire pull station in room "x" that had a halon type fire suppression system. I asked what would happen if the alarm went off? Sup said there was a ~10 sec delay in which a press of the button above the pull station would stop the discharge. As you've already guessed, there was no delay --- instantly $20,000 of gas discharged into the room, half of which came from a nozzle aimed at my head (who thought of that layout?) while doors and vents slammed shut.

    Fastest I've ever spent money with nothing to show for it. $72,000,000 an hour if the discharge lasted only a second. Of course that pales with the U.S. Federal government which spent $302,511,415.53 an hour during 2009 according to this quasi source.

    note: sup actually took responsibility

  44. DS3 down- who kicked the cable? by toygeek · · Score: 1

    Working at a small web hosting company as senior tech support lead, plus junior sysadmin for 100+ servers, I had a very busy day explaining why peoples websites were not coming online. Our 45mbps DS3 was down, with nothing but a 20mbps DS3 over ATM to handle the load. We ended up shutting down web services to reduce bandwidth consumption just so that people could check their email.

    This went on for 9 hours. Our ISP at the time was at a complete loss as to why our line was down. Their guys started at their POP in the San Francisco bay area and drove to every single POP along the way to Reno, NV (our location) until they got to Sacramento. There, they found a cable that had been bumped loose during maintenance done earlier in the day.

    We lost a ton of money that day. In return, so did they.

    1. Re:DS3 down- who kicked the cable? by Glendale2x · · Score: 1

      Was that Colossus?

      --
      this is my sig
    2. Re:DS3 down- who kicked the cable? by toygeek · · Score: 1

      IIRC it was UUNet

  45. Re:Ah, the memories! And lessons, too. by socsoc · · Score: 1

    but only on the drives which were oriented north-south; those oriented east-west were not affected. So came the directive that all drives, henceforth, needed to be oriented north-south.

    That seems counter-productive. They were oriented into the less optimal position?

  46. One print page! by antdude · · Score: 1
    --
    Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
  47. A classic by Anonymous Coward · · Score: 1, Interesting

    One of my favorite stories my grandfather told me is a story about a computer that would screw up its calculations at the same time every day, about 2pm. This was back in the 60s, when computers were rather large. Basically, if the accountants ran the job in the morning, everything checked out. But if they were to run the batch through in the afternoon, their results would all be off. After two days of checking all the standard stuff (bad memory modules, bad cooling, what have you), he noticed that there was this loud banging noise that would start in the afternoon. He went out of the office to the next set of offices over, which had a machine shop. Turns out the machine shop press would start running about 2pm every day, and that machine press happened to be on the same circuit as the adding machine, so it would draw off just enough power to screw with the results.

  48. Re:OT - Anyone know any LAN mapping software? by Anonymous Coward · · Score: 0

    Shut up, GayFucks.

  49. FOR HIRE: THE NORTH AMERICAN MIDDLE CLASS by Anonymous Coward · · Score: 0, Troll

    Is your country a backward shit hole, run by some totalitarian asshole?

    Well, buddy, your troubles are over!

    FOR HIRE: THE NORTH AMERICAN MIDDLE CLASS

    Unwanted, unloved and unemployed, Hire the people that built up the most advanced civilization on the planet to build up your country !

    Why be a third world shithole run by a tyrant when you can be the world's next super power!

    Call today!

    Operators, that speak fluent English, are standing by!

  50. Re:Ah, the memories! And lessons, too. by martyb · · Score: 2, Informative

    but only on the drives which were oriented north-south; those oriented east-west were not affected. So came the directive that all drives, henceforth, needed to be oriented north-south.

    That seems counter-productive. They were oriented into the less optimal position?

    Yes, I blew that one... Oops! But let me take this opportunity to point out something that I realized only after posting the GP post... That I was able to deduce the problem I had with the PBX, because I applied what I learned from the situation with the cleaning staff using a slot on a rack's outlet strip to plug in their vacuum cleaner.

    IOW, although some of these stories seem funny in retrospect, they can also prove to be great learning opportunities, too! I'm looking forward to reading the other posts in this thread. I should probably head over to the "daily wtf" web site, again, too.

  51. Onsite Training = Bad by Bruha · · Score: 2, Insightful

    I dont care where you work, if you're on site doing training, you're probably also sucked back into the work cycle. I see it all the time at work, I have always preferred offsite training, turn off the cell phones. It also helps if you have to use your laptop on the lab, because 99% of the time it means you can not vpn into work so email is not a concern either.

    I think my other Data Center operators would agree were all understaffed, and I work on a network with hundreds of millions of customers using it on a 24/7 cycle. The other danger nobody speaks of is that some companies are too passive when it comes to testing redundancy because half the time while there's redundancy in the system to keep a DMZ up and running, there's no spare DMZ capacity to handle a true outage such as a fiber ring failure that isolates the data center or other disaster. Companies need to design their redundancy so you can unplug the entire data center and your customers never knows it, because if you do not, you will rue the day a true outage happens that impacts the entire datacenter and you will hear about it on the news later. Not a good thing.

  52. Air Force fun by Anonymous Coward · · Score: 1, Interesting

    Some Air Force instructors told us in class that one time one of the tech school instructors wanted to brush up on his cisco skills, so he asked the IT if they had any old routers lying around. They did have one that they thought was cleared out, so they gave it over to him telling him to play around with it. He made the wonderful mistake of plugging it into the network inside the building and it started propagating all the old router information all over the network, which was hooked in the unclassified base network.

  53. Why did you have to bring this up? by Anonymous Coward · · Score: 1, Interesting

    Why did you have to bring this up? You brought back bad memories of the time that I actually did this. I worked at the time as a computer operator in the Southland Corporation data center in Dallas. We had moved into our newly built headquarters building and there was a red light switch on the wall by the master breakers. We all wondered for days what the switch would do and I was the only one who eventually got brave (stupid?) enough to throw it. The master breakers to the computer immediately dropped out. So we tried to flip the master breakers up again, but they wouldn't budge. We had to call building maintenance wherein after an hour of delaying production waiting for them to call in, we were able to get power to the computers back. We didn't know that you had to reset the breakers first by forcing them all the way down. I was scheduled to by promoted into programming anytime so I was really sweating it. The other computer room operators and evening manager decided to not tell anyone who caused the breakers to trip. There was a major management inquisition about what had happened but everyone kept quite. Finally, the evening manager was told that he had until the next day to out the culprit. He was going to do this the next day, but said that management decided to drop it. I was saved. A glass covered wooden box was made to cover the switch. I was promoted to programmer shortly afterwards. Climb mountains, but don't ever flip switches because they are there.

  54. Plating everything in gold stops rust - but costs by dbIII · · Score: 1

    Not really - just that trust in human nature or physical hospital security is flawed or there's a budget that means there is no choice other than to trust somebody to stick to the policy. Fully managed switches that could block dhcp used to be very expensive, and even now they are significantly more expensive than dumb switches that get every other part of the job done.
    Don't make the mistake of thinking a hospital considers computer networking as part of it's core business and upgrades networking components as often a software company or a small office that doesn't have much equipment to upgrade.

  55. Magic/More Magic by Dadoo · · Score: 2, Informative

    I can't believe no one's posted Guy Steele's Magic/More Magic story, yet:

            http://everything2.com/user/Accipiter/writeups/Magic

    --
    Sit, Ubuntu, sit. Good dog.
  56. Another one by dbIII · · Score: 1

    I would usually turn off the server room lights on the way out the door but left them on one night. A non-IT guy leaving tired around 9pm opened the door, hit the light switch and the AC switch next to it by mistake. It took less than five minutes to hit 60C, big spike on the temperature graph before it all shut down. Luckily nothing had to be replaced.
    This encouraged a cover over the switch and a second AC unit - then one day we lost 1 phase of power and both AC units were on the same phase and went down while the servers stayed up while I shut down what I could and hunted around for industrial fans.

  57. Once when I was at Data Center camp... by rekoil · · Score: 1

    A Big Red Button incident knocked livejournal.com offline for 2 days back in 2003. I was working for their colo provider (the owner of said Button) at the time.

  58. "Magic Switch" story - did it really happen? by twosat · · Score: 1

    I stumbled onto a story of a PDP-10 with a mysterious "magic switch" some time ago; did it really happen or is it just a story? http://catb.org/jargon/html/magic-story.html

  59. Re:Ah, the memories! And lessons, too. by dkf · · Score: 1

    great learning opportunities

    That one goes in the euphemism file entry for "horrible disaster".

    --
    "Little does he know, but there is no 'I' in 'Idiot'!"
  60. Pure spin to put all the blame on the young guy by Anonymous Coward · · Score: 0

    You should have had the button labelled.
    If he didn't admit it how do you know he did it.
    Also,the tech team were shown to be pretty damn stupid, not being able to track down the fact that a large red button had been pushed. Didn't they know that the button existed? If they knew it existed were they aware of its significance? If so, then why the fsck did it take them so long to consider it.

    The newbie was at fault, but so were the team who failed to identify the problem for a day and a half.

  61. If it is important enough to insure... by PMBjornerud · · Score: 1

    I can understand not insuring it though. I shipped a 1U fully insured. Double boxed w/ foam inserts and all. It arrived at it's destination in a different box. The back was caved in exactly as would be expected if it were gored w/ a forklift. They refused to pay claiming it was "improperly packed".

    Always take a picture of box and content before shipping anything.

    --
    I lost my sig.
    1. Re:If it is important enough to insure... by sjames · · Score: 1

      It's really sad that it's come to that, but it certainly has.

  62. Here's another one by Anonymous Coward · · Score: 0

    My dad told me this one:

    He was a tech installing a replacement network. Once the new network was up (a some 40 PC's) most of them would run for a while, then suddenly crash and reboot. After checking cables and PC's for a couple of days he noticed that one PC didn't seem to have the problem, so he took a long look at that one. That PC was the only one not plugged into an earthed power socket... so the problem was current running from that PC through the earth wire of the network.

    Sometimes the one that works causes the problem!

  63. Oldie but a goodie by Locke2005 · · Score: 1

    Many years ago Northrop University had 2 PDP 11/34 boxes sitting next to each other. One day the sysadmin decided to network them together by connecting an RS-232 cable between them -- boom, both systems crashed. Reboot, try connecting again, same thing. Suddenly it dawns on him -- he neglected to turn off the echo on the ports used for the interconnection, meaning the first character sent got echoed back and forth in an infinite loop, generating interrupts on both machines faster than the CPU could handle them.

    --
    I've abandoned my search for truth; now I'm just looking for some useful delusions.
  64. Re:Ah, the memories! And lessons, too. by MrHops · · Score: 1

    Ah, the memories! Here are some of the stories I've heard and or witnessed over the years.

    1. Buttons: Every couple years, IBM would hold an open house where anyone in the community could come in and get a tour of the facility (Kingston, NY). This was back in 1984, IIRC. PCs were just starting to make an impact at this time... big iron was king. We're talking about a huge raised-floor area with multiple mainframes, storage, tape drives... MANY millions of dollars per system. A few hundred users on a system was quite an accomplishment back then and these boxes could handle a thousand users. We were also in the midst of a huge test effort of the next release of VM/SP. I had come in that Sunday afternoon to get several tests done (death marches are no fun). All of a sudden the mainframe I was on crashed. Hard. I'd grown accustomed to this as we were at a point where we were "eating our own dog food"; the production system was running the latest build of the OS. But, an hour later and it was STILL down. Apparently, a tour guide had led a group to one of the operator consoles and a child could not resist pressing buttons. Back in those days, booting a mainframe meant "re-IPL" Initial Program Load. Unless the computer was REALLY messed up and wouldn't boot. Only then would someone re-IML the system. Initial Microcode Load. Guess which button the kid pressed? It left the system in such a wonky state that it had to be reloaded from tape. All the development work of that weekend was lost and had to be recreated and rebuilt. (It was a weekend and backups were only done on weekday nights.) It took us a week to get things back to normal.

    Hey, I have a similar story from when I was working at Dartmouth College in the mid-80's. I was on third shift with two other guys, one who knew what he was doing, and one who was, uh, not fully technology-enabled.

    For some reason, one night the latter person thought it would be a good idea to clean out the cabinet of our Honeywell mainframe. With a broom. A long-handled push broom.

    This was on a weekend, when we normally do a full backup (onto good old 9-track tapes), reboot the system into protected mode, verify the system integrity, and go into multi-user mode. Well, we finished the backups, and tried to reboot. Nothing was working, and the diagnostics were wonky and pretty uninformative, and we (the useful co-worker and I) spent an hour or so trying to debug what was going on. It wasn't until we asked the third guy about the machine that he mentioned his cleaning. The boot switches for the IPL were on the door, and when he was in there cleaning, the broom handle toggled several of them, leaving the machine in its unusual state.

    Needless to say, we asked him to avoid cleaning mainframes with brooms in the future.

  65. Yes by Anonymous Coward · · Score: 0

    This is definitely possible. If there's two live jacks at your desk, just plug them both into a desktop hub/switch and it will bring your network down completely.

    Something like a 4 or 8 port 3com or netgear switch will do the trick.

  66. A couple from my Dad by Anonymous Coward · · Score: 0

    ... who would have been working on some IBM big iron back in the early 70's. This was in NZ and there were no computers there at the time, so the cards were punched and then sent to the nearest computer (in Sydney), with the results coming back a week or two later. Inexplicably some of the runs would fail with random errors, causing a great deal of lost time, and it wasn't until they noticed one of the assistants picking up the punched holes and pushing them back into the cards that they figured out why. Apparently they didn't like to see all those cards go to waste.

    They also had another issue with one of the systems shutting down due to a fault fairly regualrly, but only when one of the operators (a woman) was using it. They eventually traced the problem to her wearing nylon underwear and causing a static charge.

  67. Here, let me google this for you by CSMoran · · Score: 2, Informative

    Why do people type "*nix" instead of spelling it out?

    http://en.wikipedia.org/wiki/*nix

    --
    Every end has half a stick.
  68. Red Buttons by sglines · · Score: 1

    I have 3 data center stories:

    I was installing mainframe software at a fortune 100 site. There were a line of printers in the computer room that spit paper into a hall to be picked up. If a printer ran out of paper a yellow flashing light went off. If one of the super fast page printers ran out of paper a red light flashed. I was in the computer room around midnight when a new watchman came in on his rounds, saw the flashing lights and paniced and pressed the red button on the IBM mainframe console. Needless to say, I was sent home and told to come back in a couple of weeks.

    Another time I was in the computer room at a small mainframe installation out in the middle of nowhere. The managers decided that they needed a full bank of batteries for backup so there were a bunch of carpenters banging away next to me. One of them put a nail from a nail gun through the 220 volt main. The computer room sounded like an explosion as the heads of the drop-in disks retracted simultaneously. That bang was followed by dead silence. You have no idea how loud mainframe computer rooms are until the power goes out.

    The last time I experienced a meltdown was at another data center that redundant everything ... except cooling water. The 3 inch pressurized, chilled water main blew apart draining the coolant system and leaving 6 inches of water under the raised floor. Fortunately there were no shorts but within about 20 minutes the temp in the rack room reached 130. It was a sauna in there. One by one the systems powered down starting with the big DEC VAX's ... the only systems that didn't shut down before we got to them were the SUN servers, mostly Sun 50's. It took us 3 days to get everything back up.

    SG