Slashdot Mirror


Dealing with Development House Disasters?

Skinnytie asks: "I was recently asked by the CEO of the company for which I work to find a resource from which to better understand what to do in the event of a disaster. 'I'm on it, Sir' was my response, and I ran back to my desk and started writing contingency plans and trying to imagine what to do if a meteorite strikes our co-lo facility. I quickly came te realize that there is far more that *could* happen (the CTO gets hit by bus, or the in-house server room gets abducted by aliens...you get the point) than I am even prepared to write plans for. I thought I'd hit the Slashdot audience up for some ideas/horror stories regarding avoiding, dealing with and getting past whatever disasters that have occurred at your development houses. Have at you!"

59 comments

  1. Secure the data by Usquebaugh · · Score: 4, Informative

    First and foremost, make sure you have your data backed up and secure. Nothing else matters if the data is missing. When did you last test your backups?

    Next ensure you can get you backup data when you need it. You state colo I'm assuming you have a duplicate system at your colo. Can you get access at 22:15 on July 4th?

    Now draw up your usual emergencies fire, flood, tornado, earthquake etc. Have a plan how to get your systems up and running. Do you need to rent office space? What about net conectivity.

    Lastly, re-check your data backups, do you have everything, is it error free, do you have more than one copy. If you have the data your company can recover.

    No data, no job, it's as simple as that.

    1. Re:Secure the data by stefanlasiewski · · Score: 3, Informative

      Lastly, re-check your data backups, do you have everything, is it error free, do you have more than one copy. If you have the data your company can recover.

      And something else as important: Where are your backup tapes?

      If they are sitting in a locked cabinet right next to the computers, they won't survive the asteroid blast either.

      Off site backups. A pain to maintain, but good idea for any contigency plan.

      --
      "Can of worms? The can is open... the worms are everywhere."
    2. Re:Secure the data by MattCohn.com · · Score: 1

      Mine are offsite, sitting on top of our generator's huge power transformer, in a secured, locked fire-proof room. Nothing's gonna get at them!

    3. Re:Secure the data by 0x0d0a · · Score: 3, Funny

      It fills me with pride to know that even if an asteroid pastes me and half the people on earth, my TPS cover sheet templates will still be readable.

    4. Re:Secure the data by ManxStef · · Score: 1
      sitting on top of our generator's huge power transformer

      Uh, maybe I'm being clueless (Hey, I've just come back from the pub so I'm a bit fuzzy!), but don't transformers tend to have really big f*ckoff magnets in them?

      If so I'd imagine your tapes might well be very safe, but they'll also be very blank... ;)

    5. Re:Secure the data by ManxStef · · Score: 1

      Just a quick addition to this (which I know doesn't really apply to really heavily IT-dependent companies such as software development firms or co-lo companies, but is relevant for a large majority of companies) -

      (after checking your backups actually restore) it may be worth keeping your contingency plans minimal, e.g. just covering critical systems with a few spare workstations, and considering this option:
      how much would it cost to run the office the good ol' fastioned manual paper way for say up to a week (in which time you can rebuilt the resources), as opposed to the cost of implementing a full disaster recovery plan?

      You may find it's almost as efficient and a hell of a lot cheaper?

    6. Re:Secure the data by MattCohn.com · · Score: 1

      (Score: -1, Whoooooosh...)

  2. A certain friend of mine by stefanlasiewski · · Score: 4, Funny

    'I'm on it, Sir' was my response

    A friend of mine once gave a response that was less gentle:


    Sir, you just laid off half the developers, and half of the support staff, but you didn't reduce the marketing staff.

    There is one manager for every 5 non-manager, we're still not meeting our financial targets, our new "Premium services" campaign is earning $1 for every $1000 we invested, we don't have enough tech staff to fix the bugs, the QA department was reduced to a single person and can't even find the bugs, and tech support is dealing with a growing number of irate customers every day.

    We can barely keep up with the endless list of new tasks that you assign, sir, and you want me to waste my time daydreaming about asteroids?

    We don't need a contigency plan sir, we ARE IN the contigency plan.

    Get real, sir.


    Still kept his job. Ok, maybe he wasn't that snotty...

    --
    "Can of worms? The can is open... the worms are everywhere."
  3. The Sky is falling!!! by TheDarkRogue · · Score: 3, Funny

    The Ceiling (Or floor to the party (Company sucess celebration) going on upstairs) of the server room fell in at my friends place of work, impaling their file server with a peice of rebarb(sp?) through the motherboard and a raid/ide controller card. All was well till the hole caught on fire due to what was later found to be a damages coffe cup heater that a stack of books had fallen onto. No one was seriously injured by the event other then the group of people who designed the building.

    This was an architectural firm.

    --
    (Score:0, Interesting)
  4. Partial answer: do dry runs of the plans by herrlich_98 · · Score: 4, Insightful

    One really hard part of plans is to catch *all* the things you are going to need to recover. This advice is kinda like the old advice to actually test your backups occasionally.

    For example, if you have plans to relocate the headquarters to a different site then every 6 or 12 months so try it out to sort out the "glitches". To expand on this example, does everyone know where the other site is and how to get there? How will they know to go there? Are there problems of quorum, where half the managers will be at one site and half at the other making contradicting decisions? Etc, etc, etc... This is also a good time to learn about how your organization operates.

    Whatever your plans and contingencies do regular dry runs of them, to the extent that it is practically possible.

  5. Backups by CounterZer0 · · Score: 1

    Of everything - not just those files in CVS. Every person, every concept, every document needs to be duplicated, or be /easily/ reconstructed from others that are.
    No person is so special that someone else can't be trained to be 'their backup' - no single person should ever hold the only set of 'keys to the kingdom'.

    1. Re:Backups by Loosewire · · Score: 1

      Hey Stop it, Us BOFH's want to stay in our positions of total power / blackmail and corruption.
      hehe ;_)

      --
      Slashdot - The one stop shop for procrastination
    2. Re:Backups by crashthud · · Score: 1
      I can't agree more.

      In our office, we've had one programmer laid low with no notice by a drug reaction and surgery, out for four months; another programmer injured in an accident, out for a month. In both cases, the paper trail didn't begin to explain just what they were working on adequately for the remainder of the crew to pick up where they left off, and they were, of course, the only ones with info on their specific bits.

      Now that everyone's back, there's still no buy-in from management that we need to take the time for crosstraining. The demo that was missed is forgotten, the delayed deliverables faded into memory, and 'we're just too busy for that'.

      Aargh.

  6. On the other hand... by Anonymous Coward · · Score: 0, Funny

    I was recently asked by the CEO of the company for which I work to find a resource from which to better understand what to do in the event of a disaster. (...) the CTO gets hit by bus, or the in-house server room gets abducted by aliens...you get the point

    On the other hand, the CEO could get hit by bus, and you wouldn't have to deal with those disasters at all...

  7. Categorize by Andrew+Lockhart · · Score: 2, Interesting

    I don't have experience with this myself, but if I were in your boat I would make a system for classifying types of disaster and the appropriate recovery methods for each. For instance at the top level you would have either a disaster resulting in either physical damage or non-physical damage. From there you could classify disaster types according to how much and what physical damage occurred.

    So, your meteorite example would probably fall into something between a horrible fire and earthquake, as the kind of damage inflicted on your facility would be similar in such events.

  8. Rebar, or Rerod, if you're in Missouri NT by stever00t · · Score: 1

    nt

  9. Volcano Disaster Plan by Rick+the+Red · · Score: 2, Funny

    I worked at a large aircraft manufacturer in the Pacific Northwest back when Mt. St. Helens blew. They quickly imposed a "Volcano Disaster Plan" that, AFAIK, is still officially in place. We never followed it, however, because it included such mandates as turning off all equipment at night and sealing it up in plastic and duct tape just in case the building got dusted with ash. Yeah, right! (remember, this was 1980, well before a computer could fit in a garbage bag) It was bad enough for us with our CAD workstations and Tektronix terminals; I can imagine what the boys running the IBM big iron thought of that plan. Where are you going to find a plastic bag big enough for a 370 mainframe?

    --
    If all this should have a reason, we would be the last to know.
    1. Re:Volcano Disaster Plan by jeffy124 · · Score: 1

      sealing it up in plastic and duct tape just in case the building got dusted with ash. Yeah, right!

      that's not all that far fetched. buildings in the area of WTC on 9/11/2001 were all dusty on the inside from the collapsed towers. additionally - there was a museum that saved their inventory by closing off the vents and other ductwork before the two towers fell.

      --
      The One Rule Of Chess You'll Ever Need: Don't play someone who carries a kit in their bookbag.
    2. Re:Volcano Disaster Plan by 74Carlton · · Score: 1
      can imagine what the boys running the IBM big iron thought of that plan.


      They probably just paid for the off site data backup from IBM and didn't think much about it. It's the sort of thing IBM does very well.

    3. Re:Volcano Disaster Plan by Rick+the+Red · · Score: 1
      And so, since 9/11, has every business in New York been sealing up all their electronic equipment with plastic and tape each evening, then un-sealing it the next morning, then re-sealing it, etc?

      I didn't think so.

      --
      If all this should have a reason, we would be the last to know.
    4. Re:Volcano Disaster Plan by Mr.+Slippery · · Score: 1
      And so, since 9/11, has every business in New York been sealing up all their electronic equipment with plastic and tape each evening, then un-sealing it the next morning, then re-sealing it, etc?

      Probably not. But I do remember that the first PC my father bought back in the mid-80s, a Victor 9000, did come with a dust cover...

      --
      Tom Swiss | the infamous tms | my blog
      You cannot wash away blood with blood
  10. What not to do by the_other_one · · Score: 2, Insightful

    I know of one outfit that shall remain namless.

    They layed off their sys admin.

    Then soon found out what a root password was for.

    --
    134340: I am not a number. I am a free planet!
    1. Re:What not to do by Loosewire · · Score: 1

      I apologise if im being ignorant (as usual) but couldnt they just boot disk it and reset the root (if bios was passworded to stop floppy boot then mobo jumper to reset it ?) only flaw is this would be a pain (read major financial loss)on production/critical servers

      --
      Slashdot - The one stop shop for procrastination
    2. Re:What not to do by Anonymous Coward · · Score: 0

      You're assuming it's an x86 system.

      If it's a Sun (for example) and he set the eprom password, then you have to pull the prom, and get sun to send you a new one because there is no way to clear it.

    3. Re:What not to do by Loosewire · · Score: 1

      Hmm - but its still not a non recoverable situation like it sounded in the first post?

      --
      Slashdot - The one stop shop for procrastination
    4. Re:What not to do by Anonymous Coward · · Score: 0

      Well, it's a little more complicated than I first described.

      You see, the bit you have to replace holds the system's host ID number.

      A lot of software is sold nodelocked to a specific host ID, so when that number changes, you have to get new keys from all of the software vendors (or use a workaround that might work).

      It wouldn't be a crisis if you knew what you were doing, but it would still be very annoying.

    5. Re:What not to do by cybermace5 · · Score: 1

      Uh, you make it sound like they LYNCHED the sysadmin and chucked him into the subfloor.

      I'm sure the sysadmin has no legal right to withold the password to the system. They own it, and owned it when he set the password in the first place.

      --
      ...
  11. A couple of thoughts by travail_jgd · · Score: 2, Insightful

    Instead of planning for fire, flood, and alien attack, categorize things by level of severity: whether or not the primary site is intact, the amount of time it will take to resume normal operations, whether the event is isolated/regional/national, etc. It doesn't matter if your office is ruined by an asteroid or a terrorist's bomb, the company will still be doing its work at another site.

    Identify critical paths and personnel. An organization can function for several weeks without C*O's, but without the "worker bees" the company will grind to a halt almost immediately. Also consider the effects of losing large numbers of staff. If every developer and admin quit, what would be the effect on the company?

    Verify the physical security of all sites. Imagine "man with a gun" security breaches -- would an armed intruder (willing to kill) be able to cause significant damage?

    I visited the colo site for a company I was with. They had multiple electrical hookups and enough fuel to run the diesel generators for 48 hours. There were two separate water main connections (a holdover from the water-cooled mainframe days). The colo was connected to two different phone companies (on opposite sides of the building), plus a dish for satellite uplink. It was incredibly expensive to build and maintain, but it would be more expensive if there was ever any downtime.

    Also take cost into account. How much is your company willing and able to spend for disaster prevention and recovery? Don't forget to include your time in those figures... :)

  12. simple by farnsworth · · Score: 4, Insightful
    having both worked on disaster recovery plans and having worked in a data center in the world trade center that was completely destroyed, I can say that the best recovery plans are extremely simple.

    break it down into procedures that you can take based on real problems, not causes. some things to consider:

    1. internet connectivity down (switch to backup colo)
    2. unrecoverable db (drive to offsite backup storage and get backup data)
    3. app servers fried (engage hot standby boxes)
    4. all of the above (shit, it's going to be a long night)

    etc. I've seen too many recovery plans that are focused on the cause, rather than solutions which is really what these plans are all about. if you really need to, you can cross reference the plans with potential causes. this seems to satisfy the cio types who stay up late wondering 'what happens if _____'.

    of course, a backup plan is totally useless if the 'course of action' section is not possible to carry out, due to bad backup practices or lack of failover equipment. having a disaster recovery plan is no substitute for good policy and an adequate hardware/isp budget.

    --

    There aint no pancake so thin it doesn't have two sides.

    1. Re:simple by stevew · · Score: 3, Interesting

      I've done alot of disaster scenario planning for emergency service providers - and we come up with some really wild stuff, but the above is perhaps the best advice I've seen here so far. Don't worry about the aliens hijacking the data center, worry about the data center resources not being available for whatever reason.

      The concept of breaking down the recovery phases is the best recovery advice I can give you. Worry about things in sort of a concentric ring of problems much as the previous poster presented. Start with the simplest broken piece and move on to the more compilicated.

      The things companies went through due to the WTC going poof is a true real-world example of the worst case scenario occuring - Not only the data center disappeared, so did the staff that ran IT there!

      Of the recovery efforts I've read about - the guys that had deals with hot-standby facilities out of the immediate area came back the quickest.

      --
      Have you compiled your kernel today??
    2. Re:simple by Zapman · · Score: 2, Interesting

      Adding to the last point you mention, the most critical thing you do to any plan is TEST IT.

      Stage a disaster.

      Either that, or just fake it. Take the data to and try to bring up all the data, bring it up on the internet (under a dr.www.?????.com DNS name for example) and see what's accessable.

      In case of failure, tune the plan, and try again in 6 months.

      In case of success, tune the plan, and try again in a year.

      Jason

      --
      Zapman
    3. Re:simple by Anonymous Coward · · Score: 1, Funny

      I took your advice and set fire to the machine room. I didn't warn anyone because it is supposed to be a realistic test. They didn't seem to take it too well when I explained that they had "done a very good job" on the test. The only server that was really toasted was the one I tested for the "crazy lawnmower man comes into server room and sloshes gasoline on prized IBM mainframe and dropps a match." Some people really cried about all that data loss but a test is a test, right? If people don't do backups then they are responsible for the data loss!

      Unfortunately "they" didn't quite see it that way. I cleared out my desk this afternoon. Thanks a lot.

  13. Know where the backups are. by Anonymous Coward · · Score: 0

    Make sure that you know if the backups are actually where they are supposed to be, and that people know for sure where they are located.

    Also, if you are in charge of backups, and come up with a "clever" storage location, please tell at least one other person where it is.
    After all, you never know what the future holds, and if you are unable (for any reason) to tell other people where the backups are when they are needed, it is as if they never existed in the first place.

    (Posted anon, and with no details to preserve my job).

  14. Really simple by xagon7 · · Score: 1

    Offsite backups, and more than one person that can perform/knows the same critical aspects of the business (code or other specialized information), is all that is really necessary. Some standarized software inventory (OSs, versions, other software and version, service packs) on what machines. This is really a procedural issue. This is about all you can really to to be ready for about anything that is thrown at you.

    1. Re:Really simple by maxume · · Score: 0, Troll

      I have been told to think for myself. I will think for myself for I have been told I will think for myself. The mass media are the opiate; I will think for myself...The mass media are the opiate; I will think for myself...The mass media are the opiate; I will think for myself...The mass media are the opiate; I will think for myself...The mass media are the opiate; I will think for myself...The mass media are the opiate; I will think for myself...The mass media are the opiate; I will think for myself...The mass media are the opiate; I will think for myself...The mass media are the opiate; I will think for myself...The mass media are the opiate; I will think for myself.













      The real problem here is that the fucking repetition is actually to make a point, not to fucking troll. I mean, yeah I'm trolling, but not really in a bad way. Perhaps this is enough for the fucking filter...

      --
      Nerd rage is the funniest rage.
    2. Re:Really simple by xagon7 · · Score: 1

      Besides.... its POPULAR media.... NOT mass media.

      Pop culture.

  15. plan for the effect not cause by linuxbert · · Score: 1

    plan for the effect, not the cause. aliens, fire flood are all different causes, but they can all cause the same result - destruction of the building.
    plan for that.

    if you need to relocate an office, move it to a hotel. you have space (rooms, convetion space) all designed to have furniture brought in and phones, power and data, and available on short notice. you also have staff to keep it all clean.

    plan the order for it systems restoration. development can wait a week but if your cash flow stops your screwed.

    those are my tidbits of advice.
    hope it helps.

  16. Contingency Plans by m0rph3us0 · · Score: 1

    First, find out how much they want to spend, ie. somewhere like the NYSE has 3 remote sites that immediately mirror every transaction that occurs.

    The only reason the NYSE stopped during 9/11 was for political reasons. Most of the brokerage firms in the WTC had warehouses in Jersey rented for years waiting for something like 9/11 to happen.

    It comes down to cost, how much is it worth if you loose all your records. How much is it worth if you are down for 2 weeks. First figure out how much a minute of downtime costs, a week, a month, then figure out how much it costs for each of these services.

    The mostly what you have to plan for is, suppliers being eliminated, loss of the block your office is in, loss of the the city you are in, loss of the nation and resulting personelle. Once you've factored those things you will have a good basis for a contingency plan, and you will find out how much really needs to be planned for and how much is if it happens we're dead anyways.

  17. Backups... and another RAID board by Anonymous Coward · · Score: 2, Insightful
    Yes backups backups backups... OFF SITE BACKUPS. Get a fire safe and keep a set in there too, what the hell.

    Little realized thing until it's too late: Do you have raid arrays? If so, you might want to ponder making sure you can always "get" another raid card of the same type currently running your array. Either have another on a shelf (off site, whatever), or another compatable system.

    Nothing sucks more to have something stupid like the raid card (or motherboard) die, and you can't a replacement that will recognize your current array of drives.

  18. 2 points by hswerdfe · · Score: 1

    1. off site backup, of ALL data (and out of country or state backup, if your worried about rebels, or floods.)

    2. makesure that every person in the company can be replaced, directly (by someone else), or indirectly (by a combination of people)

    --
    --meh--
  19. Years ago by CharlieG · · Score: 3, Interesting

    Years ago (Back in the days of drum memory), I was taken on a tour of a data center (first computer I ever saw)

    They guy told me something about their disaster preps (it was financial data)

    The first thing he pointed out was that there was 2 complete mainframes, side by side. Each was capabile of doing the whole job, but....

    The next was pointing out that they had redundant power, plus a generator, and a lesson they learned the hard way - the generator had the ability to power the Air Conditioner as well as the computers - if your server room gets too hot...

    Then he said, "we have a second Identical data center about 5 miles across town"

    Each data center could handle all the customers from that region - yes there would be a perfomance hit, but...

    Then he said, there are 7 more cities around the world, each with 2 data centers like this one - all transactions go to all 8 cities.

    And then last, he said there was one more data center, in the Outback of Australia. They figured that it was the least likely place to get nuked, and they even planned for that

    Yep, paranoid enough that they wanted their data to survive even if all the major financial centers in the world ALL went "kaboom"

    --
    -- 73 de KG2V For the Children - RKBA! "You are what you do when it counts" - the Masso
  20. Chicago Flood of 1991? by Gerry+Gleason · · Score: 1
    Or there abouts. I was working in the north suburbs at the time, so it didn't effect us, but a few years later I worked in the Board of Trade building. The bottom line is that the building was totally closed down for an extended period of time. The Chicago Board of Trade (I think they actually own and control the building) pulled up generators on the street to run their systems, but none of the tenants could get any power. The story I got from the company I eventually worked for was that they were able to get in for one day to get their systems and data out, and set up the machines at a new site. I'd bet that many companies were not able to move their equipment, and lost a lot of money that way. After all, the trading went on. I imagine that companies would find a way to keep trading even if their systems were unavailable (i.e. temporary arangements to enter trades through another company's systems).

    As to what your organization should do, that depends. How much do you have to spend? Disaster Recovery is a big subject, and there are a lot of choices to make. I could help, but I'd have to charge you.

  21. Cracking Sun's hardware security. Easy. by SN74S181 · · Score: 1

    There's a workaround for bypassing the 'eprom' password on Sparcs (actually it's NVRAM with a battery built into the module). You remove the NVRAM chip from it's socket, boot up the system to the OK prompt, then plug the chip in live, with the system running, and make your security changes. I have successfully done this on SparcStations that I bought on eBay that had a password. It's slightly risky, but on older Sparc boxes (all those nice classic SparcStations) it would be NUTS to have to buy new NVRAMs.

    The technique is documented here. And here. And here too.

    There's also a technique to tack on a replacement external battery on those NVRAMs. There's no reason to EVER buy a new one for non-critical boxes. Most of my older Sparc boxes have had that surgery performed on their NVRAM chips (involves actual physical surgery on the module) and live happily powered by a pair of AAA cells.

  22. Risk management by Twylite · · Score: 2, Insightful

    What you've been asked (volunteered) to do is a risk analysis. This is a whole lot more than being a l33t admin of a high-availability site, and many Slashdotters seem to think. You've hit onto some of the non-technical risks (your CTO example), but to address this properly you need to concentrate on identifying risks, and how to handle them.

    The first thing to realise is that you can't have a preformed contingency plan for everything. What you can do is identify every point of risk, weight it according to likelihood and severity, and develop plans for the "likely worst cases" that you discover. The rest of the risk is a business risk, that is, you insure yourself against it and deal with it if and when it happens.

    You should also bear in mind that, from a technical viewpoint, there are no absolute guarantees. Almost all high-availability strategies protect against a single point of failure, but this isn't enough. What if you have multiple failures? How quickly can you detect and respond to a failure? How long can you suffer a complete outage (this is really important to know, and "we can't" is not an acceptable answer). Uptime costs money, calculate the point of balance.

    Ask Google about "organizational risk" - you'll find a lot of information about auditing risk that can put you on the right path.

    --
    i-name =twylite [http://public.xdi.org/=twylite], see idcommons.net
  23. As well as not-so-dry runs by devphil · · Score: 1


    Great way to test the UPS batteries and auto-shutdown software is to walk over to the wall and yank the power cord of the UPS out of the socket.

    Plugging it back in after 30 seconds is good way to test the "power came back, cancel the shutdown" part of the software, too.

    --
    You cannot apply a technological solution to a sociological problem. (Edwards' Law)
    1. Re:As well as not-so-dry runs by Anonymous Coward · · Score: 0

      Great way to test the UPS batteries and auto-shutdown software is to walk over to the wall and yank the power cord of the UPS out of the socket.

      No, it isn't. UPS companies want you to leave the plug in the wall to maintain the ground connection. Kill the power at the circuit breaker panel instead.

  24. Make sure the disaster is profitable by Andy_R · · Score: 2, Interesting

    There is a natural tendency to think that this is all about keeping the data safe, or about having procedures in place, but I have a different way of looking at it that I think is more practical:

    Make sure the company stays profitable.

    All this involves is insuring against disasters, and making sure the payout will *exceed* whatever it costs to recover.

    If my office exploded and I had to re-build everything from scratch, I'm fine, because my company will soon be getting a cheque that will cover everything, including the expected profits for the next 6 months. If we rebuild in 5 months, the disaster is actually a revenue generator for the company.

    Compare this to someone with excellent plans and the ability to get rebuilt in just a month, but no insurance on the lost profits. They are facing a net loss even if they work like crazy and get the rebuild done in 3 weeks.

    Of course you should still do off-site backups to deal with problems that are 'serious' but not 'disasterous', and have contingency plans in place for day to day traumas, but I think the best way to deal with 'the big one' is simply to insure against it, and make sure the insurance covers the lost profits.

    --
    A pizza of radius z and thickness a has a volume of pi z z a
    1. Re:Make sure the disaster is profitable by Anonymous Coward · · Score: 0

      Disaster Business Plan

      1.) Insure the company for more than it is worth
      2.) ????
      3.) Profit!!

      Please note the arson investigators will be looking closely at your step 2, however.

  25. Dev? by photon317 · · Score: 2, Informative


    For a development shop, I should think that all you really need is to make sure you've got secure, recent, usable backups of your source code and important licensing/contract data.

    Unless you live in an area prone to a certain type of natural disaster, the types of things that cause real contigency plans to go into effect have a statistically small chance of ever happening to you. It's just not worth the money and effort to go to great lengths to make sure the company is running full speed the next day rather than two to three weeks down the road. As long as your data is safe, you should be ok. Just take good backups in duplicate - put one set in a fire safe onsite and ship one to a secure offsite location, perhaps using a service provider like Iron Mountain (although I'd encrypt that tape before I gave it to some random Iron Mountain driver if I were you).

    Some businesses have to worry about 24/7 production operations that can't be allowed to stop. Typical examples of extreme uptime environments are stock exchanges, utility/telco companies, various emergency services, etc. In a lot of these sorts of cases, it's actually justifuable to double or even quadruple the cost of your implementations and the ongoing maintenance and salary costs just to make sure than when a 1:1,000,000 chance event occurs, you experience a 10 second performance hiccup rather than a serious outage. In some cases a one hour outage simply cannot be tolerated at any cost. These are the environments that really have a hard time pushing the bleeding edge of engineering geographically redundant "systems", where systems includes the machines, the networks, and the people using them. A development house, in contrast, is a pretty easy problem.

    --
    11*43+456^2
  26. Decide where to spend your effort. by PinglePongle · · Score: 3, Informative
    There have been a number of posts already, but I would create a simple spreadsheet breaking down all the risks you can think of as follows :

    • Risk category : eg. people risk (CTO gets hit by a bus), infrastructure risk (your server is destroyed by aliens), legal risk (you get sued by the RIAA for having an MP3 somewhere on your network), commercial risk (your biggest client goes bust), regulatory risk (the government licenses development shops) etc.

    • Risk impact : - the impact on the business if the risk were to occur. Probably best to summarize into 5 or so levels from "negligible" to "unrecoverable"
    • Likelihood - the chances of the risk occuring. Your office is unlikely to get hit by a meteorite, but your top coder may get headhunted and take all that knowledge with him.

    You then have to reach a business decision how much to spend on mitigating each risk. Clearly it's worth spending time on "high likelihood, disastrous impact" risks, but you may not care much about a transport strike stopping the cleaning staff from getting into work for a couple of days.
    When you know which risks you care about, identify mitigation strategies. Typically, this starts with identifying an owner for the risk, who is in charge of the mitigation strategy. For instance, the development manager may have to find ways of mitigating the "top coder headhunted" risk by implementing code review processes, knowledge sharing systems, etc.
    You should not let the business view risk management as a technology issue - it's a business issue. Risk mitigation has associated costs, either financial, time, or opportunity cost - the best way to avoid not getting paid for your work is to avoid working for unreliable bill payers. If you come up with a wonderful risk list and proposals for mitigation, your work will be wasted unless the business is willing to bear the cost of implementing your proposals.
    --
    It's all very well in practice, but it will never work in theory.
  27. How about.... by shagymoe · · Score: 1

    The electric company uses a backhoe to cut all your data lines?

    We actually kept supplying a Big3 auto manufacturer in a "Just in time" sequencing operation when that happened to us.

    Not bad...

  28. I Can't Believe No One's Mentioned It Yet by Anonymous Coward · · Score: 0

    The worst disaster a company that depends on tech can have is slashdotting.

    Plus, it's a fscking good excuse to get new, fun hardware that rocks for after-hours Unreal 200x Tournament!

  29. Re: re-bar by sczimme · · Score: 1


    a peice of rebarb(sp?) through the motherboard

    This stuff is called 're-bar', which is short for 'reinforcing-bar'. It is a metal rod about a half-inch in diameter (there are larger/smaller versions) that is used to add strength to concrete structures. Re-bar is made with a coarse pattern on the outside so the concrete can get a grip.

    For those of you who are still lost, this is the stuff that Cordelia fell on in 'Lover's Walk', an episode from the second or third season of Buffy the Vampire Slayer. Does that help? :-)

    --
    I want to drag this out as long as possible. Bring me my protractor.
  30. September the 11th by chrisseaton · · Score: 1

    I wonder how business in the towers coped after 11/9. They must have had to have applied their contingency plans there, perhaps you could try looking for someone in one of those companies.

    1. Re:September the 11th by boskone · · Score: 1

      I remember hearing two things regarding the WTC's and disaster recovery planning.

      1. In the first bombing attack (in '93?), hundreds of businesses went permanently under because they lost their invoices and were unable to get back up and bill their customers (had no offsite backup).

      2. Many businesses that had primary operations in WTC on 9-11-01 were running real time mirroring to remote sites and had zero downtime from an equipment/data availability perspective. Of course, they lost a lot of key people too, so I don't know that they didn't have business interruption, but it was actually very brief. I had a vendor tell me that of the 70 or 80 businesses in WTC that were all running their high availability products, not one had a substantial downtime or interruption in service.

      So, it is possible to build against this kind of thing. Like people have said earlier however, you have to weigh the risk against the costs.

      But, like the '93 incident shows, all businesses, even tiny ones, need to at least have good offsite backups of critical accounting/billing data to survive even the smallest of fires, floods, disasters.

  31. Re: re-bar by stefanlasiewski · · Score: 1

    For those of you who are still lost, this is the stuff that Cordelia fell on in 'Lover's Walk', an episode from the second or third season of Buffy the Vampire Slayer. Does that help? :-)

    I find that the following explanation is a better example:

    When you were a kid, it was the metal bars you stole from the new house next door to play "Darth Vader vs. Luke Skywalker".

    --
    "Can of worms? The can is open... the worms are everywhere."
  32. backup tapes in a fire safe?!? by Kyril · · Score: 1

    I thought those things got hot enough to melt tapes. Certainly the last ones I laid eyes on said explicitly that they wouldn't work for tapes. More expensive models may or may not do better, but if the assumption about the heat and duration of the fire exceeds their rating, you're still pretty screwed.

    1. Re:backup tapes in a fire safe?!? by photon317 · · Score: 1


      Yes most fire safes I've ever seen are mostly worthless. When you read the specs, it turns out they let through way too much heat for most practical purposes - all they're doing is blocking the physical flame. You're still running a fair chance of total heat destruction on the inside. However, there are fire safes out there that are capable of protecting magnetic media, you just have to look around for them.

      For example, check out http://www.firekingoffice.com/fk_data.html

      --
      11*43+456^2
  33. consider going bankrupt by g4dget · · Score: 1
    You can spend enormous amounts of money on trying to protect against every eventuality. But is that worth it? What makes the US economy so dynamic is that companies can take risks and that they do fail. And while you are trying to take costly steps to protect your data, your competitors may well not be.

    I'm not saying that you shouldn't plan and protect, I'm saying that, in real life, you have to look at what your risks are, what your legal obligations are, and what your competitors are doing. Accepting the risk of going out of business is part of doing business in the first place, and aliens abducting your mainframe should probably be lower on your list of worries than having an understaffed tech support line.