Slashdot Mirror


What Hurricane Sandy Taught IT About Disaster Preparedness

StewBeans writes: The National Oceanic and Atmospheric Administration Climate Prediction Center is calling for calmer than normal storm activity this hurricane season, which runs through Nov. 30. But it's likely that data centers and IT companies in NYC are still taking disaster preparedness seriously. Three years ago, Hurricane Sandy devastated homes, businesses, transportation, and communication in New York, and taught many companies (the hard way) how to keep the lights on when the lights were literally off for weeks on end. Alphonzo Albright, former CIO of the Office of Information Technology in New York City, gives a behind-the-scenes account of what life and business were like in the dark, cold days following Hurricane Sandy in NYC. He also shares tips for other tech leaders to create their own Business Continuity Plan in case this year's storms take a turn for the worse.

68 comments

  1. Geographic diversity by Todd+Knarr · · Score: 5, Informative

    First rule: have facilities capable of running your business in more than one location. Everywhere is susceptible to disaster of one sort or another, but if you pick areas far apart that aren't geographically similar they probably won't both suffer disasters at the same time.

    Second rule: the probability of disaster taking out your main facilities is 100%. It will happen. The only question is exactly when it'll happen, and the only constant in the answer is that it won't be at a good time. If anyone in your organization doesn't like this, remind them that reality doesn't really care what they like.

    1. Re:Geographic diversity by turbidostato · · Score: 4, Insightful

      I should add a rule zero then: Take your time to properly understand your costs and revenues so you can make a sensible investment. Maybe it ends up being cheaper just to close door for a week every 30 years than your A-Bomb-proof continuity plan.

      And then a zero-plus: Make sure you get business-aligment in written. Maybe the board member that agreed to your investment-sensible less-than-A-Bomb-proof continuity plan wants you as scapegoat once the shit hits the fan.

    2. Re:Geographic diversity by Lumpy · · Score: 4, Interesting

      A week? most data disaster you are down for at least 30 days. Hell you cant get an order for servers in from DELL even on rush faster than 2 weeks.

      If your company can survive zero revenue and 100% loss for 30 days, you either are sitting on a mountain of money, or your business is more of a hobby than anything else.

      Oh and if you lose your accounting data due to lack of a bomb proof plan, expect fines in the high 6 figure range.

      --
      Do not look at laser with remaining good eye.
    3. Re:Geographic diversity by turbidostato · · Score: 0

      I see your low ID and I can't but asking... are you really *that* obtuse?

    4. Re:Geographic diversity by Anonymous Coward · · Score: 1

      Lumpy is actually completely right, and as usual you kiddies that have zero experience or education in managing IT just don't have a clue. You cant get replacement hardware instantly, no corporations don't go to best buy. Also I siggest you look up sarbanes oxley, and the fines for not having a working backup plan.

    5. Re:Geographic diversity by turbidostato · · Score: 1

      C'mon, Lumpy, we all know it's you.

    6. Re:Geographic diversity by tlambert · · Score: 1

      A week? most data disaster you are down for at least 30 days. Hell you cant get an order for servers in from DELL even on rush faster than 2 weeks.

      Do companies actually bulk-order from Dell any more? This is actually the most I've heard about Dell for months.

      I know Google manufactures their own computers, for the most part. They do use Dells as build machines for things like Chrome and ChromeOS, and they're a cheap way of throwing CPUs at the problem, instead of making the Ubuntu build process actually effective and efficient, for that matter, but those are pretty specialized use cases.

      Google also routinely runs "This data center got destroyed in an earthquake/this data center was destroyed in a sudden military conflict/this data center was destroyed by terrorists/This datacenter was taken offline by a nuclear plant melting down/This data center went offline because it was solar powered, and it was gloomy for 5 days in a row and the batteries ran out/etc." exercises.

      The exercises take place on the live network, and if someone screwed up, it's sometimes visible externally (when this happens, the exercise is ended for that service to get it back up and the service/service team is "red tagged"). For the most part, nobody notices except the people inside Google. Which is ready to fly new containerized data centers pretty much anywhere in the world with about 24 hours downtime, max, anyway.

    7. Re:Geographic diversity by Anonymous Coward · · Score: 0

      ' Hell you cant get an order for servers in from DELL even on rush faster than 2 weeks."

      Funny, I just placed an order and they'll be here in TWO DAYS GUARANTEED.

      Would you like to try again when you actively use these services and know what the fuck you're talking about?

    8. Re:Geographic diversity by nine-times · · Score: 3, Interesting

      Do companies actually bulk-order from Dell any more? This is actually the most I've heard about Dell for months.

      I know Google manufactures their own computers, for the most part.

      So you think just because Google builds their own servers, it must be that everyone else does the same? There are a few companies out there that aren't Google, and yes, many of them still buy from Dell or HP.

    9. Re:Geographic diversity by Khyber · · Score: 2, Informative

      "I know Google manufactures their own computers, for the most part."

      As a former Google employee, I must say you are full of shit.

      Show me Google's manufacturing plants, please.

      --
      Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
    10. Re:Geographic diversity by nine-times · · Score: 4, Insightful

      Take your time to properly understand your costs and revenues so you can make a sensible investment. Maybe it ends up being cheaper just to close door for a week every 30 years than your A-Bomb-proof continuity plan.

      This is an amazingly difficult concept to get people to understand. I've had way too many conversations with people who are sure they need an instantaneous failure-proof disaster recovery plan. They believe their servers should be constantly in sync with multiple copies in various places, such that in the even of a short internet outage, their servers will fail over to an outside copy, and then fail back when the outage ends, automatically and without skipping a beat. Unfortunately, they're willing to spend approximately $0 to achieve this, but that should be fine, because "the cloud" is pretty much free, right?

      It's a similar problem with security. Everyone wants all of their data to be completely secure without any possibility of being compromised under any circumstances, but they also want it to be as convenient as if the data is unsecured, and they don't expect to pay extra for any of it.

      I always try to explain that it's about trade-offs. I can make your data much more secure than it is now, but it'll cost you money, and you'll have to jump through extra hoops to get access to your own data. I can replicate what you need to a remote server, yes, but then you have to pay for the remote server. Depending on exactly what we're talking about, it might not be a real-time sync, or it might not result in anything like an automatic failover. Those things might require special software or services or licenses. Pay enough, and yes, I can probably get you a real-time sync with automatic failover and fail-back, but even then, you could still have an outage. The system that keeps everything in sync and triggers the failover could be the component that fails. Or if there's a total blackout on the east cost, it might not matter that there's a complete replica automatically started on the west coast, if all your employees are on the east coast and without power.

      It's trade-offs. Spend enough money and put up with enough limitations, and you'll get something that does what you want, although imperfectly. Most of the time, for most businesses, it doesn't make sense. "Good enough" is good enough. But people don't like to be told, "A pretty secure network with a pretty good disaster recovery plan is appropriate for you." It makes them feel unimportant, which most executives and business owners can't live with. They want to know that they should have the best thing possible.

    11. Re:Geographic diversity by Anonymous Coward · · Score: 0

      A week? most data disaster you are down for at least 30 days. Hell you cant get an order for servers in from DELL even on rush faster than 2 weeks.

      That's why you make a plan and sign a SLA before a disaster. When it finally hits, you will have a contract already paid for that a vendor has to fulfill, generally by private freight within 48 hours but less can be had if the exacting details are ironed out in advance.

      There are entire companies that hold duplicates of IT stock just to make money off these sorts of business continuity agreements. Getting paid to worry for someone else is quite lucrative.

    12. Re:Geographic diversity by dgatwood · · Score: 3, Interesting

      A week? most data disaster you are down for at least 30 days. Hell you cant get an order for servers in from DELL even on rush faster than 2 weeks.

      Maybe true, but you can get a cloud server deployed in a matter of minutes, and you can use that as a temporary (expensive) alternative to servers under your complete control.

      If your company can survive zero revenue and 100% loss for 30 days, you either are sitting on a mountain of money, or your business is more of a hobby than anything else.

      You're making a lot of assumptions that aren't necessarily valid. The amount of downtime and the impact depends heavily on the nature of the company, and in particular, whether sales/income depends on maintaining continuous operation of the business. Take, for example, a company that makes software:

      • On the development side, even if a company's entire repository went away tomorrow, and even if half the development team died, a typical software company could still get back all but the last few days' work (and perhaps a few old branches) by configuring a github instance on Amazon's Elastic Cloud and having the remaining developers push all of the branches from their local checkouts. Downtime would be minimal.
      • On the distribution site, most software companies would be completely unaffected, because distribution is usually handled by a large third-party merchant (Apple, Google, etc.).

      So unless a software company requires critical server infrastructure beyond what they get for free via iCloud, etc., it probably needs very little in the way of disaster preparedness, because the very nature of the work and the tools involved lends itself to being prepared for a disaster automatically.

      On the opposite end of the spectrum, cloud service providers and Internet service companies must have disaster preparedness plans in place, or else everybody who uses their services is screwed. And if they're down for even a couple of days, they're probably going out of business. If Facebook went down for a week, Google+ would become the #1 social network.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    13. Re:Geographic diversity by sjames · · Score: 2

      It does bring up a good point though. There is a lot of space between the A bomb proof data center with the geographically diverse duplicate data center with hot cutover and DR, what's that?

      As you point out, data backup is essential, but that doesn't imply a full duplicate data center. It may be that a very minimal setup is enough to limp along for a few weeks while things get back to normal. Limping doesn't necessarily mean no revenue.

      It's also useful to note that downtime due to storms and such doesn't necessarily mean lost equipment.

    14. Re:Geographic diversity by fustakrakich · · Score: 4, Funny

      Show me Google's manufacturing plants, please.

      Aren't they on the North Pole? I hope they can float...

      --
      “He’s not deformed, he’s just drunk!”
    15. Re:Geographic diversity by tlambert · · Score: 4, Insightful

      "I know Google manufactures their own computers, for the most part."

      As a former Google employee, I must say you are full of shit.

      Show me Google's manufacturing plants, please.

      As a former Google employee myself, I'm bound by my NDA from naming the East Asia contractors who build the actual equipment. Google generally only provides the reference implementation.

      Do you think Dell builds their own boards? They don't. The majority of their server class motherboards are manufactured by ASUS, based on Intel reference designs (Intel also no longer manufactures desktop motherboards, as of Haswell -- yields were too low).

      If you are curious about who made your motherboard, and run Windows, use the following command:
      wmic baseboard det product,Manufacturer,version,serialnumber

      (If you want a GUI version, download "Speccy", run it, and either look for the "Motherboard" section in the "Summary" view, or click on the "Motherboard" list item to get only that information by itself).

      Other OS's have their own commands, as an exercise for the student.

      P.S.: If the information has been obfuscated, you can usually back-track by looking at the BIOS vendor and version information, and then using searches for updated/same versions of the BIOS based on that, to see which platforms the BIOS vendor says it's for. You are welcome.

    16. Re:Geographic diversity by turbidostato · · Score: 2

      "This is an amazingly difficult concept to get people to understand. I've had way too many conversations with people who are sure they need an instantaneous failure-proof disaster recovery plan."

      Given your nickname it seems no wonder that you grasp the concept. Exactly yes to all you say.

      "It makes them feel unimportant, which most executives and business owners can't live with."

      That's true, but I'd say it's only half of the story. Specially business owners are quite sensible to the money part and quickly understand that there's no point in expending 1M upfront and then 100.000 a year in protecting from an outage with a 10 years recurrence that risks 500.000 in loses. From my experience, the other half of the story, specially dealing with non-director's board level management is the CYA part. You make the assess, the numbers are sensible and everybody agrees with that but, in the end, you are still gambling; most managers will gladly pay a lot of (company's) money just to avoid the chance of being the one having to go upstairs to tell the big boys that shit happened above and beyond the disaster recovery plan coverage -just as expected, but still...

      Some anecdotes, just for fun:
      -We need wide geographical coverage. We can't stand a local disaster since it would make us lose about 500.000 a day.
      -Well, your CRM says 80% of your billing comes from companies within this industry complex... do you think a local disaster won't impact your sellings anyway?

      -We need new offices within 72 hours in case this building is lost because we can't afford paying our employees for nothing.
      -Well, do you remember that, even if the building is gone off-hours without live loses, as per local labour law the company can forcibly set the dates of 11 out of the standard 22 days of yearly holidays so in case of disaster you can send your people home two weeks without it costing you an extra dime in wages nor lost productivity along a year? (this one was not so simple, but the nut of the case was this)

      -We need live migration between those two datacenters so in case of disaster there's sub-minute service disruption.
      -You see... this is quite costly and all you are basically doing is serving almost 100% static content with only minor changes. Wouldn't it be better is you spread the static load and/or let the content converge somewhere between 1 and 6 hours and drive down the costs ten-fold?

    17. Re:Geographic diversity by Anonymous Coward · · Score: 1

      Meh. After hurricane Irene our whole company was shut down for a week and we survived just fine. We lost a server for some reason but it only took less than a day to rebuild it.

    18. Re:Geographic diversity by Taco+Cowboy · · Score: 1

      No, it ain't him!

      --
      Muchas Gracias, Señor Edward Snowden !
    19. Re:Geographic diversity by Anonymous Coward · · Score: 1

      Please tell me how you get a 120Tb database back in that amazon server? quickly over the internet. most companies do not run very simple things with barely any data. and you are trusting the amazon people a whole lot with your company key information. Not many CSO's would allow what you suggest.

    20. Re:Geographic diversity by nine-times · · Score: 1

      From my experience, with each of your anecdotes, I'd half expect a response like: "Look, I'm not interested in all the technical details. That's your job. Why can't you just give me what I'm asking for?"

      For some reason, this comedy sketch comes to mind.

    21. Re:Geographic diversity by dgatwood · · Score: 1

      As I said, most software companies these days can take advantage of infrastructure like iCloud to avoid keeping their own databases. This makes it somebody else's problem. If Apple loses all of the iCloud data, it would be an end-of-the-world-level crisis, as I said, but there's no feasible way for individual software companies to back up that data, making it entirely out of their control.

      Amazon, of course, is different in that some of their cloud services are much closer to being servers under your control. If you're using that sort of service, then yes, you have to have a disaster preparedness strategy, even if that strategy is as simple as replicating the data on a geographically distant server continuously, and building backups into your schema. However, as software companies go, those sorts of needs are the exception, not the rule.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    22. Re:Geographic diversity by cyberchondriac · · Score: 2

      That made my day, thanks. I'm familiar with the "We think we should have all the greatest and best services that Fortune500 companies enjoy, even though we're funded like a corner convenience store.. and can we implement this with fewer employees and a smaller budget than we had 5 to 10 years ago?"

      --

      Look back up at my post, now look back down, you're on the Internet. Now look back up. I'm a signature.
    23. Re:Geographic diversity by Anonymous Coward · · Score: 0

      If your company can survive zero revenue and 100% loss for 30 days, you either are sitting on a mountain of money, or your business is more of a hobby than anything else.

      Having at least net 30 days' worth of cash available is hardly a "mountain" and would probably be very prudent. It's like saying "having 3 months worth of expenses in your personal bank account is a 'mountain of money'": no, it's a prudent emergency fund.

      Not having a bunch of cash around is more running a hobby than a real business. Having 1-2 payroll cycles worth of cash is not dumb.

    24. Re:Geographic diversity by Software · · Score: 2
      That should be

      wmic baseboard get product,Manufacturer,version,serialnumber

    25. Re:Geographic diversity by DerekLyons · · Score: 1

      You're making a lot of assumptions that aren't necessarily valid. The amount of downtime and the impact depends heavily on the nature of the company, and in particular, whether sales/income depends on maintaining continuous operation of the business.

      Indeed. It doesn't matter much if my wife's business has a hot spare data center or cloud installation to back up the local one... because a catastrophe that destroys local capacity will almost certainly destroy or seriously damage the physical premises - and the physical inventory located therein. No physical premises, no physical inventory - no business. They maintain off-site backups in the event of a casualty that smokes the servers but leaves the building intact - but they're at the other end of the city, not halfway across the continent. A disaster big enough to wipe out both has also likely taken out the city, and thus the local economy on which the business depends. There's just not much percentage in spending the money and effort to maintain the capability for short term recovery in the event of such a disaster.

      Much of the effort towards .9999 reliability and the ability to recover in an hour from anything short of an asteroid impact that levels the continent seems to me to be IT folks puffing up to prove their importance and gain attention.

      And if they're down for even a couple of days, they're probably going out of business. If Facebook went down for a week, Google+ would become the #1 social network.

      Would it stay that way past the end of the week? That's the real question that you have to answer when making these plans.

    26. Re:Geographic diversity by Khyber · · Score: 1

      "Do you think Dell builds their own boards? They don't."

      As a former HP and Dell engineer, uh, yes, they do.

      They build the original design and then hand that off to a company for mass production.

      Google does NOTHING OF THE SORT. They used pre-built designs that fit their particular form factor and desired specs.

      --
      Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
  2. Sandy was not a hurricane by turkeydance · · Score: 1

    it was a storm, whatever.

    1. Re:Sandy was not a hurricane by bobthesungeek76036 · · Score: 1

      Sandy WAS a hurricane ... it just wasn't when it made landfall in Northeast US...

      --
      Karma: Bad
    2. Re:Sandy was not a hurricane by jeaton · · Score: 1
    3. Re:Sandy was not a hurricane by acoustix · · Score: 1
      --
      "A plan fiendishly clever in its intricacies"- Homer Simpson
  3. New York by sexconker · · Score: 1

    Sandy was a much bigger storm when it was hitting Cuba and fucking up the southern end of the Atlantic coast. It was an actual hurricane then, in fact.

    But allllllllllll we fucking hear about is how New York was unprepared. New York isn't special and doesn't deserve special attention for being unprepared, but it sure turned into a fucking media event that's still going strong today.
    I don't know if the media wanted another Katrina or simply wanted to pander to their favorite place in the world (NYC), but it got old real fucking fast.

    And what the fuck is up with "The National Oceanic and Atmospheric Administration Climate Prediction Center is calling for calmer than normal storm activity this hurricane season"? You don't call for that, you predict it. Calls are to be answered (or not). Predictions are to be met (or not).

    1. Re:New York by turbidostato · · Score: 3, Funny

      "And what the fuck is up with "The National Oceanic and Atmospheric Administration Climate Prediction Center is calling for calmer than normal storm activity this hurricane season"? You don't call for that, you predict it. Calls are to be answered (or not). Predictions are to be met (or not)."

      Nononono... This is the United States of Almighty America. When NOAA calls, hurricanes abide! (or else, we send Chuck Norris).

    2. Re:New York by Anonymous Coward · · Score: 0

      Insecure much? Don't get your Euro-panties in a twist.

    3. Re:New York by Hognoxious · · Score: 1

      Hey! That was uncalled-for!

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    4. Re:New York by sexconker · · Score: 0

      Chuck Norris takes the stairs 3 at a time... ...going down.

      That's a sexconker original, feel free to use it.
      Better yet - feel free to try it yourself. Take those stairs 3 at a time when going down.

    5. Re:New York by cyberchondriac · · Score: 1

      I predict a storm of semantic pedantry :)

      --

      Look back up at my post, now look back down, you're on the Internet. Now look back up. I'm a signature.
  4. What it should have taught by CanadianMacFan · · Score: 3, Insightful

    Nothing. Disaster recovery plans are like backups... if you don't test them every so often then you assume that they don't work.

    Companies should have already tested their plans and known that they worked so that when any interruption from the storm kicks in their backups would take over as planned.

    1. Re:What it should have taught by rtb61 · · Score: 1, Troll

      In the days of corporate douche baggery, corporate recovery plans are like service and support are like reliability and security. Corporations spend more on executive bonuses when executives spends little or nothing on those things because bigger profits now. Then it all fucks up and the executives wanders off with those bonuses and a golden parachutes, this is not by accident this is because they do not give a fuck about anything but more greed now. You hire psychopaths as corporate executives and that is what you will end up with every single time.

      --
      Chaos - everything, everywhere, everywhen
  5. What about software diversity? by Anonymous Coward · · Score: 1

    You talk of geographic diversity, but that's only part of the picture. Software diversity is critical, too. Not all disasters are storms. Sometimes we have disasters of software architecture. Some will say that systemd is an example of this. Some of its architectural decisions, such as the use of binary logging and how it has subsumed so much unrelated functionality, prove to be very problematic for many users. That's totally separate from its implementation. Even a perfect implementation, which of course is not possible, would suffer from these architectural flaws.

    Yet just when Linux needs software diversity the most, we've seen almost all of the major distros being using systemd. Linux users don't even have a choice any longer; unless they want to use an absolutely archaic distro like Slackware, or an impractical distro like Gentoo, they're going to be burdened with systemd. This systemd monoculture poses a huge risk, in my opinion. All it will take is one serious flaw in systemd, and we'll have a situation much worse than the bash/Shellshock disaster. So while using different Linux distributions used to provide some protection, the diversity of the Linux ecosystem is approaching rock bottom, which means the risk is shooting through the roof.

    It's getting to the point where even long time Linux shops have to start introducing FreeBSD, OpenBSD, and even Windows to hedge against the increasing level of risk that can come with using modern Linux and its rapidly developing monoculture. Only through a diversity of software can at least some degree of protection be attained.

    1. Re:What about software diversity? by Anonymous Coward · · Score: 0

      > Some will say that systemd is an example of this

      We encountered this after upgrading a MongoDB cluster on Red Hat 7. The cause was our fault, but systemd hid the clear error message that was printed to stderr. If we had seen that, it would have taken us a few seconds to troubleshoot instead of hours. Also, "systemctl start mongod" returned a zero (nonerror) exit status even when MongoDB fails to start.

    2. Re: What about software diversity? by Anonymous Coward · · Score: 0

      Swallowing stderr makes troubleshooting such a pain in the neck. Often with systemd you have to resort to going through potentially hundreds of thousands of lines is strace output.

    3. Re: What about software diversity? by Anonymous Coward · · Score: 0

      You admit it was your fault. Why blame systemd?

    4. Re: What about software diversity? by Anonymous Coward · · Score: 0

      It's Redhat one word. Shows how stupid you systemd badgers are.

    5. Re: What about software diversity? by Anonymous Coward · · Score: 0

      > Shows how stupid you systemd badgers are.

      Do you mean bashers? That's the funniest typo I've seen in a while.

  6. Lengua Gato by rmdingler · · Score: 1

    Not at all irrelevant,about how much is taught, really, with not a bit of disaster..

    --
    Happiness in intelligent people is the rarest thing I know.

    Ernest Hemingway

  7. Katrina & A Datacenter/ISP by i.r.id10t · · Score: 2

    Wasn't there a datacenter guy who posted here on /. when Katrina hit about all the stuff they went through keepign things up and running at some sort of minimal level?

    Been drinking and google-fu is off but perhaps someone can post it. IIRC it included a blog of what was goign on, etc.

    --
    Don't blame me, I voted for Kodos
  8. Not an option for High Frequency Traders by tlambert · · Score: 1

    Not an option for High Frequency Traders. Geographic diversity means locating your fiber optic connect further way from the transatlantic fiber head ends which make HFT possible.

    1. Re:Not an option for High Frequency Traders by Hognoxious · · Score: 3, Insightful

      Terrible. How will we cope without them?

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    2. Re:Not an option for High Frequency Traders by sociocapitalist · · Score: 2

      Not an option for High Frequency Traders. Geographic diversity means locating your fiber optic connect further way from the transatlantic fiber head ends which make HFT possible.

      Nope -

      FHTs will typically have a presence in two colos per exchange and, depending on where they trade, with multiple exchanges each exchange being in a different country or region.

      If the disaster is big enough to take out both colos for a given exchange it has probably taken out the exchange as well.

      --
      blindly antisocialist = antisocial
    3. Re:Not an option for High Frequency Traders by Trogre · · Score: 1

      The optimum location for US High Frequency Traders is a few miles east of the transatlantic fiber head ends.

      --
      "Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
  9. What it taught us? by Ol+Olsoc · · Score: 1

    Almost certainly nothing.

    --
    The shepherds did so well protecting the flock that the sheep no longer believed that wolves existed.
  10. Hurricane taught nothing by Khyber · · Score: 1

    You can guarantee these IT idiots are going to leave the status quo intact for job security.

    --
    Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
  11. Servers need power, cooling and connectivity by Anonymous Coward · · Score: 0

    With so many comments here focusing on servers I'd just like to point out that those servers are generally useless without power, environmental controls and internet connectivity. If your disaster recovery plan doesn't address all of these issues, and physical security (some looters just want to watch the corporate world burn) it isn't worth the storage capacity it occupies.

  12. No Serious Precautions Taken by JimSadler · · Score: 2

    First we must be willing to build very large facilities capable of storing thousands of tents and food enough to last people for many weeks. And we need to do this in regional centers such that deliveries can take place the day after a storm leaves an area. Areas such as Miami simply can not be evacuated as the population is way too large. A strong storm can knock out roads and rails and put a large area into severe isolation. A few days worth of groceries will simply not help much. In my area our grocery stores were destroyed when three storms hit us back to back. Getting a car on the road was next to impossible and quite dangerous. Gasoline was not available at all as the gas wells were all flooded. I had no power for three solid weeks. A situation like that can get to the point at which people raid each other trying to keep from starving. I see no effective measures at all. If another Katrina hit New Orleans the results would be very much like the original Katrina. The repairs made are not designed to stand up to a class 5 hurricane. And it is only a matter of time before New Orleans gets hit by class 5 storm. Miami is in the same boat. We roll the dice here constantly and count on good luck not to bring on a class 5 storm.

  13. Nothing learned by Anonymous Coward · · Score: 1

    The immense sums forwarded after 9/11 to harden the infrastructure in the NY area were wasted on largess and patronage as usual and a storm came along and proved it. What to do was known, they just didn't do it, and still haven't.
     

  14. 1920s electricity infrastructure by dbIII · · Score: 2

    It taught us that 1920s electricity infrastructure shouldn't be in use in one of the richest cities on the planet - wet wood in contact with high voltage is a bad idea and the inevitable fires happened.
    Funny thing is I know a transmission guy who said "I told you so" based on what he said in the 1960s. Fifty years later that shit was still in service and it burned.

  15. History repeats itself by Monoman · · Score: 2

    Eventually things pretty much go back to the way they were before. I remember seeing a discussion about the lessons learned from Hurricane Andrew (not just IT specific) and how after 7 years things that were important were forgotten or deemed less important. I'm sure the same happened with Hurricane Katrina, Sandy, and many others. It seems to be our human nature that these things eventually wear off and become less important. I think Neil Degrasse Tyson was on Joe Rogan's podcast a few years ago and touched on the subject as well.

    --
    Keep the Classic Slashdot.
  16. Test your backup generators .. by nickweller · · Score: 2

    The backup generators failed as the fuel pumps couldn't be powered as there wasn't any electricity to power the pumps ref. Don't site your critical infrastructure in the basement. ref.

    1. Re:Test your backup generators .. by Monoman · · Score: 2

      Backup generators stopped running when they ray out of fuel because the tanks couldn't be refueled for various reasons. Power outages, roads blocked, behind on schedule, etc, etc. .... Nobody thought they would need an SLA on refills.

      --
      Keep the Classic Slashdot.
    2. Re:Test your backup generators .. by Cid+Highwind · · Score: 2

      Or they thought having an SLA for refills meant their fuel trucks get through... even when trees/power lines/National Guard soldiers are blocking the road.

      --
      0 1 - just my two bits
    3. Re:Test your backup generators .. by Monoman · · Score: 2

      Exactly. PHBs think an SLA is a guarantee. Even a "guarantee" isn't a guarantee in some circumstances. I live in "hurricane alley" and can tell you that when a storm hits that things don't usually go as planned.

      --
      Keep the Classic Slashdot.
    4. Re:Test your backup generators .. by Anonymous Coward · · Score: 0

      Very few people are prepared for any sort of emergency especially in the mainland US. These disasters don't happen frequently enough, so the decisions have been to forgo the safety and security. Everything is about cheap, hence annual destruction and insurance payments to reconstruct the cheap wooden homes. Replace them with sturdier concrete exteriors and nobody would have to lose lives or too much money. Instead, the thing that's only been recently done is to place a small "safe room" as a bunker to ride out the storm while the rest of the home and belongings blow away.

      That's still rather short term thinking, because if you go to the island of Guam on the Western Pacific ring of fire, they've replaced their homes with concrete after the 1976 Super Typhoon Pamela, a Category 4 storm that blew out many wooden homes that year. New homes there are all concrete and all include built-in storm shutters. The entire home is a safe room during typhoons and earthquakes. The typhoons there make more frequent landfall at higher intensity than the hurricanes on the Atlantic. The power poles were replaced with concrete poles that withstood 2 decades of typhoons, but many of them still broke during Super Typhoon Pongsona, a category 4+ storm, in 2002. Only 1 indirect death resulted from Pongsona, a storm that reach 173mph of peak gusts. The US Navy's steel poles remained standing, but all the cables were blown out. People that lived there are prepared for storms. I'm sure many of the US Military that have lived there and stationed there more than a year are more likely to be prepared for emergencies, even when they do come back Stateside. The people that grew up there that have since moved stateside are generally better prepared for disasters and don't have to rush out to buy supplies, because they've already got them at home. They're used to both quakes and typhoons, so they can live on both coasts of the US and still be more prepared than the vast majority of people that grew up here.

      Homes don't generally blow away anymore on Guam. Power still goes out each year, but they're quickly restored, except during the super typhoons. Unlike here, where power will go out even in gale force winds, or even just heavy winds, none of which reach even hurricane category 1. We're a country of short term planners, because we don't have frequent enough disasters to make it matter to the people in charge.

      https://en.wikipedia.org/wiki/Typhoon_Pongsona

  17. Final Rule: Test the Disaster Recovery Plan by wiredog · · Score: 2

    Story time: A few years ago I was working on a web app for a US intel/LEO agency in northern virginia. The app had started as a demo, then kind of grew. Like a fungus. It was never really designed, much less designed to shut down and restart unexpectedly. There were some other similarly "designed" apps running in the data center.

    The data center, being under the flight path for an airport, had a continuity of operations ("coop") plan and hardware. The "UPS" was a big generator with a switch so that it would take over when mains power went down. There was also a system designed to handle hot mirroring of everything and switch all network traffic to the backup center if the main center went down.

    A great system which was never tested because what if the test takes the system down for 15 minutes and we thus miss the opportunity to prevent the Next 9/11 and Thousands Die and, worse yet, we have to testify in front of Congress?

    So one day the fire marshall came through the building and, as part of his testing, hit the Big Red Switch. The switch designed to detect this and start the generators (and which was reported to cost $15) failed. All the systems went down, hard. The network switch in place to notify the hot backup site and send all the traffic there also failed. And the Vital Systems Protecting Our Nation From the Next 9/11 went down, worldwide.

    Don't just have a plan, test it.

    p.s. We never were able to determine how much, if any, data was lost....

  18. Rule 1 for I.T. by Anonymous Coward · · Score: 1

    Don't shutdown the servers that have the emergency disaster plans saved on them.

    That's what my I.T. department did. It took us 3 meetings to convince them to turn them back on.