ISP Recovers in 72 Hours After Leveling by Tornado
aldheorte writes "Amazing story of how an ISP in Jackson, TN, whose main facility was completely leveled by a tornado, recovered in 72 hours. The story is a great recounting of how they executed their disaster recovery plan, what they found they had left out of that plan, data recovery from destroyed hard drives, and perhaps the best argument ever for offsite backups. (Not affiliated with the ISP in question)"
Now that that's out of the way, it never ceases to amaze me how many companies have little to no severe disaster recovery plans, and how a little bit of ingenuity(sp?) can go a long way in a company.
Times of crisis and how one deals with them are the mark of successful businesses/employees/people. I don't think that we could recover so quickly should a disaster of that size hit my job, but it'd be fun to try.
This is what happens when people make intelligent plans and the modify them as they see other plans work or fail. I'm glad to see that this was a work in progress rather than some arcane plan in a binder somewhere that no one ever looked at.
The Blaster Master Fighting for Truth, Justice, and Evil Pie since 1979
...is a good enough argument for off site backups. If you don't have them, your backup plan is not enough.
let me get this straight, all the houses around the isp have no power, no phone... but they still need to get online?
Runnin' On Empty
Then I've seen the other end of the spectrum - a 6 Billion dollar corporation's world HQ IT center... wow. They have disaster recovery sessions and planning like I never would have imagined. Very cool facility, but it has to be like that. Some day if they get burned, it's all over.
Berto
What amazes me isn't that these people were able to restore service to their customers in 72 hours. They used standard systems administration techniques. BGP was specifically mentioned.
No, what amazes me is that this is news. The IT industry is so full of idiots and morons and MCSEs that taking basic precautions earns you a six-figure salary and news coverage. These folks didn't even have off-site backups, it was luck that they were able to resume business operations (ie: billing) so soon.
Moral of the story? When automobile manufacturers start getting press coverage for doing a great job because unlike their competition, they install brakes in their vehicles, you know that the top-tier IT managers and executives have switched industries.
Barclay family motto:
Aut agere aut mori.
(Either action or death.)
OK I just may be jaded I work in a secor that thinks 5 minutes is earth shattering ammounts of downtime. 72 hours would ahve me everybody that works for me and some C level guys fired at the companies I work for. First things first what did they do wrong backups stored on site this is page 2 of a disaster recovery howto backup need to be stored onsite and remote, they also need to be verified as functional (yes I am that manager that insists that servers be restored and checked for functionality on the backup hardware during a work window) From the story it wasent even client data as much as it was there billing DB and other office information. When will people learn that information makes a lot of businesses and needs to be protected a nominal cost to do proper backups and house them remotly even if it's in a bank vault a few towns over perferably the other coast. Satalite uplinks can provide decent ammounts of bandwith in a pinch though the latency is horid.
No sir I dont like it.
Wow! This is exactly the reason that systems administrators generally dislike most members of their development group. Your attitude does not do very much to endeer us 'cable monkeys' and 'PHB's to you.
"IT people", who give a shit about logs and backups and think plugging a PC and monitor into a powerbar is "computer science"
If you think this is all that is involved in running a remotely large and reliable network, you are sadly mistaken my friend. A lot of thought, planning and testing goes into most corporate network infrastructures.....kinda like software development.
"Computer Science" is a very broad term that encompasses much more than just 'programming'.
Many companies in the World Trade Center thought that off-site backup meant the other building.
Cave, wreck, and deep diver.
Prevent email address forgery. Publish SPF records for y
What takes an hour is that the technician has to take care of the other 20 people who can't be bothered to plug a cable back into the wall on their own.
Oh, and, of course, the tech also has to take care of real work - like fixing the programmer's machine after he installs the latest Webshots and Gator software.
Me: "It took our technican an hour to get all of the malware off of Stratjakt's computer that he downloaded from the Internet."
CTO: "Didn't he read the email that I sent out every month for the last six months telling the employees not to install non-work-related software?"
Me: "Well, I asked him about that...he said that he was a programmer and just doesn't care."
CTO: "He's fired."
Oh, and, incidentally, when your self-administering software becomes proficient enough to keep your big foot from wrapping around the network cable and yanking it out of the wall, then I'd say you really had something worthwhile. At this point, though, I have my doubts.
Keep up the good work.
sloth jr
As an Architect, even building a below-ground bunker might not protect you from the full force of Nature with a capitol 'N'.
I'm in California, and as such, we design buildings to take a certain scale of earthquake or less; not because clients are cheap, but because above a certain point all bets are off, no matter what kind of building you've built! At some point the force of Nature you're dealing with is so staggering that no amount of preparation or work can give you a guaranteed resistance.
I doubt many buildings could take a direct hit from a tornado; and even if they could that's not saying that everything that's not the building (i.e. all that fancy computer equipment and nice people inside) wouldn't be sucked out and sent to OZ in a minute...
The company I work for practices disaster recovery once a year on all our major systems.
In the article the writer was talking about how much work it was to migrate the T1 connections, and how they hadn't forseen that. That is exactly the sort of thing that a practice disaster recovery uncovers.
If you want the model from the place I work it is simple enough:
1. Run the disaster recovery during a 24 hour period
2. Pat yourself on the back for what worked.
3. Ignore what doesn't work.
4. Repeat next year.
Of course next year gets a new step:
3.5 Act surprised that stuff didn't work.
Actually.. I ran a technical support department for a small ISP for a couple years.
It amazing how accurate you are in reguards to customer viewpoint on downtime.
After having done it myself, I actually have MUCH more respect for technicul support engineers/supervisors becuase within reason most "downtime" is fixed even before the customer knows about it (i.e. small blips in service).
And the majority of people who purchase an ISP's services have absolutely no idea what it takes to respond to an outtage.
....move along....nothing to see here....
When you go to a DRP seminar, they make the claim that the majority of business that are knocked out for longer than 48 hours go out of business within 1 year.
This is really sad, and the company could have fired him for being incompetent. He basically destroyed their intellectual property through negligence, wasting all the money they invested in his project, which was almost certainly more than just his salary for that time period.
If a truck driver gets a load and forgets to check his own tie-downs, and as a result loses the load before reaching his destination, whose fault is it?
Besides, as supreme programmer, he should be motivated to work sometimes from home in the middle of the night, and have backups there
Get off my launchpad!