ISP Recovers in 72 Hours After Leveling by Tornado
aldheorte writes "Amazing story of how an ISP in Jackson, TN, whose main facility was completely leveled by a tornado, recovered in 72 hours. The story is a great recounting of how they executed their disaster recovery plan, what they found they had left out of that plan, data recovery from destroyed hard drives, and perhaps the best argument ever for offsite backups. (Not affiliated with the ISP in question)"
Those businesses should realize they need a backup/disaster plan as well, if they absolutely could not withstand a day of downtime.
Perhaps having the sites mirrored on two colos in two locations, and routing to the other one when the first goes offline.
I don't need no instructions to know how to rock!!!!
that's computerworld receiving the /.ing
the isp is here
picture of the aftermath here
There is much cruelty in the universe, John.
Yeah, we seem to have the tour map.
I think a lot of sites already have contingency plans for sudden traffic increases, and if not, they begin to think about them very seriously once they get a large spike in traffic that causes disruption of service. Even with traffic spike contingency plans, the level you establish as the maximum amount of traffic that you need to be able to sustain, and what amount of latency or down time is acceptable to business, can be and often is debated ad nauseum. It costs a lot of money to maintain readiness for, say, double or triple normal site traffic for a large site, and you have to make a business case for balancing that cost with the cost of an outage due to increased traffic.
There are several things you can do to quickly add the capability to handle additional load, and most of them rely on forethought when establishing contracts with your colocation facilities and software/hardware vendors. For instance, most large colo facilities allow you to reserve additional bandwidth capability. You may pay more for that priviledge, but that's part of the cost of preparedness. Also, you may purchase or lease additional hardware, have it set up and ready to install in a short amount of time, but not use it on a regular basis because of high licensing costs.
Licensing costs for database software can be enormous, but in the event of a large spike in traffic, turning on an additional 20 or 30 cpus on a large database server could save the company a lot of money in lost revenues. Especially if you database software vendor specifically allows this in your contract. If the contract doesn't allow this, you may end up paying a lot more in licensing fees than you would have made in revenue during the outage.
My main point here is that planning for extra traffic is a big cost-benefit balancing act, and it requires a lot of forethought. Most large software, hardware and service providers allow for emergency clauses in contractual agreements, but it's often up to the customer to specifically call those out.
But then again, it's like insurance. You hope you don't need it, but you're glad you have it when you do. And you have to pay for it even if you don't need it.
Also, when you plan for traffic spike, you need to consider the source of the traffic. Denial of service attacks are often easy to mitigate with common network practices, and it's just a matter of preparing for those. But real, human-driven traffic is much different, less predictable, and actually capable of generating revenue.
Understanding your company's site infrastructure, software architecture and day-to-day traffic patterns is very important when it comes to handling real traffic spikes. When a real spike happens, network operators, developers and database admins (among others), will probably need to jump into action, looking for and attempting to mitigate bottlenecks as they appear. This can be a difficult task, and there's nothing worse than knowing what the problem is and not being able to do anything effective to combat it in a reasonable amount of time.
Real traffic doesn't just come from other sites, it can also be driven by other forms of communication, such as television, print and other media... even word of mouth (although I haven't seen an example of this). A large, syndicated national television news program that runs during primetime can generate a lot more traffic than most web sites, and those spikes seem to grow on orders of magnitude as the duration and repetition of air time increases. A fifteen minute segment that is marginally compelling might be enough to swamp all but the largest and most prepared sites. The silver lining of the television spike is that it declines very quickly after the segment ends.
A spike from multiple media sources, for instance print, web, and television, could be very difficult to handle, both in magnitude and duration. Although, duration isn't often a problem, because even the most prepared sites will succumb under a huge spike and