Quickly Switching Your Servers to Backups?
moogoogaipan writes "After a few days thinking about the quickest way to bring my website back to the internet users, I am still stuck at DNS. From experience, even if I set the TTL for my DNS zone file as low as 5 minutes, there are still DNS servers out there won't update until a few days later (Yeah. I'm looking at you, AOL). Here is my situation. Say that I have my web servers and database servers at a remote backup location, ready to serve. If we get hit by an earthquake at our main location, what can I do in a few hours to get everyone to go to our backup location?"
Same Provider at both (N) locations, Same IPs for servers/services, Just don't advertise the prefixes via BGP from the backup location until the primary one goes down.
You could spend a bundle of money doing global load balancing and maintaining a full hot spare site, or you could figure out how critical it really is that your website be up within 5 minutes of some major disaster like an earthquake.
In the event of a major disaster, the need for "immediate" recovery is actually defined as being able to be back up and running within 24 hours of the event. This is true even for business critical functions. Unless your business would cease to exist within 24 hours if your website went down, I would consider a 72 hour return to service to be perfectly adequate, and it would cost a whole lot less time and money to set up. Keep in mind that we are talking about an eventuality that would only occur if your primary site was entirely disabled for an extended period of time, which is highly unlikely to happen if you're hosted in any kind of modern data center.
dnsmadeeasy doesn't solve the problem the OP is asking about. They simply monitor your services and start serving a different DNS record if your primary is down.
The OP is concerned with all the DNS servers that aren't yours that would then have a cached version already, and continue to serve up the dead DNS record until their (incorrectly configured) TTL expired.
As another poster already mentioned, BGP is really the only technical solution to this problem. All other "solutions" are going to be convincing people that they don't really need instant failover in the event of a major disaster.