Quickly Switching Your Servers to Backups?
moogoogaipan writes "After a few days thinking about the quickest way to bring my website back to the internet users, I am still stuck at DNS. From experience, even if I set the TTL for my DNS zone file as low as 5 minutes, there are still DNS servers out there won't update until a few days later (Yeah. I'm looking at you, AOL). Here is my situation. Say that I have my web servers and database servers at a remote backup location, ready to serve. If we get hit by an earthquake at our main location, what can I do in a few hours to get everyone to go to our backup location?"
Same Provider at both (N) locations, Same IPs for servers/services, Just don't advertise the prefixes via BGP from the backup location until the primary one goes down.
NLB (Network Load Balancing) Cluster, link the two together and have them both serve the website. Not only will it not go down (barring freak accidents like both locations being hit at once) but it will also have the added benefit of presumably double the bandwidth and such.
Only problem is if you're locating them in two separate locations that they need to be able to communicate with each other and keep identical copies of the website and be able to connect to any databases you may need.
Basically any server clustering type setup if you can connect the two remotely would probably be a good starting point for your website assuming it is that important that it dont go down ever.
09F911029D74E35BD84156C5635688C0
+2 Troll is Slashdot's way of saying groupthink is confused
Talk to your ISP. They can set it up so the IP addresses at the main location can be rerouted to the DR site almost instantly.
You could spend a bundle of money doing global load balancing and maintaining a full hot spare site, or you could figure out how critical it really is that your website be up within 5 minutes of some major disaster like an earthquake.
In the event of a major disaster, the need for "immediate" recovery is actually defined as being able to be back up and running within 24 hours of the event. This is true even for business critical functions. Unless your business would cease to exist within 24 hours if your website went down, I would consider a 72 hour return to service to be perfectly adequate, and it would cost a whole lot less time and money to set up. Keep in mind that we are talking about an eventuality that would only occur if your primary site was entirely disabled for an extended period of time, which is highly unlikely to happen if you're hosted in any kind of modern data center.
A geographical load balancing solution, such as Coyote Point's Envoy or F5's Global Traffic Manager. Very expensive though.
You could hire an actual IT administrator who knows what they're doing? Like, one who's actually trained?
I hate to say this, but sometimes you want to avoid hosting locations in places that are earthquake, hurricane, or just natural disaster prone if it is that critical.
Then again even places like NYC are victims to total power grid failure once in a blue moon so you do want some type of clustering in place like the prior people mentioned. I can't tell how many times in IT I've heard someone say, "Some guy in Georgia just dug up a major fiber cable with his backhoe!"
"I am the king of the Romans, and am superior to rules of grammar!"
-Sigismund, Holy Roman Emperor (1368-1437)
We have used DNS failover from dnsmadeeasy.com for a couple years and have put it to the test a couple times. They have had perfect reliability and a low cost (typically well under $100/year).
The method is not perfect, but it is plenty good enough for our needs to protect against something that takes a datacenter down for a prolonged time (several minutes/hours/days). And the price
And to those who recommend avoiding "disaster prone" places: they all have people. People like the backhoe guy who took out the OC192 down the street. Or the core drillers who managed to punch both the primary and secondary optical links to a building of ours at a point where they were too close to each other.
You can roll your own by having a DNS server at each site and DNS 1 always issues IP of server 1 while DNS 2 always issues IP of server 2. But there are a number of issues like traffic hitting both sites at the same time. And you will have to detect more than just a down link so you will be scripting web test and DNS update systems. By the time you are done, you will have spent decades' worth of dnsmadeeasy fees.
Note: dnsmadeeasy isn't the only game in town. Just the one we happen to use.
~~~~~~~
"You are not remembered for doing what is expected of you." - Atul Chitnis
Depends, after some emergency, like an Earthquake, etc, IT services are the absolute last on my priority list.
The damn website can wait, family needs come first, most IT people are not paid enough to give a damn about a web site or server anyway.
That was me... sorry... my bad. FSB's (Fiber Seeking Backhoe) are tough to control.
For a real answer, buy Theo Schlossnagle's book, "Scalable Internet Architectures". Theo presented a lengthy and highly-informative session at OSCON last year, and I subsequently bought and read his book. Worth every penny if you're professionally involved in providing reliable Internet services of any kind.
but wrong answer for it. the disaster plan should include backup for key people and assume responsibility for their dependents, so the key people give a schytte about what they're doing and have an out for the whole family from the (hopefully local) disaster.
it's incumbent on manglement to have useful plans, and you should help make what they have useful. shift the end focus and present it to them.
if this is supposed to be a new economy, how come they still want my old fashioned money?
Rather than the name.
Your networking gurus will find that an interesting one. In reality though, a couple of days is probably good enough in the event of something like an earthquake, you may find that people have other things on their minds when that amount of shit hits the fan.
Still, kudos on planning for it. Most IT depts are taken completely by surprise and the business goes down in flames. A decent global filesystem (like AFS) helps massively.
Deleted
If you can get somebody with multiple allready redundant locations to put up a redirection page, then you are all set, since you can basically ride on their redundancy. Personally I think this is the cheapest solution. I don't see any way around DNS RFC violators.
Other options would likely require messing with routing info. I think tat is chancy at best.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
I'm always amazed at the expensive solutions that people bandy about.
:-)
Just set www.example.com to have two ip addresses: one at your main site, one at your backup site. On the start page at both sites, immediately redirect the browser to www1.example.com (main site) or www2.example.com (backup site). Run a script every five minutes to update that static start page based on whether your main site is still up using whatever technique suits your fancy.
DNS round-robin means that browsers try the different ip addresses until they find one that works. Primary and secondary DNS servers mean that dns lookups try the different servers until they find one that works. Use these features, that's what they're there for!
(If you're really big, and have servers all over the world, have your start page redirect people based on their geo-ip location, like sourceforge's downloads. Or pay someone a load of money, because you're obviously rich...
Enjoy!
Hi, An alternative is to forget the all-or-nothing view, and make sure that with some simple round-robin DNS and enough geographically-separated servers for the DNS and HTTP/whatever, then even if one is taken out by a quake or Act of Congress (ewwww, those nature programmes), *most* users will still get through just fine. Any clients/proxies that are smart and that can try out multiple A records for one URL will always get through if even one of your servers is reachable. Example: my main UK server failed strangely yesterday morning, but only about 30% of my visitors can even have noticed, and the other servers worldwide took up some of the load. Just simple and reliable and cheap round-robin DNS. Rgds Damon
http://m.earth.org.uk/
The industry pros discuss this sort of thing there all the time. The colocation sub-forum would be the best place to ask. I know that sounds odd, but that's the area on WHT where the best network/transit/BGP people hang out.
Nothing is inexplicable; only unexplained -Tom Baker, Doctor Who
You need cross-site IP address takeover. You can accomplish this generically with BGP (but if you're asking these questions, I'd stay away from this for now), or work with your ISP to set up a simple way to accomplish this.
When $ is no issue, a tier 1 colocation provider with their own services would be the best option. They've got big pipes, and will work with you to have the additional services needed. I'd go as far to say that you're going to want to have a failover script that they would follow in the event of site A going offline. You'd need redundant equipment, or use a DR firm for getting back up.
You can move a block of IP addresses, most sites will honor an advertisement of a /24 block, in my experience. With BGP you can cause this IP block to start getting routed to equipment in another part of the world. In other words, you can keep your DNS the same and cause the IP addresses to move. No DNS propagation time required. BGP changes can propagate in a minute or two, unless it's been flapping and remote routers have dampened the route.
Sean
A disaster tolerant system will have servers configured at multiple sites with real time data update. OpenVMS can do this with the remote sites being far enough away you need to fly there. How do I restore service? is the the wrong question, ask How do avoid downtime even if a site fails?
The Amsterdam Police IT group recently announced a 10 year uptime. OpenVMS.org has details on celebrating 10 years of uninterrupted service.
But, can't you use a CN for your external so that clients are forced to come to your DNS server to get the actual IP? I'm not sure it's the most effecient way, but then again your TTL is only 300. Like you make yourself the SOA and only use CN, and then just point the CN to the currently running server? It *would* mean having to have failover DNS servers though. I'm no admin, but I do admin-type work from time to time (read: I admin a single server where I have an internship that only touches the intranet and don't have to worry about such issues on a regular basis and have the manuals on my shelf for just these occasions).
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
It's been a long time since I've been near any kind of routing, wouldn't that require access to two separate Autonomous Systems? I'd be interested to know how this works, good refresher :-)
I personally thought that using VLANs would be a quicker way to go about it, but that's obviously a more localised solution. But you're right in looking for solutions at network level, there's too much ignoring of DNS TTL values going on (AOL being the example we all love) to make any other measure quick enough.
= Ch =
Insert
You don't want to mess with BGP unless you have plenty of money to have a redundant location, and a large enough IP block to justify it. You may find an ISP that has this set up or their own block, but I don't know of any.
The way to go is DNS. For an example of this, look at Akamai:
(Removed because of the fucking lameness filter. It was a very useful DNS lookup. Try 'dig images.apple.com' to see what I saw.)
It's done using an extremely short TTL on the final A record. Obviously this handles the vast majority of cases. I also recommend having a backup DNS site hosted by someone ELSE! Set up your two locations, and host DNS on them, but have third and fourth DNS servers that are authoritative for your domain. That way if your main site is down, you can switch to secondary, but if secondary goes down, you can set up something else in a pinch and point your backup DNS at it. If you don't have this, there's no chance you'll get back up in less than a day, as you'll have to change your domain's DNS servers.
Also, if you're hosting email for you domain, set up a mail forwarding service that will hold your mail or deliver it to various addresses while your main site is down.
I used Rollernet for both of these services, but I currently only use them for Secondary DNS, since my mail is now hosted by TuffMail. As a former LFS-building home-server-roller, it's nice to have others dealing with that stuff.
Hello little man. I will destroy you!
That's what makes places like Boise, Idaho ideal areas for datacenters. 1) No natural disasters 2) Good Power Grid 3) Cheap HydroPower 4) Unused Fiber Networks solutionpro is the way to go
sitebacker - that will change your dns within 3m of your site going down.
cheers,
Alex