DNS based Website Failover Solutions?
Chase asks: "I run a couple of websites(including for my work). I'd like to have a backup web server that people would hit when my server goes down. My primary host is on my companies T1 line and even though I've had my server die once the most common reason for my sites to be offline is that our T1 goes down. I've looked at the High-Availability Linux Project but it seems that almost everything there is for failover using ip takeover which isn't an option if my network link dies and my backup server is on a different network. ZoneEdit seems to offer what I'm looking for but I'm wanting a do it myself solution. The only software I've found is Eddie and it seems to have stopped development around 2000. I know DNS based failover doesn't give 100% uptime but with a low cache time and decent monitoring it seems like it's the best solution for having my backup server at a differnt location and on a differnt network. Anyone know of a good solution? (Using Linux and/or Solaris hosts)"
Dyndns.org offers free DNS services for dynamic ip addresses. They also offer a fee service that allows you to use your own domain name. Why not set it up with them? If your web server is unreachable by the other server, it will send a dyndns update query with the new address. Just a thought. -P
If I understand you correctly you you are looking for a F/OSS project to do what you are after.
However if you do actaully have a budget to spend have a look at the 3DNS product from F5 Networks. it does the failover you describe and although it works better if it is intereacting with F5's server load balancing product, it can still monitor and react to standard web servers becoming unavailable.
The Romans didn't find algebra very challenging, because X was always 10
If your T1 is down tht often I'd change providers. My T1 has been 'slow' once in the past year with 1 outage that lasted for about an hour when we first installed it.
/* oops I accidentally made a comment, sorry */
1. Use colocation/Web hosting as the primary site. Their uptimes are usually very strong.
2. You will need a second line. Mandatory. If you really want insane uptime, you'll need dynamic routes ala BGP from both ISP's. If you don't need that much, you could maybe work with an automated probe-and-dnsupdate script which can run outside the network. It would switch the primary DNS to and from the backup IP address which is on the isolated network.
3. Have an equalized DNS entry for both IP addresses. It gives the client a 50% chance of connecting once its dead, but its better than nothing.
4. Tell the site visitors to connect to www1.mysite.com if they're having troubles reaching your site and have www1 pointing to your backup IP. Make sure your DNS servers are network redudant as well, or the whole excersize is pretty pointless.
Bye!
More information here.
Read all about IP take over and distributing server load as sample chapter of O'Reilly's Linux Server Hacks.
Don't know if it works for your setup.
My favorite quote:
If you serve a particularly popular site, you will eventually find the wall at which your server simply can't serve any more requests. In the web server world, this is called the Slashdot effect, and it isn't a pretty site (er, sight)
Ignoring the fact that DNS wasn't designed to handle this (setting your ttl to a low time (e.g., 5min) generates a good amount of useless traffic when your site is up), here is how you might do it:
First, you need to have a monitoring system on the Internet somewhere, not through your T1 because if that goes down it won't be able to update your DNS. You have that already, I'm sure, to test your web site accessibility from the Internet. Of course, at least one of your name servers must be accessible when the T1 goes down too, so that will have to be somewhere (other than on your T1) on the Internet as well.
On this name server enable dynamic updates. Modify your monitor system that checks availability of your site to use Net::DNS to update the IP address of your web server when the monitor fails.
Going all open source, I'd use Net::DNS and nagios for the monitoring software, bind for the name server (which supports dynamic updates), with Linux as the OS.
First thing you need to do if decide what kind of downtime is acceptable. 5 seconds, minutes, hours?
:)
Then you need to look at your services you're offering from your website, is it all static, session-based or what?
Combine the two to figure out how much your downtime is going to actually cost you. For example, if my personal site, which is static, is down for 5 hours the only person who is going to really care is me. And I don't pay myself much.
Flipside, on an ecommerce site with shopping cart, that 5 minutes of downtime could cost a lot of lost sales.
In otherwords, your redundancy plan should match how much you think you'll lose if Bad Things Happen.
Now, you're on a T1 with some personal stuff, let's assume 5 minutes is fine, money lost is minimal, but any more time will be irritating. Your content is static. Here's a cheap DIY solution and yes it's DNS based.
Setup identical webservers on seperate networks. Have those servers also be the nameservers for the website in question. Configure each webserver to only answer an A query as itself. The ttl for the A record needs to be low (5-10 minutes). Now, if one of the servers/networks goes down, clients can only resolve DNS by reaching a server; server down, can't query it, they'll hit the other server.
This method has some downsides, as mentioned bandwidth usage will be higher as more DNS queries will be made. Session-based stuff also won't work, no guarantee which server any given request will hit.
Anything is possible given time and money.
The site is distributed on 4 web servers : 3 on ADSL lines, one on SourceForge. I use 3 independant DNS to announce the web site. On each DNS I also run NAGIOS to monitor each web site. When one of the web site goes down (or up) a special handler (in perl) is called by NAGIOS and dynamicaly update the DNS entry
see global Load balancing for more details and code examples (in french only, but I am working on an English translation).
I set up the DNS TTL to 300 seconds, and NAGIOS can detect a state change in 2 or 3 minutes. So I can have global fail over in less than 10mn.
I have the system running for some month, and it works very well.
It's a king of "poor man's" akamai.
Most registrars will provide you the ability to run at least 2, and usually more name servers (I think 6 is the limit). By using this fact, and the the fact that a client will request dns and use the first authoritive response it gets we can impliment something like the following.
:
Colocation facility 1 machine gets named "DNS1.domain.com" and is a reverse proxy to your real site. Colocation facility 2 machine gets named "DNS2.domain.com" and is also a reverse proxy to your real site. Add cache content sharing between these two servers for extra availability.
You will also be adding DNS servers to each one of those colocated servers. They run as masters (not slaves). The contents of the zones will make each server the single point of contact for your content.
With this setup the following happens when users request your content
Browsers requests DNS lookup.
Client name server queries all the DNS servers for that domain for the request. First response wins.
Browser contacts your colocation server for content.
Colocation server checks its cache of your site.
if content does not exist, it will ask the cache partner for content, and then will query the real site.
Real site serves content to the proxy server at a much reduced rate.
The program isn't debugged until the last user is dead.
... for a VoIP project. It's a really stupid way of getting very high availability, but it can be made to work, and it is cheap to implement.
Basics are:
(1) you need a heart beat to confirm the master machine is running.
(2) You write a simple script using dnsupdate(8) that removes your master and inserts the backup.
(3) You look up the special magic to tell DNS caching to flush on other machines.
Then again, if it dosn't matter to you, don't worry about it. Just do RR-DNS and manually cut out the failed IP. "most" people will get the still-working servers.