DNS based Website Failover Solutions?

← Back to Stories (view on slashdot.org)

DNS based Website Failover Solutions?

Posted by Cliff on Monday May 17, 2004 @06:30PM from the minimizing-downtime dept.

Chase asks: "I run a couple of websites(including for my work). I'd like to have a backup web server that people would hit when my server goes down. My primary host is on my companies T1 line and even though I've had my server die once the most common reason for my sites to be offline is that our T1 goes down. I've looked at the High-Availability Linux Project but it seems that almost everything there is for failover using ip takeover which isn't an option if my network link dies and my backup server is on a different network. ZoneEdit seems to offer what I'm looking for but I'm wanting a do it myself solution. The only software I've found is Eddie and it seems to have stopped development around 2000. I know DNS based failover doesn't give 100% uptime but with a low cache time and decent monitoring it seems like it's the best solution for having my backup server at a differnt location and on a differnt network. Anyone know of a good solution? (Using Linux and/or Solaris hosts)"

39 comments

Min score:

Reason:

Sort:

Dyndns by pbulteel73 · 2004-05-17 18:33 · Score: 2, Interesting

Dyndns.org offers free DNS services for dynamic ip addresses. They also offer a fee service that allows you to use your own domain name. Why not set it up with them? If your web server is unreachable by the other server, it will send a dyndns update query with the new address. Just a thought. -P
1. Re:Dyndns by anicklin · 2004-05-17 18:46 · Score: 2, Informative
  
  dyndns is pretty good in that with a custom domain, you can set an 'offline' redirect URI. However, this has to be done manually with an internet connection - kind of a problem if the dedicated public connection is unavailable, although you could always revert to some sort of dialup to get onto their web site and update it.
  
  They will let you configure custom TTL values on A (host) records. I set mine to 5 minutes and it works just fine.
  
  There are some automated engines out there which will update the dyndns service automatically, but I have not seen any which will automatically set the unavailable URI if the primary internet connection isn't available.
  
  dyndns is more oriented at people who want to host but their address changes frequently, whether for black-hat, white-hat or ISP DHCP reasons. However, while reliability has never been a problem with their service, it may not suit the needs of a more commercial customer.
  
  Just my two cents as a happy user.
2. Re:Dyndns by filenabber · 2004-05-18 00:17 · Score: 1
  
  Zoneedit does this too but he said "I'm wanting a do it myself solution". Brian
  
  --
  Are you a Candy Addict?
Depends whether you want to pay for it . . . by unixbob · 2004-05-17 18:38 · Score: 4, Informative

If I understand you correctly you you are looking for a F/OSS project to do what you are after.

However if you do actaully have a budget to spend have a look at the 3DNS product from F5 Networks. it does the failover you describe and although it works better if it is intereacting with F5's server load balancing product, it can still monitor and react to standard web servers becoming unavailable.

--
The Romans didn't find algebra very challenging, because X was always 10
1. Re:Depends whether you want to pay for it . . . by Kurtv · 2004-05-19 12:18 · Score: 1
  
  The F5 stuff can be really expensive, I looked into it. I've seen free/open source monitoring stuff but no failover apps. Im using no-ip.com's monitoring service, its basically DNS based failover and they monitor my site every 2 or 3 minutes. I priced a whole bunch of providers around earlier in the year and found them to be the cheapest.
uhhhh by nocomment · 2004-05-17 18:48 · Score: 2, Informative

If your T1 is down tht often I'd change providers. My T1 has been 'slow' once in the past year with 1 outage that lasted for about an hour when we first installed it.

--
/* oops I accidentally made a comment, sorry */
/* http://allyourbasearebelongto.us */
1. Re:uhhhh by flonker · 2004-05-17 18:56 · Score: 1
  
  Me too.
  
  Seriously, our main reason to go with a T1 instead of business DSL is because a T1 comes with a guaranteed QoS. We had our T1 line become slow, and they had a tech come over at 4am on a Sunday to fix it. And he was *really* good. (Sprint, in case you're wondering.)
  
  Of course, you can never completely avoid backhoes.
2. Re:uhhhh by nocomment · 2004-05-17 19:20 · Score: 2, Informative
  
  If you need the QoS, but not necessarily a full T1 maybe you should look at SDSL. With ADSL the phone company owns the switching equipment and can turn it off/move/upgrade/whatever whenever they want. But with SDSL the provider (ie speakeasy, covad(if covad does sdsl)) owns the switching equipment and will skip over it when doing their moves/upgrades/whatever. Speakeasy has a QoS guarantee. I still feel safer with a T1 though :-)
  
  backhoes are easy to fix, I remember when I worked at Mindspring (pre-Earthlink) there was major outage (a hurricane I think) in NY that not only broke the T1 (there was exposed fiber) but it was also under 30' of water. It took 7 days to drain the water before the cables could be repaired.
  
  --
  /* oops I accidentally made a comment, sorry */
  /* http://allyourbasearebelongto.us */
3. Re:uhhhh by shoppa · 2004-05-19 06:48 · Score: 1
  
  With regards to SDSL, I've had it for nearly 6 years now and the number of outagaes caused by equipment off my property can be counted on one hand, and the only outage that lasted more than a few minutes was a few hours.
  This is with Covad (resold by uunet) and with Rhythms (After they were bought out by uunet).
  At the same time, lightning-caused damage and power outages have caused several week-long outages... but when nobody in the neighborhood has electricity for a week it's hard to complain about your SDSL not working :-)
A few ways.. by ADRA · 2004-05-17 19:00 · Score: 4, Informative

1. Use colocation/Web hosting as the primary site. Their uptimes are usually very strong.

2. You will need a second line. Mandatory. If you really want insane uptime, you'll need dynamic routes ala BGP from both ISP's. If you don't need that much, you could maybe work with an automated probe-and-dnsupdate script which can run outside the network. It would switch the primary DNS to and from the backup IP address which is on the isolated network.

3. Have an equalized DNS entry for both IP addresses. It gives the client a 50% chance of connecting once its dead, but its better than nothing.

4. Tell the site visitors to connect to www1.mysite.com if they're having troubles reaching your site and have www1 pointing to your backup IP. Make sure your DNS servers are network redudant as well, or the whole excersize is pretty pointless.

--
Bye!
1. Re:A few ways.. by rbbs · 2004-05-17 21:14 · Score: 1
  
  >>Tell the site visitors to connect to www1.mysite.com
  
  Why not write a little redirect php script that is hosted somehwhere with mad uptime - that script would ping both hosts and direct the user to the one which responds quickest. -
  of course if the php script machine went down you'd be toast but....
  
  no idea if this works in large volumes, but we use something similar for client side redirects...
2. Re:A few ways.. by WSSA · 2004-05-17 22:21 · Score: 1
  
  Have an equalized DNS entry for both IP addresses. It gives the client a 50% chance of connecting once its dead, but its better than nothing.
  
  More to the point, have the browser 'stick' to the server they initially connect to. In other words have the www1.mydomain.com server's content contain references only to www1.mydomain.com, and not www.mydomain.com (and similarly for the content on www2). Otherwise you'll have 50% of all links/IMG tags and so on fail rather than just 50% of all initial connections.
You could always use IPv4 Anycasting. by Mordant · 2004-05-17 20:06 · Score: 2, Informative

More information here.
1. Re:You could always use IPv4 Anycasting. by dfn5 · 2004-05-19 03:20 · Score: 1
  
  I can see how this would help in stateless protocols like DNS or for use in routers, such as the IPv4 to IPv6 gateways, but how does this help in the case of a web server?
  
  --
  -- Thou hast strayed far from the path of the Avatar.
2. Re:You could always use IPv4 Anycasting. by Mordant · 2004-05-19 18:00 · Score: 1
  
  It's best-suited for stateless stuff, but you don't get stateful failover within load-balanced Web clusters and such most of the time (unless the content-provider has gone to a lot of trouble to support it, using the right software and hardware and coding), so it's sometimes better than nothing.
Linux server hacks and the slashdot-effect... by kwench · 2004-05-17 20:44 · Score: 2, Interesting

Read all about IP take over and distributing server load as sample chapter of O'Reilly's Linux Server Hacks.
Don't know if it works for your setup.
My favorite quote:
If you serve a particularly popular site, you will eventually find the wall at which your server simply can't serve any more requests. In the web server world, this is called the Slashdot effect, and it isn't a pretty site (er, sight)
RFC 2136 + Net::DNS + your monitoring software by embobo · 2004-05-17 20:53 · Score: 3, Informative

Ignoring the fact that DNS wasn't designed to handle this (setting your ttl to a low time (e.g., 5min) generates a good amount of useless traffic when your site is up), here is how you might do it:

First, you need to have a monitoring system on the Internet somewhere, not through your T1 because if that goes down it won't be able to update your DNS. You have that already, I'm sure, to test your web site accessibility from the Internet. Of course, at least one of your name servers must be accessible when the T1 goes down too, so that will have to be somewhere (other than on your T1) on the Internet as well.

On this name server enable dynamic updates. Modify your monitor system that checks availability of your site to use Net::DNS to update the IP address of your web server when the monitor fails.

Going all open source, I'd use Net::DNS and nagios for the monitoring software, bind for the name server (which supports dynamic updates), with Linux as the OS.
1. Re:RFC 2136 + Net::DNS + your monitoring software by byolinux · 2004-05-17 21:06 · Score: 3, Informative
  
  Nagios
  
  with Linux as the OS
  
  Kernel! And anyway, does the fact you're using GNU/Linux or *BSD actually make a difference to this?
  
  --
  Join the Free Software Foundation
2. Re:RFC 2136 + Net::DNS + your monitoring software by embobo · 2004-05-17 23:26 · Score: 1
  
  It would make no difference, I just threw in the mention of Linux to get some Karma :). OpenBSD would be my choice. The poster mentioned Solaris, which I would shy away from. No need to have scalable performance for something as simple as this.
3. Re:RFC 2136 + Net::DNS + your monitoring software by FistFuck · 2004-05-18 01:13 · Score: 3, Informative
  
  I do it now with two shell scripts.
  
  The key is that I use tcpclient from DJBs ucspi-tcp package:
  
  http://cr.yp.to/ucspi-tcp.html
  
  Don't hurt yourself with BIND, either. Parsing that file is going to hurt your brain. I use grep -v to manage my data file for tinydns:
  
  http://cr.yp.to/djbdns.html
  
  Maybe I'll get around to publishing my work. A brief synopsis:
  
  I do a tcp connection to port 80 on my webservers with a 5 second timout. If the connection fails it pulls all IPs assoicated with that server out of my DNS. Not only does this determine if the server is up but it also determines if the server needs less load because it can't get to my request
  in time.
  
  There's a state file for each webserver, ie webserver.up or webserver.down. That's easy to look for later to determine if I need to change the DNS tables.
  
  I run the check every 60 seconds. I only have two servers so it's not too tough.
  
  I also check www.yahoo.com and www.google.com availability over each ISP to determine if an ISP is available. I update DNS based on the ISP conditions as well.
  
  I say again, try to avoid BIND if you can, I can't think of a sane way to process your zone files with shell scripting.
4. Re:RFC 2136 + Net::DNS + your monitoring software by ptudor · 2004-05-18 13:30 · Score: 2, Informative
  
  ...I can't think of a sane way to process your zone files with shell scripting.
  Luckily, when moving to tinydns there is a sane way to convert your zone files with shell scripting.
I'm looking for something similar... by byolinux · 2004-05-17 21:01 · Score: 1

... but also redundancy for when servers go down.

If I had multiple servers, could I keep them in sync with rsync? Or is there a better way?

--
Join the Free Software Foundation
Supersparrow by LWolenczak · 2004-05-17 22:37 · Score: 1

Supersparrow is a BGP DNS based GSLB. It is pretty cool. I guess you could use it too. You can find info at http://linuxvirtualserver.org.
Re:Dyndns NOT! by Anonymous Coward · 2004-05-17 22:53 · Score: 0

If you could use OpenBSD then what about CARP? Otherwise Cisco Local Director / Distributed Director some ACNS stuff ( proprietary!) or Alteon load balancer/switch.... ( proprietary!) can't remember.. had some beers already etc.... hiccup :D

Answers on a S.A.E. postcard....
It depends.... by RedHat+Rocky · 2004-05-18 01:34 · Score: 2, Insightful

First thing you need to do if decide what kind of downtime is acceptable. 5 seconds, minutes, hours?

Then you need to look at your services you're offering from your website, is it all static, session-based or what?

Combine the two to figure out how much your downtime is going to actually cost you. For example, if my personal site, which is static, is down for 5 hours the only person who is going to really care is me. And I don't pay myself much. :)

Flipside, on an ecommerce site with shopping cart, that 5 minutes of downtime could cost a lot of lost sales.

In otherwords, your redundancy plan should match how much you think you'll lose if Bad Things Happen.

Now, you're on a T1 with some personal stuff, let's assume 5 minutes is fine, money lost is minimal, but any more time will be irritating. Your content is static. Here's a cheap DIY solution and yes it's DNS based.

Setup identical webservers on seperate networks. Have those servers also be the nameservers for the website in question. Configure each webserver to only answer an A query as itself. The ttl for the A record needs to be low (5-10 minutes). Now, if one of the servers/networks goes down, clients can only resolve DNS by reaching a server; server down, can't query it, they'll hit the other server.

This method has some downsides, as mentioned bandwidth usage will be higher as more DNS queries will be made. Session-based stuff also won't work, no guarantee which server any given request will hit.

--
Anything is possible given time and money.
see p2pweb.net by p2pweb · 2004-05-18 03:43 · Score: 3, Interesting

I'm working on a similar project : it's called p2pweb.net.
The site is distributed on 4 web servers : 3 on ADSL lines, one on SourceForge. I use 3 independant DNS to announce the web site. On each DNS I also run NAGIOS to monitor each web site. When one of the web site goes down (or up) a special handler (in perl) is called by NAGIOS and dynamicaly update the DNS entry
see global Load balancing for more details and code examples (in french only, but I am working on an English translation).
I set up the DNS TTL to 300 seconds, and NAGIOS can detect a state change in 2 or 3 minutes. So I can have global fail over in less than 10mn.
I have the system running for some month, and it works very well.
It's a king of "poor man's" akamai.
We tried it, and it didn't work. by Anonymous Coward · 2004-05-18 04:05 · Score: 1, Informative

That's right, it didn't! We found that even when we set the TTL to 60 seconds, some DNS servers still cached the old name look-up for hours, if not days. One of our remote sites was using the Windows NT DNS server, and it cached out of date name look-up for 30 days! Damn Microsoft. This makes DNS-based failovers useless for most purposes.
Multiple Master Name Servers by fdragon · 2004-05-18 05:10 · Score: 3, Insightful

Most registrars will provide you the ability to run at least 2, and usually more name servers (I think 6 is the limit). By using this fact, and the the fact that a client will request dns and use the first authoritive response it gets we can impliment something like the following.

Colocation facility 1 machine gets named "DNS1.domain.com" and is a reverse proxy to your real site. Colocation facility 2 machine gets named "DNS2.domain.com" and is also a reverse proxy to your real site. Add cache content sharing between these two servers for extra availability.

You will also be adding DNS servers to each one of those colocated servers. They run as masters (not slaves). The contents of the zones will make each server the single point of contact for your content.

With this setup the following happens when users request your content :

Browsers requests DNS lookup.

Client name server queries all the DNS servers for that domain for the request. First response wins.

Browser contacts your colocation server for content.

Colocation server checks its cache of your site.

if content does not exist, it will ask the cache partner for content, and then will query the real site.

Real site serves content to the proxy server at a much reduced rate.

--
The program isn't debugged until the last user is dead.
right way, but expensive by uslinux.net · 2004-05-18 08:58 · Score: 1

The right way to do it is with a GSLB (global server load balancer) or a VIP - basically a DNS round-robin that hides the round robin nature and removes broken servers from the valid pool when they're down.

But, as others have mentioned, if you already have a T1 it shouldn't be down much. If it is, you're better off changing providers. Setting your DNS TTL low is a hack that will subsume quite a bit of bandwidth.
I've done it by crmartin · 2004-05-18 09:08 · Score: 2, Interesting

... for a VoIP project. It's a really stupid way of getting very high availability, but it can be made to work, and it is cheap to implement.

Basics are:

(1) you need a heart beat to confirm the master machine is running.

(2) You write a simple script using dnsupdate(8) that removes your master and inserts the backup.

(3) You look up the special magic to tell DNS caching to flush on other machines.
Don't use DNS failover. by Harik · 2004-05-18 11:14 · Score: 3, Informative

more then one large company enforces a minimum TTL to cut down on outbound lookups. Notably, AOL clients keep hitting the old address up to 24 hours after the switchover. Other ISPs/firewalled companies do the same.
Then again, if it dosn't matter to you, don't worry about it. Just do RR-DNS and manually cut out the failed IP. "most" people will get the still-working servers.
djbdns by decep · 2004-05-18 11:25 · Score: 1

http://cr.yp.to/djbdns/balance.html describes what you are wanting to do (look near the bottom). Your DNS server would have to be colocated, though.
Load Balance your DNS servers! by Dolemite_the_Wiz · 2004-05-18 12:06 · Score: 1

I had this same problem with SMTP servers in a previous job.

The DNS server would fail and because of an unpublished bug in Windows 2000 where the secondary DNS server assigned to the NIC wouldn't be used and lookups would fail in large numbers if the primary server went down.

Load Balancing Multiple Unix Based DNS servers over UDP did the trick!

Dolemite
_________________

--
Save the World! Use a Quote!
Build it redundant to start with by NateTech · 2004-05-19 06:06 · Score: 1

Instead of "fail-over" think in terms of having two public webservers that load-balance ALL traffic to your site. If one goes down the other takes up the full load.

This complicates the back-end if you have a database driven site, but you were going to have to deal with that anyway.

The "quick and dirty" way to do this is a round-robin DNS CNAME entry that sends traffic from your usual name "www.whatever.com" to "www1.whatever.com" and "www2.whatever.com".

Keep your TTL/update times low and if you know www1 went down via your monitoring, remove the second entry from DNS.

During the time your customers are hitting www1 and www2 and the 1 machine is down, you'll have an "every other time they hit it they get an error" problem, but you said you were monitoring (preferrably from a third unrelated network), so that's taken care of. You could even script the removal of the DNS entry if you trust your monitoring that much. Of course, you need to deal with corner-cases like the monitoring server not being able to monitor but the site is actually up and working fine... stuff like that...

Basically this is what many of the commerical products do under the hood. You can go buy F5's or Alteons or any of the other hardware boxes that handle multi-site load-balancing, but you can do it yourself for a fraction of the cost if you understand how and understand that everyone's working with the same limitations with DNS caching times, etc.

--
+++OK ATH
No-IP.com's monitoring service by splitretina · 2004-05-19 12:10 · Score: 1

No-IP.com has a great monitoring/failover service that I've been using for the past couple months. We set up a cheap colo on the other side of the country and when our primary goes down, it switches over. At very least we can show a page saying things are not normal (can't get to the primary db though).

For the price its not bad (yearly subscription). Check it out here: http://www.no-ip.com/services.php/page/monitor_adv anced

It isn't DIY, but I couldn't find anything that could easily achieve this with only two locations besides No-IP.
High availability on the cheap???? by Anonymous Coward · 2004-05-19 16:01 · Score: 0

You're kidding, right?

If you want serious availability, you pay serious money. While this can mean lotsa things, I'd suggest that, in your case, you find a new bandwidth vendor or find a colo.

If you still want to do it, there are several good solutions available. They are, however, pricy (for good reason, doing this stuff for real is more difficult than it appears).

If you only want "sorta HA," you might take the dyndns suggestions to heart. That being said, expect complaints from users who have browsers that ignore the domain's TTL.
Wicked-ass DNS!! by RazorJ_2000 · 2004-05-20 10:56 · Score: 1

If you've got the budget then you should check out the Adonis DNS server from BlueCat Networks. The Adonis is hands-down the best DNS server on the planet. It offers high-availability, redundancy, high-security data transfers, etc. It has a military-style flash disk option so that there is no moving parts that will fail (especially hard drives these days), etc. Kick-ass BIND support!!

Disclaimer: I used to work there and parted ways rather involuntarily. However, the Adonis DNS is one mean-ass, rock-solid piece of work. I strongly recommend it.

PS: MH suxs at chess and needs to get laid more. ;) You know who you are.

--
pi=sigma{n:0-infinity}[(1/16)^n][(4/(8n+1))-(2/(8n +4))-(1/ (8n+5))-(1/(8n+6))]
I'll write failover code for you by simul · 2004-05-21 04:34 · Score: 1

If you want it. It's not that hard to monitor a site and then switch the DNS on it. I wrote the ZoneEdit one.