Slashdot Mirror


Minimizing Downtime When Switching IP Addresses?

GeekTek asks: "As we all know, prices for co-location have plummeted since the height of the dot.com era. We've been shopping around and found a solution that works for us. We have a small setup of about a dozen Debian boxen, a few Windows servers and we run our own name servers (BIND 8.x). Most of our domain names are managed through our OpenSRS account. My concern is switching all of our server's IP addresses. I can not have any down time and I want to minimize the number of trips to the current co-lo (it's >2 hours away). What is the best way to do it? What experiences can you share in similar situations?"

26 of 51 comments (clear)

  1. what services are you running? by farnsworth · · Score: 3, Informative
    to get a real answer, you need to say what services you are running. http? smtp? soap? database?

    if it's just http, you can use redirects to mitigate the dns delay. https complicates this since you'll have to get certs for your temporary dns names (i.e., https://old/ redirecting to https:/new/, which also responds to https://old/)

    if it's just about anything else, you might consider setting up a vpn between your existing datacenter and your new datacenter, and setup both environments to answer to the same dns names and have your old environment tunnel to your new one.

    in short, doing this with zero downtime is highly dependent on the services you are running. it's not a generic problem that has one solution.

    --

    There aint no pancake so thin it doesn't have two sides.

  2. Short TTLs + rinetd or similar by Brian+Hatch · · Score: 5, Informative
    Decrease the TTL of the DNS records during the switchover. If your current TTL is a day, then at least one day earlier, change it to, say, 300 (5 minutes). You'll experience a higher DNS query rate during that time, but probably nothing you can't handle. (You'd handle it better if you used DJBDNS though.)

    Then when you're done moving a machine, change the IP in DNS. When it seems solid, you put the TTL back to a reasonable (1 day) number.

    During the transition, you can also keep a machine at the old IP address and forward the services to the new IP address using tools such as rinetd or xinetd. This assures that you have all traffic going to the correct machine (possibly through the old machine) but that the old IP address is available during the move for clients that have broken DNS resolvers that don't correctly honor DNS TTL values. The rinetd/xinetd purpose machine can easily be a temporary box, such as a laptop - it's not doing any real processing.

    If you're also moving your DNS machines, move one a week before the big move, update whois, and make sure everything settles down. Then move the other a day or two after the big move.

    1. Re:Short TTLs + rinetd or similar by GeekTek · · Score: 2

      Good points. I'm actually considering using this opportunity to switch Bind 9.x or DJBDNS. I've been very happy with qmail in the past, but the two most cited factors (complicated for beginers and grumpy DJB) weigh in against DJBDNS. How has your experience been with it?

  3. mirror your servers by Ender+Ryan · · Score: 2
    Well, you could mirror your servers at both locations until your new DNS information propogates, or somehow redirect traffic from your old addressed to the new ones...

    --
    Sticking feathers up your butt does not make you a chicken - Tyler Durden
  4. Fail over by sporty · · Score: 2

    Could you setup redundancy? Setup your DB to from colo to old place? Web pages/apps need to be duped.

    Now mind you, if the latency between the two would have to be low enough that your DB doesn't choke doing redundancy/syncronization between your two sites.

    If that can be setup, just change your namesevers to point to the new colo and watch the traffic transition over. Once your old site hits no traffic, you are done.

    There are prolly better ways.. maybe just plain other ways.. but this is off the top of my head.

    --

    -
    ping -f 255.255.255.255 # if only

  5. Reduce TTL by linuxwrangler · · Score: 3, Interesting

    Start reducing your DNS time-to-live. If it is now a week, set it to a day. A couple of days ahead set it to an hour. A few ours ahead set it to 5 minutes (do consider your traffic and make sure that you won't swamp your DNS server - if you are planning a move at night then load should be low and even setting ttl to a minute or two should be fine). Note: there are broken name servers out there and you will get requests to the old address for many hours. Also most browsers don't seem to ask based on DNS TTL so they may not get the new IP until they at least leave your site or perhaps close their browser.

    (Note: I'm assuming you have duplicate equipment since that's the only way to physically move with no downtime unless your configuration allows you to remove half of your stuff and still keep running.)

    Depending on your needs and current design you can also play NAT/Proxy games. Ie. set up a proxy server or use NAT to make your old IP contact your new servers to catch all the misdirected traffic until DNS propogates.

    Last couple of times I did this it was fun to watch. I pulled the trigger on the DNS and could watch the load flow from the old to the new site (we were in the top 500 sites in traffic and did the move during the day so there was a statistically valid sample to work with).

    --

    ~~~~~~~
    "You are not remembered for doing what is expected of you." - Atul Chitnis
    1. Re:Reduce TTL by PerryMason · · Score: 2

      Note: I'm assuming you have duplicate equipment since that's the only way to physically move with no downtime

      Either that or one heck of a transportable UPS and a kickass wireless setup!

      --
      "I'm tired of all this 'Aren't humanity great' bullshit. We're a virus with shoes" - Bill Hicks
  6. DNS then HTTP then SMTP by engine+matrix · · Score: 2, Insightful

    When I changed colo's I moved my nameservers first. A week later I tarred the home directory's, dumped the mysql databases, and changed the IP's in DNS. Finally, I changed the mailserver IP's. If you're using qmail you can make all of the mail that hits your old server forward to the new one by adding the new IP to the smtproutes file in /var/qmail/control/smtproutes.

  7. Re:Only somewhat off topic... by LWolenczak · · Score: 2

    Thats not a feature in ipv4 (as far as I know) You could round robin the addresses, but that just splits load (so many servers handle www.microsoft.com) If memory serves, the A6 IPv6 Record Type supports priority.

  8. TTL = half the time until switchover by yerricde · · Score: 2, Insightful

    Decrease the TTL of the DNS records during the switchover. If your current TTL is a day, then at least one day earlier, change it to, say, 300 (5 minutes). You'll experience a higher DNS query rate during that time, but probably nothing you can't handle.

    Actually, you can reduce the DNS query rate by continuously setting the TTL to about half the time until the switchover. For instance, 24 hours before the switchover, set it to 12 hours. Then keep decreasing the TTL until it's down to about five minutes. This way, you won't get a continuous flood of DNS requests during the day before the switchover.

    --
    Will I retire or break 10K?
    1. Re:TTL = half the time until switchover by Electrum · · Score: 3, Informative
      Actually, you can reduce the DNS query rate by continuously setting the TTL to about half the time until the switchover. For instance, 24 hours before the switchover, set it to 12 hours. Then keep decreasing the TTL until it's down to about five minutes. This way, you won't get a continuous flood of DNS requests during the day before the switchover.

      Or you could use tinydns, which handles this automatically:

      http://cr.yp.to/djbdns/tinydns-data.html


      You may include a timestamp on each line. If ttl is nonzero (or omitted), the timestamp is a starting time for the information in the line; the line will be ignored before that time. If ttl is zero, the timestamp is an ending time (``time to die'') for the information in the line; tinydns dynamically adjusts ttl so that the line's DNS records are not cached for more than a few seconds past the ending time. A timestamp is an external TAI64 timestamp, printed as 16 lowercase hexadecimal characters. For example, the lines

      +www.heaven.af.mil:1.2.3.4:0:4000000038af1379
      +ww w.heaven.af.mil:1.2.3.7::4000000038af1379


      specify that www.heaven.af.mil will have address 1.2.3.4 until time 4000000038af1379 (2000-02-19 22:04:31 UTC) and will then switch to IP address 1.2.3.7.
  9. Another part of the job.... by dpilot · · Score: 2, Insightful

    Use DHCP for server addresses instead of static IP.

    Even though my home network is only a two, sometimes three machines, I administer IP addresses through DHCP. The server has a static IP, everything else gets its IP served from DHCP, with a static MACIP mapping. My DNS is on the same machine.

    For your situation, switch the machines to DHCP at the old location, and have everything running. You would need a temporary machine to act as the DHCP/DNS machine at the new location. When you move your machines, they should simply come up. Watch out for hardcoded IPs in other configs.

    I presume your servers are on a DMZ, and you could arrange one machine as a DHCP/DNS server. Heck, a WalMart $200 box could more than do the job.

    --
    The living have better things to do than to continue hating the dead.
    1. Re:Another part of the job.... by duffbeer703 · · Score: 2

      I have a job opening for a senior network engineer in Boston, MA. Established company, stock options and competitive salary.

      We need people like you who can think outside of the box and come up with creative solutions to complex problems.

      Send me an email and we'll talk.

      --
      Conformity is the jailer of freedom and enemy of growth. -JFK
    2. Re:Another part of the job.... by Black+Copter+Control · · Score: 2
      I think that he was talking about using static DHCP, instead of dynamic DHCP. For those parts of the system that don't need to hardcode the IP address in them, this does help a bit... You move the machine, and the IP changes. You'll still have to figure out where you've got hardcoded IP addresses.

      DHCP certainly isn't a magic bullet but it does get some parts of planning the move into solid existence, and I don't see it causing any problems. If nothing else, the post is at least interesting because until I read it, I hadn't even considered using DHCP as part of a large scale move for machines with static addresses.

      --
      OS Software is like love: The best way to make it grow is to give it away.
  10. A friend had that problem... by TheSHAD0W · · Score: 2

    ...I suggested he change his old website to forward to the new server's IP. It worked just fine, and held everything together while everyone's DNS updated. After a week he was able to take down the old site.

    'Course if the old IP is completely dead, you've got problems. If you're physically moving the server, then I'm sure you can dig up an old 486 to run Apache on as a redirect.

  11. Pure operational exercise by abulafia · · Score: 5, Informative

    Every site is different, with different services, operator skill sets, requirements, demands, and cash.

    Lay out your priorities. You say "everything has to stay up" - maybe that's true, but I moved a rather large commercial site stuck in one colo elsewhere, in pieces, when we had a *lot* of money, and when cost analysis started being done, it turned out we could afford downtime.

    Look at your traffic records, worry about what has to be up and what doesn't. Think *hard* about dependencies.

    Perhaps you can afford two trips (which is what we did), in which case, you move a skeleton crew to the new site (pre configured and tested, of course) , switched DNS (you did think about your TTL, yes?), waited for it to be picked up from a site I knew had not cached the DNS, and completed the move.

    Perhaps you can buy/borrow from the office/use spares (but be careful about occupying your spares!) for the move.

    Perhaps you can offload the bulk of your traffic elsewhere (Akamai or something to move the demand on machines off the machines while you're doing it.)

    I can't speak to your situation, but there's always a way to make it work - like I said, it is pure operations. Analyse, plan, plan again, execute.

    More hints -

    - Before you're slouching in the colo breaking down the network, copy all data where it needs to be from the comfort of your office. Doublecheck you got it right.
    - when disassembling equipment, label all interconnects, in order, unless every box is flat on a local net, with nothing hanging off of them. Don't forget routers, and don't assume it's stupid to label something obvious. Assume you're going to be brain dead when you put it back together - if something unexpected happens (someone flips the truck?), you will be brain dead. And even if you're not, it does help, esp. with messy SCSI configs, etc.
    - Write out a timeline, and give yourself more time than you need. Make sure other people concerned know what it is.
    - Oh, _back up your machines_. I know, it is obvious, but I know of one company that screwed this up royally.
    - Bring one more person than you need. They might be helpful, and if not, they can at least fetch coffee and donuts when you need them.
    - Bring snacks, lots of them.
    - Convince your accountant insurance is worth it, if they don't belive it already. We were moving ~2M worth of gear, and I would have been even more freaked out than I was if we hadn't insured it while it was being transported.
    - Have a wad of company cash/credit card on hand. You never know what comes up.
    - Ditto for spares, whatever you can - is that disk that's been spinning for 4 years going to come back up? Cat 5?
    - If you have heavy gear, think about whether or not you're going to move it yourself.
    - Overplan it. You'll be glad. Think contigencies and fall back positions.
    - Make sure your staff is well rested before you do it, and that they have whatever they need before you start.

    Hope this helps.

    -j

    --
    I forget what 8 was for.
    1. Re:Pure operational exercise by jea6 · · Score: 2
      Have a wad of company cash/credit card on hand. You never know what comes up.


      Umm... DECLINED?


      Ditto for spares, whatever you can - is that disk that's been spinning for 4 years going to come back up?


      That one is easy: NOTACHANCE.


      Oh, and from personal experience, make sure there are no "flags" on your account at your current provider or they may try to prevent you from removing equipment.

      --

      sarchasm: The gulf between the author of sarcastic wit and the person who doesn't get it.
    2. Re:Pure operational exercise by abulafia · · Score: 2

      2. Turn it into ns1 (import records,etc), turn off ns1 at old location

      3. Update host records for ns1 with Tucows and Internic

      I'm REALLY nervous about running on one name server for the few hours between moving. I also have an irrational fear that when I transition ns1 to its temporary home, it will cause a rift in the time-space continuum making all of my site inaccessable.

      Is it advisable to bring ns0 up with the rest of the equipment and skip the redirect?


      If I'm following you, I wouldn't be too concerned with running on one name server for a short time. The odds of something going wrong on it in that period is lower than the odds of something else going wrong.If it makes you really nervous, have the redirect server also do DNS for that period, although that slows you down - you'll have to wait for Tucows or whoever to make an additional change.

      As for fears of transitioning ns1, what I'd do is not turn off the old ns1 until after you're sure that the new one is being used - wait until the host record update goes through and you see zero traffic on the old one to turn it off.

      Another hint I thought of:

      - if you can get away with it, institute a configuration freeze several days before the move, and reboot everything you can at the old place. I wish I had done thing - several machines had very long uptimes, and people had made changes that caused problems when we brought them back up at the new location. If you can test them one at a time, you might save yourself some grief.

      Good luck

      -j

      --
      I forget what 8 was for.
    3. Re:Pure operational exercise by RFC959 · · Score: 2
      Very well said. You have covered almost everything I was going to say. My company has been through two server room moves in the last 18 months, and while both have been messy, we've learned a bit. (Are you secretly one of my coworkers? *g*) Some other tips:

      • Do not attempt to do anything else at the same time as the move, no matter how tempting it is. (You know, the "Oh, since you're going to be bringing the servers down and uncabling everything anyway, maybe you could just...while you're at it.") You will have plenty to do as it is and you will probably have to be doing it at off-hours; don't add complexity to the project.
      • Reboot servers in the week or so prior to the move, just to make sure everything's OK. 4am is not the best time to discover that someone messed with the EEPROM settings and now the host won't boot, or the software won't start because the license has expired. (This goes along with the configuration freeze.)
      • Evaluate both sites in detail before moving anything. Floor layouts, power, everything. And when you actually do the move, have a colo liaison close at hand, in case something has to get changed in a hurry.
      • If you're hiring outside movers (which I recommend, since you will have enough to do without worrying about heavy lifting and driving the $10M truck, too), spend the extra dime and get GOOD ones. Bad ones stand around with blank expressions, have to be told in detail what to do, and handle servers like they're taking out the trash. Good ones have their own (good) moving equipment, have done server room moves before, and don't waste time.
      • Stuff WILL get damaged. Have a camera or two on hand to document it.
      • Do it in two or more crews if possible. One crew works remotely, taking services and hosts down, then the local crew unplugs, moves, and sets up again, only to the point at which things are remotely accessible, at which point the remote crew takes over again to check stuff. (Obviously, someone has to stick around just in case a network cable got overlooked or something, but most people can leave.) This keeps both crews more focussed and fresher than if they had to be on site for the entire thing.
      • One last thought: any business that says "No downtime at all is acceptable!" has a philosophical problem more than a technical one. Sooner or later you may be forced to accept downtime whether you wanted it or not (backhoe? colo fire? worm infestation?), and you should be technically and psychologically prepared for it. There are very few businesses, I think, that absolutely CANNOT have downtime. (I mean, we hear that at my workplace. We sell children's books. I think if some people can't buy children's books in the middle of the night a couple times a year, it's not the end of the world, or even of the company.) If you are one of these, your management should be willing to accept the expense and complexity of truly redundant geographically distant systems. If they're not...we're back to a philosophical problem.

  12. Simple by BoomerSooner · · Score: 2

    1) Mirror both servers.
    2) Change the DNS
    3) When the DNS updates you're done.
    4) Pat self on back.
    5) Get back to work you damn code/admin monkey.
    (This is how it worked at my job.)

  13. Reducing Your Time-to-Live by tswinzig · · Score: 3, Funny

    No, I'm not talking about a DNS setting.

    It's a proven fact your lifespan will actually decrease due to the stress involved in moving a network's IP address and the debugging that goes along with it.

    --

    "And like that ... he's gone."
  14. -12: Flamebait by Captain+Nitpick · · Score: 3, Funny

    Find out what slashdot's admins did the couple of times they moved their servers.

    Then don't do that.

    --
    But then again, I could be wrong.
  15. I'm reminded of a familiar puzzle by cybermace5 · · Score: 2

    Remember this puzzle? Find the answer, and then start moving your servers!

    The farmer (you) is taking his fox (servers), duck (DNS) and corn (customer web space) to market (colocation provider). He is currently stuck on the left bank of a major river (highway). The good news is that he has a boat available (pickup truck). The bad news is that the boat will only hold him and one other item for each crossing (changing over servers without service loss).

    He dare not leave the duck alone with the corn as the corn would get eaten (no servers for DNS), or the duck alone with the fox as the duck would get eaten (no customer web space while swapping drives?). Also the farmer knows from prior experience that he cannot leave the corn alone on the right bank of the river (old location) since a large flock of crows (customers) is waiting to devour it (and you).

    Can you help the farmer get everything across the river safely? (Ask Slashdot can!)

    --
    ...
  16. Re:Only somewhat off topic... by druzicka · · Score: 2, Insightful

    You don't want DNS to handle this. That's what dynamic routing is for. Let your routers use BGP to determine what link is up/down, and let it choose the best path to your server.

    --
    If Happy Fun Ball begins to smoke, get away immediately. Seek shelter and cover head.
  17. DNS Authoritative Servers by elton · · Score: 2, Insightful
    Watch out when reading the comments about changing the TTL on your DNS servers. It is not as simple as that since you state that your DNS servers are part of what would be moving to the new location. When you have a TLD like Network Solutions point your domain to a set of nameservers, that nameserver has to have a host NIC handle in their database. Changing the IP address of your nameservers is not difficult, but it is also not trivial. Plus, you have to remember that you have no control over the Network Solutions nameservers (ie. when they will be restarted, what the TTLs are etc.)

    To get around this, there are two scenarios:
    1) Use outside nameservers as your authoritative servers for your domain. You may even be able to get your registrar to do this. Some registrars offer it as a feature and others may charge. In any case, having a separate set of nameservers means you can move from colocation facility to colocation facility with relative ease as mentioned in earlier posts.
    2) Set up two servers at the new colo facility as DNS servers and set all of your TTLs etc to the desired values. Registers those IPs as nameservers with Network Solutions (you may be able to do this through your registrar). Then change the IP numbers of your nameservers for the domain names. Wait 48 hours for total propagation and proceed as has been outlined in previous posts.

    Please note that you really should contact your registrar and find out what the proceedure is for changing the IP address of a nameserver. I know that in the past when we had to do it, there was a template sent to Network Solutions specifically for this task. This is most likely easier now and probably different for each registrar.

  18. I went through this... by DarkDust · · Score: 2

    two days ago, but only with my one server which is co-lo'd by a friend of mine who is an ISP. They simply set up routing so both the old and new IP adresses could reach my machine and then I only had to set up a second IP adress on my ethernet adapter like this:

    ifconfig eth0:1 123.45.67.89 netmask 255.255.255.224 broadcast 123.45.67.255

    That's it. Now the people at my friend's company have set up the DNS to report the new IP adress and let it propagate through the 'net. One hour later or so all my domains targeted the new IP adress, everything went fine, with zero downtime.

    The best is: everything was done through ssh, I didn't had to move my lazy ass ;-)

    Only one pit to be aware of: don't forget to tell your firewall ! In my case it was simply adding eth0:1 to the list of firewalled interfaces in SuSEs /etc/sysconfig/SuSEfirewall2 and running "rcSuSEfirewall2 restart"

    Now that everything works I could kick out the old IP adress and stuff... but I'm lazy ;-)