Slashdot Mirror


Minimizing Downtime When Switching IP Addresses?

GeekTek asks: "As we all know, prices for co-location have plummeted since the height of the dot.com era. We've been shopping around and found a solution that works for us. We have a small setup of about a dozen Debian boxen, a few Windows servers and we run our own name servers (BIND 8.x). Most of our domain names are managed through our OpenSRS account. My concern is switching all of our server's IP addresses. I can not have any down time and I want to minimize the number of trips to the current co-lo (it's >2 hours away). What is the best way to do it? What experiences can you share in similar situations?"

4 of 51 comments (clear)

  1. what services are you running? by farnsworth · · Score: 3, Informative
    to get a real answer, you need to say what services you are running. http? smtp? soap? database?

    if it's just http, you can use redirects to mitigate the dns delay. https complicates this since you'll have to get certs for your temporary dns names (i.e., https://old/ redirecting to https:/new/, which also responds to https://old/)

    if it's just about anything else, you might consider setting up a vpn between your existing datacenter and your new datacenter, and setup both environments to answer to the same dns names and have your old environment tunnel to your new one.

    in short, doing this with zero downtime is highly dependent on the services you are running. it's not a generic problem that has one solution.

    --

    There aint no pancake so thin it doesn't have two sides.

  2. Short TTLs + rinetd or similar by Brian+Hatch · · Score: 5, Informative
    Decrease the TTL of the DNS records during the switchover. If your current TTL is a day, then at least one day earlier, change it to, say, 300 (5 minutes). You'll experience a higher DNS query rate during that time, but probably nothing you can't handle. (You'd handle it better if you used DJBDNS though.)

    Then when you're done moving a machine, change the IP in DNS. When it seems solid, you put the TTL back to a reasonable (1 day) number.

    During the transition, you can also keep a machine at the old IP address and forward the services to the new IP address using tools such as rinetd or xinetd. This assures that you have all traffic going to the correct machine (possibly through the old machine) but that the old IP address is available during the move for clients that have broken DNS resolvers that don't correctly honor DNS TTL values. The rinetd/xinetd purpose machine can easily be a temporary box, such as a laptop - it's not doing any real processing.

    If you're also moving your DNS machines, move one a week before the big move, update whois, and make sure everything settles down. Then move the other a day or two after the big move.

  3. Pure operational exercise by abulafia · · Score: 5, Informative

    Every site is different, with different services, operator skill sets, requirements, demands, and cash.

    Lay out your priorities. You say "everything has to stay up" - maybe that's true, but I moved a rather large commercial site stuck in one colo elsewhere, in pieces, when we had a *lot* of money, and when cost analysis started being done, it turned out we could afford downtime.

    Look at your traffic records, worry about what has to be up and what doesn't. Think *hard* about dependencies.

    Perhaps you can afford two trips (which is what we did), in which case, you move a skeleton crew to the new site (pre configured and tested, of course) , switched DNS (you did think about your TTL, yes?), waited for it to be picked up from a site I knew had not cached the DNS, and completed the move.

    Perhaps you can buy/borrow from the office/use spares (but be careful about occupying your spares!) for the move.

    Perhaps you can offload the bulk of your traffic elsewhere (Akamai or something to move the demand on machines off the machines while you're doing it.)

    I can't speak to your situation, but there's always a way to make it work - like I said, it is pure operations. Analyse, plan, plan again, execute.

    More hints -

    - Before you're slouching in the colo breaking down the network, copy all data where it needs to be from the comfort of your office. Doublecheck you got it right.
    - when disassembling equipment, label all interconnects, in order, unless every box is flat on a local net, with nothing hanging off of them. Don't forget routers, and don't assume it's stupid to label something obvious. Assume you're going to be brain dead when you put it back together - if something unexpected happens (someone flips the truck?), you will be brain dead. And even if you're not, it does help, esp. with messy SCSI configs, etc.
    - Write out a timeline, and give yourself more time than you need. Make sure other people concerned know what it is.
    - Oh, _back up your machines_. I know, it is obvious, but I know of one company that screwed this up royally.
    - Bring one more person than you need. They might be helpful, and if not, they can at least fetch coffee and donuts when you need them.
    - Bring snacks, lots of them.
    - Convince your accountant insurance is worth it, if they don't belive it already. We were moving ~2M worth of gear, and I would have been even more freaked out than I was if we hadn't insured it while it was being transported.
    - Have a wad of company cash/credit card on hand. You never know what comes up.
    - Ditto for spares, whatever you can - is that disk that's been spinning for 4 years going to come back up? Cat 5?
    - If you have heavy gear, think about whether or not you're going to move it yourself.
    - Overplan it. You'll be glad. Think contigencies and fall back positions.
    - Make sure your staff is well rested before you do it, and that they have whatever they need before you start.

    Hope this helps.

    -j

    --
    I forget what 8 was for.
  4. Re:TTL = half the time until switchover by Electrum · · Score: 3, Informative
    Actually, you can reduce the DNS query rate by continuously setting the TTL to about half the time until the switchover. For instance, 24 hours before the switchover, set it to 12 hours. Then keep decreasing the TTL until it's down to about five minutes. This way, you won't get a continuous flood of DNS requests during the day before the switchover.

    Or you could use tinydns, which handles this automatically:

    http://cr.yp.to/djbdns/tinydns-data.html


    You may include a timestamp on each line. If ttl is nonzero (or omitted), the timestamp is a starting time for the information in the line; the line will be ignored before that time. If ttl is zero, the timestamp is an ending time (``time to die'') for the information in the line; tinydns dynamically adjusts ttl so that the line's DNS records are not cached for more than a few seconds past the ending time. A timestamp is an external TAI64 timestamp, printed as 16 lowercase hexadecimal characters. For example, the lines

    +www.heaven.af.mil:1.2.3.4:0:4000000038af1379
    +ww w.heaven.af.mil:1.2.3.7::4000000038af1379


    specify that www.heaven.af.mil will have address 1.2.3.4 until time 4000000038af1379 (2000-02-19 22:04:31 UTC) and will then switch to IP address 1.2.3.7.