Multihoming Suggestions w/o at Least a /24?

← Back to Stories (view on slashdot.org)

Multihoming Suggestions w/o at Least a /24?

Posted by Cliff on Thursday February 20, 2003 @08:05AM from the messy-routing-issues-301 dept.

An anonymous reader asks: "I work for a small company who is looking to get a multihomed Internet connection for redundancy. The logical conclusion would be to get another internet connection to another provider. However, in the case of a primary connection failure, we need to be running BGP to have our internally-hosted sites still accessible to the Internet via the 2nd connection. The problem is that we only have a /28 (16 IPs), which is too small to make it past most route filters, and would then mean that we still couldn't be reached if the primary T1 is down. So, what's our options? (and no, lying and getting a /24 isn't a valid choice)"

8 of 55 comments (clear)

Min score:

Reason:

Sort:

Re:It depends on the services... by photon317 · 2003-02-20 08:21 · Score: 4, Informative

Yeah outbound traffic is easy, it's the inbound he's having a problem with I'm sure. The problem with two sets of addresses and DNS switching is the caching. Even if you set your records to expire in 30 seconds or something crazy like that, at various levels the records *will* get cached much longer than that, and it will "problematic" at best.

This question is truly worthy of Ask Slashdot, which is a first in a long time. I have yet to see a good answer for someone who wanst truly redundant internet connectivity and has too small an address space to really do BGP peering.

I thought of one solution at the ISP end of things, which would require partnerships between ISPs. Two distinct competing ISPs could grab a decent-sized netblock and share it. They sell these IPs to customers wanting dual-homed access from both ISPs, and split the money. In this type of scenario the customer can BGP to both ISPs, who in turn BGP with each other and the real backbone, and you can get all the redundancy you need in case of ISP or wan-link failure.

--
11*43+456^2
Legal v.s. technical issues. by pruneau · 2003-02-20 08:24 · Score: 4, Informative

Of course, the usual question is: what can you afford to have redundancy ?
Because before technical solutions, you might want to review the contract with your access provider to include liabilities. The contract itself might cost more, but it might be simpler than a real redundant solution.
Because unless you know for a fact than your access provider is not reliable and has bad support, playing the redundancy game might be a bit more expansive than "simply" getting a double connection from the internet.
Let's do the excercise: you want a dual internet connection, that's OK, but you surely do not want a single router=single point of failure. So you have to buy another router, most probably the same brand as the one you already have, so to be able to use the (most probably) proprietary high availability solution. Provided your current model supports HA, or you will have to buy a more expensive one ?
Which brings to mind that having a redundant link (with an SLA :-) from the same provider might be an excellent idea, since they are probably aggregating your /28 to other /subnet, your route advertisment won't get lost in their network until it gets aggregated. Just make sure it does not get aggregated on the next hop ;-) Well, if you are willing to pay for multi-homing, woul'dt it be easier to try to obtain an SLA with only one access provider, SLA including an redundant routing connection, with some redundancy protocol handled

--
[Pruneau /\o^O/\ warranty void if this .sig is removed]
Faking it is your only option. by GoRK · 2003-02-20 09:15 · Score: 2, Informative

You're going to have to do your own redundant routing in between you and a network that is properly multi-homed with BGP out to the larger internet to make this work like I think you are really wanting it to work.

First, find an upstream ISP that is multi-homed to your satisfaction. Buy some IP's from them and put in a router or two for redundancy.

Next, build two or more tunnels to the ISP over different circuts or providers and run your own small BGP network on private IP's between the router at your multihomed isp and the routers on either end of your connection. Assign the IP's that belong on the multihomed network locally and let your own routers run BGP (or OSPF or whatever else you want to use instead like load balancing) between your LAN and the multihomed network.

It's hackish. It will be fairly expensive. It will also, however, work, let you keep your servers on-site, and give you greater control over redundancy and failover than you'd get with two upstream providers allowing you to use BGP anyway.

In the end, it might work out to be cheaper to do this in the long run since you wont have to pay any upstream ISP for letting you do BGP. You'll just have to pay for colocation somewhere, which could be a lot cheaper.

~GoRK
VRRP is a possibillity. by FreeLinux · 2003-02-20 09:16 · Score: 2, Informative

If the use of BGP is out of the question, there seems to be only one alternative. However, this solution still leaves the ISP as a single point of failure.

The option is Virtual Router Redundancy Protocol (VRRP). A brief description of VRRP, including a diagram, can be found here. Keep in mind that there are numerous other manufacturers that support the VRRP standard, you don't *have* to go with Cisco. Also, remember that with VRRP there is still a single point of failure, the ISP. This means that your ISP had better be a good one.
Reverse proxy off site? by Richard_at_work · 2003-02-20 09:16 · Score: 2, Informative

How about spending money to have a reverse proxy off site, in a colo somewhere, that handles which line to send it down. Clients connect to the advertised IP address for a site from DNS, which is the colo proxy/whatever, and then are either dealt with transparently, like a true proxy, or redirected to whichever line is up at the time.

Its something i have intended to look into for work, as it would jsut be a extension of what we currently use for firewalling anyways, port 80 is redirected from the gateway to a machine behind the firewall. To carryout a port 80 redirect on two publically available ips is probably jsut as trivial, infact as ive been thinking this through, i have tried it with OpenBSD on both ends and apache as the webserver, a rdr on the outside box gets my webpage fine.
Re:Fake it with DNS? by aminorex · 2003-02-20 10:16 · Score: 2, Informative

I think that if DNS is the best you can do, you
should round-robin the IPs from each link on both
servers. Only drop the IPs from link1 if link1
goes down. Then even if there is some dead cache
on the network, at least your clients can reach
the server by trying again.

--
-I like my women like I like my tea: green-
You are entitled to a /24 if going multihomed. by jtavares · 2003-02-20 11:19 · Score: 2, Informative

Hello,

There is no need to lie. Going multihomed is reason enough to request and obtain a /24 from one of your two providers, despite the fact that your network size only requires a /28. I have performed this exercise for companies of your size many times over, and trust me, any major network provider will give you a /24 if you are switching over to BGP and getting a second connection.

The effect of imposing a /24 or greater limit on BGP routes is that providers need to be more sensative to the needs of companies who, when considering network size alone, can't justify a /24. Thus, going multi-homed is enough of a qualifier by it self to obtain a /24 from an upstream ISP.

-James
The VoyNetworks Solution to redundancy by JWSmythe · 2003-02-20 17:31 · Score: 2, Informative

I know this method will get flamed by quite a few people, but it works very well.

We want 0 downtime. There's no way to guarantee that any equipment is without failure. Something can/will always break. That's something you have to accept.

voyeurweb.com is located in colo facilities in both New York and Tampa. Each facility has it's own network drop. The size doesn't matter, but for reference, it's 1000base fiber in each location.

We have at least 5 machines in each location. Each machine has it's own IP, and in some cases multiple IP's just to increase it's load (faster machines can handle heavier loads).

You put multiple A records in your DNS. When a customer browses to your site, they get any one of the IP's randomly. Here's what an 'nslookup' returns for voyeurweb.com

> nslookup voyeurweb.com | grep Address
Address: 63.208.2.23
Address: 63.208.2.25
Address: 63.208.2.62
Address: 63.208.2.64
Address: 63.208.2.84
Address: 63.208.2.97
Address: 209.247.59.14
Address: 209.247.59.15
Address: 209.247.59.16
Address: 209.247.59.17
Address: 209.247.59.84
Address: 209.247.59.85
Address: 209.247.59.86
Address: 209.247.59.87

The 209.247.59 IP's are in New York. The 63.208.2. IP's are in Tampa. We're favoring the New York network a little bit, because we have some other specialized sites running in Tampa, and want an equal load between the cities. Right now, we're pulling about 450Mb/s per city at peak time. Just half of our 1000Mb/s drop. We've just added 1Gb/s fiber in Los Angeles, to increase our redundancy. How redundant you make youself is really up to how much the bosses want to spend, and how safe you want your site. Like I said, we want 0 downtime, and we achieve it.

The nice part is, if a machine fails, the client hangs for a few seconds, and then goes off to the next IP. If all the IP's in a city fail, the client can potentially hang for up to 30 seconds, before going to a server that works. Your browser will continue to use any IP that works.

We use a relatively short ttl in our DNS records, so if we decide to shut down all the servers in a city for any reason, within an hour all the traffic stops to them. I've done this many many times now, I'm 100% sure it works.

If we have a known problem (say a server has a hardware failure), you take it out of your DNS, and within an hour, no one is even trying to hit it.

We've done this with pairs of machines, or in the case of voyeurweb.com, up to 25 machines.

It's so simple it shouldn't work. I've been told by quite a few people that it won't work, but then I'll prove to them that it does..

Before we did this (before I was admin), if a machine failed, thousands of viewers would write in complaining immediately. Now, I can take a few machines down for maintaince, and no one notices. If we have a mystery crash at 4am, it's not fatal, we fix it when we can..

If you're a viewer, you probably didn't notice that we shut down all of Tampa for voyeurweb.com, for a week because of provider problems (lack of available bandwidth). You probably didn't even notice when we swapped out all the New York servers.. We were polite with some of them, but got bored with it, and just started yanking cables after a while.. We didn't receive a single Email asking why the sites were down. When a server went down, we did notice an increased load across the rest of them. It's nice having a *LOT* of servers running. If you have 10, you only get a 10% load increase across all the other servers when one goes down. :)

We don't depend on BGP. We don't depend on expensive load balancing equipment. We don't depend on anything other than the fact that people use browsers, and they resolve IP's through DNS.

In your case, you should have two ISP's providing you bandwidth. Each ISP should issue you a block of IP's from their available pool (like, it's hard to be on the Internet without it).. I'd say, if you want a site that stays up, set up a pair of mirrored servers. Give the first one an IP and gateway of the first ISP, and the second one an IP and gateway of the second ISP.. I could name off over a dozen sites that do this now, but I won't. :)

If you want to get real fancy, get one machine, put two IP's on it (one from each provider), and have a script monitor each gateway. If one fails, switch to the other. But this doesn't do anything for redundancy if one server should fail.

I hope I've explained this well. I've never seen it well documented anywhere. It's in the BIND documentation somewhere, but they have a real convoluted method of CNAME's and A records, which other documentation says are completely against some of the RFC's, so you shouldn't do it.

Even our evil nemesis does it...

>nslookup microsoft.com | grep Address | sort
Address: 207.46.134.155
Address: 207.46.134.190
Address: 207.46.134.222
Address: 207.46.249.190
Address: 207.46.249.222
Address: 207.46.249.27

Two different networks, splitting the load. :)

If CNN does it, it must be good.

> nslookup cnn.com | grep Address | sort
Address: 64.236.16.116
Address: 64.236.16.20
Address: 64.236.16.52
Address: 64.236.16.84
Address: 64.236.24.12
Address: 64.236.24.20
Address: 64.236.24.28
Address: 64.236.24.4

BTW, yes we get constant DoS attacks against us.. Sometimes I entertain myself by watching the logs. :) But, it's pretty hard to take down 10 servers that can each push out 150Mb/s (dual NIC's bound together with teql).

To the script kiddie that "took down" one of our machines the other night.. Ummm, you didn't. I was annoyed at seeing the logs, and dropped all traffic from your network. :)

--
Serious? Seriousness is well above my pay grade.