Linux Failover?

← Back to Stories (view on slashdot.org)

Posted by Cliff on Wednesday May 24, 2000 @01:30AM from the is-this-a-real-problem? dept.

Anton asks: "This is a question about Linux failover in business situations. We are a growing B2B company; our product runs on Apache/Linux. We contracted professional services to properly set up our network. After all the hellishly expensive CISCO hardware had been set up it turns out that for our servers to be configured for failover, each one needs two dual-port NICs configured for one IP connected to two different switches, furthermore the driver needs to intelligently switch the ports when the active port fails ... We've never heard of such beasts for Linux and a net search revealed nothing. Our consultant however claims that 'Linux is biting itself in the foot' for not supporting that, and that other industrial strength OS's like solaris in fact do support this. Has anyone run into this before or have other ideas? " [nik suggests]: Take a look at Polyserve Understudy, which might be an alternative. FreeBSD and Linux versions are available (and bundled with FreeBSD 4.0).

12 of 203 comments (clear)

Min score:

Reason:

Sort:

What you need by Anonymous Coward · 2000-05-23 20:47 · Score: 4

Everything you need is at High Availability Linux.

I too am/have built a B2B exchange on the linux platform and found JServ to be *INCREDIABLE* at HA/Failover safe features.

As for the 2 network cards for each machine, that too is a *VERY GOOD* thing. It allows you to partition out your network traffic to achieve much better response time. For example our network has 2 NICs in each machine. There is "Web Server to Database" network, There is a firewall to webserver network, and we have a seperate network for office web surfing and misc stuff like that. Access to the "WebServer to Firewall" network is handled across the router.

One thing to keep in mind when dealing with DB aware web applications is that unless your code is *VERY POORLY* written the biggest bottleneck will be in network latentcy.
-Grimsaado
Ignorance of options is not a failure by ch-chuck · 2000-05-23 21:10 · Score: 4

When Linux was cranking up last year the folks at TurboLinux sales called up promoting their fault tolerant cluster solution, altho it's not free you could get a timed demo - so Linux solutions exist.

Just a general observation - Linux is pretty well fleshed out with about anything you can think of in one form or another, it just isn't chasing you down with in-your-face ads and high pressure sales promos like other comercial products, so it may appear to be deficient but more often than not just a few days (for us slow pokes) search and trials will usually turn up an inexpensive quality solution in some stage of development hidden somewhere.

--
try { do() || do_not(); } catch (JediException err) { yoda(err); }
Failure to implement open standards. by jcostom · 2000-05-23 21:37 · Score: 4

It never ceases to amaze me. Companies want to sell you obnoxious amounts of software and hardware to do something as simple as create a highly available system. Not to mention projects like the Linux-HA people, who while are doing a good thing, are (IMHO) heading in the wrong direction (using RS-232 heartbeats is silly).
Has anyone considered VRRP (Virtual Router Redundancy Protocol)? It's an actual open standard, and it works. It not only works, it works amazingly well.
One of the major users of VRRP technology is Nokia. They've done extensive work on the protocol, and use it in their line of firewalls (which btw run a heavily modified FreeBSD codebase).
VRRP uses multicast packets that are similar to OSPF "Hello" packets to let the partner(s) know it is alive. If the primary machine dies, the backup instantly takes over. When the takeover happens, it not only assumes the IP address of the dead machine, but it also answers for the MAC address of the dead machine.
--

--

The unsig!
Re:Linux High Availability project by SEWilco · 2000-05-23 21:13 · Score: 4
Notice there are many links to related HA items there at the Linux High Availability Project. It sounds as if you're looking for something like FAKE, which lets a machine acquire the IP of another machine in a failure (note that FAKE points out that it has been moved into the "Heartbeat" code at Linux-HA) -- although some link chasing is necessary to learn where it went.
"...dual-port NICs...switch the ports when the active port fails...
Oh, I see. When one port (or its path) fails, you want to switch the IP to a different port? I don't think "the driver" needs to do that, just change the IP assignments with ifconfig.
- Monitor each link with some sort of heartbeat.
- When there is no response, assign the IP of that link to the backup interface. Just use ifconfig to alter the interface configuration.
- Have the backup interface be on the other NIC, not "switch ports" as you mentioned.
- Dual-port NICs are not needed, if you can fit 3-4 NICs in your machine.
- Have heartbeats running on backup and downed interfaces also, to report problems and repairs.
Re:This is the wrong question by x0 · 2000-05-23 21:37 · Score: 4

why do you need dual-port NICs?

On Suns at least, the dual (well, quad) port NICs are used as a heartbeat signal between the active server and the failover box (when using FirstWatch).

True, you could use two separate NICs in each box to provide the same solution, but then you are using up three PCI slots since the heartbeat NICs do not carry any packets.

I am wondering why NICs with more than one port are so danged expensive though? I can see a bit of an increase in price, but there is no way these things should be $400 and up (last time I looked..)

--
In the immortal words of Socrates, who said; 'I drank what?'
Feh! by Greyfox · 2000-05-23 21:36 · Score: 4

My experience with consultants is that a good many of them are clueless. The reason they're consultants is they can easily BS the customer into believing they know what they're talking about long enough to bleed you dry. They may even provide you an actual solution that may even kind of work but which is patently the worst way to do what you were wanting to do. Then when you DO get someone in who knows what he's doing, that guy will have to spend twice as long beating your company into shape because he has to go back and undo everything the previous one did.

--
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
I think we need more information here. by Ron+Harwood · 2000-05-23 20:42 · Score: 4

What kind of application are you trying to fail-over? A database? A web-server?

If you wanted a web-farm, that's dead stupid easy. Fail-over database/ftp/nfs isn't too hard, but (presently) requires commercial software. Understudy Polyserve, Wizard Watchdog, or even RSF-1 are just some of the HA clustering products available.

--
BlackNova Traders
Easy solutions by sdw · 2000-05-23 20:51 · Score: 5

First, it wouldn't be that difficult to modify the kernel to support this. I've released patches to work around bugs in the ARP reply with two NIC's. Routing to the active port shouldn't be too difficult.
A non-kernel invasive version of this would be a script that configures one port with the desired IP/mask and creates the default route. It then puts the other port in promiscuous mode and monitors it for traffic using a libpcap based program or even a possibly modified tcpdump. As soon as it sees any traffic, it switches the configuration and starts monitoring the other port. This could probably be written in 2-4 hours given a network to test it in.
For a possibly simpler solution (i.e. no code to write), use a pair of additional Linux systems. Configure each of them to load balance with LinuxVirtualServer (aka LinuxDirector) or the Pirhana version of it to as many backend servers as you have, BUT to Different internal IP addresses. Good choices would be 10.* addresses, say 10.0.1.* and 10.0.2.*. Using either a dual NIC or two NIC cards in each server, create two networks with one for each of the load distribution servers. Configure Apache et al to respond to the IP addresses of each network the same.
BTW, there are 4 port 100-base-T cards out there, from Adaptec I think.
Good Luck!

--
Stephen D. Williams
Linux High Availability project by dashuhn · 2000-05-23 20:40 · Score: 5

You may want to check out http://linux-ha.org/.
The "heartbeat" application implements node-to-node monitoring over a serial line and UDP and can initiate IP address takeover based on a notion of resources provided by nodes and resource groups. It worked well for me. However, this was only a very basic two-node setup.
Shoot your consulatnt by arivanov · 2000-05-23 21:48 · Score: 5
Shoot your consultant. With a big gun. Only an idiot will suggest a layer 2 failover for a unix system.
1. Your consultant should learn routing protocols
2. Your consultant should learn the concept of a loopback alias.
3. Your consulatnt should have an IQ of above 25
4. There is absolutely no need for link layer 2 failover where layer 3 will do. Unix is not WinHoze. It knows about routing.
So your task list is:
- Configure loopback aliases on the linux boxes.
- Configure apache to listen only the loopback alias interface.
- Build gated from rhat sources they have the patches for linux-2.2 in already. You may use zebra CVS instead but it is still a bit off in terms of stability. You may need a script that HUPs it a few times gated as gated does not always start clean and update the routing table on 2.2.x.
- Configure ospf on gated and on your cisco gear. Distribute default into OSPF as gated from the 3.5.x tree has no IRDP.
- Shoot your consultant
In btw: your bill is 500$.
--
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
Linux does support this... by austad · 2000-05-23 20:56 · Score: 5

You could go with an expensive commercial solution like BigIP from F5, but those will run you at least $30k or so. You could also use Polyserve Understudy, which does pretty much the same thing only under Linux, and it's only about $400 or so. If you have all this expensive Cisco equipment and a Cat6000, you can run Local Director on that without buying additional hardware.

However, I suggest:
http://www.linuxvirtualserver.org or
http://linux-ha.org or
http://www.eddieware.org

It all depends on your application that you're running. If it's just http, any of these will work, but if it's something else, you're stuck with linux-HA or Linux Virtual Server. Eddie will only do http as far as I know. Plus Eddie uses Erlang, which may affect performance.

--
Need Free Juniper/NetScreen Support? JuniperForum
Wrong failure being addressed by igaborf · 2000-05-23 21:38 · Score: 5

You're attacking the wrong problem. What you need first is not Linux failover but consultant failover: Your consultant has failed; you need to switch to a new one instantly.