Linux Failover?

← Back to Stories (view on slashdot.org)

Posted by Cliff on Wednesday May 24, 2000 @01:30AM from the is-this-a-real-problem? dept.

Anton asks: "This is a question about Linux failover in business situations. We are a growing B2B company; our product runs on Apache/Linux. We contracted professional services to properly set up our network. After all the hellishly expensive CISCO hardware had been set up it turns out that for our servers to be configured for failover, each one needs two dual-port NICs configured for one IP connected to two different switches, furthermore the driver needs to intelligently switch the ports when the active port fails ... We've never heard of such beasts for Linux and a net search revealed nothing. Our consultant however claims that 'Linux is biting itself in the foot' for not supporting that, and that other industrial strength OS's like solaris in fact do support this. Has anyone run into this before or have other ideas? " [nik suggests]: Take a look at Polyserve Understudy, which might be an alternative. FreeBSD and Linux versions are available (and bundled with FreeBSD 4.0).

26 of 203 comments (clear)

Min score:

Reason:

Sort:

Solaris "Alternate Pathing" by Anonymous Coward · 2000-05-23 21:04 · Score: 3

The Solaris feature refered to is AP or Alternate Pathing. It is used for disk devices and network devices. Basically, you create a psuedo network device. This psuedo network device is attached to two real network devices. Since the OS is doing all network activity through the psuedo device, it is capable of working out of either real physical network device without the applications (or the rest of the OS) being aware there was a change. It is similar for disk.
The main reason for AP is for the DR or Dynamic Reconfiguration feature. If you've got three system boards, then you can have some redundant hardware so that you can take down and remove a system board *while the OS is still running*, and keep your network connection going without missing a beat. (Same for disk.) Neat stuff.
Been there, wrote it myself by bluGill · 2000-05-23 21:01 · Score: 3

I wrote software for Solaris (Which as others have pointed out does not do this without 3rd party software) because we found that no solution would fit our needs well. When looking at the prive tag we concluded that we could do better. (Come to think of it there is High avaiable Solaris, but it isn't cheaper or better then 3rd party stuff)
Basicly we ping something on the other side of the router every 5 seconds, and if the ping doesn't come back we switch to the other port. That is the overview, but you need to do some more isolation before you blindly switch ports.
I strongly recomend you put in some other path between you and the box you are pinging. Several times we have been bitten when the box we weere pinging went down and not the router, or alternaticly the network was so busy the ping didn't get through within our timeout period.
There is no portable way in solaris to tell if one ethernet port has signal. You can find out from some drivers, but when you change to a different ehternet card you have to do something else to find out.
linux option by mattdm · 2000-05-23 20:47 · Score: 3

http://www.linuxvirtualserver.org/

including

Piranha

--
dual port nics? by kevin+lyda · 2000-05-24 00:20 · Score: 3

and when the dual port nic dies and takes both interfaces with it you switch to psychic networking? i'd really suggest two nics not a single dual one.

that said, linux can do routing. why not set up a loopback device and then have it route through either nic? ip was designed to deal with multiple routes, why must your consultant reinvent the wheel? (loopback addresses are published, so they'll be seen on the network)

--
US Citizen living abroad? Register to vote!
Q: is this a good solution? by Felinoid · 2000-05-23 21:04 · Score: 3

I hate saying "I hate to tell you this but" when I really enjoy telling someone something.

So I won't lie.. I'm happy to tell you....

This may not be the right solution.
The problem isn't the consultents. They know there stuff otherwise you wouldn't be paying them.
They come with years of experence and bieses.

Expect a consultent who isn't friendly with Linux to dig up a solution that will not work on Linux.
Yes Solarus can do a lot of great things Linux can not. In the end Linux has one great advantage and thats price. Source code and quick security patch relases is a bonus.

This rule holds for an NT shop....
And don't put a consultent past finding a feature Solarus dosn't have. Your talking with some of the greater frelance tallent in some cases and if a defect is to be found they can find it. Once found that defect becomes a case for switching to something the consultent likes.

So once you pick a platform for your shop pick a consultent who is buddy with your choice. If then he recomends something else it will be after bleeding dry all posablitys.

This may not be the only way to do it...

and... it may be the worst way to get it done...
Linux has it's limits don't get me wrong but you can allways find more than one solution. Linux may support 4 out of 8 solutions.. if your only presented 1... there may be a reason...

--
I don't actually exist.
OS Level failover by PenguinX · 2000-05-23 21:37 · Score: 3

One of the things that I have found is that OS level failover doesn't always work or will have odd problems. If you are looking for Enterprise level uptime then hobbling together a solution such as this is not for you. The company I work for uses a cisco localdirector to do the work for it. What's great about this sort of solution is that a localdirector will round robin, do failover, and such a dizzying array of things that it's wonderous. I would suggest you look into this solution or one similar
This is the wrong question by FascDot+Killed+My+Pr · 2000-05-23 20:37 · Score: 3

Your question: "Here's what the consultant told us is the solution to our problem. Where can we get the hardware?"

What you want to know: "Here's our problem. Here's the solution the consultant came up with. What improvements can /. suggest?"

For instance, why do you need dual-port NICs? If it's just for the throughput, why not just use 2 single-ports? This also provides redundancy in the hardware department.
--
Have Exchange users? Want to run Linux? Can't afford OpenMail?

--
Linux MAPI Server!
http://www.openone.com/software/MailOne/
(Exchange Migration HOWTO coming soon)
1. Re:This is the wrong question by x0 · 2000-05-23 21:37 · Score: 4
  
  why do you need dual-port NICs?
  
  On Suns at least, the dual (well, quad) port NICs are used as a heartbeat signal between the active server and the failover box (when using FirstWatch).
  
  True, you could use two separate NICs in each box to provide the same solution, but then you are using up three PCI slots since the heartbeat NICs do not carry any packets.
  
  I am wondering why NICs with more than one port are so danged expensive though? I can see a bit of an increase in price, but there is no way these things should be $400 and up (last time I looked..)
  
  --
  In the immortal words of Socrates, who said; 'I drank what?'
Re:Failure to implement open standards. by yesod · 2000-05-23 22:28 · Score: 3

I believe Juniper and 3Com also support the use of VRRP.

You may not be able to implement HSRP without paying Cisco a license fee. I'm not sure if anyone has approached Cisco from an open-source viewpoint though.

As for a public implementation - I should have a Linux VRRP implementation out this week.
network failure should be handled by network HW by jalex · 2000-05-23 22:15 · Score: 3

If a network node goes down, it's better if network equipment handles the failover. The server should only handle the failover if the network hardware is incapable of doing so. The server is the weakest part of the network, so you want as little as possible depending on the server.
Say you have a machine with two dual port NICs or even two NICs. Have a script that checks the main network interface every ten seconds. If the main interface becomes unavailable, unload it and load the second interface with the same IP and reset all of your routing information.
If you have a server with that kind of "need" then maybe you should consider having a better routing setup altogether. Consider how www.netscape.com will actually resolve to several IP addresses. The options are numerous. The main issue, is that linux works just fine. (Although freebsd has sexier networking)
Hardware Support by TheCarp · 2000-05-23 20:41 · Score: 3

> We are a growing B2B company;

Good to know that you are buzzword compliant...
I understand thats very important to some people,
and if I ever figure out who those people are, I
will probably avoid them like the plague.

As for fallover...check out 3com....long ago a
man (who would later go on to teach Unix courses
at WPI and be one of the best teachers I ever
had for anything) designed a piece of hardware
with 1 ethernet port on one side, and 2 on the
other...it was designed to do JUST THAT.

Completely in hardware. He did it for a company
that was later bought out by 3com...he claimed
(a couple of years ago, when I was in his course)
that they still sell the product that he designed.

Of course, why thats even needed is beyond me.
For better redundancy, you really want seprate
redundant servers, each with RAID arrays and
probably a couple of localdirectors (or round
robin DNS for a cheaper solution) direcing
connections between them (giving both fallover and
increased availability) but...thats just IMNSHO.

Afterall, if a CPU fries, or a power supply starts
letting its magic smoke out...all the duel port
NICs in the world wont help.

--
"I opened my eyes, and everything went dark again"
Other thoughts by Ron+Harwood · 2000-05-23 20:53 · Score: 3

Making the assumption that you want a web-farm, and for maximum availability, you are using two cisco switches, and two cisco PIXes, and two cisco local directors... you can still get away with a single NIC (not that you have to) put half of your web-servers on one switch, and half on the other.

Otherwise, you can put two NICs in (one on each switch) and assign them each their own IP address... no need to fail over... although I would look at the F5 BIG/IP - as it can make sure that your servers are serving up content... the Local director isn't as good at this.

--
BlackNova Traders
redhat piranha? by mikeee · 2000-05-23 20:53 · Score: 3

I think piranha does just this. Byte just ran a look at Linux HA clusters: http://www.byte.com/column/BYT20000510S 0001
Re:huh ? by dwakeman · 2000-05-23 20:55 · Score: 3

http://support.in tel.com/support/network/adapter/pro100/30504.htm That is the dual port adapter that comes in Dell Servers....Has anybody used this before? D
Building Linux Clusters - O'Reilly by howard_wwtg · 2000-05-23 20:59 · Score: 3

There are many good ways (and even more bad ways) to build a redundant/failover environment. Your consultant is just "seeing the one solution he knows). As others have posted there are many good solutions already out there.
Building Linux Clusters is just what you should read.... Uncoftunately it won't be out until August.
What you need by Anonymous Coward · 2000-05-23 20:47 · Score: 4

Everything you need is at High Availability Linux.

I too am/have built a B2B exchange on the linux platform and found JServ to be *INCREDIABLE* at HA/Failover safe features.

As for the 2 network cards for each machine, that too is a *VERY GOOD* thing. It allows you to partition out your network traffic to achieve much better response time. For example our network has 2 NICs in each machine. There is "Web Server to Database" network, There is a firewall to webserver network, and we have a seperate network for office web surfing and misc stuff like that. Access to the "WebServer to Firewall" network is handled across the router.

One thing to keep in mind when dealing with DB aware web applications is that unless your code is *VERY POORLY* written the biggest bottleneck will be in network latentcy.
-Grimsaado
Ignorance of options is not a failure by ch-chuck · 2000-05-23 21:10 · Score: 4

When Linux was cranking up last year the folks at TurboLinux sales called up promoting their fault tolerant cluster solution, altho it's not free you could get a timed demo - so Linux solutions exist.

Just a general observation - Linux is pretty well fleshed out with about anything you can think of in one form or another, it just isn't chasing you down with in-your-face ads and high pressure sales promos like other comercial products, so it may appear to be deficient but more often than not just a few days (for us slow pokes) search and trials will usually turn up an inexpensive quality solution in some stage of development hidden somewhere.

--
try { do() || do_not(); } catch (JediException err) { yoda(err); }
Failure to implement open standards. by jcostom · 2000-05-23 21:37 · Score: 4

It never ceases to amaze me. Companies want to sell you obnoxious amounts of software and hardware to do something as simple as create a highly available system. Not to mention projects like the Linux-HA people, who while are doing a good thing, are (IMHO) heading in the wrong direction (using RS-232 heartbeats is silly).
Has anyone considered VRRP (Virtual Router Redundancy Protocol)? It's an actual open standard, and it works. It not only works, it works amazingly well.
One of the major users of VRRP technology is Nokia. They've done extensive work on the protocol, and use it in their line of firewalls (which btw run a heavily modified FreeBSD codebase).
VRRP uses multicast packets that are similar to OSPF "Hello" packets to let the partner(s) know it is alive. If the primary machine dies, the backup instantly takes over. When the takeover happens, it not only assumes the IP address of the dead machine, but it also answers for the MAC address of the dead machine.
--

--

The unsig!
Re:Linux High Availability project by SEWilco · 2000-05-23 21:13 · Score: 4
Notice there are many links to related HA items there at the Linux High Availability Project. It sounds as if you're looking for something like FAKE, which lets a machine acquire the IP of another machine in a failure (note that FAKE points out that it has been moved into the "Heartbeat" code at Linux-HA) -- although some link chasing is necessary to learn where it went.
"...dual-port NICs...switch the ports when the active port fails...
Oh, I see. When one port (or its path) fails, you want to switch the IP to a different port? I don't think "the driver" needs to do that, just change the IP assignments with ifconfig.
- Monitor each link with some sort of heartbeat.
- When there is no response, assign the IP of that link to the backup interface. Just use ifconfig to alter the interface configuration.
- Have the backup interface be on the other NIC, not "switch ports" as you mentioned.
- Dual-port NICs are not needed, if you can fit 3-4 NICs in your machine.
- Have heartbeats running on backup and downed interfaces also, to report problems and repairs.
Feh! by Greyfox · 2000-05-23 21:36 · Score: 4

My experience with consultants is that a good many of them are clueless. The reason they're consultants is they can easily BS the customer into believing they know what they're talking about long enough to bleed you dry. They may even provide you an actual solution that may even kind of work but which is patently the worst way to do what you were wanting to do. Then when you DO get someone in who knows what he's doing, that guy will have to spend twice as long beating your company into shape because he has to go back and undo everything the previous one did.

--
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
I think we need more information here. by Ron+Harwood · 2000-05-23 20:42 · Score: 4

What kind of application are you trying to fail-over? A database? A web-server?

If you wanted a web-farm, that's dead stupid easy. Fail-over database/ftp/nfs isn't too hard, but (presently) requires commercial software. Understudy Polyserve, Wizard Watchdog, or even RSF-1 are just some of the HA clustering products available.

--
BlackNova Traders
Easy solutions by sdw · 2000-05-23 20:51 · Score: 5

First, it wouldn't be that difficult to modify the kernel to support this. I've released patches to work around bugs in the ARP reply with two NIC's. Routing to the active port shouldn't be too difficult.
A non-kernel invasive version of this would be a script that configures one port with the desired IP/mask and creates the default route. It then puts the other port in promiscuous mode and monitors it for traffic using a libpcap based program or even a possibly modified tcpdump. As soon as it sees any traffic, it switches the configuration and starts monitoring the other port. This could probably be written in 2-4 hours given a network to test it in.
For a possibly simpler solution (i.e. no code to write), use a pair of additional Linux systems. Configure each of them to load balance with LinuxVirtualServer (aka LinuxDirector) or the Pirhana version of it to as many backend servers as you have, BUT to Different internal IP addresses. Good choices would be 10.* addresses, say 10.0.1.* and 10.0.2.*. Using either a dual NIC or two NIC cards in each server, create two networks with one for each of the load distribution servers. Configure Apache et al to respond to the IP addresses of each network the same.
BTW, there are 4 port 100-base-T cards out there, from Adaptec I think.
Good Luck!

--
Stephen D. Williams
Linux High Availability project by dashuhn · 2000-05-23 20:40 · Score: 5

You may want to check out http://linux-ha.org/.
The "heartbeat" application implements node-to-node monitoring over a serial line and UDP and can initiate IP address takeover based on a notion of resources provided by nodes and resource groups. It worked well for me. However, this was only a very basic two-node setup.
Shoot your consulatnt by arivanov · 2000-05-23 21:48 · Score: 5
Shoot your consultant. With a big gun. Only an idiot will suggest a layer 2 failover for a unix system.
1. Your consultant should learn routing protocols
2. Your consultant should learn the concept of a loopback alias.
3. Your consulatnt should have an IQ of above 25
4. There is absolutely no need for link layer 2 failover where layer 3 will do. Unix is not WinHoze. It knows about routing.
So your task list is:
- Configure loopback aliases on the linux boxes.
- Configure apache to listen only the loopback alias interface.
- Build gated from rhat sources they have the patches for linux-2.2 in already. You may use zebra CVS instead but it is still a bit off in terms of stability. You may need a script that HUPs it a few times gated as gated does not always start clean and update the routing table on 2.2.x.
- Configure ospf on gated and on your cisco gear. Distribute default into OSPF as gated from the 3.5.x tree has no IRDP.
- Shoot your consultant
In btw: your bill is 500$.
--
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
Linux does support this... by austad · 2000-05-23 20:56 · Score: 5

You could go with an expensive commercial solution like BigIP from F5, but those will run you at least $30k or so. You could also use Polyserve Understudy, which does pretty much the same thing only under Linux, and it's only about $400 or so. If you have all this expensive Cisco equipment and a Cat6000, you can run Local Director on that without buying additional hardware.

However, I suggest:
http://www.linuxvirtualserver.org or
http://linux-ha.org or
http://www.eddieware.org

It all depends on your application that you're running. If it's just http, any of these will work, but if it's something else, you're stuck with linux-HA or Linux Virtual Server. Eddie will only do http as far as I know. Plus Eddie uses Erlang, which may affect performance.

--
Need Free Juniper/NetScreen Support? JuniperForum
Wrong failure being addressed by igaborf · 2000-05-23 21:38 · Score: 5

You're attacking the wrong problem. What you need first is not Linux failover but consultant failover: Your consultant has failed; you need to switch to a new one instantly.