Linux Failover?
Anton asks: "This is a question about Linux failover in business situations. We are a growing B2B company; our product runs on Apache/Linux. We contracted professional services to properly set up our network. After all the hellishly expensive CISCO hardware had been set up it turns out that for our servers to be configured for failover, each one needs two dual-port NICs configured for one IP connected to two different switches, furthermore the driver needs to intelligently switch the ports when the active port fails ... We've never heard of such beasts for Linux and a net search revealed nothing. Our consultant however claims that 'Linux is biting itself in the foot' for not supporting that, and that other industrial strength OS's like solaris in fact do support this. Has anyone run into this before or have other ideas? " [nik suggests]: Take a look at Polyserve Understudy, which might be an alternative. FreeBSD and Linux versions are available (and bundled with FreeBSD 4.0).
The old fashioned way of doing it is to set your availability requirements, then pick a solution. Many people have unrealistic expectations or unclear availability goals. The usual conversation goes:
:-)
You: What availability do you want?
Client: 100% - It cannot ever go down at all or the world ends
You: Right, that'll be 10 million a year
Client: How about we schedule some downtime every week?
Be careful of references to 'industrial strength' operating systems. There are some very good products available but they are not part of the base OS, they can cost serious money and manufacturers will make you go through hoops to deploy them.
For example AFAIK SunCluster configurations are built and validated by Sun and then shipped to you. They are also only available on cluster certified hardware models. If you want to run non standard software on them they will probably want that signed off as well.
You do not automatically get application failover unless the application supports it in that particular cluster environment. Commercial cluster products tend to be biased towards transaction processing and database environments (because that's where the need and the big money is) and may not be the best option for web serving etc.
If you want more details the enterprise computing sites of all the big manufacturers have a stack of three hundred page white papers that conclusively prove they are all the best at everything
Posting anon again because Slashdot never seems to send email to Demon users.
Martin (martin@nospam.sharrow.demon.co.uk)
For a government project requiring network redundancy I have used the following ethernet link protector. http://www.shoremicrosystems.com/sm-2500.htm It's two outputs run to two seperate 3com switches. It also has a output contact to tell you when it has failed over.
I had a similar problem a few years ago and wrote some scripts to do failover clustering.s ue64/3247.html /lj/listings/issue64/3247clusterd.tar.gz
I published it in the Linux Journal last year:
http://ww w.linuxjournal.com/cgi-bin/frames.pl/lj-issues/is
source/scripts:
ftp://ftp.ssc.com/pub
It would solve your problem with layer-2 and layer-3 failover but, without modification, would require two identical servers.
Yes, it will decide which switch/router is actually still live for the next hop routing aswell - it can determine which switch has failed if any by pinging a list of 'supposed to be reachable' addresses.
It is a bit dated and I haven't really kept the code upto date but the principles are there and it is has been used for a long time with no major problems.
Philip J Lewis
Network Consultant
Dome Computer Consultants Ltd
UK
mailto:slashdot@*REMOVEME*linuxcentre.net
What type of network connection do you have? Assuming you are using fast ehternet Ramix makes a dual port 100BFX NIC with hardware failover. I don't know if they have Linux drivers but they definatly work with other UNIX platforms.
Here's a configuration that should work.
1) Compile the kernel with the bonding device as a module.
2) Set your ethernet cards up through a bond device.
3) Plug NIC #1 into switch #1, and NIC #2 into switch #2. Let's say they're on vlan 100.
4) Trunk vlan 100 between the 2 switches.
5) Set your spanning tree config to favor devices seen through switch 1 (your primary).
6) Activate the bonding device. Once you do this, both switches will see the same MAC address, and switch #2's spanning tree config should cause it to administratively shut down the port that is connected to NIC #2.
What happens next is that all traffic goes through NIC #1, and switch #1. However, if switch #1 goes down, or if NIC #1 goes down, then switch #1 will surrender to switch #2 and allow switch #2 to bring up NIC #2, which will share the same IP address and MAC address as NIC #1 (read the doco for the bonding driver if you don't understand how this works).
Actually, this doesn't even require trunking the vlan, but it does require that you have switches that have spanning tree, that you pay careful attention to the spanning tree config, and that both interfaces can see each other so the spanning tree config will isolate them on the network.
NOTE!!! YMMV I don't know for sure that this works under linux. I know that this config should work with sun trunking, and reading the bond_xmit() function in the bonding driver it looks like it should work. I will test it on some cisco equipment if I get a chance.
If you want me to help you set this up, send me email, as I'm available to do such work.
-Peter
== Just my opinion(s)
Actually, this should be doable with channel bonding (where the MAC address is identical across all interfaces) combined with a conservative spanning tree configuration (having trunking between both switches would be sugar on top, in case of the upstream interface dying on the switch).
See my other post for a likely configuration.
-Peter
== Just my opinion(s)
And bay and ipivot. It works, and it's a publicly available RFC. Much nicer situation then HSRP.
-Peter
== Just my opinion(s)
You're out of date as of solaris 2.6. The sun trunking module (cost of about $1000 per system it's on), bad documentation and all, takes care of this problem far better then any set of scripts can.
Linux's bonding driver addresses this, too.
-Peter
== Just my opinion(s)
Have you ever talked to someone who has had sun consultants come in and try to do a sun cluster install? Maybe they've gotten better since last year, but I talked to at least 2 companies that had been over-promised a working solution by sun. One company ripped sun's work out of their data center (sun didn't have the cluster working a couple of months into the project, and the 3rd party worked far easier and faster) and the other settled for a reduced spec.
*BLECH*
-Peter
== Just my opinion(s)
You can avoid the problem of having to share the channel between switches by using spanning tree to shut one off until it's needed. This ensures that only one interface and switch pair is active for a particular bonded address at any one time, eliminating the need for the switches to have to load-balance between those 8 channels that they have available.
This solution expands to an arbitrary number of switches and an arbitrary number of host interfaces.
-Peter
== Just my opinion(s)
Really? The following code looks pretty foolproof (from bonding.c, bond_xmit()):
while (good == 0) {
slave = queue->current_slave->dev;
if (slave->flags & (IFF_UP|IFF_RUNNING)) {
skb->dev = slave;
skb->priority = 1;
dev_queue_xmit(skb);
good = 1;
}
if (queue->current_slave->next != NULL) {
queue->current_slave = queue->current_slave->next;
} else {
queue->current_slave = queue->head;
}
}
Care to explain where the problem lies in detecting if the card is active or not, and how this is less reliable then an IGP?
-Peter
== Just my opinion(s)
The bonding driver does this for you w/o needing much/any intervention from the user.
-Peter
== Just my opinion(s)
Have you priced out a pair of fibre channel arrays, a fibre channel extender (over dark fibre or oc3) and a raid 1 driver? It'll be slow as shit (speed of light for the data to travel through the fibre channel->WAN->fibre channel) but it'll get all of your content somewhere else in real time. It'll probably cost you as much as an EMC box, though, which does it a bit more... sanely.
-Peter
== Just my opinion(s)
Right, it isn't the best design. However the redundant machines are designed so we can operatue without them for several minutes if need be. The recovery procedure if the Master really does fail involves the backup rebooting. (These machine control other hardware that must work all the time, and that hardware is both more robust and can operate without the controllers for a short time if need be. You just lose access to the disc so you cannot reconfigure them.
I didn't want to get into all the head aches we faced due to the bad design above, it is byond the scope of the orginal question. We are however re-doing things to fix that.
Uh, your knowledge has been shot in the foot. You dont need a 3rd party app for failover in Solaris. Sun offers Sun Cluster (current version is 2.2, 3.0 is going to be released next month for Solaris 8)
There is also Veritas First Watch.
Yea... Those boxes are really nice. The company I work for sells them. You gotta dig that big red ball on the front.
They are that expensive, because people with those kinds of needs are willing to pay that much money.
It's the same everywhere, where have you been?
I just remembered this old Metallica song. . .
These are my friends, See how they glisten. See this one shine, how he smiles in the light.
SCO UnixWare does support NIC failover (for cards that support MAC address reprogramming), unfortunately it's not one of the most stable OS that I saw.
Sometimes the same computer must complete the transaction, using another one is not an option (telephony come to mind).
Two Unix based 'routing' daemons are:
Alternatively, assuming you've eliminated another single point of failure by running two routers, is to run HSRP ( since you mentioned Cisco ) on the routers. You could set up the network in such a way that you can dual home each server to a separate switch, without needing link failover or even a routing protocol.
There are several ways to kill this "problem", but the way your *ahem* consultant is recommending sounds like the most overly complex solution. Your consultant may know their systems/lan configuration, but shows little knowledge of routing.
My 2 cents is get a new consultant.
------
---
Segmentation Fault ( core dumped )
>Oh, that was a good joke. Yes, Nokia uses it. >And nobody else
You are talking complete bollocks, just ask Bay, Alteon etc. VRRP works, easy to configure & a lifesaver operationally.
>But of course they didn't even bother to >contribute it back to the FreeBSD Project. Talk >about clueless corporate idiots.
The only clueless idiot around here is the gobshite who posted the prior article.
>Ah, whatever. Sorry for the rant - this whole HA >scene seems to be more annoying than the rest of >the bunch. They all operate in "a customer who >needs that must have money to burn - let's >charge him hefty" mode.
Hmmm you have a peculiar notion of what is HA. Banks etc are not interested in mickey mouse h0x3r solutions with no support.
greg
Moderate this the "dumbass" guy down! What the hell? How did he get moderated up to a 2/
Jerk.
In addition to linux-ha, which includes links to Linux Virtual Server, Piranha, Ultramonkey, you can also find organizations that do this for a living. One (the company I work for, to be honest) is Mission Critical Linux. Specify what your needs are, exactly (web service, database failover, file system, etc), then look around.
By the way, is your consultant a reseller of Solaris (since I see he suggested that)?
jeff
Looks like your consultant is shooting himself in the foot. There are many ways to produce failover-- a couple bash scripts could probably do it. I think your consultant just wants to make a sale of some of that expensive hardware, or is not creative enough to think of alternative methods.
DecNet is obsolete. Plain ol' ethernet bridging (not to be confused with routing) requires the ability to set arbitrary MAC source addresses on each outgoing packet. Various IPX-based failover and routing systems require this capability too. Also the Ethernet people don't always get their standards straight, requiring support for yet another header format.
I haven't actually seen a network card that doesn't support arbitrary MAC, but I suppose that some old 8-bit ISA cards may still exist--if you have such a beast, mount it on your wall and get a network card capable of at least a megabit of throughput after host bus overhead.
Note this change usually isn't permanent, i.e. we're not overwriting the NVRAM on the card or anything. The capability is simply due to the fact that the chip doesn't prepend the ethernet header for you, so the software has to fill in the second six bytes of each packet. Linux reads the MAC address from the card's ROM as a default, but you can override this with 'ifconfig', and for Linux bridging the source MAC address is set for each packet forwarded across the bridge.
I'm willing to bet that a lot of NIC chipset designers intend for their chips (or at least most of the die) to be useful inside switches as well as inside network cards. Why design two different devices when you can just design one and sell it twice?
-- I avoid spam by accepting only OpenPGP encrypted or signed email at this address. Clear-signed, RFC2015, heck, even
Both hme0/hme1 have the Same IP and Same MAC address.
Just a little perl script, failover the card and reconfigures EFS on the fly.
Works ok, but could be more automated. This would work perfectly on any Linux box.
Then of course there is multiple catalysts, local directors, multiple pipes with multiple carriers, OSPF... Very good network design.
Only outage is when GTE/USWEST cuts those damn fiber cables....
Right now we are working on 100% failover, 5 Nine HA solutions. Sun cluster, Veritias file system, Oracle HA solutions. Even moving up too a few Sun 10Ks.
Depends on how far you want to go for Reliability.
-IronWolve
The consultant says Linux is "shooting itself in the foot" for not supporting failover?
I've got news for your consultant; Solaris folks buy a third-party product when they want failover capability, such as Legato (formerly Qualix) HA+.
Is Sun shooting themselves in the foot, too?
Third-party products are available for Linux, just like they are for Solaris etc. Buy them if you need them.
--
Ok, I don't want this to sound like an ad, but I work for a company that has a product that should fit your requirements. I work at a startup (called Netboost) that was recently acquired by intel and our primary product is a strong arm based dual port NIC with Linux, BSDi, Solaris, and NT driver support. Here's a url with some info. I'm not in marketing, so I don't even know how you'd go about getting one of these (though you could send me mail and I could try and point you to the right person if you were really interested). There is an API that lets you write code for the NIC to handle packets as well as a bunch of sample code (some stuff I write). Anyway, check it out if your interested.
Rather than failover, consider using a load-balancing device. You'll get use of all of the boxes (and more can be added as traffic dictates) and it will automatically stop routing traffic to a downed box. It can be transparent or semi-tranparent to the servers. Of course, the application has to support being run from multiple servers simultaneously.
There are a number of commercial products from F5, Alteon, Extreme, Intel, and others.
Slashdot has been saturated with 'IANAL' for several weeks now. I think we need a new acronym for this discussion: IDKCAHA. 'I Don't Know Crap About High Availability.'
I've been trained in IBM's high availability product, HACMP, 'High Availability Cluster MultiProcessing' and manage a few production clusters.
This is typical of what I'm hearing on Slashdot today...
If a network node goes down, it's better if network equipment handles the failover.
It ain't that simple. The node that went down has resources other than network ports. What about the application? What about the filesystems? What if the network connection is up but the application or filesystem is down? Show me a chunk of networking equipment that can handle those failures and I'll send you a dollar.
In our production environment, we have an Oracle database running in an IBM HACMP cluster. The SSA (think: SCSI over token ring) drives are shared between the primary node and the backup node but can only be active on one node at a time.
Should a network card fail, in the primary node, it will down the sick interface and bring up the IP and MAC on the backup card. Linux, too, can do that rather easily.
But what if the problem isn't the network. What if the applicaiton on the primary node has failed? HACMP can down the primary node and bring the application up on the backup node, taking the disk drives with it. I have yet to find a Linux tool that will do disk failover.
I've also seen comments in this discussion to the effect that secondary heartbeat paths are 'silly'. Obviously, the person who made that comment is insane. (We use a target mode SCSI over SSA for a heatbeat.)
If your only heartbeat is running over the network and the network fails, neither node knows if the other is up and both nodes will attempt to claim the disk resources, come up on the same IP/MAC address and start the applicaiton. This is Very Bad. What do you think will happen when the network comes back online? I'll give you a hint. All of a sudden you've got duplicate nodes on the network. That's never good.
To answer the consultant question, I think the person is stearing you in the wrong direction. Either you need to rethink your failover solution or you need to move to a platform that better supports the type of failover you want to do such as AIX or Solaris.
Real Soon Now, I expect that there will a viable Linux HA solution. However, there ain't one now and that's where you are. Since you're a B2B startup with venture funds to burn, I suggest you throw some of that money toward IBM's HACMP or another commercial solution on a platform that has had an HA solution for more than a few months. I'd hate for your company to save a few bucks by using Linux and lose lots of money to downtime.
InitZero
(let the flames begin)
Yup, sure... append lines in lilo.conf are unformatted strings passed to the kernel. The kernel then passes them to the appropriate routine (or module, if you're modular) which does whatever the module-specific commands are telling it to do. In this case I'm telling Donald Becker's network drivers not to stop looking for 3com cards until all four have been found. For more details see the documentation.
--Charlie
For those of us who do not yet know, what does "B2B" stand for? Thanks, -S
Scott Ruttencutter
We Apprentice Developers and Designers
Free as in beer.. No
Free as in non-proffit...
Sun dosn't make any money but your paying for shipping.
Thats not free...
I don't actually exist.
There's also a problem with the LD if you need sessions in your application. As long as you have big enough servers that can handle all the traffic coming from behind proxies you can always use the IP-sticky in the LD but experience with this has shown me that sometimes the load balancing can be really bad with this scheme. If the LD just happens to throw three or four big proxy addresses at one machine it will get bogged down and the whole idea with the LD is lost.
There is ofcourse the ssl-sticky for ssl sessions over http. But there's a problem with that too. If the client uses IE and sits behind a proxy the ssl-session ID will not come from the client. IE by default uses http 1.0 when talking to a proxy which doesn't allow sessions. This is probably due to the fact that up to a very late point in time MS supplied proxies that talked only 1.0 and now they correct that by disabling the newer protocol in their browser.
At the webserver end it seems like the client would be talking 1.1 but that is not true since the protocol info comes from the proxy, which talks 1.1.
There might be a solution to this but I haven't been able to find anything yet. Otherwise the LD is a working solution that also provides failover for itself if you have two of them.
Did I say it was anyhing else? I only stated that I didn't use the Pirhana package.
In fact, according to the LVS site, its not linux virtual server.
Piranha is the clustering product from Red Hat Inc., it includes the LVS kernel code, a GUI-based cluster configuration tool and cluster monitoring tool.
If this is important enough to require high availability, and you are forking out the dough for all the cisco gear...
what other software are you using, and why are you cheaping out to use linux?
I mean, look.. I love linux... don't get me wrong. And linux *can* do this, but it'll take work.
But if you want something that already does it, and considering the money being spent.. why not go with what the consultant said and pick up the suns he recomments and the failover gear he recommends?
Sheesh. I wanna use linux at work too, but when it comes to a platform for a $60,000 piece of sofwtare, it sure didn't make much sense to argue that linux was 'cheaper' than solaris..
Because having an operating systems allows for custom tweeking. I have written numerous perl scripts to do VIP IP accounting and uptime statistics outside of the BigIP interfaces. Having a device like this adds FLexablility. As for the Local Director or ServerIron these platforms tie you to their interfaces to access data without the option to customize.
Sorry man I don't controll the aliens.
Those who can, do; those who can't, consult.
:-)
funny, that's why I just got into consulting...
I figured if people who don't know what they are doing can make good money telling people how to run their shops, think of what someone who does know something could make.
For this we used the Linux Virtual Server Project and also The Linux High Availability project.
This provides a great, resiliant service, the project is live and running like a dream !!!!
Dont believe what you hear from these overpriced consultants.
In a non ciritcal situation I simply use vhosting on a backup machine that pings a private IP every N seconds. If the private IP fails I can assume the primary machine died and take over its IP. If I see it come back online I release the vhost. Yea, there might be problems w/two machines having the same IP for a second or two but it hasn't failed for me yet. The bigest problem wasn't the failover; it was keeping the machines synced.
We are trying to implement this same thing here on out production
network. EVERY machine has 2 connections to any subnet that it is
attached to. We are doing this to eliminate the single point of
failure for any one switch.
The trick is you want one NIC port to failover to the other. One way
to do this is to use the channel bonding stuff in the 2.2.15
kernel. (see linux/Documentation/networking/bonding.txt, or some
such).
This creates a virtual interface that you associate real interfaces
to. All real interfaces get the same MAC address and behave as one
virtual NIC. This gives you two advantages:
- increased bandwidth
- failover
Should one of the NICs fail, the driver will use the good one.
The only catch here is that you need a switch that supports it. It
called Trunking or Channel Bonding or WhatEverVendorsCallIt. Also, you
want a switch that supports Trunking *across* switches. Should one
switch fail, the other will still work. The only switch I've found
that supports it is the BayStack 450. BUT it only supports up to 6
configured trunks. I need 24. Oh well.
The Linux channel bonding code could be hacked to support a failover
only scenario.
All very good points. So if we knew the problem the poster was trying to solve, we'd be in a better position to evaluate the proposed solution.
For instance, does he need to guarantee an uninterrupted stream? Or does he just need to guarantee the server is "always available"?
--
Have Exchange users? Want to run Linux? Can't afford OpenMail?
Linux MAPI Server!
http://www.openone.com/software/MailOne/
(Exchange Migration HOWTO coming soon)
That's a potential solution. The need is something like "have our webserver available 99.5% of the time" or "guaranteed database integrity" or something.
--
Have Exchange users? Want to run Linux? Can't afford OpenMail?
Linux MAPI Server!
http://www.openone.com/software/MailOne/
(Exchange Migration HOWTO coming soon)
To take that further, aren't they legally obliged to hold their shareholder's interests above those of their customers...
GoodPint
No - my effort is separate from the source forge project. That must have been setup fairly recently.
Its bussness to bussness. Just be glad they're not saying 'intrabussness e-commerce' or something like that...
ReadThe ReflectionEngine, a cyberpunk style n
I was under the impression that you had to pay if you were going to use it for commercial use
Sun are damn quick about getting security patches out.
Well, I've seen one study. They mesured the amount of security problems, and the time it took to solve them. They mesured linux, windows, and solaris. Windows and linux had hundreds of problems, and Linux was about twice to 3x as fast in getting in patches. In the end, linux had about 56 days in witch there were open problems, and m$ had a hundred or so.
Solaris had seven bugs. And seven hundred days when those bugs were known and open.
ReadThe ReflectionEngine, a cyberpunk style n
Oops...that won't deal with a failed server, only a failed Ethernet path. A similar configuration, only with FAKE-like IP reassignment in the functional server will allow the still-running server to get the traffic intended for the downed IP...of course, with redundant paths you have to ensure that the downed server gets a chance to first reconnect its backup.
after eating my greater than symbol, it looks like slashdot plain text really isnt just plain text after all
Musta been that god-forsaken Lameness Filter(tm)... could someone mv that to /dev/null, please?
"The best weapon of a dictatorship is secrecy, but the best weapon of a democracy should be the weapon of openness."
The poor guy has a solution involving Linux that he can't implement for want of a device driver. Everybody seems to be proposing alternate solutions which seems like using a sledgehammer to crack a nut. Is there nobody witha nutcracker?
As for a public implementation - I should have a Linux VRRP implementation out this week.
Very interesting. I suppose that this is related to http://sourceforge.net/project/?group_id=2181, or is that a different project?
As for VRRP, I believe that an accurate description of the protocol is available here
If J.K.R wrote Windows: Puteulanus fenestra mortalis!
Actually, my F5 Labs BigIP load balancers suppport VRRP also. Works rather wall too I might add. I'm actually running Redundant Border routers (cisco 3620's) with HSRP (cisco's version of VRRP), feeding two Nokia IP440's with VRRP on all interfaces, and behind them are a pair of BigIP's also using VRRP on both the interior and exterior nets.
Hardware address in terms of MAC address. ;)
If the the device 00-10-4B-C7-2F-3B suddenly goes tit's up, the ARP cache of your sender has to know what MAC address of your failover device is. In just about every ethernet card design the MAC address is either supplied by the software or from a writable register. How do you think the enbedded bridge code in Linux works
Nah, that's just cliche compliant ;-)
"It's tough to be bilingual when you get hit in the head."
>> We are a growing B2B company;
> Good to know that you are buzzword compliant...
> I understand thats very important to some people,
> and if I ever figure out who those people are, I
> will probably avoid them like the plague.
Good to know that you are buzzword compliant...
I understand thats very important to some people,
and if I ever figure out who those people are, I
will probably avoid them like the plague.
Error. Too deep recursion.
Unable to read configuration file '/bigassraid/htdig//conf/14229.conf'
Geocrawler error message.
All of us are Linux! Everyone who uses it, who wants it improved. It's us. There is no "They" who is going to shoot "themselves" in the foot. If Linux fails at something its because every single one of us was too lazy to implement the thing that we thought it should do. We have the source for crying out loud. What more do we need?
So enough of this us vs them crap. There is no "them".
Key to financial independence: Spend less than you earn. Save and invest the difference. Do it for a long time.
You can go on like this forever. To be very cautious, have multiple disks, multiple NICs, multiple databases, multiple network connections, etc. Then, copy that exact setup to a totally seperate physical location so that you not only have two of everything within one architecture, but you have two of the same architecture.
Once you've decided what you want to achieve by impelementing a fail-over solution, then it's time to figure out HOW to do it. That's when you say "okay, if I want everything to be okay when a disk crashes, how would I impelement that??? OH, RAID would do this for me!". You do that for each fail-over piece.
Good luck!
-- Get your free Mini Mac http://www.FreeMiniMacs.com/?r=14209873
I recently attended a seminar on Alteon's switching gear.
These things have all sorts of cool features and CPU-power up the yin-yang. They also do heartbeat-style stuff and they look like they mop the floor with practically all other switches/routers for performance and scalability. They can do cool things like examining session ids in packets, and 'binding' a client to a specific server for preserving session state. All in hardware, and all in realtime.
The seminar was one of the more informative ones i have attended. Check them out.
I don't work for them or anything, and don't believe everything i say without further research, but you can find them at www.alteon.com
I gots ta ding a ding dang my dang a long ling long
Read up on your economic theory this is obvious price discrimination in it's purist form.
Joe User only needs a single port hub, and is only willing to spend x dollars. Joe Server can really use that dual port, and he's got a bigger budget, and support costs are always >> initial purchase. So he's willing to spend 3 to 4 hundred times what Joe User will spend.
The NIC manufacturers have found a segmented market and are taking advantage of it. Pure economics.
The previous responses are appropriate, but I'll add this as well...
Because dual-port NICs generally have a pretty specialized function in an enterprise, there's going to be higher expectations in terms of reliability and uptime. It'll also be expected to have support for enterprise-grade applications which may mean special drivers, special qualification, etc. And finally, there's an expectation that it'll have awesome support.
All that stuff might cost a lot, and definitely is something that the manufacturers don't have any qualms about charging heavily for.
And if they really do it, then I don't think anybody is mad.
There's nothing that stops a company from making a dual port card that is just two $10 enet cards on the same board, but they probably won't be able to deliver on the special drivers, support, etc.
David Fung
I don't believe that the BigIp supports a failover nic. I know that failover is done over a DB-9 cable that connects two BigIp controllers.
They run on BSDi, with proprietary hardware drivers, but (major but) they are soon to be converting all software to RedHat.
Dude, if you need server redundancy, get yourself a foundry serveriron. They're cheap. www.foundrynet.com. They will pretty much handle load balancing and failover between your servers.
There's a company by the name of F5. They have a box called BigIP which will do load balancing and failover. It will even failover an active telnet (ssh) session!
Pretty sweet, and you don't have to have two cards chained together as one and whatnot. Each box has its own, and you have one live and one hot-backup, or you can load-balance between the two, whatever.
The bigip boxes are based on BSD so that's deffinitely a plus.
-Diggem
I have worked with NT "clusters" that do just this. From what I have seen they create more downtime with there complexity than they prevent.
How about this:
2 single port NICs in each sys with a crossover.
Write some (simple) program to do a heartbeat over this connection.
If the "standby" system loses the heartbeat it pings the "live" system with it's other NIC which is configured with a different IP than the "live" system. If the ping fails, a little script runs that takes that if down and brings it back up with the "live" IP.
Have the "standby" system periodically mirror the data on teh "live" system (maybe over the private NIC to keep traffic on the main connection down.
I know that this is not perfect, but it illustrates that there can be a simple solution. It could work quite well in an env where there is a fair amount of tolerance. (IE where it is okay to say "transaction failed, please retry.")
-Peter
Slashdot cries out for open standards, then breaks them.
To address your P.S.: This is so completely true. It feels like the real power of the internet is lost on many of the ISPs. The wonderful rerouting of packets that is supposed to take place only happens on the macro level, leaving smaller outtages to have major impacts on a few sites at a time.
two (2) ServerIron switches.
At least, I'm fairly certain Foundry Networks makes products that support fail-over in a load balancer. Certainly Arrowpoint and F5 products support this.
I.E. how does one mirror a web site, including session data, between two facilities?
>Yes Solarus can do a lot of great things Linux
> can not. In the end Linux has one
> great advantage and thats price.
> Source code and quick security patch relases is
> a bonus.
Solaris is free, for any system smaller than 8 CPU's, aside from shipping. And get a clue, Sun are damn quick about getting security patches out.
Dell has actually written drivers that will handle this while running the Red Hat Linux distributiuon of the OS. If you go to their driver support page you might be able to find something that will be modifiable for current solution.
Unless your company actually has the man power, knowledge, or resources to develop this solution on your own I would say your IS Manager made a huge mistake. He should have evaluated the situation and if he did not have the resources needed then he should have bought a preconfigured Linux solution such as provided by Dell, Penguin Computing, or anyone else who has the proper resorces to complete a task like this magnitude.
"Help me Obi-/.-Kenobi,your my only hope!" -$
Come now, let's not be too hypocritical:
.
OS == Operating system.
NIC == Network interface card.
TCP == Transmission control protocol.
ARP == Address resolution protocol.
. .
"Why can't we just say those?"
kugano
I think we are unsure as to what the need really is in this situation. I am pretty sure it's not failover or high availability boxes. That solution is already present thanks to all the eeekspenseev Cisco hardware. In order to cooperate with said box, you need to bind the same IP addy to two Network Interfaces, each interface going to a different switch. You need to do this again for a total of 4 interfaces, using 2 IP addys. In addition, you need to move to the said interface for a said IP if the primary interface fails.
My intelligence insults itself.
Linix, FreeBSD, Solaris, NT - all are supported by the open development kit for the ACEnic 10/100/1000 adapters, which has a fairly liberal license. Linux drivers developed by CERN are available there (source included, of course).
They work in layer 2 failover by configuring two ACEnics in the same machine, with one IP address. It's supposed to work with the Cisco gear, as well as Extreme and some other vendors (and Alteon's gear, naturally! :).
But if you have an Alteon load balancing switch, it's overkill, really. The ACEnic do have other things to recommend them, such as dedicated ingress and egress processors (MIPS R4400) and the ability to offload interrupt handling from the host. These are high performance NICs, and Alteon is eager to allow developers to support them on whatever platform is desired.
The dual ACEnic setup intended to guard against an actual failure of the NIC itself.
Edith Keeler Must Die
Haven't seen anyone mention RSF, yet, so I'll point you over too www.high-availability.com, their homepage. It can be configured to work with 1 interface/machine (relying on a serial and disk 'heartbeat' to insure things are up).
All I know about it is what the person who
designed it wa sbragging about. I really am not
fammiliar enough with the device to say more.
As I remember (this was a while ago) he was
proud to have done it "Completely with PAL
logic" and "not using a microcontroller".
"I opened my eyes, and everything went dark again"
B2B - Business to Business.
I can type it faster, and it makes people more likely to give me money.
So what if it's a silly game ? It's their cash, and they're queueing up to throw it at me.
As for fallover...check out 3com....long ago a man (who would later go on to teach Unix courses at WPI and be one of the best teachers I ever had for anything) designed a piece of hardware with 1 ethernet port on one side, and 2 on the other...it was designed to do JUST THAT.
I just called up 3com and said, "Please send me two of those pieces of hardware with 3 ethernet ports on different sides that that guy who teachs unix designed a few years ago". Thanks for the lead. I can't wait to get em.
For better redundancy, you really want seprate redundant servers, each with RAID arrays [...snip...] Afterall, if a CPU fries, or a power supply starts letting its magic smoke out...all the duel port NICs in the world wont help.
I think you want two servers with the same RAID array. That way when one server goes down, the other server immediately knows this, (becuase that second ethernet card was for heartbeats between the two) and the second one can take over not only the downed machines IP address, but can also take over the workload of the downed machine, because the data is on a shared (multimaster) RAID array.
Of course, why thats even needed is beyond me.
apparantly...
he idea here wasn't, "In case my machine blows up, I want to have a redundant one there just in case." it was something more like if you had, for instance, a machine that serves pages, and a machine that serves images, and the image server went down, the pages server would serve both images and pages until the images server came back up. The original article was not clear at all about whether they desired failover in case one of their upstream providers went down, or failover in case a machine went down. I was responding to a post which claimed that having two NIC's couldn't help if a CPU went down.
what do you mean exactly by "something else" than http? What about non-tunneled IIOP traffic? Would there be any implications? Any easy,linux only and cheap load-balancing solution?
What's B2B?
Failover is nice for services (file/web/etc.), but I am continually bombarded with database replication requests. The only folks I know of that do this well are Sun and Oracle. Anyone know of a similar platform for MySQL?
Information wants to be Free. Useful Information will cost you.
The need, as I've understood it, was to have two ports on separate switches with *the same* IP-address.
Is this possible to achive in Linux with two NIC's?
/.Mattsson - My native language is not English, so please don't whine over linguistic errors. (That's lame anyway...)
For network devices, Alternate Pathing is crap. You're supposed to stick all the "alternate paths", that is, all the different network connections into the same switch. All right, it helps if your NIC goes down (as long as you have two NICs and not one Quad, don't laugh, I've seen SUN offer these...), but what if the switch blows up? Your SPOF (single point of failure) is just moved over to the switch. On the disk & mem & CPU side, AP is good though.
To be precise, information about SGI's FailSafe software for Linux is at: http://oss.sgi.com/projects/failsafe/ This software has seen much comercial use, including supporting the highest volume of traffic for the Mars lander web site, (Though that was under IRIX). There was even a Slashdot article about SGI's Failsafe: http://slashdot.org/articles/00 /02/26/1224233.shtml
Still, he does an admirable job of separating newbies (and people with zero pattern-recognition ability) from "experienced" /.ers. Now if only anyone needed that...
The illegal we do immediately. The unconstitutional takes a little longer.
--Henry Kissinger
Cuz' investors are stupid and lazy, "venture capitalists" doubly so. Buzzwords offer a simple way for them to decide where to invest their money without having to think or (gasp!) do research.
The illegal we do immediately. The unconstitutional takes a little longer.
--Henry Kissinger
Imagine how much more authoritive you could have sounded if you didn't make so many spelling mistakes, call Windows WindHoze and call an MSCE something with Minesweeper or Solitaire in it.
If you do have a valid point, say it, and don't bash unneccessarily.
Ivo
<grub> Reading
I have a OSICOM technologies 2340-tx board that has 4 10/100 ethernet ports. The tulip driver works fine with it. Street price is $480 at pricewatch. Works great.
-dB
"It if was easy to do, we'd find someone cheaper than you to do it."
Interesting? This gets moderated as "Interesting?"
It's flamebait, and if the moderators weren't so blind from anti-corporate propoganda, they would rate it as such.
Read slashdot for the articles, read the discussion for laughs.
It's good to see that you're just as blind as all the other sheep around here.
I am a technical consultant, who deals daily with customers who think they know better -- customers who don't understand that by specializing in a particular field, I bring a depth of knowledge to each project; customers who don't understand that by working on many projects, I bring a breadth of experience to each new endeavor.
In short, customers who think they can be jacks-of-all-trades, masters-of-all. In a lot of fields, it pays to specialize. And when you're specialized, you can sell your skills to many different people.
As for your "anti-moron" bit, hah! You're clueless
If consultants are so worthless, why do so many companies use them? Because consultants provide skills to companies that their own employees don't have .
Sounds to me like "those who can, consult; those who can't, get a cushy job with no outside pressure."
All this notwithstanding, the original post still wasn't "Interesting"
Know the difference between flamebait and a dissenting opinion
Then some research group came out and said "B2B will be a 4 trillion dollar industry by 2003" (no, literally, the number they used was $4 trillion)
And the true B2B companies saw their stock prices shoot up.
And everyone else wanted their stock price to shoot up, so they announced that they were also "B2B". And now the term means nothing.
Read thestandard.com for news. Read slashdot.org for propoganda
To get a little bit more precise, in order to avoid a linux driver that can support redundant links, we have now reached additional hardware needs in the oder of at least, say, 2 cisco 7xxx with an appropriate number of interfaces (we _are_ talking about speeds of around 100Mbit and more, are we ?). This can easily reach six figure sums. It isn't. If you do that these machines get routes installed via icmp redirect messages. This in turn means that here your convergence mechanism doesn't work any more - any session they have open will break in case of an OSPF announced route flap/failover.
Network design is tricky. Anything but 'just easy'.
Regards,
f.
Somebody get this guy his flamebait mark he so desperately wants....
f.
f.
Would you let your core router allow to listen to OSPF updates from an unsecured machine ? If yes, I've got a bridge to sell...
f.
Yes, there is VRRP. But there is no publicly available implementation I know of, not to mention two independent interoperable ones (as the IETF ususally requires for standard track RFCs). There's also HSRP. Everybody and his dog uses it in their Cisco routers. Why not take that ? After all, as Cisco never stops to tell us, all their stuff is standardized and open ?
Ah, whatever. Sorry for the rant - this whole HA scene seems to be more annoying than the rest of the bunch. They all operate in "a customer who needs that must have money to burn - let's charge him hefty" mode.
f.
P.S. You don't happen to work for Nokia's IPRG division ?
You can't attach to a fully redundant switched architecture by the help of routing protocols. (and, FWIW, I doubt simple channel bondig would help, either). On the IP level this looks like a normal LAN to which you are connected via two Interfaces.
So, unless you do some serious magic on all systems involved, you will have the problem of route timeouts, wrong arp entries and whatever. Any solution involving routing protocols is going to cause serious trouble.
f.
f.
It really bothers me that many consultants consider Linux a bad OS just because they can not make money selling it or supporting it because they do not understand its power. If your looking for dual port NICs (or even quad port for that matter) and it will work great in Linux. I have what I would call a VLan server with 2 4 port Intel NIC cards. They are wonderful cards and I have had no probelms with them whatsoever (they do run the up to date module). Good luck.
I once read that you could change the MAC address of a card in software - am I dreaming this, or is it only with specific hardware?
As long as you are here and reading these posts, and have somewhat confirmed that you are employed by Microsoft, I'd like to know some more.
How many machines are dedicated to tracking Microsoft operating systems that are online?
When are you going to start prosecuting illegal installations of Office and Windows 2000? I have yet to see anyone actually legally own a license to run this software. (compared with hundreds of pirated copies I've seen)
Maybe you can help stamp out piracy of Microsoft software by offering Linux as a free download from your windowsupdate.microsoft.com site. (which doesn't work well with lynx I might add)
I would be happy to submit a banner to you with such a message emblazed on it.
Lars -
How about these solutions:
Buy N IIS/NT licenses at whatever they cost ($0) and pay your MCSE team to battle with it on a daily basis.
Install a freenix (at $0 per license) and pay a savvy admin who knows what s/he is doing to make it work.
Looks like the cost factor is on the side of the freenix. Customers like saving money, and not paying Microsoft Taxes(tm)
Lars -
..they cost ($0) and pay..
should read
..they cost (greater than $0) and pay..
after eating my greater than symbol, it looks like slashdot plain text really isnt just plain text after all
Lars -
Ifconfig can enforce whatever hardware address you want to any interface, so the issues 2 and 3 are trivial once the first is solved.
does this with web services, using itself and any other server running a web server, including NT and Solaris. http://www.turbolinux.com
actually, i think the original problem stems from attempting to use a linux cluster as the firewall and cisco's hsrp protocol for backbone connect failover. the issue is that if the cluster node which is plugged into the primary backbone line fails, the second node can't inform the cisco router to switch to its backup. i think you could put a hub between the firewall cluster and the hsrp lines to get around this, but then you have a single point of failure. but my experience as a seamless integrator of leveraged synergies says that pre-ipo b2b's should consider this option going forward.
C is for Cookie.
You are suggesting, a solution, that potentially 100 times as much as the soltion offered. The solution was for line failover, which, albiet, isn't terribly useful. You're suggesting a solution for the failure of the machine. A couple dual port nics cost a lot less than a quad PIII xeon, or whatever kinda server they are running. I am assuming that it's huge. I am also under the assumtion, that they probably don't want to replace whatever it is with a 486. Crucify me, but business types aren't likely to say, hrmm, the backup doesn't have to actually be a superpowerful machine. Also, I would assume that, probably, they are making an attempt at bandwidth, probably trying to avert going to a fiber optic line. Maybe they are distributing the bandwidth in some other way, maybe they are using several lines. Maybe they have no clue what they are doing. No offense, but you answered a question under the assumption that that was what was being asked. He wants this hardware, and for all I know, he wants to connect to 4 fiber optic lines coming in from separate cities. Why? I don't know! But the nature of the beast could be completely different from the angle which it has been attacked.
Eh...
Actually, I do have a question for the tech-savvy here on /. Since BSD is a new version of the Solaris code, what are the new innovations which it adds to Solaris. My god thats funny.
Check the headline.
I'm looking at Tux, and having a real hard time seeing him biting himself in the foot.
The evaluation of an action as 'practical' . . . depends on what it is that one wishes to practice.
At first, I was using the built in Adaptec 78xx chipset, but I found there were problems with FF2 and soft read errors. I bit the bullet and got the RAIDs at that point, and the systems have been humming along with no problem, other than the tapes don't always get switched on time. (Oops!).
If all you are doing is serving web content and e-shoping, then you may want to look at FF2 on SCO. (Alright! I *know* it's sys V, down muggy!).
The arp problem can be handeled by turning arp_keep and arp_prune down in the nic configuration, and the switch/router's can get handeled by aging the arp cache every 15 seconds. That would give you a window of (worst case) sixty seconds down between switchover, which is automatic. FF2 keeps the mirrored drive slice sync'ed with the second nic. (You can also alias in the "floating" ip, alias out the "fixed" ip, then ping the router. That will force the router to update the arp cache with the proper mac addy.) The second nic is only for mirror traffic, so if you want to seperate the systems by a few miles, that's kosher as long as the pipe between systems is at least 56K. I'd say that with moderate transactions, you would need 128K minimum. If the distance were short, order 2 SDSL modems and a shortest path copper loop from telco. You will also need a low bandwidth connections for the heartbeat between system. The heartbeat can use any path (up to six) to talk to the second half, and vice versa.
If you can't take sixty seconds down time, then look at IBM's RS6000 HA50. If you've got the price, then this is for you.
In the many, many times people tell me "It HAS to say up ALL THE TIME!" I find that a few minutes of down time becomes very acceptable when confronted with the price tag of a system that really will stay up all the time.
Necessity is the plea for every infringement of human freedom. It is the argument of tyrants; it is the creed of slaves.
-m.
Give me any recent version of Windows, and two Intel Pro 10/100 NICs ($39 direct), and I can do fault tolerance and load balancing. You use the intel ProSet utility, and go to 'Adapter teaming'
First of all, you have to tell us more about your existing Cisco infrastructure. Since you have already invested a lot of $ into cisco hw, you may want to consider buying Cisco Local Director box to handle your load balancing and port monitoring. Your DNS will point to a virtual IP address on LD and LD will point to real IPs of your web farm. It can be configured to monitor port(s), 80 for example, and make server OOS (out of service) if server doesnt reply to LD's request on that port. You can also telnet to LD and make any server OOS in case you need to bring it down for repairs, application upgrades or beta code testing. Cisco also make a Distributed Director that works with Local Director but you may not need it. There's a better product from F5 that does everything and more but it costs a lot more. Both devices work with the rest of the network and do NOT require any software to be loaded on your Linux and NT boxes.
It sounds so easy, doesn't it? Recompile kernel, tweak Apache and play around with ospf. the problem is most techies never write down changes that they make and if he finds another job or gets hit by a bus, they are screwed! Shoot that consultant anyway, sounds like he deserves it.
When I was working for a stock brokerage, we a had DU DecSafe failover setup. However, it had a drawback in that if something simple like a network card failed, it would still have to failover the whole system, which would take more than a few min because of the database. We decided that we wanted to have the running system failover to an alternative network card instead. We came up with a couple of simple programs that helped us out on that. First, we had a program that would set the MAC address on the card to whatever we fed as input. Second, we had a program that would poll the card for a failure once every half second or so. It would exit on failure. Third, we used a script at start-up that set both mac addresses to some values. Then we started the checking program. If this would cause a busy-wait condition. When the card failed, such as someone disconnecting it, the program would exit, and the rest of the script would run. The rest of the script 1 downed the bad card 2 changed it's MAC address to some third address 3 changed the backup card to the working address 4 upped the backup card with the systems IP address. 5 notified the operator We only changed the MAC address because we had some PCs that couldn't handle the change. If the system had been behind a rounter this wouldn't have been an issue. If you wanted to you could do some other things to make this setup even more useful, such as using the second card for heartbeat checking or whatnot. This was not difficult to write. Most of the work took 2 or 3 hours. A few days to add some nice details to the scripting, and we were off and running. Some things that can be added if you do it right: Allow the cards to fail back the other way if the first card comes back on-line Have a third card in the chain if you are really hyper about uptime just a thought -cliff (no, not that cliff, the other one)
> Oh, I see. When one port (or its path) fails, you want to switch the IP > to a different port? I don't think "the driver" needs to do that, just change > the IP assignments with ifconfig. Your proposed solution is seriously flawed. Namely, in the time between failure and failover, you are losing packets, and the failover is not atomic. This means that there will be a small period of time where *no* interface will have the IP number in question, you may cause sockets to drop during f/o, etc. Solaris and IBM both have HA cluster solutions which address these issues -- your ifconfig-based one does not. You need something integrated into the network drivers at the DLPI level, not some shell-script hack, to have an HA solution. Otherwise, you only have a "Pretty Available" solution. If that.
--
Do daemons dream of electric sleep()?
> we also implement a full quorum based scheme monitoring over the shared SCSI bus
Finally, a Linux vendor with a serious-sounding cluster product. I commend you.
How is your network-level failover? Atomic? Arbitrary hooks for vendor-supplied HA services? Support for a journaling filesystem? Howabout good integration with a 0+1 RAID solution?
I've been waiting to convert to linux for a while, and can't do it without all of those features.
--
Do daemons dream of electric sleep()?
There are several packages available for Linux that provide HA services. Understudy, mentioned in the article, works well, and on top of that you can get into Pirahna from RedHat, which came from the Linux-HA project. Then there's TurboCluster from the TurboLinux guys. And those are just the first few that come to mind. You could even go so far as to "build your own" HA solution with scripts and some kind of remote service monitor like mon.
Note that all abovementioned projects are software-based HA rather than the hardware HA like your consultant has had set up, and every hardware solution I've looked at is extremely expensive and generally overkill for a lot of small to mid-size networks.
If you're just trying to provide 24/7 access to your web servers, you may want to look into some of the offerings that are out there, and see if they might fit your needs for future projects.
"The possibility of mental and physical collapse is very real now."
Good to know you are Jargon File compliant. I understand that's very important to some people, and if I ever find out who these people are, I will make sure to call them fucken dumbasses, you fucken dumbass.
-- the most controversial site on the Web
You can buy multiple Internet connections through different ISPs, but you will still need to have different network addresses for your site in the DNS, and when one of your ISPs goes down, everyone on the Internet who has cached the IP provided by that ISP gets cut off.
I'll grant you the second redundant ServerIron, but the main point is that the bozo Dogbert consultant from the original poster is trying to get network layer redundancy from the operating system instead of the routing infrastructure. And then he slams Linux because he's not getting the job done.
-- King of the Luddites
P.S. Yes, I realize that it is possible to have alternate routes to the same IP address in the routing tables, but try getting two different ISPs to agree to do it for you.
Your consultant obviously owns CISCO and SUN stock. For Linux webserver failover on N boxes
all you need is:
N Linux webservers
1 ServerIron switch from Foundry Networks (http://www.foundrynetworks.com)
You put the N linux servers behind the switch, each with their own IP, and give ONE ip to the switch. Tell the world to use the one ip on the switch, and the switch can be set up to automatically fail-over OR load balance your web servers.
-- King of the Luddites
Why put an operating system in where a smart router switch will do? ServerIron from Foundry Networks kicks ass. See my dogbert comment.
It's all part of VC culture - they don't actually know why some companies they invest in fail and why some succeed so they are absolute suckers for a load of buzzwords wrapped up as a good business model. It is amazing how fasion conscious they are.
TWW
"Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
The Linux HA stuff is pretty much as functional as the standard MS Wolfpack. I.e. It isn't really clustering at all.
You can failover IP addresses, you can fail over disks on a shared bus, you can replicate blocks across a LAN, you can automatically stop and start 'resources' with a little careful scripting but it isn't a *real* cluster.
For that you would have to be able to run applications on multiple machines which had simultaneous read and write access to shared resources like databases or file systems. For this to work you have to be able to lock resources on all the machines in the cluster simultaneously. A lock request generated on one system would have to be communicated to all the others before being granted or denied.
Linux just can't do this yet.
Gimme Gimme Gimme - Karma!
I thought the consultant thought of redundant access switches (to use Cisco terminology). With multiple NICs in each server (each with the same IP) going to multiple access switches, each access switch going to multiple distribution switches, etc... the probability of network outage due to switch failure is reduced.
B2B... Business to business. Not a buzzword, really, just basic business economics. Round robbin DNS is a kludge at best without a bit of hacking. Real load balancing solutions provide much more peace of mind. But round robbin is cheaper, I'll give you that. Oh, and you seem to be missing the point of multiple NICs. It's not really for NIC failure, although that is a added benefit. It's for upstream switch failure. Each server has multiple NICs going to multiple access switches (Cisco terminology), going to multiple distribution switches, etc... Remember, servers aren't the only thing to think about when building a redundant network.
I'm pretty sure that can be cleared up with a couple scripts. If you can't find the answer here, try debian.org, or "insert linux flavor here.org" Usenet also has their considerable collection of Linux gurus... Linux, since it was a toddler, has been designed to run on both mainstream, and the equivalent of a motherboard with a hole shot in it. With the multiple ports and flavors of Linux, the answer is probably out there, and quite probably free. I would beware of network consultants complaining about linux, and/or freeBSD hashes. Either they haven't got the understanding to utilize them at their full potential, they're trying to sell you their preferred brand, or they're trying to nail you for more dough via consultation fees. Linux is extremely progressive, so even if you don't find your solution, it will probably be around within a month or so. krystal_blade
It will be easy to motivate our fellow man; there is hardly anything people treasure more than not being annihilated.
We here at Mission Critical Linux are comming out with a high availability product in June. Without trying to be too commercial sounding, here's the rub: If all you want is load balancing at the HTTP level then LVS is great (well packaged in the forms of RedHat Piranha, UltraMonkey from VA and TurboLinux). Also available as misc parts from the previously mentioned linux-ha related sites. The problem that LVS doesn't address is how to handle the dynamic content tier of an e-commerce site. For example, this may consist of a back end Oracle database running on a redundant pair of storage servers (Linux systems w/ attached shared SCSI storage). In this configuration the collection of HTTP servers would access the Oracle database via network based libraries such as JDBC or functional equivalents. In order to provide high availability Oracle database services you have a pair of nodes which access an individual instance of the database by one system in the pair. If you have multiple databases, the pair of storage servers are in an active-active configuration. Numerous product offerings out there on Linux claim to do failover HA of an Oracle database, but they do not offer high data integrity guarantees to correctly handle network partitioning and node hang scenarios. Consequently if you read into their white papers they say that you are vounerable to data corruption. Ouch! Thats where our product (comming out in June) comes in. In addition to the usual host monitoring over redundant network and serial connections, we also implement a full quorum based scheme monitoring over the shared SCSI bus. Additionally we use remote power management to insure that all single point of failure scenarios can have service failover with complete data integrity guarantees. We're the same pople who implemented Digital's TruCluster products.
Take a look at Enfusion - a product supported by TurboLinux. http://www.turbolinux.org
Commercial support is available.
IBM is commited (so I hear) to support the same failover strategy on linux as it does on AIX.
There's a number of failover strategies listed on http://www.freshmeat.net including failoverd in the past months
Conclusion: Get another contractor.
Many cards can have their MAC address set. Linux ethernet drivers support that where available.
http://www.us.buy.com/comp/product.asp?sku=1016087 0
This is D-Link's 4-port 10/100 NIC.. It has Linux drivers, and it's only $165.
So, in the telephony industry, this is a big requirement... The problem is that you don't want the outage of a single network link to take your machine out of service, or to interrupt any existing transactions.
So, for example, if you have a TCP stream going to a specific NIC and the link between the NIC and the switch gets cut, or the NIC fails or something, then you need to be able to continue the same TCP stream on a second interface.
You end up with several issues: On a lot of NICs, it's not that easy to figure out when it's having problems. Secondly, the second NIC is typically at a different hardware address, so you need to update the ARP cache of any machine sending to you. And, you have to figure out how to tell when the first NIC is working again.
The SBus QFE part may have been space constrained (SBus cards are small), which will bump the price a little. Multi-port PCI NICs normally need a PCI bridge part (actually it's been a while since I bought one, maybe they do it all in one multifunction PCI chip now), which pushes the cost up a little too.
But the big reason is economy of scale. It costs a lot of money to design a product, document it, write drivers, set up distribution channels, and so on. Cost that is mostly fixed regardless of how few of the product you sell.
Contimplate the following example:
Assume for the sake of argument that it costs $1,000,000 to design a PCI board. Now assume I make a 4 port ethernet (with a parts cost of $40), and you make a one port ehternet (with a parts cost of $10). Also assume there are (only) 1,000,000 people on the earth (and all want to be in on the big LAN party). Some of them are uber-graks and will buy the 4-porter so they can have a 4-porter. Some want a "reliable gaming experance" and will buy the 4-porter because they have 3 more ports if hte first fails. Some want to run the LAN server and need more bandwidth. In all 100,000 people are intrested in my product. 900,000 in yours. To exactly cover our costs you need to charge $10 for the parts and a bit over $1 for the "overhead" -- a $11 price, I have to charge $40 and a bit over $10 in overhead -- a $50 price.
Alot of the people who wanted a "reliable game experiance" are now swayed by your argument that they can buy two cards and get "enough" reliability. Or even 4 of yours ($44), and an extra $6 to buy another ethernet cable in case their breaks! A few more are swayed by the argument that $50 is alot to pay for a network card, look over there a $10 card. Maybe they should keep the rest of the money, or buy a new game, or save up for a monkey. Soon only 10,000 people want my card. Your overhead drops a little (it is still about $1), but mine rockets to $100!
With a $140 price tag even the uber-geeks start rethinking, and decided maybe they would rater show their geekeness with a $130 EFF contribution, and a nifty EFF bumper sticker on the side of their case.
That's when things really start to suck, only the 5 guys holding the LAN party that need my card are now intrested in it. The price rockets to $200,040. At that price the 5 guys will spend a long time trying to figure out a way to do the whole gig without my card. In the end maybe they just charge everyone on the planet $5 to get into the LAN party and end up with "free" cards.
There are lots of little things wrong with this example (the guys running the part could probbably use 4 of your cards at once), there are more then 1,000,000 people, the overhead costs can vary from product to product, some people will buy even seriously overpriced goods. But I think it does go a long ways towards showing why a Sun QFE costs $1,500 and a Intel Ether Express 100+ is $25.
For a "top flight consultant" you have a few mis-conceptions. BSD isn't a new version of the Solaris code. Solaris is NOW a SysV derivative, i.e. ATT code. BSD is BSD. For the older versions of SunOS that you are thinking about, they separated from the main BSD tree ALONG time ago.
As for offering your customers a product with a company who stands behind it's guarantee - you're giving them MS? Why? That is pure FUD. Did you hear about the court case that handed down a couple weeks ago where the software supplier was held immune due to the "we don't guarantee this software for any use" clause in the shrink-wrap agreement. Pretty much leaves the concept of a "Big company" being needed out in the cold.
Have you compiled your kernel today??
Well from what I've gathered from the current discussion is somewhat of a lack of direction. So here are a few things to consider and answer before going forward:
/. then clustering proper is not done. If you can see, they just load balance 3 web servers, and then dedicate a box for ads, a box for the database, and a dedicated image box. And we know how little /. is down...
1) Do you need High Availability of 1 machine? (ie 99+% of a single machine) If the answer is yes, then clustering is the way to go. But doing that right is very expensive (hardware, software)
2) Does it make sense to have a farm of identicaly configured machines? If you're using Linux / FreeBSD as your webservers and if you only run web servers on them, then you can get away from clustering proper and just throw a ton of machines at the problem. ie farm of web servers.
3) Sounds like the Consultant has the right idea with the "expensive Cisco hardware" in making sure Layer 2 is fully redundant. Good step forward. Now ya just need to make sure your hardware that is connected to it will utilize it. Do you?
4) If your running Solaris, then Alternate Pathing becomes your friend (especially with Quad Fast Ethernet cards), as well as Dynamic Reconfiguration. Are you, or is this a moot thread?
5) Overall, what are you trying to accomplish? Uptime of hardware, uptime of the application, or raw uptime of the web servers? If you got a set up like
Basically, that's pretty much it. Personally I wouldn't bother with clustering or complicating the web servers that much, I'd cluster the back end supporting stuff for the web farm. ie the back end database, fully redundant hardware, alternate paths and so on. And then let Cisco's Local Director take care of load balancing and checking the web server is up or not. (From what my network guy at work tells me, it can do that. I won't personally believe it until I see it).
"If you insist on using Windoze you're on your own."
The word used to be "supplier".
Oh well at least I'm not seeing "architect" used as a verb anymore. I was just itching to shoot someone then.
I've finally had it: until slashdot gets article moderation, I am not coming back.
I worked on a distributed systems project that needed high reliability LAN connections. The solution we used was a custom NIC that had two Ethernet interfaces and a 68000 with 512K of RAM on a single PCB. Each system broadcast heartbeat messages on all attached LANs. If a primary LAN failed or became partitioned, all systems automatically switched to the backup LAN. This was transparent to the processes sending and receiving data on the network since the NIC routed packets to the Ethernet interface designated as active by the system's LAN monitoring and failure detection software.
Mea navis aericumbens anguillis abundat
I could give you a detailed rant style answer but I think it is not worth it.
Most root DNS servers, primary mail relays, etc use exactly what I said. And there is no such thing as what you said. Been there done, that.
Please get a clue.
Solutions using routing protocols cause serious trouble if and only if designed and ipmplemented by Minesweeper Consultants and Solitaire Experts.
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
I have to remind you - you do not use physicals. Apache listens on loopback only. So the client retransmits, it goes via the other interface and you have no problem. Session is alive.
talk to (resp. across) a small set of routers (or routing protocol using hosts).
Correct. You talk to two routers or just differnt ifaces on one that connect you to the backbone (via different layer 2 devices - switches or hubs). And from there on with the entire internet.
In a similar internal corporate scenario you talk to the routers or the RSM on the switch that separate the servers from the lusers.
I can give you a number of examples where it won't work at all.
Yeah, sure. I have seen gazillion of b0rken network designs written by experts. Most of them with a minesweeper and/or solar sertificate. I am not beeing biased but core networking is not a subject in neither of these sertifications. Officially core network support in Slowarez is considered with a "to be or not to be" status in Sol 8. Check the zebra archive for details. With minesweepers it is not even considered.
You don't happen to post in certain de newgroups ... ? This somehow sounds ... familiar
No. Never used news. But I am not the only BOFH around.
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
The problem is that a bunch of carma w** who are out of their scope have immediately flooded the article with comments about piraniah, clusters and other irrelevant things. The question is about failover in case of link failure. The consulatnat thought of winhoze and chose layer 2. You have a unix system. Unix knows about routing and IP. Hence what you need is a layer 3 solution. For example:
http://slashdot.org/com ments.pl?sid=00/05/21/1853216&cid=90
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
Juniper, 3Com, and Alcatel was at least working on it for a time in 1999. Yeah, that sounds like "just Nokia". :)
HSRP is a hacked version of VRRP v1. Where do you think they got the ideas from???
And no, I don't work for the IPRG group. I've got some friends who used to, and one that still does, but no, I don't work for them.
--
The unsig!
When the consultant installs a network that is clearly not designed for the needs of the company (i.e. supposedly requires special hardware and drivers that the consultant doesn't know how to integrate with your core product) you are being taken for a ride by people with little knowledge and less moral backbone. /etc/lilo.conf files look sort of like this:
If you need multiple ethernet interfaces on a machine they should be separate cards for robust redundant failover. I run 12 linux boxes with 4 ethernet cards in each; my
boot=/dev/sda
map=/boot/map
install=/boot/boot.b
prompt
timeout=50
image=/boot/vmlinuz-2.2.5-15smp
label=linux-smp
append="ether=0,0,eth1 ether=0,0,eth2 ether=0,0,eth3"
root=/dev/sda8
initrd=/boot/initrd-2.2.5-15.img
read-only
The append line activates my additional ethernet cards, all of which are 3com 100bTs using Donald Becker's excellent open-source drivers.
Combining this with round-robin DNS using the latest ISC BIND code, you can get incredible fault tolerance at a very low cost. You can even do IDE RAID (hard or soft) if you are too cheap for SCSI, and you can use rsync to keep your servers clones.
Unless your application is extremely unusual and non-wwwebby, you can accomplish what you need without any expensive Cisco stuff or fancy double-headed cards at all. The consultant is taking you to the cleaners due to greed or a total lack of competence.
--Charlie
For a big company name I'd have recomended SCO not Microsoft.
HP is also good but my personal bise prevents me from recomending them for software solutions.
I don't actually exist.
Without RedHat's Pirhana package. ;-)
Well.. if you say "Here's what the consultant told us was the solution to our problem"..
where's the solution? Was he just speaking theoretically?
A dual port nic sounds strange, especially with this behavior. From a networking point of view, this makes sense.
Sure, a dual port nic will help you, *if* it's set up to get arond transciever failure by bringing up the other port.
Two nic's would be better, where the box itself could attempt to configure and use the other nic if it loses network connectivity.
An even better (and more obvious?) solution is to have two computers..... complete redundancy.
One of the sweet things that most people don't know is that you can hot swap memory and CPU's with the Dynamic Reconfiguration feature. This is only on Sun hardware though.
However, with solaris 8 (and maybe 7) on x86 hardware, you can hotswap PCI cards. This feature under Linux would be huge win for the OS.
Need Free Juniper/NetScreen Support? JuniperForum
I have an alpha implementation of VRRP for Linux that I'll be GPLing within the next week or so.
We're using it and it seems to work very well.
Currently for 2.2.x only.
Watch for announcements.
This is not something that's *in* any OS, unless Sun's added it into Solaris in S8. (Could be, I don't get to play with Suns anymore... sniff...)
Although I'm sure the options have changed some since I was fully up on this stuff about three years ago, there were only a handful of failover options at that point, and only one of them worked really well.
That one, interestingly was in reality a bag of (very good) scripts, which implemented a heartbeat function and when it detected something wrong, would down the interfaces, re-plumb them if necessary, reset addrs, and up them again. Although it's worth the money they charge, if you're into a serious DIY mode, there's no reason you couldn't write such scripts yourself, and there are almost certainly some already out there, probably as part of the Linux HA project.
Oh, and as an aside, I would stick with the script-based solutions whether you build or buy: they're more reliable, and they leverage the OS better than the proprietary methods. (Qualix's main competitor back when I worked for Sun consulting for customers on such things was OpenVision HA, which was a huge, slick, impressive monolith of GUI binaries that had a well-earned reputation for leaving a trail of dead bodies behind it. FirstWatch, on the other hand, was simple and unimpressive in a demo, but it just worked, and worked well, in the real world.
Qualix was bought by Veritas a few years ago - check with them if you want a decent supported package. (And let's face it - HA is certainly one area where it may not pay to roll your own, since a failure in the HA system in production would be a serious career-limiting move...)
"The future's good and the present is nothing to sneeze at." - Roblimo's last
Let's see here...either (1) you've never worked in a corporate environment where you've had to deal with consultants or (2) you're a consultant yourself and "resemble that remark." From the (admittedly limited) experience I have with them, the original poster's remarks were on-target, though. Those who can, do; those who can't, consult.
It's not an "anti-corporate" bias; it's an "anti-moron" bias. :-)
20 January 2017: the End of an Error.
Our company uses these nifty machines for load balancing and fail-over. They are basicly x86 based machines running FreeBSD and some proprietary software. They also have the important things in life like 2 NICs and a nice rack mounted chasis. It is a bit pricy, but you get a very useful manual, support, someone to blame when one is on fire, etc. Most importantly, it works...
One thing... Make sure you're pluging it into 120VAC. The power supply get's very unhappy if you don't... You learn these things when someone labels a 240VAC strip as 120...... Go figure.
I'm not sure, but if true, I find that prospect somewhat revolting. It's a basic admission that companies care more about money than about quality. Usually smaller companies are okay, but the big conglomerates make me skeptical of the good of capitalism in the big picture.
make world, not war
But you said, a few posts up, that your customers want a "tried and tested platform backed by a company that truly cares about their satisfaction." But now you imply that your company doesn't truly care about their satisfaction, but only about truly about their money. Which is it?
I assume you care only enough about their satisfaction as it will bring in the dollars. Ie, you want to barely keep them satisfied enough, such that they'll buy more products. Such is capitalism at its extreme. You choose money over product quality.
make world, not war
Hahaha, a sleezy capitalist fearing his/her eventual demise. Anyway, doesn't this 'company' you speak of truly care more about their shareholdrs than about it's customers' satisfaction?
make world, not war
> I just called up 3com and said, "Please send me
> two of those pieces of hardware with
Well thats nice. Look, I have no use for these
things myself. I don't know what the product is
called, I never bought one. I was simply trying
to offer an idea and point in the right dircetion.
I never claimed to be able to do more.
I probably could find out the name of the product,
but not in the time frame where it would matter
wrt slashdot comments.
> I think you want two servers with the same RAID
> array....[snip]
Yup...a very good way to do it...I agree (of
course it doesn't handle the raid array itself
having a catastrofic failure...but given the
redundancy in a good array, that should be more
rare than a system blowing)
>> Of course, why thats even needed is beyond me.
> apparantly..
Thank you for changing the order of what I said
so that it looks like I said something different
than I did.
If you were to look at my original comment, I said
this about the case of SIMPLE ethernet line
failover NOT the redundant servers case.
-Steve
"I opened my eyes, and everything went dark again"
Sorry to be a bit off-topic, but there is a reasonably priced 4-port ethernet solution out there. Compex, Inc. makes a quad port ethernet card (P/N FL400TX/PCI) that sells for $189.95 on buy.com. Looks like it's out of stock right now though.
I purchased one of these for our server (Linux based of course) here at work and have been quite happy with it. I'm using it for subnetting our network (vs. fail-over network links.)
B2B == We sell to businesses. Why can't they just say it?
Will I retire or break 10K?
Your comment is very informative, however, what he's really talking about doing is implementing Cisco Catalyst switches that use HSRP (hot-standby routing protocol) and load-balancing in order to give you twice the throughput, without using two separate subnets. This is the preferred and desirable way to implement high availability and layer 2 redundancy. It can be done on both Solaris and NT (don't know about linux). The point most people are missing here is that it is preferable to do this in hardware, as opposed to software, because the hardware tends to be more reliable. I would trust Cisco IOS to handle my redundancy much more than even Unix (although Unix is very stable). I think most people are answering the incorrect question. Now that he has all of this hardware, how does he use it? I would be interested in hearing if there are any devices or device drivers that allow you to do this in Linux.
"When the president does it, that means it's not illegal." - Richard M. Nixon
I worked on an HP-mini, and it used a similar setup. Basically, the two *identical* minis shared a SCSI bus with redundent media. The backup mini would ping the other one over the SCSI bus, and if it didn't get a response it would take the IP of the first one. Worked damn well.
The only drawback is that the backup isn't doing anything but issuing a ping, mirroring the system RAM in machine 1, and waiting. The upside is that short of a missle strike, you had very high reliablity. Most failures didn't even cause a pause.
I don't see any problem with using the same method with more systems, though a cluster starts to look attractive after a while.
A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
I recently attended the SGI University, and besides recieving a little Tux doll, I sat through some fairly inane discussions about SGI. BUT, this may help you out... they had a session on clustering for high availability. They mainly spoke about being able to hide planned downtime by using 'High Availability, Mission Critical' Clusters where if one failed (or was shutdown for maintenance) the other would automatically pick up the slack... they also spoke of this with regards to Beowulf clusters... you may want to check out their open source area also, and see if any of this software is publicly available yet. It does sound like they are heading in the right direction with getting into Linux...
Hope this helps
regards,
Benjamin Carlson
"If voting could really change things, it would be illegal. " - Revolution Books, NY
This feature isn't needed for all fail-over schemes, but it does exist for those schemes which use it.
Sounds like a Cisco LocalDirector or one of it's competitors could do the trick
The main reason for AP is for the DR or Dynamic Reconfiguration feature. If you've got three system boards, then you can have some redundant hardware so that you can take down and remove a system board *while the OS is still running*, and keep your network connection going without missing a beat. (Same for disk.) Neat stuff.
I wrote software for Solaris (Which as others have pointed out does not do this without 3rd party software) because we found that no solution would fit our needs well. When looking at the prive tag we concluded that we could do better. (Come to think of it there is High avaiable Solaris, but it isn't cheaper or better then 3rd party stuff)
Basicly we ping something on the other side of the router every 5 seconds, and if the ping doesn't come back we switch to the other port. That is the overview, but you need to do some more isolation before you blindly switch ports.
I strongly recomend you put in some other path between you and the box you are pinging. Several times we have been bitten when the box we weere pinging went down and not the router, or alternaticly the network was so busy the ping didn't get through within our timeout period.
There is no portable way in solaris to tell if one ethernet port has signal. You can find out from some drivers, but when you change to a different ehternet card you have to do something else to find out.
including
Piranha
--
and when the dual port nic dies and takes both interfaces with it you switch to psychic networking? i'd really suggest two nics not a single dual one.
that said, linux can do routing. why not set up a loopback device and then have it route through either nic? ip was designed to deal with multiple routes, why must your consultant reinvent the wheel? (loopback addresses are published, so they'll be seen on the network)
US Citizen living abroad? Register to vote!
I hate saying "I hate to tell you this but" when I really enjoy telling someone something.
So I won't lie.. I'm happy to tell you....
This may not be the right solution.
The problem isn't the consultents. They know there stuff otherwise you wouldn't be paying them.
They come with years of experence and bieses.
Expect a consultent who isn't friendly with Linux to dig up a solution that will not work on Linux.
Yes Solarus can do a lot of great things Linux can not. In the end Linux has one great advantage and thats price. Source code and quick security patch relases is a bonus.
This rule holds for an NT shop....
And don't put a consultent past finding a feature Solarus dosn't have. Your talking with some of the greater frelance tallent in some cases and if a defect is to be found they can find it. Once found that defect becomes a case for switching to something the consultent likes.
So once you pick a platform for your shop pick a consultent who is buddy with your choice. If then he recomends something else it will be after bleeding dry all posablitys.
This may not be the only way to do it...
and... it may be the worst way to get it done...
Linux has it's limits don't get me wrong but you can allways find more than one solution. Linux may support 4 out of 8 solutions.. if your only presented 1... there may be a reason...
I don't actually exist.
One of the things that I have found is that OS level failover doesn't always work or will have odd problems. If you are looking for Enterprise level uptime then hobbling together a solution such as this is not for you. The company I work for uses a cisco localdirector to do the work for it. What's great about this sort of solution is that a localdirector will round robin, do failover, and such a dizzying array of things that it's wonderous. I would suggest you look into this solution or one similar
Your question: "Here's what the consultant told us is the solution to our problem. Where can we get the hardware?"
/. suggest?"
What you want to know: "Here's our problem. Here's the solution the consultant came up with. What improvements can
For instance, why do you need dual-port NICs? If it's just for the throughput, why not just use 2 single-ports? This also provides redundancy in the hardware department.
--
Have Exchange users? Want to run Linux? Can't afford OpenMail?
Linux MAPI Server!
http://www.openone.com/software/MailOne/
(Exchange Migration HOWTO coming soon)
I believe Juniper and 3Com also support the use of VRRP.
You may not be able to implement HSRP without paying Cisco a license fee. I'm not sure if anyone has approached Cisco from an open-source viewpoint though.
As for a public implementation - I should have a Linux VRRP implementation out this week.
Say you have a machine with two dual port NICs or even two NICs. Have a script that checks the main network interface every ten seconds. If the main interface becomes unavailable, unload it and load the second interface with the same IP and reset all of your routing information.
If you have a server with that kind of "need" then maybe you should consider having a better routing setup altogether. Consider how www.netscape.com will actually resolve to several IP addresses. The options are numerous. The main issue, is that linux works just fine. (Although freebsd has sexier networking)
> We are a growing B2B company;
Good to know that you are buzzword compliant...
I understand thats very important to some people,
and if I ever figure out who those people are, I
will probably avoid them like the plague.
As for fallover...check out 3com....long ago a
man (who would later go on to teach Unix courses
at WPI and be one of the best teachers I ever
had for anything) designed a piece of hardware
with 1 ethernet port on one side, and 2 on the
other...it was designed to do JUST THAT.
Completely in hardware. He did it for a company
that was later bought out by 3com...he claimed
(a couple of years ago, when I was in his course)
that they still sell the product that he designed.
Of course, why thats even needed is beyond me.
For better redundancy, you really want seprate
redundant servers, each with RAID arrays and
probably a couple of localdirectors (or round
robin DNS for a cheaper solution) direcing
connections between them (giving both fallover and
increased availability) but...thats just IMNSHO.
Afterall, if a CPU fries, or a power supply starts
letting its magic smoke out...all the duel port
NICs in the world wont help.
"I opened my eyes, and everything went dark again"
Making the assumption that you want a web-farm, and for maximum availability, you are using two cisco switches, and two cisco PIXes, and two cisco local directors... you can still get away with a single NIC (not that you have to) put half of your web-servers on one switch, and half on the other.
Otherwise, you can put two NICs in (one on each switch) and assign them each their own IP address... no need to fail over... although I would look at the F5 BIG/IP - as it can make sure that your servers are serving up content... the Local director isn't as good at this.
BlackNova Traders
I think piranha does just this. Byte just ran a look at Linux HA clusters: http://www.byte.com/column/BYT20000510S 0001
http://support.in tel.com/support/network/adapter/pro100/30504.htm That is the dual port adapter that comes in Dell Servers....Has anybody used this before? D
Building Linux Clusters is just what you should read.... Uncoftunately it won't be out until August.
Everything you need is at High Availability Linux.
I too am/have built a B2B exchange on the linux platform and found JServ to be *INCREDIABLE* at HA/Failover safe features.
As for the 2 network cards for each machine, that too is a *VERY GOOD* thing. It allows you to partition out your network traffic to achieve much better response time. For example our network has 2 NICs in each machine. There is "Web Server to Database" network, There is a firewall to webserver network, and we have a seperate network for office web surfing and misc stuff like that. Access to the "WebServer to Firewall" network is handled across the router.
One thing to keep in mind when dealing with DB aware web applications is that unless your code is *VERY POORLY* written the biggest bottleneck will be in network latentcy.
-GrimsaadoWhen Linux was cranking up last year the folks at TurboLinux sales called up promoting their fault tolerant cluster solution, altho it's not free you could get a timed demo - so Linux solutions exist.
Just a general observation - Linux is pretty well fleshed out with about anything you can think of in one form or another, it just isn't chasing you down with in-your-face ads and high pressure sales promos like other comercial products, so it may appear to be deficient but more often than not just a few days (for us slow pokes) search and trials will usually turn up an inexpensive quality solution in some stage of development hidden somewhere.
try { do() || do_not(); } catch (JediException err) { yoda(err); }
Has anyone considered VRRP (Virtual Router Redundancy Protocol)? It's an actual open standard, and it works. It not only works, it works amazingly well.
One of the major users of VRRP technology is Nokia. They've done extensive work on the protocol, and use it in their line of firewalls (which btw run a heavily modified FreeBSD codebase).
VRRP uses multicast packets that are similar to OSPF "Hello" packets to let the partner(s) know it is alive. If the primary machine dies, the backup instantly takes over. When the takeover happens, it not only assumes the IP address of the dead machine, but it also answers for the MAC address of the dead machine.
--
The unsig!
"...dual-port NICs...switch the ports when the active port fails...
Oh, I see. When one port (or its path) fails, you want to switch the IP to a different port? I don't think "the driver" needs to do that, just change the IP assignments with ifconfig.
My experience with consultants is that a good many of them are clueless. The reason they're consultants is they can easily BS the customer into believing they know what they're talking about long enough to bleed you dry. They may even provide you an actual solution that may even kind of work but which is patently the worst way to do what you were wanting to do. Then when you DO get someone in who knows what he's doing, that guy will have to spend twice as long beating your company into shape because he has to go back and undo everything the previous one did.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
What kind of application are you trying to fail-over? A database? A web-server?
If you wanted a web-farm, that's dead stupid easy. Fail-over database/ftp/nfs isn't too hard, but (presently) requires commercial software. Understudy Polyserve, Wizard Watchdog, or even RSF-1 are just some of the HA clustering products available.
BlackNova Traders
A non-kernel invasive version of this would be a script that configures one port with the desired IP/mask and creates the default route. It then puts the other port in promiscuous mode and monitors it for traffic using a libpcap based program or even a possibly modified tcpdump. As soon as it sees any traffic, it switches the configuration and starts monitoring the other port. This could probably be written in 2-4 hours given a network to test it in.
For a possibly simpler solution (i.e. no code to write), use a pair of additional Linux systems. Configure each of them to load balance with LinuxVirtualServer (aka LinuxDirector) or the Pirhana version of it to as many backend servers as you have, BUT to Different internal IP addresses. Good choices would be 10.* addresses, say 10.0.1.* and 10.0.2.*. Using either a dual NIC or two NIC cards in each server, create two networks with one for each of the load distribution servers. Configure Apache et al to respond to the IP addresses of each network the same.
BTW, there are 4 port 100-base-T cards out there, from Adaptec I think.
Good Luck!
Stephen D. Williams
You may want to check out http://linux-ha.org/.
The "heartbeat" application implements node-to-node monitoring over a serial line and UDP and can initiate IP address takeover based on a notion of resources provided by nodes and resource groups. It worked well for me. However, this was only a very basic two-node setup.
1. Your consultant should learn routing protocols
2. Your consultant should learn the concept of a loopback alias.
3. Your consulatnt should have an IQ of above 25
4. There is absolutely no need for link layer 2 failover where layer 3 will do. Unix is not WinHoze. It knows about routing.
So your task list is:
- Configure loopback aliases on the linux boxes.
- Configure apache to listen only the loopback alias interface.
- Build gated from rhat sources they have the patches for linux-2.2 in already. You may use zebra CVS instead but it is still a bit off in terms of stability. You may need a script that HUPs it a few times gated as gated does not always start clean and update the routing table on 2.2.x.
- Configure ospf on gated and on your cisco gear. Distribute default into OSPF as gated from the 3.5.x tree has no IRDP.
- Shoot your consultant
In btw: your bill is 500$.Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
You could go with an expensive commercial solution like BigIP from F5, but those will run you at least $30k or so. You could also use Polyserve Understudy, which does pretty much the same thing only under Linux, and it's only about $400 or so. If you have all this expensive Cisco equipment and a Cat6000, you can run Local Director on that without buying additional hardware.
However, I suggest:
http://www.linuxvirtualserver.org or
http://linux-ha.org or
http://www.eddieware.org
It all depends on your application that you're running. If it's just http, any of these will work, but if it's something else, you're stuck with linux-HA or Linux Virtual Server. Eddie will only do http as far as I know. Plus Eddie uses Erlang, which may affect performance.
Need Free Juniper/NetScreen Support? JuniperForum
You're attacking the wrong problem. What you need first is not Linux failover but consultant failover: Your consultant has failed; you need to switch to a new one instantly.