Linux Failover?
Anton asks: "This is a question about Linux failover in business situations. We are a growing B2B company; our product runs on Apache/Linux. We contracted professional services to properly set up our network. After all the hellishly expensive CISCO hardware had been set up it turns out that for our servers to be configured for failover, each one needs two dual-port NICs configured for one IP connected to two different switches, furthermore the driver needs to intelligently switch the ports when the active port fails ... We've never heard of such beasts for Linux and a net search revealed nothing. Our consultant however claims that 'Linux is biting itself in the foot' for not supporting that, and that other industrial strength OS's like solaris in fact do support this. Has anyone run into this before or have other ideas? " [nik suggests]: Take a look at Polyserve Understudy, which might be an alternative. FreeBSD and Linux versions are available (and bundled with FreeBSD 4.0).
As the other poster said, it depends on the driver. Thats the bad news
The good news is DecNet requires the driver (and hardware) be able to change the MAC address. Thus even for cheap cards most of them can just in case the vender ever has the chance to sell to the last shop out there still running decnet.
Many cards can have their MAC address set. Linux ethernet drivers support that where available.
http://www.us.buy.com/comp/product.asp?sku=1016087 0
This is D-Link's 4-port 10/100 NIC.. It has Linux drivers, and it's only $165.
So, in the telephony industry, this is a big requirement... The problem is that you don't want the outage of a single network link to take your machine out of service, or to interrupt any existing transactions.
So, for example, if you have a TCP stream going to a specific NIC and the link between the NIC and the switch gets cut, or the NIC fails or something, then you need to be able to continue the same TCP stream on a second interface.
You end up with several issues: On a lot of NICs, it's not that easy to figure out when it's having problems. Secondly, the second NIC is typically at a different hardware address, so you need to update the ARP cache of any machine sending to you. And, you have to figure out how to tell when the first NIC is working again.
The SBus QFE part may have been space constrained (SBus cards are small), which will bump the price a little. Multi-port PCI NICs normally need a PCI bridge part (actually it's been a while since I bought one, maybe they do it all in one multifunction PCI chip now), which pushes the cost up a little too.
But the big reason is economy of scale. It costs a lot of money to design a product, document it, write drivers, set up distribution channels, and so on. Cost that is mostly fixed regardless of how few of the product you sell.
Contimplate the following example:
Assume for the sake of argument that it costs $1,000,000 to design a PCI board. Now assume I make a 4 port ethernet (with a parts cost of $40), and you make a one port ehternet (with a parts cost of $10). Also assume there are (only) 1,000,000 people on the earth (and all want to be in on the big LAN party). Some of them are uber-graks and will buy the 4-porter so they can have a 4-porter. Some want a "reliable gaming experance" and will buy the 4-porter because they have 3 more ports if hte first fails. Some want to run the LAN server and need more bandwidth. In all 100,000 people are intrested in my product. 900,000 in yours. To exactly cover our costs you need to charge $10 for the parts and a bit over $1 for the "overhead" -- a $11 price, I have to charge $40 and a bit over $10 in overhead -- a $50 price.
Alot of the people who wanted a "reliable game experiance" are now swayed by your argument that they can buy two cards and get "enough" reliability. Or even 4 of yours ($44), and an extra $6 to buy another ethernet cable in case their breaks! A few more are swayed by the argument that $50 is alot to pay for a network card, look over there a $10 card. Maybe they should keep the rest of the money, or buy a new game, or save up for a monkey. Soon only 10,000 people want my card. Your overhead drops a little (it is still about $1), but mine rockets to $100!
With a $140 price tag even the uber-geeks start rethinking, and decided maybe they would rater show their geekeness with a $130 EFF contribution, and a nifty EFF bumper sticker on the side of their case.
That's when things really start to suck, only the 5 guys holding the LAN party that need my card are now intrested in it. The price rockets to $200,040. At that price the 5 guys will spend a long time trying to figure out a way to do the whole gig without my card. In the end maybe they just charge everyone on the planet $5 to get into the LAN party and end up with "free" cards.
There are lots of little things wrong with this example (the guys running the part could probbably use 4 of your cards at once), there are more then 1,000,000 people, the overhead costs can vary from product to product, some people will buy even seriously overpriced goods. But I think it does go a long ways towards showing why a Sun QFE costs $1,500 and a Intel Ether Express 100+ is $25.
For a "top flight consultant" you have a few mis-conceptions. BSD isn't a new version of the Solaris code. Solaris is NOW a SysV derivative, i.e. ATT code. BSD is BSD. For the older versions of SunOS that you are thinking about, they separated from the main BSD tree ALONG time ago.
As for offering your customers a product with a company who stands behind it's guarantee - you're giving them MS? Why? That is pure FUD. Did you hear about the court case that handed down a couple weeks ago where the software supplier was held immune due to the "we don't guarantee this software for any use" clause in the shrink-wrap agreement. Pretty much leaves the concept of a "Big company" being needed out in the cold.
Have you compiled your kernel today??
Well from what I've gathered from the current discussion is somewhat of a lack of direction. So here are a few things to consider and answer before going forward:
/. then clustering proper is not done. If you can see, they just load balance 3 web servers, and then dedicate a box for ads, a box for the database, and a dedicated image box. And we know how little /. is down...
1) Do you need High Availability of 1 machine? (ie 99+% of a single machine) If the answer is yes, then clustering is the way to go. But doing that right is very expensive (hardware, software)
2) Does it make sense to have a farm of identicaly configured machines? If you're using Linux / FreeBSD as your webservers and if you only run web servers on them, then you can get away from clustering proper and just throw a ton of machines at the problem. ie farm of web servers.
3) Sounds like the Consultant has the right idea with the "expensive Cisco hardware" in making sure Layer 2 is fully redundant. Good step forward. Now ya just need to make sure your hardware that is connected to it will utilize it. Do you?
4) If your running Solaris, then Alternate Pathing becomes your friend (especially with Quad Fast Ethernet cards), as well as Dynamic Reconfiguration. Are you, or is this a moot thread?
5) Overall, what are you trying to accomplish? Uptime of hardware, uptime of the application, or raw uptime of the web servers? If you got a set up like
Basically, that's pretty much it. Personally I wouldn't bother with clustering or complicating the web servers that much, I'd cluster the back end supporting stuff for the web farm. ie the back end database, fully redundant hardware, alternate paths and so on. And then let Cisco's Local Director take care of load balancing and checking the web server is up or not. (From what my network guy at work tells me, it can do that. I won't personally believe it until I see it).
"If you insist on using Windoze you're on your own."
The word used to be "supplier".
Oh well at least I'm not seeing "architect" used as a verb anymore. I was just itching to shoot someone then.
I've finally had it: until slashdot gets article moderation, I am not coming back.
I worked on a distributed systems project that needed high reliability LAN connections. The solution we used was a custom NIC that had two Ethernet interfaces and a 68000 with 512K of RAM on a single PCB. Each system broadcast heartbeat messages on all attached LANs. If a primary LAN failed or became partitioned, all systems automatically switched to the backup LAN. This was transparent to the processes sending and receiving data on the network since the NIC routed packets to the Ethernet interface designated as active by the system's LAN monitoring and failure detection software.
Mea navis aericumbens anguillis abundat
I could give you a detailed rant style answer but I think it is not worth it.
Most root DNS servers, primary mail relays, etc use exactly what I said. And there is no such thing as what you said. Been there done, that.
Please get a clue.
Solutions using routing protocols cause serious trouble if and only if designed and ipmplemented by Minesweeper Consultants and Solitaire Experts.
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
I have to remind you - you do not use physicals. Apache listens on loopback only. So the client retransmits, it goes via the other interface and you have no problem. Session is alive.
talk to (resp. across) a small set of routers (or routing protocol using hosts).
Correct. You talk to two routers or just differnt ifaces on one that connect you to the backbone (via different layer 2 devices - switches or hubs). And from there on with the entire internet.
In a similar internal corporate scenario you talk to the routers or the RSM on the switch that separate the servers from the lusers.
I can give you a number of examples where it won't work at all.
Yeah, sure. I have seen gazillion of b0rken network designs written by experts. Most of them with a minesweeper and/or solar sertificate. I am not beeing biased but core networking is not a subject in neither of these sertifications. Officially core network support in Slowarez is considered with a "to be or not to be" status in Sol 8. Check the zebra archive for details. With minesweepers it is not even considered.
You don't happen to post in certain de newgroups ... ? This somehow sounds ... familiar
No. Never used news. But I am not the only BOFH around.
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
Very good besides the fact that Layer2 failover has always been less reliable than layer3. If layer2 was better the internet core would not use OSPF and BGP.
So, overall: OSPF instead.
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
The problem is that a bunch of carma w** who are out of their scope have immediately flooded the article with comments about piraniah, clusters and other irrelevant things. The question is about failover in case of link failure. The consulatnat thought of winhoze and chose layer 2. You have a unix system. Unix knows about routing and IP. Hence what you need is a layer 3 solution. For example:
http://slashdot.org/com ments.pl?sid=00/05/21/1853216&cid=90
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
Juniper, 3Com, and Alcatel was at least working on it for a time in 1999. Yeah, that sounds like "just Nokia". :)
HSRP is a hacked version of VRRP v1. Where do you think they got the ideas from???
And no, I don't work for the IPRG group. I've got some friends who used to, and one that still does, but no, I don't work for them.
--
The unsig!
When the consultant installs a network that is clearly not designed for the needs of the company (i.e. supposedly requires special hardware and drivers that the consultant doesn't know how to integrate with your core product) you are being taken for a ride by people with little knowledge and less moral backbone. /etc/lilo.conf files look sort of like this:
If you need multiple ethernet interfaces on a machine they should be separate cards for robust redundant failover. I run 12 linux boxes with 4 ethernet cards in each; my
boot=/dev/sda
map=/boot/map
install=/boot/boot.b
prompt
timeout=50
image=/boot/vmlinuz-2.2.5-15smp
label=linux-smp
append="ether=0,0,eth1 ether=0,0,eth2 ether=0,0,eth3"
root=/dev/sda8
initrd=/boot/initrd-2.2.5-15.img
read-only
The append line activates my additional ethernet cards, all of which are 3com 100bTs using Donald Becker's excellent open-source drivers.
Combining this with round-robin DNS using the latest ISC BIND code, you can get incredible fault tolerance at a very low cost. You can even do IDE RAID (hard or soft) if you are too cheap for SCSI, and you can use rsync to keep your servers clones.
Unless your application is extremely unusual and non-wwwebby, you can accomplish what you need without any expensive Cisco stuff or fancy double-headed cards at all. The consultant is taking you to the cleaners due to greed or a total lack of competence.
--Charlie
For a big company name I'd have recomended SCO not Microsoft.
HP is also good but my personal bise prevents me from recomending them for software solutions.
I don't actually exist.
Without RedHat's Pirhana package. ;-)
Well.. if you say "Here's what the consultant told us was the solution to our problem"..
where's the solution? Was he just speaking theoretically?
A dual port nic sounds strange, especially with this behavior. From a networking point of view, this makes sense.
Sure, a dual port nic will help you, *if* it's set up to get arond transciever failure by bringing up the other port.
Two nic's would be better, where the box itself could attempt to configure and use the other nic if it loses network connectivity.
An even better (and more obvious?) solution is to have two computers..... complete redundancy.
One of the sweet things that most people don't know is that you can hot swap memory and CPU's with the Dynamic Reconfiguration feature. This is only on Sun hardware though.
However, with solaris 8 (and maybe 7) on x86 hardware, you can hotswap PCI cards. This feature under Linux would be huge win for the OS.
Need Free Juniper/NetScreen Support? JuniperForum
I have an alpha implementation of VRRP for Linux that I'll be GPLing within the next week or so.
We're using it and it seems to work very well.
Currently for 2.2.x only.
Watch for announcements.
This is not something that's *in* any OS, unless Sun's added it into Solaris in S8. (Could be, I don't get to play with Suns anymore... sniff...)
Although I'm sure the options have changed some since I was fully up on this stuff about three years ago, there were only a handful of failover options at that point, and only one of them worked really well.
That one, interestingly was in reality a bag of (very good) scripts, which implemented a heartbeat function and when it detected something wrong, would down the interfaces, re-plumb them if necessary, reset addrs, and up them again. Although it's worth the money they charge, if you're into a serious DIY mode, there's no reason you couldn't write such scripts yourself, and there are almost certainly some already out there, probably as part of the Linux HA project.
Oh, and as an aside, I would stick with the script-based solutions whether you build or buy: they're more reliable, and they leverage the OS better than the proprietary methods. (Qualix's main competitor back when I worked for Sun consulting for customers on such things was OpenVision HA, which was a huge, slick, impressive monolith of GUI binaries that had a well-earned reputation for leaving a trail of dead bodies behind it. FirstWatch, on the other hand, was simple and unimpressive in a demo, but it just worked, and worked well, in the real world.
Qualix was bought by Veritas a few years ago - check with them if you want a decent supported package. (And let's face it - HA is certainly one area where it may not pay to roll your own, since a failure in the HA system in production would be a serious career-limiting move...)
"The future's good and the present is nothing to sneeze at." - Roblimo's last
Let's see here...either (1) you've never worked in a corporate environment where you've had to deal with consultants or (2) you're a consultant yourself and "resemble that remark." From the (admittedly limited) experience I have with them, the original poster's remarks were on-target, though. Those who can, do; those who can't, consult.
It's not an "anti-corporate" bias; it's an "anti-moron" bias. :-)
20 January 2017: the End of an Error.
Our company uses these nifty machines for load balancing and fail-over. They are basicly x86 based machines running FreeBSD and some proprietary software. They also have the important things in life like 2 NICs and a nice rack mounted chasis. It is a bit pricy, but you get a very useful manual, support, someone to blame when one is on fire, etc. Most importantly, it works...
One thing... Make sure you're pluging it into 120VAC. The power supply get's very unhappy if you don't... You learn these things when someone labels a 240VAC strip as 120...... Go figure.
I'm not sure, but if true, I find that prospect somewhat revolting. It's a basic admission that companies care more about money than about quality. Usually smaller companies are okay, but the big conglomerates make me skeptical of the good of capitalism in the big picture.
make world, not war
But you said, a few posts up, that your customers want a "tried and tested platform backed by a company that truly cares about their satisfaction." But now you imply that your company doesn't truly care about their satisfaction, but only about truly about their money. Which is it?
I assume you care only enough about their satisfaction as it will bring in the dollars. Ie, you want to barely keep them satisfied enough, such that they'll buy more products. Such is capitalism at its extreme. You choose money over product quality.
make world, not war
Hahaha, a sleezy capitalist fearing his/her eventual demise. Anyway, doesn't this 'company' you speak of truly care more about their shareholdrs than about it's customers' satisfaction?
make world, not war
> I just called up 3com and said, "Please send me
> two of those pieces of hardware with
Well thats nice. Look, I have no use for these
things myself. I don't know what the product is
called, I never bought one. I was simply trying
to offer an idea and point in the right dircetion.
I never claimed to be able to do more.
I probably could find out the name of the product,
but not in the time frame where it would matter
wrt slashdot comments.
> I think you want two servers with the same RAID
> array....[snip]
Yup...a very good way to do it...I agree (of
course it doesn't handle the raid array itself
having a catastrofic failure...but given the
redundancy in a good array, that should be more
rare than a system blowing)
>> Of course, why thats even needed is beyond me.
> apparantly..
Thank you for changing the order of what I said
so that it looks like I said something different
than I did.
If you were to look at my original comment, I said
this about the case of SIMPLE ethernet line
failover NOT the redundant servers case.
-Steve
"I opened my eyes, and everything went dark again"
Yup, I know all that.
However, you should note that I offered 2 solutions. One of them being almost exactly what he asked for, but implimented in hardware (and as someone else pointed out, possibly firmware too) which requires no driver software to work (beyond that of whatever existing ethernet card one has)
The other solution, yes its alot more costly. Yes it MAY not be right for the given situation. However, I felt it should be offered up anyway, and to let the person in that situation decide.
"I opened my eyes, and everything went dark again"
Sorry to be a bit off-topic, but there is a reasonably priced 4-port ethernet solution out there. Compex, Inc. makes a quad port ethernet card (P/N FL400TX/PCI) that sells for $189.95 on buy.com. Looks like it's out of stock right now though.
I purchased one of these for our server (Linux based of course) here at work and have been quite happy with it. I'm using it for subnetting our network (vs. fail-over network links.)
B2B == We sell to businesses. Why can't they just say it?
Will I retire or break 10K?
Your comment is very informative, however, what he's really talking about doing is implementing Cisco Catalyst switches that use HSRP (hot-standby routing protocol) and load-balancing in order to give you twice the throughput, without using two separate subnets. This is the preferred and desirable way to implement high availability and layer 2 redundancy. It can be done on both Solaris and NT (don't know about linux). The point most people are missing here is that it is preferable to do this in hardware, as opposed to software, because the hardware tends to be more reliable. I would trust Cisco IOS to handle my redundancy much more than even Unix (although Unix is very stable). I think most people are answering the incorrect question. Now that he has all of this hardware, how does he use it? I would be interested in hearing if there are any devices or device drivers that allow you to do this in Linux.
"When the president does it, that means it's not illegal." - Richard M. Nixon
I worked on an HP-mini, and it used a similar setup. Basically, the two *identical* minis shared a SCSI bus with redundent media. The backup mini would ping the other one over the SCSI bus, and if it didn't get a response it would take the IP of the first one. Worked damn well.
The only drawback is that the backup isn't doing anything but issuing a ping, mirroring the system RAM in machine 1, and waiting. The upside is that short of a missle strike, you had very high reliablity. Most failures didn't even cause a pause.
I don't see any problem with using the same method with more systems, though a cluster starts to look attractive after a while.
A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
I recently attended the SGI University, and besides recieving a little Tux doll, I sat through some fairly inane discussions about SGI. BUT, this may help you out... they had a session on clustering for high availability. They mainly spoke about being able to hide planned downtime by using 'High Availability, Mission Critical' Clusters where if one failed (or was shutdown for maintenance) the other would automatically pick up the slack... they also spoke of this with regards to Beowulf clusters... you may want to check out their open source area also, and see if any of this software is publicly available yet. It does sound like they are heading in the right direction with getting into Linux...
Hope this helps
regards,
Benjamin Carlson
"If voting could really change things, it would be illegal. " - Revolution Books, NY
This feature isn't needed for all fail-over schemes, but it does exist for those schemes which use it.
Sounds like a Cisco LocalDirector or one of it's competitors could do the trick
The main reason for AP is for the DR or Dynamic Reconfiguration feature. If you've got three system boards, then you can have some redundant hardware so that you can take down and remove a system board *while the OS is still running*, and keep your network connection going without missing a beat. (Same for disk.) Neat stuff.
I wrote software for Solaris (Which as others have pointed out does not do this without 3rd party software) because we found that no solution would fit our needs well. When looking at the prive tag we concluded that we could do better. (Come to think of it there is High avaiable Solaris, but it isn't cheaper or better then 3rd party stuff)
Basicly we ping something on the other side of the router every 5 seconds, and if the ping doesn't come back we switch to the other port. That is the overview, but you need to do some more isolation before you blindly switch ports.
I strongly recomend you put in some other path between you and the box you are pinging. Several times we have been bitten when the box we weere pinging went down and not the router, or alternaticly the network was so busy the ping didn't get through within our timeout period.
There is no portable way in solaris to tell if one ethernet port has signal. You can find out from some drivers, but when you change to a different ehternet card you have to do something else to find out.
including
Piranha
--
and when the dual port nic dies and takes both interfaces with it you switch to psychic networking? i'd really suggest two nics not a single dual one.
that said, linux can do routing. why not set up a loopback device and then have it route through either nic? ip was designed to deal with multiple routes, why must your consultant reinvent the wheel? (loopback addresses are published, so they'll be seen on the network)
US Citizen living abroad? Register to vote!
I hate saying "I hate to tell you this but" when I really enjoy telling someone something.
So I won't lie.. I'm happy to tell you....
This may not be the right solution.
The problem isn't the consultents. They know there stuff otherwise you wouldn't be paying them.
They come with years of experence and bieses.
Expect a consultent who isn't friendly with Linux to dig up a solution that will not work on Linux.
Yes Solarus can do a lot of great things Linux can not. In the end Linux has one great advantage and thats price. Source code and quick security patch relases is a bonus.
This rule holds for an NT shop....
And don't put a consultent past finding a feature Solarus dosn't have. Your talking with some of the greater frelance tallent in some cases and if a defect is to be found they can find it. Once found that defect becomes a case for switching to something the consultent likes.
So once you pick a platform for your shop pick a consultent who is buddy with your choice. If then he recomends something else it will be after bleeding dry all posablitys.
This may not be the only way to do it...
and... it may be the worst way to get it done...
Linux has it's limits don't get me wrong but you can allways find more than one solution. Linux may support 4 out of 8 solutions.. if your only presented 1... there may be a reason...
I don't actually exist.
One of the things that I have found is that OS level failover doesn't always work or will have odd problems. If you are looking for Enterprise level uptime then hobbling together a solution such as this is not for you. The company I work for uses a cisco localdirector to do the work for it. What's great about this sort of solution is that a localdirector will round robin, do failover, and such a dizzying array of things that it's wonderous. I would suggest you look into this solution or one similar
Your question: "Here's what the consultant told us is the solution to our problem. Where can we get the hardware?"
/. suggest?"
What you want to know: "Here's our problem. Here's the solution the consultant came up with. What improvements can
For instance, why do you need dual-port NICs? If it's just for the throughput, why not just use 2 single-ports? This also provides redundancy in the hardware department.
--
Have Exchange users? Want to run Linux? Can't afford OpenMail?
Linux MAPI Server!
http://www.openone.com/software/MailOne/
(Exchange Migration HOWTO coming soon)
I believe Juniper and 3Com also support the use of VRRP.
You may not be able to implement HSRP without paying Cisco a license fee. I'm not sure if anyone has approached Cisco from an open-source viewpoint though.
As for a public implementation - I should have a Linux VRRP implementation out this week.
Say you have a machine with two dual port NICs or even two NICs. Have a script that checks the main network interface every ten seconds. If the main interface becomes unavailable, unload it and load the second interface with the same IP and reset all of your routing information.
If you have a server with that kind of "need" then maybe you should consider having a better routing setup altogether. Consider how www.netscape.com will actually resolve to several IP addresses. The options are numerous. The main issue, is that linux works just fine. (Although freebsd has sexier networking)
> We are a growing B2B company;
Good to know that you are buzzword compliant...
I understand thats very important to some people,
and if I ever figure out who those people are, I
will probably avoid them like the plague.
As for fallover...check out 3com....long ago a
man (who would later go on to teach Unix courses
at WPI and be one of the best teachers I ever
had for anything) designed a piece of hardware
with 1 ethernet port on one side, and 2 on the
other...it was designed to do JUST THAT.
Completely in hardware. He did it for a company
that was later bought out by 3com...he claimed
(a couple of years ago, when I was in his course)
that they still sell the product that he designed.
Of course, why thats even needed is beyond me.
For better redundancy, you really want seprate
redundant servers, each with RAID arrays and
probably a couple of localdirectors (or round
robin DNS for a cheaper solution) direcing
connections between them (giving both fallover and
increased availability) but...thats just IMNSHO.
Afterall, if a CPU fries, or a power supply starts
letting its magic smoke out...all the duel port
NICs in the world wont help.
"I opened my eyes, and everything went dark again"
Making the assumption that you want a web-farm, and for maximum availability, you are using two cisco switches, and two cisco PIXes, and two cisco local directors... you can still get away with a single NIC (not that you have to) put half of your web-servers on one switch, and half on the other.
Otherwise, you can put two NICs in (one on each switch) and assign them each their own IP address... no need to fail over... although I would look at the F5 BIG/IP - as it can make sure that your servers are serving up content... the Local director isn't as good at this.
BlackNova Traders
I think piranha does just this. Byte just ran a look at Linux HA clusters: http://www.byte.com/column/BYT20000510S 0001
http://support.in tel.com/support/network/adapter/pro100/30504.htm That is the dual port adapter that comes in Dell Servers....Has anybody used this before? D
Building Linux Clusters is just what you should read.... Uncoftunately it won't be out until August.
Everything you need is at High Availability Linux.
I too am/have built a B2B exchange on the linux platform and found JServ to be *INCREDIABLE* at HA/Failover safe features.
As for the 2 network cards for each machine, that too is a *VERY GOOD* thing. It allows you to partition out your network traffic to achieve much better response time. For example our network has 2 NICs in each machine. There is "Web Server to Database" network, There is a firewall to webserver network, and we have a seperate network for office web surfing and misc stuff like that. Access to the "WebServer to Firewall" network is handled across the router.
One thing to keep in mind when dealing with DB aware web applications is that unless your code is *VERY POORLY* written the biggest bottleneck will be in network latentcy.
-GrimsaadoWhen Linux was cranking up last year the folks at TurboLinux sales called up promoting their fault tolerant cluster solution, altho it's not free you could get a timed demo - so Linux solutions exist.
Just a general observation - Linux is pretty well fleshed out with about anything you can think of in one form or another, it just isn't chasing you down with in-your-face ads and high pressure sales promos like other comercial products, so it may appear to be deficient but more often than not just a few days (for us slow pokes) search and trials will usually turn up an inexpensive quality solution in some stage of development hidden somewhere.
try { do() || do_not(); } catch (JediException err) { yoda(err); }
Has anyone considered VRRP (Virtual Router Redundancy Protocol)? It's an actual open standard, and it works. It not only works, it works amazingly well.
One of the major users of VRRP technology is Nokia. They've done extensive work on the protocol, and use it in their line of firewalls (which btw run a heavily modified FreeBSD codebase).
VRRP uses multicast packets that are similar to OSPF "Hello" packets to let the partner(s) know it is alive. If the primary machine dies, the backup instantly takes over. When the takeover happens, it not only assumes the IP address of the dead machine, but it also answers for the MAC address of the dead machine.
--
The unsig!
"...dual-port NICs...switch the ports when the active port fails...
Oh, I see. When one port (or its path) fails, you want to switch the IP to a different port? I don't think "the driver" needs to do that, just change the IP assignments with ifconfig.
My experience with consultants is that a good many of them are clueless. The reason they're consultants is they can easily BS the customer into believing they know what they're talking about long enough to bleed you dry. They may even provide you an actual solution that may even kind of work but which is patently the worst way to do what you were wanting to do. Then when you DO get someone in who knows what he's doing, that guy will have to spend twice as long beating your company into shape because he has to go back and undo everything the previous one did.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
What kind of application are you trying to fail-over? A database? A web-server?
If you wanted a web-farm, that's dead stupid easy. Fail-over database/ftp/nfs isn't too hard, but (presently) requires commercial software. Understudy Polyserve, Wizard Watchdog, or even RSF-1 are just some of the HA clustering products available.
BlackNova Traders
A non-kernel invasive version of this would be a script that configures one port with the desired IP/mask and creates the default route. It then puts the other port in promiscuous mode and monitors it for traffic using a libpcap based program or even a possibly modified tcpdump. As soon as it sees any traffic, it switches the configuration and starts monitoring the other port. This could probably be written in 2-4 hours given a network to test it in.
For a possibly simpler solution (i.e. no code to write), use a pair of additional Linux systems. Configure each of them to load balance with LinuxVirtualServer (aka LinuxDirector) or the Pirhana version of it to as many backend servers as you have, BUT to Different internal IP addresses. Good choices would be 10.* addresses, say 10.0.1.* and 10.0.2.*. Using either a dual NIC or two NIC cards in each server, create two networks with one for each of the load distribution servers. Configure Apache et al to respond to the IP addresses of each network the same.
BTW, there are 4 port 100-base-T cards out there, from Adaptec I think.
Good Luck!
Stephen D. Williams
You may want to check out http://linux-ha.org/.
The "heartbeat" application implements node-to-node monitoring over a serial line and UDP and can initiate IP address takeover based on a notion of resources provided by nodes and resource groups. It worked well for me. However, this was only a very basic two-node setup.
1. Your consultant should learn routing protocols
2. Your consultant should learn the concept of a loopback alias.
3. Your consulatnt should have an IQ of above 25
4. There is absolutely no need for link layer 2 failover where layer 3 will do. Unix is not WinHoze. It knows about routing.
So your task list is:
- Configure loopback aliases on the linux boxes.
- Configure apache to listen only the loopback alias interface.
- Build gated from rhat sources they have the patches for linux-2.2 in already. You may use zebra CVS instead but it is still a bit off in terms of stability. You may need a script that HUPs it a few times gated as gated does not always start clean and update the routing table on 2.2.x.
- Configure ospf on gated and on your cisco gear. Distribute default into OSPF as gated from the 3.5.x tree has no IRDP.
- Shoot your consultant
In btw: your bill is 500$.Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
You could go with an expensive commercial solution like BigIP from F5, but those will run you at least $30k or so. You could also use Polyserve Understudy, which does pretty much the same thing only under Linux, and it's only about $400 or so. If you have all this expensive Cisco equipment and a Cat6000, you can run Local Director on that without buying additional hardware.
However, I suggest:
http://www.linuxvirtualserver.org or
http://linux-ha.org or
http://www.eddieware.org
It all depends on your application that you're running. If it's just http, any of these will work, but if it's something else, you're stuck with linux-HA or Linux Virtual Server. Eddie will only do http as far as I know. Plus Eddie uses Erlang, which may affect performance.
Need Free Juniper/NetScreen Support? JuniperForum
You're attacking the wrong problem. What you need first is not Linux failover but consultant failover: Your consultant has failed; you need to switch to a new one instantly.