Slashdot Mirror


Linux Failover?

Anton asks: "This is a question about Linux failover in business situations. We are a growing B2B company; our product runs on Apache/Linux. We contracted professional services to properly set up our network. After all the hellishly expensive CISCO hardware had been set up it turns out that for our servers to be configured for failover, each one needs two dual-port NICs configured for one IP connected to two different switches, furthermore the driver needs to intelligently switch the ports when the active port fails ... We've never heard of such beasts for Linux and a net search revealed nothing. Our consultant however claims that 'Linux is biting itself in the foot' for not supporting that, and that other industrial strength OS's like solaris in fact do support this. Has anyone run into this before or have other ideas? " [nik suggests]: Take a look at Polyserve Understudy, which might be an alternative. FreeBSD and Linux versions are available (and bundled with FreeBSD 4.0).

61 of 203 comments (clear)

  1. Re:Changing MAC addresses by bluGill · · Score: 2

    As the other poster said, it depends on the driver. Thats the bad news

    The good news is DecNet requires the driver (and hardware) be able to change the MAC address. Thus even for cheap cards most of them can just in case the vender ever has the chance to sell to the last shop out there still running decnet.

  2. re: MAC address by sjames · · Score: 2

    Many cards can have their MAC address set. Linux ethernet drivers support that where available.

  3. Re:This is the wrong question by Defiler · · Score: 2

    http://www.us.buy.com/comp/product.asp?sku=1016087 0

    This is D-Link's 4-port 10/100 NIC.. It has Linux drivers, and it's only $165.

  4. Re:This is the wrong question by cfulmer · · Score: 2

    So, in the telephony industry, this is a big requirement... The problem is that you don't want the outage of a single network link to take your machine out of service, or to interrupt any existing transactions.

    So, for example, if you have a TCP stream going to a specific NIC and the link between the NIC and the switch gets cut, or the NIC fails or something, then you need to be able to continue the same TCP stream on a second interface.

    You end up with several issues: On a lot of NICs, it's not that easy to figure out when it's having problems. Secondly, the second NIC is typically at a different hardware address, so you need to update the ARP cache of any machine sending to you. And, you have to figure out how to tell when the first NIC is working again.

  5. Re:This is the wrong question by stripes · · Score: 2
    I am wondering why NICs with more than one port are so danged expensive though? I can see a bit of an increase in price, but there is no way these things should be $400 and up (last time I looked..)

    The SBus QFE part may have been space constrained (SBus cards are small), which will bump the price a little. Multi-port PCI NICs normally need a PCI bridge part (actually it's been a while since I bought one, maybe they do it all in one multifunction PCI chip now), which pushes the cost up a little too.

    But the big reason is economy of scale. It costs a lot of money to design a product, document it, write drivers, set up distribution channels, and so on. Cost that is mostly fixed regardless of how few of the product you sell.

    Contimplate the following example:

    Assume for the sake of argument that it costs $1,000,000 to design a PCI board. Now assume I make a 4 port ethernet (with a parts cost of $40), and you make a one port ehternet (with a parts cost of $10). Also assume there are (only) 1,000,000 people on the earth (and all want to be in on the big LAN party). Some of them are uber-graks and will buy the 4-porter so they can have a 4-porter. Some want a "reliable gaming experance" and will buy the 4-porter because they have 3 more ports if hte first fails. Some want to run the LAN server and need more bandwidth. In all 100,000 people are intrested in my product. 900,000 in yours. To exactly cover our costs you need to charge $10 for the parts and a bit over $1 for the "overhead" -- a $11 price, I have to charge $40 and a bit over $10 in overhead -- a $50 price.

    Alot of the people who wanted a "reliable game experiance" are now swayed by your argument that they can buy two cards and get "enough" reliability. Or even 4 of yours ($44), and an extra $6 to buy another ethernet cable in case their breaks! A few more are swayed by the argument that $50 is alot to pay for a network card, look over there a $10 card. Maybe they should keep the rest of the money, or buy a new game, or save up for a monkey. Soon only 10,000 people want my card. Your overhead drops a little (it is still about $1), but mine rockets to $100!

    With a $140 price tag even the uber-geeks start rethinking, and decided maybe they would rater show their geekeness with a $130 EFF contribution, and a nifty EFF bumper sticker on the side of their case.

    That's when things really start to suck, only the 5 guys holding the LAN party that need my card are now intrested in it. The price rockets to $200,040. At that price the 5 guys will spend a long time trying to figure out a way to do the whole gig without my card. In the end maybe they just charge everyone on the planet $5 to get into the LAN party and end up with "free" cards.

    There are lots of little things wrong with this example (the guys running the part could probbably use 4 of your cards at once), there are more then 1,000,000 people, the overhead costs can vary from product to product, some people will buy even seriously overpriced goods. But I think it does go a long ways towards showing why a Sun QFE costs $1,500 and a Intel Ether Express 100+ is $25.

  6. Re:Choice of enterprise solutions by stevew · · Score: 2

    For a "top flight consultant" you have a few mis-conceptions. BSD isn't a new version of the Solaris code. Solaris is NOW a SysV derivative, i.e. ATT code. BSD is BSD. For the older versions of SunOS that you are thinking about, they separated from the main BSD tree ALONG time ago.

    As for offering your customers a product with a company who stands behind it's guarantee - you're giving them MS? Why? That is pure FUD. Did you hear about the court case that handed down a couple weeks ago where the software supplier was held immune due to the "we don't guarantee this software for any use" clause in the shrink-wrap agreement. Pretty much leaves the concept of a "Big company" being needed out in the cold.

    --
    Have you compiled your kernel today??
  7. A few things to consider by Doctor_D · · Score: 2

    Well from what I've gathered from the current discussion is somewhat of a lack of direction. So here are a few things to consider and answer before going forward:

    1) Do you need High Availability of 1 machine? (ie 99+% of a single machine) If the answer is yes, then clustering is the way to go. But doing that right is very expensive (hardware, software)

    2) Does it make sense to have a farm of identicaly configured machines? If you're using Linux / FreeBSD as your webservers and if you only run web servers on them, then you can get away from clustering proper and just throw a ton of machines at the problem. ie farm of web servers.

    3) Sounds like the Consultant has the right idea with the "expensive Cisco hardware" in making sure Layer 2 is fully redundant. Good step forward. Now ya just need to make sure your hardware that is connected to it will utilize it. Do you?

    4) If your running Solaris, then Alternate Pathing becomes your friend (especially with Quad Fast Ethernet cards), as well as Dynamic Reconfiguration. Are you, or is this a moot thread?

    5) Overall, what are you trying to accomplish? Uptime of hardware, uptime of the application, or raw uptime of the web servers? If you got a set up like /. then clustering proper is not done. If you can see, they just load balance 3 web servers, and then dedicate a box for ads, a box for the database, and a dedicated image box. And we know how little /. is down...

    Basically, that's pretty much it. Personally I wouldn't bother with clustering or complicating the web servers that much, I'd cluster the back end supporting stuff for the web farm. ie the back end database, fully redundant hardware, alternate paths and so on. And then let Cisco's Local Director take care of load balancing and checking the web server is up or not. (From what my network guy at work tells me, it can do that. I won't personally believe it until I see it).

    --
    "If you insist on using Windoze you're on your own."
  8. Re:B2B buzzword by scrytch · · Score: 2

    The word used to be "supplier".

    Oh well at least I'm not seeing "architect" used as a verb anymore. I was just itching to shoot someone then.

    --
    I've finally had it: until slashdot gets article moderation, I am not coming back.
  9. Intelligent NIC with Failover by Detritus · · Score: 2

    I worked on a distributed systems project that needed high reliability LAN connections. The solution we used was a custom NIC that had two Ethernet interfaces and a 68000 with 512K of RAM on a single PCB. Each system broadcast heartbeat messages on all attached LANs. If a primary LAN failed or became partitioned, all systems automatically switched to the backup LAN. This was transparent to the processes sending and receiving data on the network since the NIC routed packets to the Ethernet interface designated as active by the system's LAN monitoring and failure detection software.

    --
    Mea navis aericumbens anguillis abundat
  10. Re:Shoot your consulatnt by arivanov · · Score: 2

    I could give you a detailed rant style answer but I think it is not worth it.

    Most root DNS servers, primary mail relays, etc use exactly what I said. And there is no such thing as what you said. Been there done, that.

    Please get a clue.

    Solutions using routing protocols cause serious trouble if and only if designed and ipmplemented by Minesweeper Consultants and Solitaire Experts.

    --
    Baker's Law: Misery no longer loves company. Nowadays it insists on it
    http://www.sigsegv.cx/
  11. Re:Shoot your consulatnt by arivanov · · Score: 2
    if you only require service availability (but not session availability).

    I have to remind you - you do not use physicals. Apache listens on loopback only. So the client retransmits, it goes via the other interface and you have no problem. Session is alive.

    talk to (resp. across) a small set of routers (or routing protocol using hosts).

    Correct. You talk to two routers or just differnt ifaces on one that connect you to the backbone (via different layer 2 devices - switches or hubs). And from there on with the entire internet.

    In a similar internal corporate scenario you talk to the routers or the RSM on the switch that separate the servers from the lusers.

    I can give you a number of examples where it won't work at all.

    Yeah, sure. I have seen gazillion of b0rken network designs written by experts. Most of them with a minesweeper and/or solar sertificate. I am not beeing biased but core networking is not a subject in neither of these sertifications. Officially core network support in Slowarez is considered with a "to be or not to be" status in Sol 8. Check the zebra archive for details. With minesweepers it is not even considered.

    You don't happen to post in certain de newgroups ... ? This somehow sounds ... familiar

    No. Never used news. But I am not the only BOFH around.

    --
    Baker's Law: Misery no longer loves company. Nowadays it insists on it
    http://www.sigsegv.cx/
  12. Re:What is needed for real NIC failover. by arivanov · · Score: 2

    Very good besides the fact that Layer2 failover has always been less reliable than layer3. If layer2 was better the internet core would not use OSPF and BGP.

    So, overall: OSPF instead.

    --
    Baker's Law: Misery no longer loves company. Nowadays it insists on it
    http://www.sigsegv.cx/
  13. Re:This is the wrong question by arivanov · · Score: 2
    Very good approach.

    The problem is that a bunch of carma w** who are out of their scope have immediately flooded the article with comments about piraniah, clusters and other irrelevant things. The question is about failover in case of link failure. The consulatnat thought of winhoze and chose layer 2. You have a unix system. Unix knows about routing and IP. Hence what you need is a layer 3 solution. For example:

    http://slashdot.org/com ments.pl?sid=00/05/21/1853216&cid=90

    --
    Baker's Law: Misery no longer loves company. Nowadays it insists on it
    http://www.sigsegv.cx/
  14. Re:Failure to implement open standards. by jcostom · · Score: 2
    Oh, that was a good joke. Yes, Nokia uses it. And nobody else.

    Juniper, 3Com, and Alcatel was at least working on it for a time in 1999. Yeah, that sounds like "just Nokia". :)

    There's also HSRP. Everybody and his dog uses it in their Cisco routers. Why not take that ?

    HSRP is a hacked version of VRRP v1. Where do you think they got the ideas from???

    And no, I don't work for the IPRG group. I've got some friends who used to, and one that still does, but no, I don't work for them.
    --

    --

    The unsig!
  15. Fire the consultant and hire a hacker by Medievalist · · Score: 2

    When the consultant installs a network that is clearly not designed for the needs of the company (i.e. supposedly requires special hardware and drivers that the consultant doesn't know how to integrate with your core product) you are being taken for a ride by people with little knowledge and less moral backbone.
    If you need multiple ethernet interfaces on a machine they should be separate cards for robust redundant failover. I run 12 linux boxes with 4 ethernet cards in each; my /etc/lilo.conf files look sort of like this:

    boot=/dev/sda
    map=/boot/map
    install=/boot/boot.b
    prompt
    timeout=50
    image=/boot/vmlinuz-2.2.5-15smp
    label=linux-smp
    append="ether=0,0,eth1 ether=0,0,eth2 ether=0,0,eth3"
    root=/dev/sda8
    initrd=/boot/initrd-2.2.5-15.img
    read-only

    The append line activates my additional ethernet cards, all of which are 3com 100bTs using Donald Becker's excellent open-source drivers.
    Combining this with round-robin DNS using the latest ISC BIND code, you can get incredible fault tolerance at a very low cost. You can even do IDE RAID (hard or soft) if you are too cheap for SCSI, and you can use rsync to keep your servers clones.
    Unless your application is extremely unusual and non-wwwebby, you can accomplish what you need without any expensive Cisco stuff or fancy double-headed cards at all. The consultant is taking you to the cleaners due to greed or a total lack of competence.
    --Charlie

  16. Re:Choice of enterprise solutions by Felinoid · · Score: 2

    For a big company name I'd have recomended SCO not Microsoft.
    HP is also good but my personal bise prevents me from recomending them for software solutions.

    --
    I don't actually exist.
  17. Linux Virtual Server by A.+Lynch · · Score: 2
    The Linux Virtual Server Projects stuff might do the trick. I've used it for mailserver load balancing under medium load, and found it to be quite reliable, and easy to configure.

    Without RedHat's Pirhana package. ;-)

  18. No.. THIS is the wrong question. by mindstrm · · Score: 2

    Well.. if you say "Here's what the consultant told us was the solution to our problem"..
    where's the solution? Was he just speaking theoretically?

    A dual port nic sounds strange, especially with this behavior. From a networking point of view, this makes sense.

    Sure, a dual port nic will help you, *if* it's set up to get arond transciever failure by bringing up the other port.

    Two nic's would be better, where the box itself could attempt to configure and use the other nic if it loses network connectivity.

    An even better (and more obvious?) solution is to have two computers..... complete redundancy.

  19. Re:Solaris "Alternate Pathing" by austad · · Score: 2

    One of the sweet things that most people don't know is that you can hot swap memory and CPU's with the Dynamic Reconfiguration feature. This is only on Sun hardware though.

    However, with solaris 8 (and maybe 7) on x86 hardware, you can hotswap PCI cards. This feature under Linux would be huge win for the OS.

    --
    Need Free Juniper/NetScreen Support? JuniperForum
  20. Re:Failure to implement open standards. by yesod · · Score: 2


    I have an alpha implementation of VRRP for Linux that I'll be GPLing within the next week or so.

    We're using it and it seems to work very well.

    Currently for 2.2.x only.

    Watch for announcements.

  21. Can be done with scripts... by dublin · · Score: 2

    This is not something that's *in* any OS, unless Sun's added it into Solaris in S8. (Could be, I don't get to play with Suns anymore... sniff...)

    Although I'm sure the options have changed some since I was fully up on this stuff about three years ago, there were only a handful of failover options at that point, and only one of them worked really well.

    That one, interestingly was in reality a bag of (very good) scripts, which implemented a heartbeat function and when it detected something wrong, would down the interfaces, re-plumb them if necessary, reset addrs, and up them again. Although it's worth the money they charge, if you're into a serious DIY mode, there's no reason you couldn't write such scripts yourself, and there are almost certainly some already out there, probably as part of the Linux HA project.

    Oh, and as an aside, I would stick with the script-based solutions whether you build or buy: they're more reliable, and they leverage the OS better than the proprietary methods. (Qualix's main competitor back when I worked for Sun consulting for customers on such things was OpenVision HA, which was a huge, slick, impressive monolith of GUI binaries that had a well-earned reputation for leaving a trail of dead bodies behind it. FirstWatch, on the other hand, was simple and unimpressive in a demo, but it just worked, and worked well, in the real world.

    Qualix was bought by Veritas a few years ago - check with them if you want a decent supported package. (And let's face it - HA is certainly one area where it may not pay to roll your own, since a failure in the HA system in production would be a serious career-limiting move...)

    --
    "The future's good and the present is nothing to sneeze at." - Roblimo's last ./ post
  22. Re:Feh! by ncc74656 · · Score: 2
    My experience with consultants is that a good many of them are clueless. The reason they're consultants is they can easily BS the customer into believing they know what they're talking about long enough to bleed you dry...

    Interesting? This gets moderated as "Interesting?"

    It's flamebait, and if the moderators weren't so blind from anti-corporate propoganda...

    Let's see here...either (1) you've never worked in a corporate environment where you've had to deal with consultants or (2) you're a consultant yourself and "resemble that remark." From the (admittedly limited) experience I have with them, the original poster's remarks were on-target, though. Those who can, do; those who can't, consult.

    It's not an "anti-corporate" bias; it's an "anti-moron" bias. :-)

    --
    20 January 2017: the End of an Error.
  23. BIG/IP from F5 Networks... by ColonelNorth · · Score: 2

    Our company uses these nifty machines for load balancing and fail-over. They are basicly x86 based machines running FreeBSD and some proprietary software. They also have the important things in life like 2 NICs and a nice rack mounted chasis. It is a bit pricy, but you get a very useful manual, support, someone to blame when one is on fire, etc. Most importantly, it works...

    One thing... Make sure you're pluging it into 120VAC. The power supply get's very unhappy if you don't... You learn these things when someone labels a 240VAC strip as 120...... Go figure.

  24. Re:You have no clue what you are talking about by wass · · Score: 2
    To take that further, aren't they legally obliged to hold their shareholder's interests above those of their customers...

    I'm not sure, but if true, I find that prospect somewhat revolting. It's a basic admission that companies care more about money than about quality. Usually smaller companies are okay, but the big conglomerates make me skeptical of the good of capitalism in the big picture.

    --

    make world, not war

  25. Re:What are you talking about? by wass · · Score: 2
    And if all our customers fuck off because we don't keep them happy, the shareholders are going to love that right? A satisfied customer is a paying customer.

    But you said, a few posts up, that your customers want a "tried and tested platform backed by a company that truly cares about their satisfaction." But now you imply that your company doesn't truly care about their satisfaction, but only about truly about their money. Which is it?

    I assume you care only enough about their satisfaction as it will bring in the dollars. Ie, you want to barely keep them satisfied enough, such that they'll buy more products. Such is capitalism at its extreme. You choose money over product quality.

    --

    make world, not war

  26. Re:You have no clue what you are talking about by wass · · Score: 2
    No, the average corporate customer we get is more interested in a tried and tested platform backed by a company that truly cares about their satisfaction rather than being at the whims of the "open source" bearded hippy crew and their communist fuhrer.

    Hahaha, a sleezy capitalist fearing his/her eventual demise. Anyway, doesn't this 'company' you speak of truly care more about their shareholdrs than about it's customers' satisfaction?

    --

    make world, not war

  27. Re:Hardware Support by TheCarp · · Score: 2

    > I just called up 3com and said, "Please send me
    > two of those pieces of hardware with

    Well thats nice. Look, I have no use for these
    things myself. I don't know what the product is
    called, I never bought one. I was simply trying
    to offer an idea and point in the right dircetion.
    I never claimed to be able to do more.

    I probably could find out the name of the product,
    but not in the time frame where it would matter
    wrt slashdot comments.

    > I think you want two servers with the same RAID
    > array....[snip]

    Yup...a very good way to do it...I agree (of
    course it doesn't handle the raid array itself
    having a catastrofic failure...but given the
    redundancy in a good array, that should be more
    rare than a system blowing)

    >> Of course, why thats even needed is beyond me.
    > apparantly..

    Thank you for changing the order of what I said
    so that it looks like I said something different
    than I did.

    If you were to look at my original comment, I said
    this about the case of SIMPLE ethernet line
    failover NOT the redundant servers case.

    -Steve

    --
    "I opened my eyes, and everything went dark again"
  28. Re:Cost of redundant servers by TheCarp · · Score: 2

    Yup, I know all that.

    However, you should note that I offered 2 solutions. One of them being almost exactly what he asked for, but implimented in hardware (and as someone else pointed out, possibly firmware too) which requires no driver software to work (beyond that of whatever existing ethernet card one has)

    The other solution, yes its alot more costly. Yes it MAY not be right for the given situation. However, I felt it should be offered up anyway, and to let the person in that situation decide.

    --
    "I opened my eyes, and everything went dark again"
  29. Re: This is the wrong question / Multiport NICs by IMLizKing · · Score: 2
    I am wondering why NICs with more than one port are so danged expensive though? I can see a bit of an increase in price, but there is no way these things should be $400 and up (last time I looked..)

    Sorry to be a bit off-topic, but there is a reasonably priced 4-port ethernet solution out there. Compex, Inc. makes a quad port ethernet card (P/N FL400TX/PCI) that sells for $189.95 on buy.com. Looks like it's out of stock right now though.

    I purchased one of these for our server (Linux based of course) here at work and have been quite happy with it. I'm using it for subnetting our network (vs. fail-over network links.)

  30. B2B buzzword by yerricde · · Score: 2

    B2B == We sell to businesses. Why can't they just say it?

    --
    Will I retire or break 10K?
  31. Re:Shoot your consulatnt by illumin8 · · Score: 2

    Your comment is very informative, however, what he's really talking about doing is implementing Cisco Catalyst switches that use HSRP (hot-standby routing protocol) and load-balancing in order to give you twice the throughput, without using two separate subnets. This is the preferred and desirable way to implement high availability and layer 2 redundancy. It can be done on both Solaris and NT (don't know about linux). The point most people are missing here is that it is preferable to do this in hardware, as opposed to software, because the hardware tends to be more reliable. I would trust Cisco IOS to handle my redundancy much more than even Unix (although Unix is very stable). I think most people are answering the incorrect question. Now that he has all of this hardware, how does he use it? I would be interested in hearing if there are any devices or device drivers that allow you to do this in Linux.

    --
    "When the president does it, that means it's not illegal." - Richard M. Nixon
  32. Re:Linux High Availability project by Spoing · · Score: 2
    This tactic works...and a version of it has been used in commercial minicomputer systems for a while. I haven't been to the linux high-availabilty site yet, but they probably got the idea from existing commercial products.

    I worked on an HP-mini, and it used a similar setup. Basically, the two *identical* minis shared a SCSI bus with redundent media. The backup mini would ping the other one over the SCSI bus, and if it didn't get a response it would take the IP of the first one. Worked damn well.

    The only drawback is that the backup isn't doing anything but issuing a ping, mirroring the system RAM in machine 1, and waiting. The upside is that short of a missle strike, you had very high reliablity. Most failures didn't even cause a pause.

    I don't see any problem with using the same method with more systems, though a cluster starts to look attractive after a while.

    --
    A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
  33. SGI and High Availability Linux Clusters by ChiaBen · · Score: 2

    I recently attended the SGI University, and besides recieving a little Tux doll, I sat through some fairly inane discussions about SGI. BUT, this may help you out... they had a session on clustering for high availability. They mainly spoke about being able to hide planned downtime by using 'High Availability, Mission Critical' Clusters where if one failed (or was shutdown for maintenance) the other would automatically pick up the slack... they also spoke of this with regards to Beowulf clusters... you may want to check out their open source area also, and see if any of this software is publicly available yet. It does sound like they are heading in the right direction with getting into Linux...
    Hope this helps
    regards,
    Benjamin Carlson

    --
    "If voting could really change things, it would be illegal. " - Revolution Books, NY
  34. Every Ethernet chip you will ever encounter does.. by becker · · Score: 2
    Every Ethernet chip you are likely to encounter allows temporarily overriding the MAC address in software. Almost every Linux Ethernet driver (all the ones written by me) allows changing the MAC address while the interface is down. The driver rewrites the chip's idea of the MAC address when the interface is brought back up.

    This feature isn't needed for all fail-over schemes, but it does exist for those schemes which use it.

  35. Hardware Fault Tolerance by roth67 · · Score: 2

    Sounds like a Cisco LocalDirector or one of it's competitors could do the trick

  36. Solaris "Alternate Pathing" by Anonymous Coward · · Score: 3
    The Solaris feature refered to is AP or Alternate Pathing. It is used for disk devices and network devices. Basically, you create a psuedo network device. This psuedo network device is attached to two real network devices. Since the OS is doing all network activity through the psuedo device, it is capable of working out of either real physical network device without the applications (or the rest of the OS) being aware there was a change. It is similar for disk.

    The main reason for AP is for the DR or Dynamic Reconfiguration feature. If you've got three system boards, then you can have some redundant hardware so that you can take down and remove a system board *while the OS is still running*, and keep your network connection going without missing a beat. (Same for disk.) Neat stuff.

  37. Been there, wrote it myself by bluGill · · Score: 3

    I wrote software for Solaris (Which as others have pointed out does not do this without 3rd party software) because we found that no solution would fit our needs well. When looking at the prive tag we concluded that we could do better. (Come to think of it there is High avaiable Solaris, but it isn't cheaper or better then 3rd party stuff)

    Basicly we ping something on the other side of the router every 5 seconds, and if the ping doesn't come back we switch to the other port. That is the overview, but you need to do some more isolation before you blindly switch ports.

    I strongly recomend you put in some other path between you and the box you are pinging. Several times we have been bitten when the box we weere pinging went down and not the router, or alternaticly the network was so busy the ping didn't get through within our timeout period.

    There is no portable way in solaris to tell if one ethernet port has signal. You can find out from some drivers, but when you change to a different ehternet card you have to do something else to find out.

  38. linux option by mattdm · · Score: 3
  39. dual port nics? by kevin+lyda · · Score: 3

    and when the dual port nic dies and takes both interfaces with it you switch to psychic networking? i'd really suggest two nics not a single dual one.

    that said, linux can do routing. why not set up a loopback device and then have it route through either nic? ip was designed to deal with multiple routes, why must your consultant reinvent the wheel? (loopback addresses are published, so they'll be seen on the network)

    --
    US Citizen living abroad? Register to vote!
  40. Q: is this a good solution? by Felinoid · · Score: 3

    I hate saying "I hate to tell you this but" when I really enjoy telling someone something.

    So I won't lie.. I'm happy to tell you....

    This may not be the right solution.
    The problem isn't the consultents. They know there stuff otherwise you wouldn't be paying them.
    They come with years of experence and bieses.

    Expect a consultent who isn't friendly with Linux to dig up a solution that will not work on Linux.
    Yes Solarus can do a lot of great things Linux can not. In the end Linux has one great advantage and thats price. Source code and quick security patch relases is a bonus.

    This rule holds for an NT shop....
    And don't put a consultent past finding a feature Solarus dosn't have. Your talking with some of the greater frelance tallent in some cases and if a defect is to be found they can find it. Once found that defect becomes a case for switching to something the consultent likes.

    So once you pick a platform for your shop pick a consultent who is buddy with your choice. If then he recomends something else it will be after bleeding dry all posablitys.

    This may not be the only way to do it...

    and... it may be the worst way to get it done...
    Linux has it's limits don't get me wrong but you can allways find more than one solution. Linux may support 4 out of 8 solutions.. if your only presented 1... there may be a reason...

    --
    I don't actually exist.
  41. OS Level failover by PenguinX · · Score: 3

    One of the things that I have found is that OS level failover doesn't always work or will have odd problems. If you are looking for Enterprise level uptime then hobbling together a solution such as this is not for you. The company I work for uses a cisco localdirector to do the work for it. What's great about this sort of solution is that a localdirector will round robin, do failover, and such a dizzying array of things that it's wonderous. I would suggest you look into this solution or one similar

  42. This is the wrong question by FascDot+Killed+My+Pr · · Score: 3

    Your question: "Here's what the consultant told us is the solution to our problem. Where can we get the hardware?"

    What you want to know: "Here's our problem. Here's the solution the consultant came up with. What improvements can /. suggest?"

    For instance, why do you need dual-port NICs? If it's just for the throughput, why not just use 2 single-ports? This also provides redundancy in the hardware department.
    --
    Have Exchange users? Want to run Linux? Can't afford OpenMail?

    --
    Linux MAPI Server!
    http://www.openone.com/software/MailOne/
    (Exchange Migration HOWTO coming soon)
    1. Re:This is the wrong question by x0 · · Score: 4

      why do you need dual-port NICs?

      On Suns at least, the dual (well, quad) port NICs are used as a heartbeat signal between the active server and the failover box (when using FirstWatch).

      True, you could use two separate NICs in each box to provide the same solution, but then you are using up three PCI slots since the heartbeat NICs do not carry any packets.

      I am wondering why NICs with more than one port are so danged expensive though? I can see a bit of an increase in price, but there is no way these things should be $400 and up (last time I looked..)


      --
      In the immortal words of Socrates, who said; 'I drank what?'
  43. Re:Failure to implement open standards. by yesod · · Score: 3


    I believe Juniper and 3Com also support the use of VRRP.

    You may not be able to implement HSRP without paying Cisco a license fee. I'm not sure if anyone has approached Cisco from an open-source viewpoint though.

    As for a public implementation - I should have a Linux VRRP implementation out this week.

  44. network failure should be handled by network HW by jalex · · Score: 3
    If a network node goes down, it's better if network equipment handles the failover. The server should only handle the failover if the network hardware is incapable of doing so. The server is the weakest part of the network, so you want as little as possible depending on the server.

    Say you have a machine with two dual port NICs or even two NICs. Have a script that checks the main network interface every ten seconds. If the main interface becomes unavailable, unload it and load the second interface with the same IP and reset all of your routing information.

    If you have a server with that kind of "need" then maybe you should consider having a better routing setup altogether. Consider how www.netscape.com will actually resolve to several IP addresses. The options are numerous. The main issue, is that linux works just fine. (Although freebsd has sexier networking)

  45. Hardware Support by TheCarp · · Score: 3

    > We are a growing B2B company;

    Good to know that you are buzzword compliant...
    I understand thats very important to some people,
    and if I ever figure out who those people are, I
    will probably avoid them like the plague.

    As for fallover...check out 3com....long ago a
    man (who would later go on to teach Unix courses
    at WPI and be one of the best teachers I ever
    had for anything) designed a piece of hardware
    with 1 ethernet port on one side, and 2 on the
    other...it was designed to do JUST THAT.

    Completely in hardware. He did it for a company
    that was later bought out by 3com...he claimed
    (a couple of years ago, when I was in his course)
    that they still sell the product that he designed.

    Of course, why thats even needed is beyond me.
    For better redundancy, you really want seprate
    redundant servers, each with RAID arrays and
    probably a couple of localdirectors (or round
    robin DNS for a cheaper solution) direcing
    connections between them (giving both fallover and
    increased availability) but...thats just IMNSHO.

    Afterall, if a CPU fries, or a power supply starts
    letting its magic smoke out...all the duel port
    NICs in the world wont help.

    --
    "I opened my eyes, and everything went dark again"
  46. Other thoughts by Ron+Harwood · · Score: 3

    Making the assumption that you want a web-farm, and for maximum availability, you are using two cisco switches, and two cisco PIXes, and two cisco local directors... you can still get away with a single NIC (not that you have to) put half of your web-servers on one switch, and half on the other.

    Otherwise, you can put two NICs in (one on each switch) and assign them each their own IP address... no need to fail over... although I would look at the F5 BIG/IP - as it can make sure that your servers are serving up content... the Local director isn't as good at this.

  47. redhat piranha? by mikeee · · Score: 3

    I think piranha does just this. Byte just ran a look at Linux HA clusters: http://www.byte.com/column/BYT20000510S 0001

  48. Re:huh ? by dwakeman · · Score: 3

    http://support.in tel.com/support/network/adapter/pro100/30504.htm That is the dual port adapter that comes in Dell Servers....Has anybody used this before? D

  49. Building Linux Clusters - O'Reilly by howard_wwtg · · Score: 3
    There are many good ways (and even more bad ways) to build a redundant/failover environment. Your consultant is just "seeing the one solution he knows). As others have posted there are many good solutions already out there.

    Building Linux Clusters is just what you should read.... Uncoftunately it won't be out until August.

  50. What you need by Anonymous Coward · · Score: 4

    Everything you need is at High Availability Linux.

    I too am/have built a B2B exchange on the linux platform and found JServ to be *INCREDIABLE* at HA/Failover safe features.

    As for the 2 network cards for each machine, that too is a *VERY GOOD* thing. It allows you to partition out your network traffic to achieve much better response time. For example our network has 2 NICs in each machine. There is "Web Server to Database" network, There is a firewall to webserver network, and we have a seperate network for office web surfing and misc stuff like that. Access to the "WebServer to Firewall" network is handled across the router.

    One thing to keep in mind when dealing with DB aware web applications is that unless your code is *VERY POORLY* written the biggest bottleneck will be in network latentcy.

    -Grimsaado
  51. Ignorance of options is not a failure by ch-chuck · · Score: 4

    When Linux was cranking up last year the folks at TurboLinux sales called up promoting their fault tolerant cluster solution, altho it's not free you could get a timed demo - so Linux solutions exist.

    Just a general observation - Linux is pretty well fleshed out with about anything you can think of in one form or another, it just isn't chasing you down with in-your-face ads and high pressure sales promos like other comercial products, so it may appear to be deficient but more often than not just a few days (for us slow pokes) search and trials will usually turn up an inexpensive quality solution in some stage of development hidden somewhere.

    --
    try { do() || do_not(); } catch (JediException err) { yoda(err); }
  52. Failure to implement open standards. by jcostom · · Score: 4
    It never ceases to amaze me. Companies want to sell you obnoxious amounts of software and hardware to do something as simple as create a highly available system. Not to mention projects like the Linux-HA people, who while are doing a good thing, are (IMHO) heading in the wrong direction (using RS-232 heartbeats is silly).

    Has anyone considered VRRP (Virtual Router Redundancy Protocol)? It's an actual open standard, and it works. It not only works, it works amazingly well.

    One of the major users of VRRP technology is Nokia. They've done extensive work on the protocol, and use it in their line of firewalls (which btw run a heavily modified FreeBSD codebase).

    VRRP uses multicast packets that are similar to OSPF "Hello" packets to let the partner(s) know it is alive. If the primary machine dies, the backup instantly takes over. When the takeover happens, it not only assumes the IP address of the dead machine, but it also answers for the MAC address of the dead machine.
    --

    --

    The unsig!
  53. Re:Linux High Availability project by SEWilco · · Score: 4
    Notice there are many links to related HA items there at the Linux High Availability Project. It sounds as if you're looking for something like FAKE, which lets a machine acquire the IP of another machine in a failure (note that FAKE points out that it has been moved into the "Heartbeat" code at Linux-HA) -- although some link chasing is necessary to learn where it went.

    "...dual-port NICs...switch the ports when the active port fails...

    Oh, I see. When one port (or its path) fails, you want to switch the IP to a different port? I don't think "the driver" needs to do that, just change the IP assignments with ifconfig.

    • Monitor each link with some sort of heartbeat.
    • When there is no response, assign the IP of that link to the backup interface. Just use ifconfig to alter the interface configuration.
    • Have the backup interface be on the other NIC, not "switch ports" as you mentioned.
    • Dual-port NICs are not needed, if you can fit 3-4 NICs in your machine.
    • Have heartbeats running on backup and downed interfaces also, to report problems and repairs.
  54. Feh! by Greyfox · · Score: 4

    My experience with consultants is that a good many of them are clueless. The reason they're consultants is they can easily BS the customer into believing they know what they're talking about long enough to bleed you dry. They may even provide you an actual solution that may even kind of work but which is patently the worst way to do what you were wanting to do. Then when you DO get someone in who knows what he's doing, that guy will have to spend twice as long beating your company into shape because he has to go back and undo everything the previous one did.

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

  55. I think we need more information here. by Ron+Harwood · · Score: 4

    What kind of application are you trying to fail-over? A database? A web-server?

    If you wanted a web-farm, that's dead stupid easy. Fail-over database/ftp/nfs isn't too hard, but (presently) requires commercial software. Understudy Polyserve, Wizard Watchdog, or even RSF-1 are just some of the HA clustering products available.

  56. Easy solutions by sdw · · Score: 5
    First, it wouldn't be that difficult to modify the kernel to support this. I've released patches to work around bugs in the ARP reply with two NIC's. Routing to the active port shouldn't be too difficult.

    A non-kernel invasive version of this would be a script that configures one port with the desired IP/mask and creates the default route. It then puts the other port in promiscuous mode and monitors it for traffic using a libpcap based program or even a possibly modified tcpdump. As soon as it sees any traffic, it switches the configuration and starts monitoring the other port. This could probably be written in 2-4 hours given a network to test it in.

    For a possibly simpler solution (i.e. no code to write), use a pair of additional Linux systems. Configure each of them to load balance with LinuxVirtualServer (aka LinuxDirector) or the Pirhana version of it to as many backend servers as you have, BUT to Different internal IP addresses. Good choices would be 10.* addresses, say 10.0.1.* and 10.0.2.*. Using either a dual NIC or two NIC cards in each server, create two networks with one for each of the load distribution servers. Configure Apache et al to respond to the IP addresses of each network the same.

    BTW, there are 4 port 100-base-T cards out there, from Adaptec I think.

    Good Luck!

    --
    Stephen D. Williams
  57. Linux High Availability project by dashuhn · · Score: 5

    You may want to check out http://linux-ha.org/.
    The "heartbeat" application implements node-to-node monitoring over a serial line and UDP and can initiate IP address takeover based on a notion of resources provided by nodes and resource groups. It worked well for me. However, this was only a very basic two-node setup.

  58. Shoot your consulatnt by arivanov · · Score: 5
    Shoot your consultant. With a big gun. Only an idiot will suggest a layer 2 failover for a unix system.

    1. Your consultant should learn routing protocols

    2. Your consultant should learn the concept of a loopback alias.

    3. Your consulatnt should have an IQ of above 25

    4. There is absolutely no need for link layer 2 failover where layer 3 will do. Unix is not WinHoze. It knows about routing.

    So your task list is:

    • Configure loopback aliases on the linux boxes.
    • Configure apache to listen only the loopback alias interface.
    • Build gated from rhat sources they have the patches for linux-2.2 in already. You may use zebra CVS instead but it is still a bit off in terms of stability. You may need a script that HUPs it a few times gated as gated does not always start clean and update the routing table on 2.2.x.
    • Configure ospf on gated and on your cisco gear. Distribute default into OSPF as gated from the 3.5.x tree has no IRDP.
    • Shoot your consultant
    In btw: your bill is 500$.
    --
    Baker's Law: Misery no longer loves company. Nowadays it insists on it
    http://www.sigsegv.cx/
  59. Linux does support this... by austad · · Score: 5

    You could go with an expensive commercial solution like BigIP from F5, but those will run you at least $30k or so. You could also use Polyserve Understudy, which does pretty much the same thing only under Linux, and it's only about $400 or so. If you have all this expensive Cisco equipment and a Cat6000, you can run Local Director on that without buying additional hardware.

    However, I suggest:
    http://www.linuxvirtualserver.org or
    http://linux-ha.org or
    http://www.eddieware.org

    It all depends on your application that you're running. If it's just http, any of these will work, but if it's something else, you're stuck with linux-HA or Linux Virtual Server. Eddie will only do http as far as I know. Plus Eddie uses Erlang, which may affect performance.

    --
    Need Free Juniper/NetScreen Support? JuniperForum
  60. Wrong failure being addressed by igaborf · · Score: 5

    You're attacking the wrong problem. What you need first is not Linux failover but consultant failover: Your consultant has failed; you need to switch to a new one instantly.