Domain: linux-ha.org
Stories and comments across the archive that link to linux-ha.org.
Comments · 72
-
A solution to a down interfaceHigh Availability (HA) is not an option for many Linux users (or to me for my home systems) so I use iproute2 (which is built into all common Linux distributions). With a few simple rules, one can make outbound traffic go out on the interface that it is associated with. For example: I could have multiple DNS A records for a host (using either single or multiple network interfaces) and have that host respond to client requests via the same interface on which they arrive.
Iproute2 has worked out very well for me for a quite a long time and I have no need to run any additional routing daemons.
-
Check out "The Assimilation Project"
Take a look at The Assimilation Project : What we do: Continually discover and monitor systems, services, switches and dependencies with very low human and network overhead.
-
Re:There are other options for DynDNS only routers
Not that I know of. The three links below can give you dynamic failover, and if you install the MOSIX kernel (free for 4 nodes) you've automatic load balancing, but these all work at the node level on a single WAN. I can find nothing similar that would give you extranet failover/load balancing.
http://www.linux-ha.org/wiki/Releases
http://clusterlabs.org/wiki/Main_Page
http://www.corosync.org/doku.php -
Re:I Don't Quite Understand
I would say OMPL (one mainframe per lab) would save a company money. WIth GigaEthernet, thats the way to go. Make it redundant with High Availability
-
Re:Outsource web hosting. But, which provider?
A2's hardware and pricing appear a bit out-of-date. Consider BlackMesh; it's the company I work for, and we run a tight ship in a tier-4 datacenter with name-brand equipment and massive onsite technical expertise.
In terms of load-balancing, I'd agree with earlier posters that you really don't need any for 1,000 visitors a day. As you scale up, I can't recommend strongly enough the use of a hardware load balancer; we've had excellent luck with those from Kemp Technologies, which incidentally run a version of the well-known open-source load-balancing daemon ldirectord. The load balancer then automatically covers your HA needs in terms of a Web front; depending on what you have going on in the backend, a combination of NFS, DB replication, or other service-specific replication configuration, often alongside a deployment of Heartbeat, will cover your data-persistence needs.
To the OP, or anyone else, feel free to poke me via email if you'd like to discuss options any further. I'm a network engineer, not a sales guy, so prices aren't my strong suit, but we can work together to design a platform that you can then get bid on by a number of hosting providers, us included.
--J
-
Some solutions
Others have already covered the "1000 users isn't much" aspect. Benchmark, and verify what each server can handle of your anticipated load, but they're probably right.
Option 1: Don't do it yourself. Look into renting servers from a hosting company. They will often provide HA and load balancing for free if you get a couple servers. Also, having rented servers makes it much easier to scale. If you find that you have 100,000 uniques per day, you can order up a bunch more servers and meet the load within minutes to hours. If you overbought, you can scale back down just as fast.
Option 2: http://www.linuxvirtualserver.org/ plus http://www.linux-ha.org/ . You use LVS to load balance out to a cluster (including removing failed servers from the pool). You use HA so that two LVS machines can fail over to each other. Note that you can run LVS on the same machines as your load, for a small environment. This is much more DIY than the Windows setup, of course... But honestly, if the setup requirements of this scare you away, then you're not ready to run a fault-tolerant network, regardless of OS.
Option 3: http://www.redhat.com/cluster_suite/ . Less DIY, more money. Perhaps that's better for you.
Option 4: Buy a commercial solution. Every major network vendor sells a HA/LB product. I've used them from most of the big players... I'm not going to write a review here, but it'll suffice to say that while they each have their good and bad points, any of them will get the job you've outlined done.
As for the network: The general rule is to reduce your single points of failures to the minimum you can afford. Common ones are: The ISP (BGP is a pain); the routers (Each ISP goes to its own router); the switches between (you need to full-mesh links from the two routers to two switches, down through the line as many layers as it goes; your switches need to run STP or be layer 3 switches running OSPF or another routing protocol; don't forget to plug the load balancers into different switches); the power (Servers, switches, and routers on separate UPSes such that losing one will leave a fully functioning path); and depending on how far you want to take this, the data center itself (in case of fire/meteor/EPO mishaps).
Note that all of this is required even for your Windows solution. Are you sure you don't want option 1?
:) -
Re:1000+ a day isn't very much
You'll need something that detects the primary server is offline and switches to the backup automatically. You might also want to have a separate database server that mirrors the primary DB if you're storing a lot of user content, plus a backup for it (though the backup DB server could always be the same physical machine as one of the backup webservers).
On this note, if you're comfortable (and your application is compatible) with Linux+Apache, then heartbeat and DRBD will do this and are relatively simple to get up and running. Just avoid trying to use the heartbeat v2-style config (for simplicity), make sure both the database and apache are controlled by heartbeat, and don't forget to put your DB on the DRBD-replicated disk (vastly simpler than trying to deal with DB-level replication, and more than adequate for such a low load).
Oh, and don't forget to keep regular backups of your DB somewhere else other than those two machines.
-
Some information about HA
I want to give you some more information. Based on your visitor estimates I think you do not have a lot of knowledge about it. Because for this number of visitors you do not really need a cluster.
But now to the other stuff. Yes, Windows clustering is (up to Win Server 2003 [1]) a lot easier. But this is because it is not really a cluster. The only thing you can do is having the software running on one server, then you stop it and start it on the new server. This is what Windows Cluster is doing for you. But you can not have the software running on both servers at the same time.
If you really want to have a cluster then you need probably some sort of shared storage (FibreChannel, iSCSI, etc.). Or you are going to use something like DRDB [2]. You will need something like this too if you want to have a real cluster on Windows.
I recommend you to read some more on the Linux HA website [3]. Then you get a better idea what components (shared storage, load balancer, etc.) you will need within your cluster.
If you only want high availability and not load balancing then I recommend you to not use Windows Cluster. Better set-up two VMware servers with one virtual machine and then copy a snapshot of your virtual machine every few hours over to the second machine.
[1] I don't know about Win Server 2008
[2] http://www.drbd.org/
[3] http://www.linux-ha.org/ -
What is High Availability to you?
"What is high availability to you?" That is the question HP posed in a Service Guard class I was in once. It's a valid question though. I work with mission critical hospital systems in health care and deal with high availability on a medical hosting service. This means in my particular environment, we need 24/7 operation with minimum or no downtime (
Linux HA Project
IBM HACMP (High Availability Clustered Multi-processing)
HP Service Guard for Linux (also available on HPUX)
Oracle RAC (Real Applications Cluster)
Those are some ares to start with. If you are doing Oracle, you can create a GRID compute environment which will allow for true clustering and load balancing with Oracle in a shared environment with SAN. Once thing to keep in mind is that a SAN is required for most clustering. RedHat also offers the GFS filesystem which is a true proven clusted filesystem. There is another called GPFS which has been used cross platform as well, but required licensing.
When it comes to redundant hardware for HA, make sure you support the minimum requirements for heartbeat paths depending on what clustering solution you want. If you use HACMP or Service Guard, you will likely use a SAN HB and at least 1 redundant network path. Also when using a SAN, use multiple HBAs to provide reduncancy with a multi-path software such as dm-multipath (Linux), Securepath (UNIX), HDLM (Unix), MPIO (IBM UNIX), SDD (IBM UNIX). There are plenty of documents on how to do HA under various environments. I recommend looking at some of the IBM redbooks on HACMP and on Clustering. They also have redbooks for Oracle tuning on Linux with POWER, which will give you an idea about how to do Linux Oracle clusters. If you can create a Oracle Metalink account, you can find out some of the tuning and detailed info about Oracle clusters.
I am sure there are others I am missing, but that covers the base for most clusters. The only other thing is finding a persistent messaging platform (like IBM Websphere MQ - MQ Series) to handle message passing in applications. IPC is good under UNIX for programming, but not as good with clusters, security, or transaction guarantees.
The only other thing to remember is cost. HA environments do incur costs higher than small unreliable environments. Things like mirrored drives, redundant HBAs, redundant power supplies and power feeds, redundant NICs, etc. People worry about petty things like how likely drives will fail, etc. If you architect your environment properly and build your clusters, you build around that. RAID 5 on your SAN, redundant cards, fault tolerant hardware, better reporting mechanisms (HP and IBM integrate daemons on all their OS's to report potential hardware failures with mid-range to high end servers). Look at what your SLA is and what you have provide and then look for the best, most reliable hardware and software to fit in your budget to provide that. Not everyone can buy millions in hardware and software to run a true mission critical environment. -
Re:Support, Support, Support
I tend to agree with you but...
With the cost of commodity PCs these days you could probably have an entire second router on hot standby for the cost of a single year's support contract.
If it is a T-1 then just move the cable over. If it is an Ethernet connection the fall over could be entirely automatic http://linux-ha.org/
You will also have a trade off of in house time to test and configure vs just buying Cisco.
Of course their are times where generic hardware will not cut it. However this does offer some interesting options to a off the self router.
Dedicated hardware will always be faster but software offers a great deal of flexibility.
With cheap duel core 64 bit hardware just how fast can a software router be today? -
Re:Find the problem before trying to solve it
He's apparently not asking for compile farm...
Author did fail to say what the purpose was, but here are some good starts.
Apache cluster
MySQL cluster (should also refer to mysql.com resources)
Ultra monkey, heartbeat and the like can make cluster as well. -
Hah, no kidding
I was trying to setup a Linux-HA cluster.
After struggling to read their documentation at http://www.linux-ha.org/ (bad formatting, the use of Wiki made most sentences hard to read BecauseOfWikiWords and the fact that hyperlinks were nearly same color as the text, I finally decided that it would be easier to ask on their irc channel, #linux-ha @ freenode.
Oh, how wrong I was. Upon joining the channel, I noticed there were about 30 users idling. I politely asked my question, and waited for reply. No reply arrived, so I idled in the channel for a few days, occasionally repeating the question (not annoyingly and never automatically), hoping someone would notice. Finally, around noon european timezone, some people were awake and chatting in the channel about fixing a particular bug in the new v2 release of Linux-HA. I joined in and asked my question again. I was immediately sent to the "documentation" (the wiki trash I had trouble reading eariler), after that I politely let them know that documentation is not very readable and does not solve my problem either. At this point I was accused of "bitching about service provided for free" and "its a wiki, feel free to contribute and edit it".
I said, that this was a standard v1 configuration which worked, automatically converted to v2 configuration using the included conversion script. Everything was supposed to just work. Take a working v1 config, run the script, add 'crm=yes' to ha.cf, and v2 will work.. Except it didnt. Within a few minutes, the debug logfile grew to several gigabytes in size, filled with repeated failures of glib/heartbeat daemon, thousands of lines of meaningless XML snippets and other junk. I said that my disks were getting filled with errors, and at this point #linux-ha folks suggested I post the entire error log on the mailing list "because more people read the mailing list". I wasn't interested in waiting another week for a "RTFM" response from a mailing list, so I told them "why not help me now, or at least say you arent qualified to help, etc".
At this point, the active 3 or 4 users in the channel have decided I was a "nuisance" and though that their best course of action would be to place me on ignore. Why? Because I wanted to run THEIR software? Because they made a RELEASE version of their product, which was supposed to work out of the box, and it didn't, and I was complaining about it? After placing me on ignore, another opensores user in the channel thought it would be a good idea to complain about my "behaviour" (asking questions about Linux-HA in a SUPPORT channel FOR Linux-HA) to lilo, the freenode nazi.
Few minutes later I was klined with a message "please do not harrass channels/users on freenode".
What the fuck. All I did was ask for help configuring a piece of software in a support channel designed for this exact purpose.
So, fuck this, I trashed Linux-HA install and got my boss to get Veritas Cluster Server. -
Re:MySQL Cluster != master/slave
there's also no other way to do it without either spending ridiculous amounts of money or being dependent on a single piece of hardware.
That's not true. along with some other things you've mentioned from various marketing materials.
For starters, there's no such thing as "shared nothing" clustering. You have to have a shared resource to have a cluster.
Solutions that claim to be "shared nothing" actually share a network. Once you've gone that far, there's no reason not to do replicated network block devices, which some larger vendors do (One sibling comment mentions DB2). It can even be fast if you're clever about where you issue your acknowledgements.
There are third party cluster middleware products that make this process transparent to the database. Most of them focus on Oracle. You can also fool some shared storage cluster products into thinking you have some big, expensive, high-performance, SAN array by pointing them at a linux DRBD. Anyway, the point is that there are plenty of ways to do so-called shared nothing database clustering without any expensive hardware. -
For Linux ( as well as Solaris and *BSD)
There is Linux HA. This is High Availability Clustering software (via a heartbeat). This along with DRBD ( Disk Replicated Block Device ) you have a very robust cluster.
This uses an Active/Standby setup with a heartbeat between the systems. If the Active is no longer responding, within X seconds (10 by default ) the Standby takes over all the processes that were running on the other system. And ( if needed ) STONITH's (Shot the other node in the head) the other server to ensure that it really IS dead
.We've been running webservers and Oracle database servers here with 0 downtime using heartbeat and drbd.
-
Re:Uptime ++
It's interesting that people such as yourself that use lesser equipment are the ones that have experienced most multidrive failures.
I never said that I experienced a multi-drive failure. I am aware of how they come about, and I know of those who have experienced them. You are making assumptions that are not supported by fact.
HotMail and Google were cheaper why?
I also never said this. I said they used less expensive components. I made no statement as to their total cost. Again, you are making assumptions that are not supported by fact.
Two machines, by themselves, do not make a cluster. The machines within a cluster are ware of each other and interact with each other.
Again, you show your lack of research into the subject:
http://en.wikipedia.org/wiki/Computer_cluster#Load _balancing_clusters
http://www.linux-ha.org/FAQ#head-7f4d8eec3b4075a46 4bab8ccedfc1c970cb2cd29
Your statement implies a HPC cluster:
http://en.wikipedia.org/wiki/Computer_cluster#High -performance_.28HPC.29_clusters
But, your situation is probably not the same as everyone else and what works best for you may not suite everyone.
Yet, you spout that the only "real" way to achieve high reliability is using *your* recomendations, using *your* recomended hardware. I agreee that the solution to clustering is situationally dependent upon what the needs of availability are, the budget constraints, etc. I present a possible scrnerio, and corrections to your statements, not an answer for the submitter. You present your statement as "all this will cost big piles of cash", which I provided counter examples that it will not require a large budget. -
Some thoughts on this"Where could one find material on recommended strategies for increasing server availability? Anything related to equipment, configurations, software, or techniques would be appreciated."
Well, as you can see, you can get a bit of information her, obvoiusly.
Since there was no specific details, but a request for information sources, I would say this:
Many vendors of products will offer an assortment of solutions to high availability needs. Legato used to have cluster software for Windows servers (for availabilty, not load balanacing). But alas (as I just discovered) thats now gone with their EMC merger.
Microsoft actually makes an okay clustering solution.
Oracle has clustering ability in their database product and are considered by some as one of the better solitions for a truely high availabilty database.The Linux High Avilaibilty project is a good place to look around if you have time on your hands to impliment it. I've done it and it helps if you alread understand a lot of the concepts involde in HA solutions.
As you will find out though, is that you really have to determine the value a solution can provide, versus the potential loss of revenue a failure of any type can cause. whn you realize how much money you can loose, you can evaluate how much money you can spend. Thats the real key to any high availabilty solution.
Keep in mind there are also two type of clustering to think about (you'll discover it on your own in your research anyways):
- One is Load Balancing clusters like web farms. All ther servers in the cluster share the work load. One server drops, the others have to take up the slack.
- The other type is a High Availability cluster with active and passive nodes, where one computer does all the work while another sits idle waiting for the first to fail. When that happens it takes over the firsts work. A variation on this is an active/active cluster, where both machines do work but have to be ready to take on the others work load as well if the other fails.
If you think you are falling behind from the rest of the world, you are not. Right now I am going through this whole proces at work figuring out what it will take to get the management team to buy into high availability, and we have a customer base that really needs us to impliment it. It all comes down to the money game.
-
You need to start looking at server redundancy
Drives die. Fans die. Power supplies die. Motherboards die. Having a RAID array is not enough. There are plenty of other things in a single system that can go wrong that can take the system down for a period of time. The biggest issue I have with the limited description here is the fact you are talking about one system. If you want availability, you need to be looking at scaling by the machine.
Now you can just have other machines waiting to take the load with a quick reconfig, or you can start doing things automatically (people have mentioned using things like Nagios for monitoring, but monitoring doesn't give you uptime . .it gives you response to minimize downtime . . .)
If you want to look at solutions that don't cost, check out LVS. It'll allow you to balance your ports across multiple systems (you can even balance win32 and linux systems if you for some reason wanted to) with a couple different methods (I prefer the DR method myself). The setup isn't that bad with several of the recent kernels in the major distro's including all the ipvs turned on by default, so you may not even have to recompile a kernel on your balancer systems.
Now, of course you can't depend on a single balancer any more than you can depend on a single web server; there is support using the HA linux stuff to allow you to have backup LVS systems to take over as a balancer if your primary balancer bites it (heartbeat and ldirectord is your friend here).
With a pair of fairly low end systems and some monkey work at the keys, you can have a system that will balance your tcp traffic (or setup an automagic failover from a system) that can be as good as some of the commercial balancing products out there.
I currently use LVS with heartbeat/ldirector to balance the following:
Win32 Apache Servers
Linux Apache Servers
Win2k3 IIS Servers (the LVS system balanced better than the built in WLBS from MS . . .and there was a lot less broadcasts
Postfix
Amavisd-new
And as others have mentioned about setting up some good monitoring (ala Nagios if you want), we monitor the virtual services on the LVS systems in addition to the real servers' services so that we can know if we are still delivery service externally even though real server B is down...
When you get bigger, then you should even start looking at having datacenter redundancy . . . deploying the meteor net never seems to be the right answer to the 'Force Majeur' question . . . -
Re:OpenBSD, of course!
"pf supports redundant parallel firewalls with automatic failover via CARP. This is a rare feature unless you're willing to go buy a Pix."
Linux-HA fails firewalls just fine.
"pf supports routing of traffic based upon OS fingerprinting."
It's a module in iptables called "osf", but I don't recommend it. Anything that relies on information (even passively gathered information) provided by the remote host is fundamentally unreliable. Worse, by filtering based on OS you open yourself up to all sorts of confusing problems when proxies (transparent or otherwise) are involved.
"When compared to setting up an IPtables firewall, pf is surprising simple and it's howto at openbsd.org cannot be beat."
Howto?! Ew. I know how to configure a firewall, but if I'm going to point newbies at a firewall solution, it's going to be one that's configured out of the box. I'd recommend Fedora's default install for on-server firewalling, and any of the CD-based firewall-specific distributions for centralized firewalling.
Still, I've set up many an iptables firewall, and unless you're doing something REALLY hairy, there's nothing all that complex about it. One config. One command to load the config. Next problem.
I've been a bit hard on you here, and honestly I have no interest in "my OS is bigger than your OS" debates. My point was simply to demonstrate that you're showing off the features of a system you know, and ignoring the fact that a system you don't know might have those features too. What's more, that other system might have other feautres that you would find just as useful or moreso once you got used to them. -
Re:A cheap linux firewall
You could easily use heartbeat for this:
http://www.linux-ha.org/
This would work with any number of machines, with the virtual ip taking over if any loss occurs.
I've used heartbeat numerous times with redundant servers, works like a charm. -
Re:PC's are not for networking
Toss in a 2nd PC, use Mon, and use Linux HA and you've got yourself a high availability cluster that can route almost TCP IP or UDP traffic for far less cost than a "real" load balancer not to mention can do far MORE than a load balancer. We are using LVS and HA at my 9to5 job for load balancing our new Webservers. ITs super reliable. We have it such that within 10 seconds of a web server not being available, its removed from the LB. Once its back up and running, it adds the server back in automatically.
-
Shared disks etc?
No? You don't even need them physically connected these days, SCSI over IP can do it.
LVS isn't really an ideal system, the load balancer is bound to be the box that dies.
For a clustering project :
http://www.linux-ha.org/ -
full redundancy (almost) always works
From TFA: One approach is simply to make an individual system the unit of recovery; if anything fails, either restart the whole thing or fail-over to another system providing redundancy. Unfortunately, with the increasing physical resources available to each system, this approach is inherently wasteful: Why restart a whole system if you can disable a particular processor core, restart an individual application, or refrain from using a bit of your spacious memory or a particular I/O path until a repair is truly needed?
Because using stuff like stonith or heartbeat works for many more types of failures. Bad network cable? Yup. Power supply? yup. Server Catch on Fire? Yup.
I'm not saying it wouldn't be nice to have the OS route around bad memory blocks or bad processor cores due to some fancy-pants algorithm (without having to rewrite my app). But you're still going to need a redundant server for when somebody trips over the power cord. -
Very nice
Installing and administering the various open source tools can be tedious work, especially without documentation of how to put things together.
A quick Google search though reveals a lot of free papers and manuals on this very topic. -
Re:Why?Oh no... You asked a FAQ, one that is addressed in dozens of places on every OpenBSD site, and you got sligtly rude answers.
Well.. I DID RTFM at the time. I saw nothing either saying "We don't do ISO images" or "We do ISO images". That was added after my initial inquiries. Had it been there when I asked, I wouldn't have said anything.
I'm sure they won't miss you... You'd just have flooded the lists and news groups with even more dumb questions, flames, etc.
Flooded the lists with crap ? I don't think so. I avoid posting until I have googled for a problem/solution and I don't find one. And, I have been a software engineer ( Yes. I have a BS in Computer Science ) for 17 years and I am still developing software and doing system administration along with tech support so I really hate asking questions that are in FAQ's
.Also, I am an active developer on the Linux HA ( High Availablility) for a fairly long time. I'm their non-linux release engineer. So from that list, I'm quite familiar with people who don't RTFM or read the FAQ's. Its annoying but you deal with it. You should avoid copping attitudes with people who may honestly made a mistake or misread something.
-
Re:yes, that's actually the basic idea
What you've said is true only for the case of web sites with static or almost static content, where you could have the content in the local drives of each webservers, and use rsync to distribute new content (web site changes) to all the servers.
But it's a very different situation when your webservers handle very dinamic content, specially when the content is upload by the users. In this case, you have three alternatives:
1) Content in the database. Is up to you to use a clustered database to provide High Availability and Load Balancing
2) Content in a NAS (NFS, etc.). You have the same content for all the webservers, and with drbd you achieve High Availability... but you don't have Load Balancing.
3) You use GFS or other distributed File System (don't know the issues on this option).
btw, for load balancing at the IP level I would recommend Linux Virtual Server, and Heartbeat to achieve High Availability in the balancers. -
Re:Sun will Shine at the Big Blue
Have you any clue as to how many years more advanced than Linux Solaris is at the high end?
I'd like to hear the specifics. Linux is making great strides on support of truly large machines; will Solaris have any edge on Linux in, say, two years?
Compare Linux 2.6 with the original Linux 0.01, and consider how far it came. Now consider how much farther 2.8 would need to come to match or surpass Solaris; not nearly as far. And Linux has huge momentum now.
So, let's see: Linux is already the best choice for low-end servers (workgroup print server, etc.), for small servers (single or dual processor PC hardware), for parallel clusters (imagine a Beowulf...), and for mainframes (big iron from IBM).
You could argue that Solaris is better in some way for small servers, but Linux is already stable and reliable enough that any benefits of Solaris here would be wiped out by the vendor lockin and much greater costs.
Parallel clusters don't just include a Beowulf, they also include having several computers act like one very reliable computer. See the Linux-HA project.
If you were starting with a blank sheet of paper (not already a Sun shop) and you wanted servers to run a business, I think Linux is already to the point where it would be a better choice than Solaris.
Sun has three things going for it: it makes very good hardware, it offers very good support, and Solaris currently rules on computers with many CPUs. But other companies make very good hardware that runs Linux (and you can even cluster cheap hardware with Linux-HA), you can get good support for Linux (e.g. from IBM), and Linux will catch up on many-CPU hardware.
steveha -
If you're using Linux
you may benefit from a combination of heartbeat and DRBD, which respectively provide IP address/service failover and a network (no special hardware required) data replication solution.
If you have appropriate hardware you might also appreciate Stonith, which provides forced-shutdown of a failed node (in the case that the failed node won't release the IP address, and hence you would otherwise have problems switching service).
If you're in the UK then give me a shout and I'll set it up for you (for a reasonable fee)! My contact details are available on my web site. -
If you're using Linux
you may benefit from a combination of heartbeat and DRBD, which respectively provide IP address/service failover and a network (no special hardware required) data replication solution.
If you have appropriate hardware you might also appreciate Stonith, which provides forced-shutdown of a failed node (in the case that the failed node won't release the IP address, and hence you would otherwise have problems switching service).
If you're in the UK then give me a shout and I'll set it up for you (for a reasonable fee)! My contact details are available on my web site. -
Re:i'm starting to agree
well, if it makes you feel any better, we just made a purchasing decision against cisco in favor of two simple linux boxes running a combination of shorewall and heartbeat. The cost savings versus the cheapest cisco firewall that does failover was worth the effort of installing the open source software. I also highly recommend m0n0wall for a SOHO cisco replacement. I'd chose m0n0wall over a cheaper watchguard or sonicwall box any day.
-
Re:1000+ Users????
Do the math. If your homebrew system goes down, you will be burning the time of 1000+ people ($60,000) per hour. With those kind of numbers it doesn't pay to do it on the cheap. Get a redundant Cisco system with plenty of power backup.
Or you could use something like this to provide redundant Linux routers on cheap commodity hardware and spend the money saved on getting more backup power. -
Don't forget redundancy!
A few years ago, I needed to put together a fairly heavy duty samba/NFS server setup. The important issues were reliability, and cheap (in that order.) I went with a couple of P-4 (after watching the TomsHardware video of the athlon going poof), three 40 GB WD drives (per system) and a total of 3 NIC's per system (we have redudant LANS). I created a 3Gb / on each disk, a swap partition on each disk and a RAID 5 (software) 75 GB MD. I also used heartbeat so that the two systems would be "Highly Available" see linux-ha.org for more HA stuff.
Then for the backup up stuff, I just rsync the primary box to the secondary twice a day. Worked great . . . until I lost TWO disks on one of the boxes. I was able to run on the secondary until I replaced my disks AND backing up 55 GB of files only took 2.5 hours. NOTE: I had 455 days of uptime on the primary when it crashed because of the bad disk. It DOES make sense to occasionally verify your box CAN reboot. -
As someone who's made the transition...
I migrated from a Cobalt RaQ setup after many many frustrating moments with the whole net appliance idea in general.
I also needed the migration to be as smooth as possible, including all user auth, mail boxes/folders, lists and aliases.
I decided to go with MySQL based authentication on Postfix, Courier-IMAP, Apache, and ProFTPd, all running on Debian. I Wrote a little web front end using PHP for user administration, and voila, we now have a much more flexible system. All MySQL auth patches and plugins are available in Debian's apt archive.
Check out how-tos on the subject here, here, and here.
It took a little effort to get all of this working, but a little effort went a long way. I was basically able to duplicate the RaQ's functionality on a Debian system that I had full control over as far as software updates, kernel and hardware.
To top it all off, I replicated the config and used Heartbeat to make this into a high availability pair. -
Re:three types of clusters
2) shared disk between two computers: in this case, there are multiple machines and multiple disks. each disk is atleast connected to two computers. if one of the computer fails, other takes over. no mainstream database uses this mode
Well, what about Windows Cluster Service and Oracle Failsafe? The company I work for use it at a couple of airports and it works pretty good. The downside is that the failover is not transparent (all clients connected gets disconnected and have to reconnect).
An open-source solution that works like this should be quite simple to set up with the help of the Linux High-availability project. -
Problems portingNot always. Sometimes porting is tough. Right now, I'm the "Non-Linux" release engineeer for Linux HA ( High Availablity Clustering) and I've tested it on FreeBSD 4.7 (going to to upgrade one box to 4.8 and another to 5.1 ) . The only problem is that the tool chain requires versions that are NOT the standard ported versions (Automake and autoconf if my memory serves me right).
I want to get things working right so that I can release a Port version of Heartbeat but currently I cannot. Luckily it, by design, builds on FreeBSD and puts things into
/usr/local/.../ and not /usr/... like on Linux.This may be a factor why things aren't quite right (different versions of Automake/Conf/lib) .
-
Advice from the HA-Linux list
Alan Robertson, who maintains the heartbeat package and works for IBM, recently posted to the ha-linux list on this subject.
Alan does not accept patches to the heartbeat code that were developed on company time unless he receives a disclaimer from somebody at the company.
This is obviously spoofable, but it's probably a good way to legally protect the code -- Alan can honestly say he received it in good faith, which keeps IBM's lawyers' from breathing down his neck. It's kind of weird for me, though, I have to send a disclaimer giving myself permission to send in a patch....
So, to answer your question: explain to your CEO why helping the OSS community helps you to help your company, and get her/him to sign off on a policy that allows you to do so. Ask for legal authority to be delegated to yourself (or your boss) to license or assign corporate intellectual property to open-source projects. Then have HR propagate the policy to your co-workers. -
Advice from the HA-Linux list
Alan Robertson, who maintains the heartbeat package and works for IBM, recently posted to the ha-linux list on this subject.
Alan does not accept patches to the heartbeat code that were developed on company time unless he receives a disclaimer from somebody at the company.
This is obviously spoofable, but it's probably a good way to legally protect the code -- Alan can honestly say he received it in good faith, which keeps IBM's lawyers' from breathing down his neck. It's kind of weird for me, though, I have to send a disclaimer giving myself permission to send in a patch....
So, to answer your question: explain to your CEO why helping the OSS community helps you to help your company, and get her/him to sign off on a policy that allows you to do so. Ask for legal authority to be delegated to yourself (or your boss) to license or assign corporate intellectual property to open-source projects. Then have HR propagate the policy to your co-workers. -
Re:This will be nice
...and it's VERY difficult (if not impossible) to supply the same level of redundancy and failover with commodity PC gear.
I'm not so sure about that. Lots of solutions for transparent fault-tolerance have come out for Linux in the last few years; see the Linux-HA project for more info.
And presuming one *does* have a good failover solution in place, the commodity-PC route becomes that much more desirable, as there's no longer a hard (and low) upper bound on the reliability of the system as a whole.
FWIW, I'm {system,network} admin for a tiny little startup with precious little cash to speak of. We absolutely can't afford to spend more on hardware than we must, particularly if we can throw more man-hours (of people who're mostly working for stock) at the problem. Perhaps my viewpoint would be different if I were working somewhere with more cash than man-hours to spend. -
why this is interesting? think high availability
I'm going to take up the challenge here of explaining why this is interesting. Since November of 2002, OpenBSD's pf has had support for load balancing. RedHat's $2499 Premium Edition of their Enterprise distro features Piranha load balancing which was derived from the Linux High Availability project.
So what the OpenBSD pf project is giving you is enterprise-class high availability and load-balance clustering for a tiny fraction of the price. With a handful of cheap dotcom-throw-away x86 servers, a small company or mildly well-capitalized individual can personally build a multi-datacenter-fault-tolerant clustering setup that will rival Fortune 500 uptime ratings.
In other words, the pf project's list of accomplishments is starting to read like a ToDo list for RedHat's Enterprise Linux development team.
-
Yup, it's a GNUish Unixism....
GNU software installation procedures are the least user-friendly of all those I've used. They generally go like this:
Download software.
Search for documentation - find incomplete and poorly written docs that assume too much.
./configure
research and correct 15 badly documented error conditions.
./configure
identify 3 totally undocumented errors.
join project mailing list and post question.
be roundly flamed and referred to FAQ
post references showing errors are not documented in FAQ
be roundly flamed and referred to list archives
search list archives for several days
post again asking for specific references to archives
Acerbic but kindly guru finds comment written in swahili that is only in the CVS version you can't access, and translates it for you in a private Email.
remove a single character from the configure script
./configure
edit the makefile to correct unwarranted assumptions about file locations, system capabilities, network architecture, etc.
make
correct typos introduced by prior editing (D'OH!)
make
research and correct 7 errors caused by missing libaries (these libraries are normally required only by Welsh Morris dancers, but for some reason your GNU software won't compile without them).
make
research and attempt to correct 3 errors caused by having a different version of gcc than the software authors.
make
give up on correcting the errors and go download the precise version of gcc used by the developers.
make
cheer like nobody's watching, which they aren't because it is five O'clock in the morning.
make install
Congratulations! You have sucessfully built your GNU software. This amazingly powerful software will now run incredibly smoothly and accurately for unbelievable lengths of time. (Unless it's a 2.4 linux kernel, in which case it'll be obsolete by Monday when the latest remote root exploit comes out, or whenever Linus decides to replace a major subsystem wholesale in the middle of a "stable" kernel series.)
After a few years of living comfortably with your smoothly running, reliable, low maintenance GNU software, you'll break even on the pain and suffering quotient.
I recently configured heartbeat and I've done most of the uber-GNU utilities that don't deign to have man pages (info is so much better, the only way it could be more user friendly is if it required all input in Common Lisp) so it's just barely possible I might have some idea what I'm talking about. On the other claw, I may be stark raving bonkers from too many ./configure;make;make install recursions. -
Re:Hot swappable CPU's and memory
-
Linux Virtual Server is great
LVS was able to handle a medium-sized HTTP/HTTPS load at my last job quite well. It had 6 months of uptime serving 5-10 hits/second, and I literally never had to worry about it going down. In combination with mon, bringing machines up and down was never a problem, and failure situations were handled without the end user noticing.
Installation was a bit frustrating because I hadn't dealt with the networking issues before (the ARP problem). However, in the end it was only my lack of networking knowledge that was lacking, and the ARP problem turned out to be simple to overcome.
Support from the mailing list was great, I got thorough replies to my questions in a few hours. The documentation is good, although some parts of the HOWTO could be trimmed back a bit (more information than is needed to understand the problem, takes a bit of time to filter).
The hardware was two slower UP boxes (one live, one for failover), and the load was esstentially 0, even with mon and MRTG running.
LVS is of course just the load balancer, and the setup also included mon for monitoring, heartbeat for failover, and MRTG for trending. They all play well together, and create a very reliable, informative, load balancer setup.
Depending on your setup, one of the meta-packages such as Ultra Money or Redhat's HA suite might be best, but installing the components individually isn't much of a hassle either. -
Re:Patent issues?
I too am interested in any recent knowledge about this. IIRC Cisco went after Alcatel over their use of VRRP.
It's a shame, because the protocol isn't that bad. (though, certain implementations and their tendency to conduct VRRP wars may be ;)
The whole thing made me look for an alternative. I ended up investigating the Linux-HA project. They didn't really have support for failing over when the box became unreachable from the network (this is a desired behavior with certain shared storage apps and such) so I concocted a plugin called ipfail. All of that has since been included in the recent releases of heartbeat. It's sort of a second-best solution as I think VRRP is really the answer here, but hopefully others will be able to benefit from it. -
Linux-HA
www.linux-ha.org
Lots of information on using shared storage with a bias toward setting up highly available clusters. -
DRDB network raid system anyone?
I'm building a heartbeat cluster to serve WebGUI pages and files via samba.
This going to be presented at a congress for the Netherlands Network User Group November 13th (a mostly Novell and Microsoft NT association).
I have been looking for a solution to mirror files between the two cluster nodes. SCSI is just too expensive for this, since low cost is one of the requirements. I've been trying to compile DRDB on my gentoo 1.3 systems but the 2.4 kernel isn't supported by the default DRDB distibution yet.
Does anyone know about any other projects like these that actually work?
-
Re: Businessoriented employes(was:Lets not forget)
Pardon the tangential subject as we wander from over-managment to bad business models. Really, no amount of good management can fix a broken business model. Good management might rewrite a broken business plan and fix an ailing company, but so might bad management rewrite a perfectly function business plan. But back to the point.
We are going to spend thousands of man hours (=gigantic cost) and then give our products away for free. Dot-coms and open source development companies are examples of this. FREE DO NOT PAY THE BILLS!
Open source development frequently comes down to an issue of profit through service rather than the product itself. In the case of one kinda big company, they're spending some large money developing and integrating open source solutions to phase out some of their products. Sometimes it works better providing services rather than constantly maintaining one's own proprietary software, or at least it may become easier to maintain when your customers sometimes volunteer improvements.
The same give-away-the-product, sell-the-support system works for some smaller companies who sell to home users. Good tech support is certainly worth plenty, especially when even mature software can sometimes be confusing.
-
Re:i have....
But 'sticky' isn't 'zero affinity', is it? So what you really want it what the original poster suggested, a SSL-speaking proxy (eg Squid in SSL accellerator mode) that terminates the SSL session and forwards the request inside it to a cluster of non-SSL webservers (using RRDNS perhaps, or LVS if you want a 'smarter' solution). The downside there is your squid proxy is doing a lot of work, so you probably want to have a backup one and use something like heartbeat to fail-over to it if there's a problem with the first one.
-
A great site
Check out the High-Availability Linux Project. There's a lot of info in the site and links to much more.
-
same question, with links
Yikes! Links make it a lot easier for people to figure out what's going on!
"A year ago, there seemed to be two promising Linux HA [high availability] frameworks--along with lots and lots of experimental things: SGI's FailSafe, and Kimberlite from Mission Critical Linux. The FailSafe software website now seems very out of date, although the mailing list remains active, and there seems to be forward momentum. On the other hand, Redhat seems to have forked the development of Kimberlite, calling the fork Redhat Cluster Manager. They don't seem to be making development source available, at least to the public. Are these two projects still relevant? What's the current status of Open Source HA?"
Try also linux-ha.org and open cluster -
Heartbeat
I'm working with heartbeat from the Linux-HA project and it is very much alive and well, as is the linux-ha mailing list.
-
You aren't looking hard enough