Slashdot Mirror


Best Solution For HA and Network Load Balancing?

supaneko writes "I am working with a non-profit that will eventually host a massive online self-help archive and community (using FTP and HTTP services). We are expecting 1,000+ unique visitors / day. I know that having only one server to serve this number of people is not a great idea, so I began to look into clusters. After a bit of reading I determined that I am looking for high availability, in case of hardware fault, and network load balancing, which will allow the load to be shared among the two to six servers that we hope to purchase. What I have not been able to determine is the 'perfect' solution that would offer efficiency, ease-of-use, simple maintenance, enjoyable performance, and a notably better experience when compared to other setups. Reading about Windows 2003 Clustering makes the whole process sounds easy, while Linux and FreeBSD just seem overly complicated. But is this truly the case? What have you all done for clustering solutions that worked out well? What key features should I be aware for successful cluster setup (hubs, wiring, hardware, software, same servers across the board, etc.)?"

22 of 298 comments (clear)

  1. 1000+ a day isn't very much by onion2k · · Score: 5, Insightful

    1000+ unique visitors is nothing. Even if they all hit the site at lunchtime (1 hour window), and look at 30 pages each (very high estimate for a normal site) that's only 8 requests a second. That isn't a lot. A single server could cope easily, especially if it's mostly static content. As an example, a forum I run gets a sustained 1000+ users an hour and runs fine on one server.

    As for "high availability", that depends on your definition of "high". If the site being down for a morning is a big problem then you'll need a redundant failover server. If it being down for 15 minutes is a problem then you'll need a couple of them. You won't need a load balancer for that because the redundant servers will be sitting there doing nothing most of the time (hopefully). You'll need something that detects the primary server is offline and switches to the backup automatically. You might also want to have a separate database server that mirrors the primary DB if you're storing a lot of user content, plus a backup for it (though the backup DB server could always be the same physical machine as one of the backup webservers).

    Whoever told you that you'll need as many as 6 servers is just plain wrong. That would be a waste of money. Either that or you're just seeing this as an opportunity to buy lots of servers to play with, in which case buy whatever your budget will allow! :)

    1. Re:1000+ a day isn't very much by drsmithy · · Score: 4, Informative

      You'll need something that detects the primary server is offline and switches to the backup automatically. You might also want to have a separate database server that mirrors the primary DB if you're storing a lot of user content, plus a backup for it (though the backup DB server could always be the same physical machine as one of the backup webservers).

      On this note, if you're comfortable (and your application is compatible) with Linux+Apache, then heartbeat and DRBD will do this and are relatively simple to get up and running. Just avoid trying to use the heartbeat v2-style config (for simplicity), make sure both the database and apache are controlled by heartbeat, and don't forget to put your DB on the DRBD-replicated disk (vastly simpler than trying to deal with DB-level replication, and more than adequate for such a low load).

      Oh, and don't forget to keep regular backups of your DB somewhere else other than those two machines.

    2. Re:1000+ a day isn't very much by Mad+Merlin · · Score: 5, Informative

      I agree that 1000 unique visitors is peanuts, but as for how to do HA, it really depends a lot on your situation. For example, the primary server for Game! started acting up about 2 weeks ago, but it mattered little as I was able to flip over to the backup server and came out with barely any downtime and no data loss. In the mean time, I was able to diagnose and fix the primary server, then point the traffic back at it. In my case, all the dynamic data is in MySQL, which is replicated to the backup server, so when I switched over I simply swapped the slave and the master and redirected traffic at the backup server. You also have to consider the code, which you presumably make semi-frequent updates to. In my case, the code is stored in SVN and updated automagically on both the master and the slave simultaneously.

      Having said all that, there's more to consider than just your own hardware when it comes to HA. What happens if your network connection goes down? In most cases, there's nothing you can do about it except twiddle your thumbs while you wait on hold with customer service. Redundant Internet connections are expensive due to the fact that you basically need to be in a big (and expensive) colocation facility to get it.

      Also, how easy it is to have HA depends largely on how important writes are to your database (or filesystem). Does it matter if this comment doesn't make it to the live page for a couple seconds after I hit submit? No, not really. Does it matter if I change my equipment in Game! but don't see the changes immediately? Yes, definitely. Indeed, if your content is 100% static, you can just keep a dozen complete copies and put a load balancer in front that pulls dead machines out of the loop automagically and be done with it.

    3. Re:1000+ a day isn't very much by Xest · · Score: 5, Informative

      I was thinking along the same lines.

      But to the person asking the question, if you want a full answer then you need to get your site built and make use of stress testing tools such as JMeter for Apache or Microsoft's WAS tool for IIS.

      It's not something anyone here can give you a definite answer for without knowing how well your site is implemented and what it actually does.

      Look into Transaction Cost Analysis, that's ultimately what you need here, a good start is this article:

      http://technet.microsoft.com/en-us/commerceserver/bb608757.aspx

      or this one:

      http://msdn.microsoft.com/en-us/library/cc261632.aspx

      Don't worry that these are MS articles on MS technologies they both still cover the ideas that are applicable elsewhere.

      Even though no one here can give you a full answer for the above mentioned reasons, we can at least give you our best guesses and this is where I think the parent poster is spot on, 6 servers is absolute overkill for this kind of load requirements and indeed, unless your application does some pretty intensive processing I see little reason why a single server couldn't do the trick or at least a web/application server and a database server at most.

      For ensuring high availability you may indeed need more servers of course and as you mention a requirement for FTP is bandwidth likely to be an issue?

      The fact you're only expecting 1000 a day suggest you're not running the biggest of operations and although it's nice to do these things in house it may just be worth you using a hosting provider with an acceptable SLA, at the end of the day they have more experience, more hardware, more bandwidth and can probably even do things a fair bit cheaper than you can. Do you have a generator to allow continued provision of the service should your power fail for an extended period for example? If you receive an unexpected spike in traffic or a DDOS do you have the facility to cope with and resolve that like a big hosting company could?

      There are many things I wouldn't ever use an external hosting provider for, but this doesn't sound like one of them.

    4. Re:1000+ a day isn't very much by Bandman · · Score: 5, Interesting

      HA isn't there just for load issues. It's there to guarantee availability. 1,000 users might be peanuts, but I've got a site that only gets a couple hundred visitors a day. That site has clustered load balancers which talk to redundant app servers, which talk to redundant web servers (connected via redundant switches). It's really important that the site be there for those couple of hundred visitors.

      The number of visitors isn't as important as the importance of the visitors.

  2. Is It Mission Critical? by s7uar7 · · Score: 4, Insightful

    If the site goes down do you lose truck loads of money or does anyone die? Load balancing and HA sounds a little overboard for a site with a thousand visitors a day. A hundred thousand and you can probably justify the expense. I would probably just be looking at a hosted dedicated server somewhere for now.

    1. Re:Is It Mission Critical? by Errtu76 · · Score: 5, Interesting

      It's not overboard. And even with a hosting provider you're still dependent on hardware problems. What you can do to realise what you want is:

      - buy 2 cheap servers with lots of RAM
      - set them up as XEN platforms
      - create 2 virtuals for the loadbalancers
      - setup LVS (heartbeat + ldirectord) on each virtual
      - create 4 webserver virtuals, 2 on each xen host
      - configure your loadbalancers to distribute load over all webserver virtuals

      And you're done. Oh, make sure to disable tcp_checksum_offloading on your webservers, else LVS won't work that well (read: not at all).

    2. Re:Is It Mission Critical? by drsmithy · · Score: 5, Informative

      And you're done. Oh, make sure to disable tcp_checksum_offloading on your webservers, else LVS won't work that well (read: not at all).

      Just a heads-up for those who (like me) read this and thought: "WTF ? LVS works fine with TOE", it is a problem specific to running LVS in Xen VMs where the directors and realservers share the same Xen host. Link.

    3. Re:Is It Mission Critical? by alta · · Score: 4, Informative

      If I had mod points, I'd give. This is the same thing we did, just different software.
      -get 2 ISP, I suggest different transports. We have one as fiber, the other is a T1. There's no point in getting 2 T1 from different companies if a bulldozer cuts them together.
      -Two dell 1950's
      -Set each up with vmware server
      -created 2 databases, replicating to each other
      -Created 2 web servers, each pointing at database on same machine
      -installed to copies of Hercules load balancer, vrrp + pen
      -set up failover DNS with 5 minute expiration.

      Now, you may say, why the load balancers if you're load balancing with DNS? Because if I have a hardware/power failure that's one instance where the 5 minutes for DNS to expire will not incure downtime for my customers. It also gives me the ability to take servers offline one at a time for maintenance/upgrades, again with no dowtime.

      I have a pretty redundant setup here and the only thing I've paid for is the software.

      Future plans are to move everything to Xenserver.

      --
      Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
  3. budget? by timmarhy · · Score: 5, Insightful
    you can go as crazy as you like with this kind of stuff, but given your a non profit i'm guessing money is the greatest factor here. my reccomendation would be to purchase managed hosting and NOT try running it yourself. folks with a well established data centre that do this stuff all day long will do it much better,quicker,cheaper than you will be able to.

    there is also more of them than you can poke a stick at and prices are very reasonable. places like rackspace for this kind of thing for $100/mo.

    the other advantage is you don't need to pony up for the hardware.

    --
    If you mod me down, I will become more powerful than you can imagine....
  4. Plan or Implementation? by Manip · · Score: 5, Insightful

    Why are you purchasing six or so servers before you even have one online?

    You say that you expect "1,000+ a day" visitors which frankly is nothing. A single home PC with Apache would handle that.

    This entire posts strikes me as either bad planning or no planning. You're flirting with vague "out of thin air" projections that are likely impossible to make at this stage.

    Have a plan in place for how you will scale your service *if* it becomes popular or as it becomes popular but don't go wasting the charities money just in case your load jumps from 0 to 30,000+ in 24 hours.

  5. 1000+ a day is trivial have you thought of amazon? by MosesJones · · Score: 5, Insightful

    Lets get more blunt. Depending on what you are doing and if you want to worry about failover then 1000 a day is bugger all. Simple set up of Apache and Tomcat (if using Java) with running round-robin load-balancing will give you pretty much what you need.

    If however you really are worried about scale up and scale down then have a look at Amazon Web Services as that will probably more cost effective to cope with a peak IF it occurs rather than buying 6 servers to do bugger all most of the time.

    2 boxes for hardware failover will do you fine, if you are worried about HA the its the COST of downtime that you are worried about (i.e. down for an hour exceeds $1000 in lost revenue) which will justify the solution. Don't just drive availability to five nines because you feel its cool, do it because the business requires it.

    --
    An Eye for an Eye will make the whole world blind - Gandhi
  6. Re:You will be OK by Anonymous Coward · · Score: 4, Insightful

    16GB? Are you mad? Anything beyond 1GB should be enough to handle 1000 unique visitors per day. If you want to virtualize the system and have a separate web- and database server, 2GB should be enough already, if you ant to go further and have a separate virtual mail server in there, 2GB is still sufficient and 3GB is plenty.

  7. HaProxy by Nicolas+MONNET · · Score: 4, Informative

    Haproxy is better than Pound, IMO. It's lightweight, but handles immense load just as well as layer 3 load balancing (LVS), with the advantages of layer 5 proxying. It uses the latest Linux APIs (epoll, vmsplice) to reduce context switching and copying to a minimum. It has a nice, concise stats module. Its logs are terse yet complete. It redirects traffic to a working server if one is down / overloaded.

  8. we run a nonprofit with 100m+ visitors a day by midom · · Score: 5, Interesting

    Hi! we run a non-profit website that gets 100 million visitors a day on ~350 servers. we don't even use any "clustering" technology, just replication for databases, and software (LVS) load balancer in front of both app (PHP) and squids at the edge. but oh well, you can always waste money on expensive hardware and clustering technology. and you can always check how we build things

  9. STOP. You have no idea what you're doing. by Enleth · · Score: 4, Interesting

    I'm sorry, but I have to say that. Don't be offended, please - sooner or later you will look at your submission and laugh really hard, but for now you need to realise that you said something very, very silly. A few people already politely pointed out that 1000 visitors a day is nothing - but seriously, it's such a great magnitude of nothingness that, if you make such a gross misintepretation of your expected traffic, you need to reconsider if you really are the right person for the job *right now* and maybe gain some more experience before trying to spend other people's money on a ton of hardware that will just sit there, idle and consume huge amounts electricity (also paid by other people's money).

    I'm serving a 6k/day website (scripting, database, some custom daemons etc.) from a Celeron 1.5GHz with 1GB RAM, and it's still doing almost nothing. If you really have to have some load balancing, get two of those for $100 each.

    --
    This is Slashdot. Common sense is futile. You will be modded down.
  10. Pointless by ledow · · Score: 5, Informative

    1000 users a day? So what? That's less than one user a minute. Even if you assume they stay on the website for 20 or so minutes each, you're never looking at more than about 20 users at a time browsing content (there will be peaks and troughs, obviously). Now picture a computer that can only send out, say, 20 x 20 pages a minute (assuming you're visitors can visit a full page every 3 seconds) - we're talking "out of the Ark". Unless they are downloading about half a gig of video each, this is hardly a problem for a modern machine.

    I do the technical side for a large website which sees nearly ten times that (as far as you can trust web stats) and it runs off an ordinary shared host in an ordinary mom-n-pop webhosting facility and doesn't cost anywhere near the Earth to run. We often ask for more disk space, we've never had to ask for more bandwidth, or more CPU, or got told off for killing their systems. Admittedly, we don't do a lot of dynamic or flashy content but this is an ordinary shared server which we pay for out of our own pockets (and it costs less than our ISP subscriptions for the year, and the Google ad's make more than enough to cover that even at 0.3% clickthrough). We don't have any other servers helping us keep that site online (we have cold-spares at other hosting facilities should something go wrong, but that's because we're highly pedantic, not because we need them or that our users would miss us) - one shared server does the PHP, MySQL, serves dozens of Gigabytes per month of content for the entire site, generates the statistics etc. and doesn't even take a hit. I could probably serve that website off my old Linux router over ADSL and I doubt many people would notice except at peak times because of the bandwidth.

    Define "massive" too... this site I'm talking about does multiple dozens of Gigabytes of data transfer every month, and contains about 10Gb of data on the disk (our backup is now *three* DVD-R's... :-) ). That's *tiny* in terms of a lot of websites, but equally puts 99% of the websites out there to shame.

    Clustering is for when you have more than two or three servers already and primitive load-balancing (i.e. databases on one machine, video/images on another, or even just encoding half the URL's with "server2.domain.com" etc.) can't cope. In your case, I'd just have a hot-spare at a host somewhere, if I thought I needed it, with the data rsync'd every half-hour or so. For such a tiny thing, I probably wouldn't worry about the "switchover" between systems (because it would be rare and the users probably don't give a damn) and would just use DNS updates if it came to it. If I was being *really* pedantic, I might colo a server or two in a rack somewhere with the capability for one to steal the other's IP address if necessary, or have DNS with two A records, but I'd have to have a damn good reason for spending that amount of money regularly. If I was hosting in-house and the bandwidth was "free", I'd do the same.

    Seriously - this isn't cluster territory, unless you see those servers struggling heavily on their load. And if I saw that, I'd be more inclined to think the computers were just crap, the website was unnecessarily dynamic, or I had dozens-of-Gigabytes databases and tens or hundreds of thousands of daily visitors.

    You're in "basic hosting" territory. I doubt you'd hit 1Gb/month traffic unless the data you're serving is large.

  11. As already stated : HAProxy by amaura · · Score: 5, Informative

    If you're looking for a lightwheight open source loadbalancer with a lot of features, go for HAProxy. In my company we work with F5 Big IPs, Alteon, Cisco CSS which are the leading load balancers from the industry, they are really expensive and depending on the licence you buy, you won't have all the features (HTTP level load balancing, cookie insertion/rewriting). We first used HAProxy for POC and now we're installing it in production environnements, works like a charm on a linux box (debian and RHEL5) with around 600 users.

  12. One more thing. by OneSmartFellow · · Score: 4, Insightful

    There is no way to be fully redundant unless you have independent power sources, which usually requires your backup systems to be geographically separated. In my experience, loss of power is the single most common reason for a system failure in a well designed system (after human error that is).

  13. CentOS/HA by digitalhermit · · Score: 5, Informative

    It's fairly trivial to install RedHat/CentOS based clusters, especially for web serving purposes.

    There are a few components involved:
    1) A heartbeat to let each node know if the other goes out.

    2) Some form of shared storage if you need to write to the filesystem.

    3) Some methood of bringing up services when it fails over.

    A web server with a backend database is one of the canonical examples. You'd install the heartbeaat service on both nodes. Next, install DRBD (distributed replicated block device). Finally, configure the services to bring up during a failure. The whole process takes about an hour following instructions on places like HOWTOFORGE.

    But 1000 visitors a day is not much. It's small enough that you could consider virtualizing the nodes and just using virtualization failover.

  14. Re:1000+ a day is trivial have you thought of amaz by rufus+t+firefly · · Score: 5, Informative
    There are a number of nice load balancers out there which are opensource. I'm partial to HAproxy, but you could try:

    HAproxy (which is the one I use) has the ability to define "backup" servers which can be used in the event of a complete failure of all servers in the pool, even if there is only one server in the main pool. If you're trying to do this on the cheap, that may help. It also has embedded builds for things like the NSLU2, so it may be easy to run on an embedded device you already have.

    --
    "He may look like an idiot, and talk like an idiot, but don't let that fool you. He really is an idiot." - Duck Soup
  15. Re:1000+ a day is trivial have you thought of amaz by eharvill · · Score: 4, Informative

    My favorite (the name seals the deal for me) is http://www.ultramonkey.org/

    It's probably more complicated and overkill for what the poster needs, but it worked great for us. We used this years ago for transaction processing (~100,000 transactions an hour, not too busy) on a couple old HP NetServers with 1GB RAM each.

    --
    At night I drink myself to sleep and pretend I don't care that you're not here with me