Best Solution For HA and Network Load Balancing?
supaneko writes "I am working with a non-profit that will eventually host a massive online self-help archive and community (using FTP and HTTP services). We are expecting 1,000+ unique visitors / day. I know that having only one server to serve this number of people is not a great idea, so I began to look into clusters. After a bit of reading I determined that I am looking for high availability, in case of hardware fault, and network load balancing, which will allow the load to be shared among the two to six servers that we hope to purchase. What I have not been able to determine is the 'perfect' solution that would offer efficiency, ease-of-use, simple maintenance, enjoyable performance, and a notably better experience when compared to other setups. Reading about Windows 2003 Clustering makes the whole process sounds easy, while Linux and FreeBSD just seem overly complicated. But is this truly the case? What have you all done for clustering solutions that worked out well? What key features should I be aware for successful cluster setup (hubs, wiring, hardware, software, same servers across the board, etc.)?"
1000+ unique visitors is nothing. Even if they all hit the site at lunchtime (1 hour window), and look at 30 pages each (very high estimate for a normal site) that's only 8 requests a second. That isn't a lot. A single server could cope easily, especially if it's mostly static content. As an example, a forum I run gets a sustained 1000+ users an hour and runs fine on one server.
As for "high availability", that depends on your definition of "high". If the site being down for a morning is a big problem then you'll need a redundant failover server. If it being down for 15 minutes is a problem then you'll need a couple of them. You won't need a load balancer for that because the redundant servers will be sitting there doing nothing most of the time (hopefully). You'll need something that detects the primary server is offline and switches to the backup automatically. You might also want to have a separate database server that mirrors the primary DB if you're storing a lot of user content, plus a backup for it (though the backup DB server could always be the same physical machine as one of the backup webservers).
Whoever told you that you'll need as many as 6 servers is just plain wrong. That would be a waste of money. Either that or you're just seeing this as an opportunity to buy lots of servers to play with, in which case buy whatever your budget will allow! :)
http://twitter.com/onion2k
If the site goes down do you lose truck loads of money or does anyone die? Load balancing and HA sounds a little overboard for a site with a thousand visitors a day. A hundred thousand and you can probably justify the expense. I would probably just be looking at a hosted dedicated server somewhere for now.
there is also more of them than you can poke a stick at and prices are very reasonable. places like rackspace for this kind of thing for $100/mo.
the other advantage is you don't need to pony up for the hardware.
If you mod me down, I will become more powerful than you can imagine....
At work we have a pretty good experience with Pound - it's easy to set up & it load balances and will detect when one of your servers is down and stop sending traffic there. You can get hardware load balancing from people like F5 too.
If you're just starting out you'll probably want to start with software and then, if the load demands it, move to hardware
Machine-wise, we use cheap & not overly powerful 250 GBP, 1u servers with a RAID1; they'll die after a few years (but servers will need to be refreshed anyway) and they provide us with lots of options. They're all plugged into 2 gigabit switches
Global symbol "$deity" requires explicit package name at line 2. - If only $scripture started "use strict;"
I would think that this would also largely depend upon what you are using to serve the pages people are going to be accessing. If you are using IIS as a web server (I'm assuming this is not the case) then the NLB component of Windows is already there ready to be turned on. This will provide fault-tolerance and load balancing for the front-end but if you have databases then these will also need redundancy for your service to be HA (MS have failover clusters for this purpose). I've found MS implementations of load-balancing / HA to be simple and effective if they are implemented properly.
Why are you purchasing six or so servers before you even have one online?
You say that you expect "1,000+ a day" visitors which frankly is nothing. A single home PC with Apache would handle that.
This entire posts strikes me as either bad planning or no planning. You're flirting with vague "out of thin air" projections that are likely impossible to make at this stage.
Have a plan in place for how you will scale your service *if* it becomes popular or as it becomes popular but don't go wasting the charities money just in case your load jumps from 0 to 30,000+ in 24 hours.
Your application is very simple, and your budget probably is not too high. But for your own edification, this is F5 Networks (formerly F5 Labs) bread and butter, application delivery. What you want is a pair of BIG-IPs running Local Traffic Manager. You should look into that, at least so you can show off how cheap the solution you propose to your boss is to it.
Lets get more blunt. Depending on what you are doing and if you want to worry about failover then 1000 a day is bugger all. Simple set up of Apache and Tomcat (if using Java) with running round-robin load-balancing will give you pretty much what you need.
If however you really are worried about scale up and scale down then have a look at Amazon Web Services as that will probably more cost effective to cope with a peak IF it occurs rather than buying 6 servers to do bugger all most of the time.
2 boxes for hardware failover will do you fine, if you are worried about HA the its the COST of downtime that you are worried about (i.e. down for an hour exceeds $1000 in lost revenue) which will justify the solution. Don't just drive availability to five nines because you feel its cool, do it because the business requires it.
An Eye for an Eye will make the whole world blind - Gandhi
I have a small budget. 800 bucks.
I have heard of this guy building a microwulf cluster, http://www.calvin.edu/~adams/research/microwulf that generated some good flops, at least at that time. Today I can build that very same cluster for about 800 dollars.
My question: Is it better to go with a newer computer setup that falls within that budget, or go with the cluster. I will be doing image analysis work of function MRI data. Thanks.
A Good Troll is better than a Bad Human.
Consider OpenBSD, CARP gives you the best clustering. Alternatively OpenBSD with relayd makes for the best load-balancer.
Sit down for a bit and think about the most likely use cases for your software. To give the example of slashdot that might be viewing the main page or viewing an entire article. Structure your code so that these things can be done be directly sending a single file to the client. With the kernel doing most of the work you should be okay.
Sites which get slashdotted typically use a badly structured and resourced database to directly feed external queries. If you must use a database put some kind of simple proxy between it and the outside world. You could use squid for that or a simple directory of static html files.
http://michaelsmith.id.au
I want to give you some more information. Based on your visitor estimates I think you do not have a lot of knowledge about it. Because for this number of visitors you do not really need a cluster.
But now to the other stuff. Yes, Windows clustering is (up to Win Server 2003 [1]) a lot easier. But this is because it is not really a cluster. The only thing you can do is having the software running on one server, then you stop it and start it on the new server. This is what Windows Cluster is doing for you. But you can not have the software running on both servers at the same time.
If you really want to have a cluster then you need probably some sort of shared storage (FibreChannel, iSCSI, etc.). Or you are going to use something like DRDB [2]. You will need something like this too if you want to have a real cluster on Windows.
I recommend you to read some more on the Linux HA website [3]. Then you get a better idea what components (shared storage, load balancer, etc.) you will need within your cluster.
If you only want high availability and not load balancing then I recommend you to not use Windows Cluster. Better set-up two VMware servers with one virtual machine and then copy a snapshot of your virtual machine every few hours over to the second machine.
[1] I don't know about Win Server 2008
[2] http://www.drbd.org/
[3] http://www.linux-ha.org/
For LoadBalancing and statics file HTTP serving use Nginx, is the fastest around. Use two or more linux servers for your High Availability Cluster, set a virtual IP for the LoadBalancer and HeartBeat to switch the virtual IP in case of failure. Software cost including OS = zero.
1000 visitors per *day*? Oh my! That's almost one visitor every minute! Truly, this is traffic previously unheard of.
Amazon's servers allow you to scale vertically and horizontally. They have images that are preconfigured to do load balancing and they have LAMP setups. Plus the fact that its a completely virtualized system means you never have to worry about hardware failures. with only 1k uniques per day, they have more than enough to accommodate for what you need
as for ease of use, i've never done windows load balancing, but the linux load balancing isn't terribly difficult to get working. to optimize it is quite a bit more difficult though. but with anything linux, its all terminal so its almost never as convenient as point and click. however, its almost always more flexible than point and click.
one other thing that you need to think about that goes hand in hand with HA systems is monitoring. with or without amazon, you need to always account for software failures too. apache might hang, the database might be overloaded, etc. you'll need something like nagios, cacti, etc. so don't forget to account for that in your hardware costs
16GB? Are you mad? Anything beyond 1GB should be enough to handle 1000 unique visitors per day. If you want to virtualize the system and have a separate web- and database server, 2GB should be enough already, if you ant to go further and have a separate virtual mail server in there, 2GB is still sufficient and 3GB is plenty.
Haproxy is better than Pound, IMO. It's lightweight, but handles immense load just as well as layer 3 load balancing (LVS), with the advantages of layer 5 proxying. It uses the latest Linux APIs (epoll, vmsplice) to reduce context switching and copying to a minimum. It has a nice, concise stats module. Its logs are terse yet complete. It redirects traffic to a working server if one is down / overloaded.
Depending on static/PHP/Python/WhatEverYouUse engines, i think 16GB is a bit overkill for 1000+ users per day, but it all depends on the application ofcourse.
Hey dude, it's just got to be a Beowulf cluster.
Preferably a russian one.
And don't forget to use low-profile car tires for extra performance.
Have you considered any of the 'cloud' offerings? Amazon EC2 / Microsoft Azure could be an option, this will be able to give you scalability as am sure that your 1000+ visitors a day is a guess. You can then bring up some of your services and grow with demand. Your 6 servers, clustered with a load balancer will quickly get expensive. Give it a go :-)
SJJ
One dual quad Xeon properly configured can saturate 200Mbps, and serve 500 requests per second per GB of RAM installed easy. Most bad data centers configure their systems with only 1GB of ram fully aware that they can lease more systems to one client and much more profit than simply fine tuning the server.
Once you take into account the hardware bottlenecks (disk arrays)
Cluster systems are high-latency, better suited for "applications running on the server" over "static content"
Yes, because you know exactly the memory footprint of the application running on...um, well whichever OS, and you know it'll scale in a predictable way. Today's server hardware[tm] (which just comes in one model) doesn't really care about the CPU intensiveness of the application. As we all know, 1000+ means exactly 1001-1005, not for example 172513, and it really makes no difference whether it's evenly spread out during the day or all of them connect at 4:20:00 EST. Just buy lots of RAM, case closed.
Because the cluster setup is highly complex and fragile, you should hang a sign directly above the hardware.
"ATTENTION
This room is filled with special electronic equipment. Fingering and pressing the buttons from the computers are allowed for experts only! So all the "lefthanders" stay away and do not disturb the brainstorming happening here. Otherwise you will be out thrown and kicked elsewhere!
Also: please keep still and only watch the blinking lights in awe and astonishment."
Hi! we run a non-profit website that gets 100 million visitors a day on ~350 servers. we don't even use any "clustering" technology, just replication for databases, and software (LVS) load balancer in front of both app (PHP) and squids at the edge. but oh well, you can always waste money on expensive hardware and clustering technology. and you can always check how we build things
First, figure out what it means for your website to be available (do people need to be able to fetch a page, or do that also be able to log in, etc.). Select monitoring software and set it up correctly.
As for the serving architecture, at this level of load, you're better off without clustering. You don't need it for the load and it's probably a net loss for reliability; most outages I've seen in two-node cluster is either infrastructural that takes them both out (power distribution failures, for example) or problems with the HA system itself (switches going into jabber-protection mode and provoking a failover, failure detection script bugs, etc.). If you really feel that a single machine does not offer enough protection, go for an active-active configuration and simplify the problem to directing incoming requests to the working web servers, as opposed to "failing over".
This changes a bit if your reliability needs are high enough to justify separate serving facilities in separate data centres in different cities. For that sort of stuff you need to look at working with DNS to solve part of the problem too, but the right approach there depends on to what extent the website is static content.
and was handling like hundred thousands to a everyday, with off the shelf hardware spec 10 years ago. (Like 512M RAM and 1st era Pentium 4)
There was no problem at all.
We also used www.linuxvirtualserver.org to handle load balancing the web requests, and using yet another bigger Linux NFS for backend storage.
The biggest problem for the HA is
1. How you sync the data over, or do you rely on another central storage which then there is single point of failure again.
2. If it involves Database, then it's is a much bigger issue...
I assume you don't need sub-second failover. 5 minutes downtime might even be OK. You might want to shoot for a Hot Standby solution, instead of Load Balancing solution, which should be a little bit easier on everything.
I'm sorry, but I have to say that. Don't be offended, please - sooner or later you will look at your submission and laugh really hard, but for now you need to realise that you said something very, very silly. A few people already politely pointed out that 1000 visitors a day is nothing - but seriously, it's such a great magnitude of nothingness that, if you make such a gross misintepretation of your expected traffic, you need to reconsider if you really are the right person for the job *right now* and maybe gain some more experience before trying to spend other people's money on a ton of hardware that will just sit there, idle and consume huge amounts electricity (also paid by other people's money).
I'm serving a 6k/day website (scripting, database, some custom daemons etc.) from a Celeron 1.5GHz with 1GB RAM, and it's still doing almost nothing. If you really have to have some load balancing, get two of those for $100 each.
This is Slashdot. Common sense is futile. You will be modded down.
1000 users a day? So what? That's less than one user a minute. Even if you assume they stay on the website for 20 or so minutes each, you're never looking at more than about 20 users at a time browsing content (there will be peaks and troughs, obviously). Now picture a computer that can only send out, say, 20 x 20 pages a minute (assuming you're visitors can visit a full page every 3 seconds) - we're talking "out of the Ark". Unless they are downloading about half a gig of video each, this is hardly a problem for a modern machine.
I do the technical side for a large website which sees nearly ten times that (as far as you can trust web stats) and it runs off an ordinary shared host in an ordinary mom-n-pop webhosting facility and doesn't cost anywhere near the Earth to run. We often ask for more disk space, we've never had to ask for more bandwidth, or more CPU, or got told off for killing their systems. Admittedly, we don't do a lot of dynamic or flashy content but this is an ordinary shared server which we pay for out of our own pockets (and it costs less than our ISP subscriptions for the year, and the Google ad's make more than enough to cover that even at 0.3% clickthrough). We don't have any other servers helping us keep that site online (we have cold-spares at other hosting facilities should something go wrong, but that's because we're highly pedantic, not because we need them or that our users would miss us) - one shared server does the PHP, MySQL, serves dozens of Gigabytes per month of content for the entire site, generates the statistics etc. and doesn't even take a hit. I could probably serve that website off my old Linux router over ADSL and I doubt many people would notice except at peak times because of the bandwidth.
Define "massive" too... this site I'm talking about does multiple dozens of Gigabytes of data transfer every month, and contains about 10Gb of data on the disk (our backup is now *three* DVD-R's... :-) ). That's *tiny* in terms of a lot of websites, but equally puts 99% of the websites out there to shame.
Clustering is for when you have more than two or three servers already and primitive load-balancing (i.e. databases on one machine, video/images on another, or even just encoding half the URL's with "server2.domain.com" etc.) can't cope. In your case, I'd just have a hot-spare at a host somewhere, if I thought I needed it, with the data rsync'd every half-hour or so. For such a tiny thing, I probably wouldn't worry about the "switchover" between systems (because it would be rare and the users probably don't give a damn) and would just use DNS updates if it came to it. If I was being *really* pedantic, I might colo a server or two in a rack somewhere with the capability for one to steal the other's IP address if necessary, or have DNS with two A records, but I'd have to have a damn good reason for spending that amount of money regularly. If I was hosting in-house and the bandwidth was "free", I'd do the same.
Seriously - this isn't cluster territory, unless you see those servers struggling heavily on their load. And if I saw that, I'd be more inclined to think the computers were just crap, the website was unnecessarily dynamic, or I had dozens-of-Gigabytes databases and tens or hundreds of thousands of daily visitors.
You're in "basic hosting" territory. I doubt you'd hit 1Gb/month traffic unless the data you're serving is large.
If your planning a HA solution my first step is to decide what you are trying to protect against, what the cost/consequence of these events occurring and a method to test failure events.
I've seen projects where the HA configuration has contributed to more downtime than any specific failure. I've seen projects that were too "important" to schedule test failures so when it did fail it didn't fail over.
In a lot of cases if a specialist site is down then people would come back later. If your consequences are not that high for an outage then save your money for good backups and good support contracts and maybe a cold/warm spare. If slashdot crashed now I'd just check again next time I had a chance.
A HA solution has to be designed from end to end. This isn't easy and some of your components may not work in a compatible way(black box software). Static content can be pretty easy to load balance/failover but once you start getting into dynamic content things become more complicated and uncertain.
If you have to worry about session persistence an unexpected event might redistribute connections causing existing connections to break for something that was very transient. i.e. it amplifies a minor fault.
I've seen applications that didn't pass their status through to the web server. There was a significant back end failure and the web server was still returning "200 OK" responses to the requests. The other servers were still working correctly and due to session persistence the people diagnosing the issue initially didn't realise that 25% of sessions were empty pages. The developer should have provided checks in their code, the load balancer could have done a different check, the initial level 1 support didn't really understand the system. All these have costs and consequences. i.e. development time and skills, risk that a content change might cause a service check to fail, training costs.
Buy two good quality machines and keep one as a hot spare and just backup every night.
The current "uptime" of a couple of my systems are 255 days, and that's only because of a power failure and subsequent end of generator fuel at my colo which no amount of on-site redundancy would have helped.
Good quality machines and software *will* run for a year pr more with no issues.
I've been setting up sites at data centers for about 10 years now, seriously, do the cost/benefit analysis, the base price is a couple machines, colo, and a backup strategy. Use the stand-by as a backup server, and download from that nightly. You can figure access to internet + 5 minutes to shut down or repair the non-working box, and if necessary active a new IP address on the stand by system. The probability of a good system running a solid OS -- FreeBSD or CentOS failing is pretty low. Good software components don't often fail or if they do, restart.
Seriously, a few of the sites I run have NO redundancy and my biggest risk is NStar and Sprint.
For a fully redundant system, two load balancers, at least 4 servers (two for each load balancer -- redundancy), two high speed switches, etc. etc.
Hardware failure happens but but not that frequently after the first week of service. I have two machines at a colo that are, no joke, 10 years old this year. A few years ago, I replaced the hard disks. This year they will be upgraded -- maybe :-)
Buy 2 very cheap computers with double HDs. You can get them for less than 200$ each. Then install BSD/linux with mirrored raid. Then you can use rsync/unison/name your favorite synchronization tool to mirror data between computers.
Then use http://en.wikipedia.org/wiki/Lighttpd or http://en.wikipedia.org/wiki/Nginx. You will get relative easy setup, excellent performance, unbeatable stability and good load balancing that scales to 10k+ users in a hour.
Of course all is depend if you use bloatware or not. It is very easy to make dynamic content generation and database stop scalability to only few connections.
So all basic tools are easily available from any free server distribution.
IF YOU WANT 100% AVAILABILITY: Don't forget your networking stuff. You have to have 2 routers and 2 Internet connections. This is why server hosting companies are 10x better and cheaper than doing your own server.
From hosting company you get 24h administration and regular backups. And as a bonus you get pre-installed and pre-configured environment.
I see there are already a ton of good advice here, so when you have your kit set-up, post a link so that we can load test your config :-)
It's called the slashdot effect and if anything, you will at least know when things break and how your configuration handle these fail over conditions.
PS: This is cheaper then buying load testing kit and software :-)
Need an ISP in South Africa?
I mean no offense, but so far everybody has been quick to point out that load balancing and stuff isn't what the user needs -- but yet, nobody has came forward with an actual answer.
Obviously has shares in Kingston.
(16Gb RAM for 1k visitors? What kind of pages are you serving?)
"It doesn't cost enough, and it makes too much sense."
I remember initially setting up our little site with 3 servers and a "managed" loadbalancer/failover solution from our hosting provider. Our domain name pointed to the IP address of the loadbalancer.
I learned that "managed" is actually a hosting company euphemism for "shared" and performance was seriously degraded during "prime time" everyday.
We eventually overcame our network latency issues by ditching the provider's loadbalancer and using round-robin DNS to point our domain name at all three of the 3 servers.
I was using Apache + JBoss + MySQL, and on each server I configured Apache's mod_jk loadbalancer to failover using AJP over stunnel to the JBoss instances on the other 2 servers. I also chose to configure each JBoss instance to talk to a MySQL instance on each box, these being configured in a replication cycle with the other MySQL instances for hot data backup.
For our load, we've never had any problems with this. The biggest component with downtime was JBoss (usually administrative updates), but Apache would seamlessly switch over to use use a different JBoss instance.
One of the servers was hosted with a different provider in a different site.
"I thought they were the dominant species..."
Has any /.er implemented the following ultra-simple solution to provide HA for websites serving static content: having the website DNS name resolve to 2 IP addresses pointing to 2 different servers, and simply duplicating the static content on the 2 servers ? How do browsers behave when 1 of the server goes down ? Will they automatically try to re-resolve the DNS name and attempt to contact the 2nd IP ? Or is the well-known DNS pinning security feature preventing them from falling back on the 2nd IP ?
Why don't you get a small VPS system? and upgrade if/when you need more power.
You get redundant Power/Disk/Networking all for a much lower cost than a dedicated box. If a phyiscal system dies (quite unlikely anyway) they can move your VPS to another machine and it should be up again pretty soon - which should be good enough for that many users.
My question: Is it better to go with a newer computer setup that falls within that budget, or go with the cluster. I will be doing image analysis work of function MRI data. Thanks.
While I'm not an expert on the topic by any means, I would expect for that sort of budget you'll get far better performance out of a single a machine, than any cluster you could build for the same cost.
Even if your interest is in testing how "cluster friendly" your code is (eg: for scaling considerations), you'll almost certainly still get the best performance/$ with a single quad-core machine running $CORE_COUNT VMs to "simulate" a cluster (with each VM bound to a specific CPU core).
I just can't see why you would want to venture into the cost inefficiencies of multiple machines until you _had_ to be cause a single machine wasn't fast enough - and you can fit a *lot* of power into a single computer these days.
First, I suggest you read and think deeply about Moens Nogood's essay "So Few Really Need Uptime".
Key quote:
And that corresponds pretty well to my experience: the more effort people make to duplicate hardware and build redundant failover environments the more failures and downtime they experience. Consider as well the concept of ETOPS and why the 777 has only two engines.
sPh
Others have already covered the "1000 users isn't much" aspect. Benchmark, and verify what each server can handle of your anticipated load, but they're probably right.
Option 1: Don't do it yourself. Look into renting servers from a hosting company. They will often provide HA and load balancing for free if you get a couple servers. Also, having rented servers makes it much easier to scale. If you find that you have 100,000 uniques per day, you can order up a bunch more servers and meet the load within minutes to hours. If you overbought, you can scale back down just as fast.
Option 2: http://www.linuxvirtualserver.org/ plus http://www.linux-ha.org/ . You use LVS to load balance out to a cluster (including removing failed servers from the pool). You use HA so that two LVS machines can fail over to each other. Note that you can run LVS on the same machines as your load, for a small environment. This is much more DIY than the Windows setup, of course... But honestly, if the setup requirements of this scare you away, then you're not ready to run a fault-tolerant network, regardless of OS.
Option 3: http://www.redhat.com/cluster_suite/ . Less DIY, more money. Perhaps that's better for you.
Option 4: Buy a commercial solution. Every major network vendor sells a HA/LB product. I've used them from most of the big players... I'm not going to write a review here, but it'll suffice to say that while they each have their good and bad points, any of them will get the job you've outlined done.
As for the network: The general rule is to reduce your single points of failures to the minimum you can afford. Common ones are: The ISP (BGP is a pain); the routers (Each ISP goes to its own router); the switches between (you need to full-mesh links from the two routers to two switches, down through the line as many layers as it goes; your switches need to run STP or be layer 3 switches running OSPF or another routing protocol; don't forget to plug the load balancers into different switches); the power (Servers, switches, and routers on separate UPSes such that losing one will leave a fully functioning path); and depending on how far you want to take this, the data center itself (in case of fire/meteor/EPO mishaps).
Note that all of this is required even for your Windows solution. Are you sure you don't want option 1? :)
If you're looking for a lightwheight open source loadbalancer with a lot of features, go for HAProxy. In my company we work with F5 Big IPs, Alteon, Cisco CSS which are the leading load balancers from the industry, they are really expensive and depending on the licence you buy, you won't have all the features (HTTP level load balancing, cookie insertion/rewriting). We first used HAProxy for POC and now we're installing it in production environnements, works like a charm on a linux box (debian and RHEL5) with around 600 users.
There is no way to be fully redundant unless you have independent power sources, which usually requires your backup systems to be geographically separated. In my experience, loss of power is the single most common reason for a system failure in a well designed system (after human error that is).
Well, yes, that is how Microsoft makes its money: by releasing versions of complex technology that seem easy compared to the archaic legacy technology. Key word there is "seem", of course; when the chips are really down you will find out if (a) the Microsoft system was as good as, or even the equivalent of, the "archaic" version (b) your deep understanding of the problem you are facing, and ability to fix it, has been improved or disimproved by having the complexity hidden from you by a friendly interface.
YMMV. Obviously Microsoft shifts a lot of kit.
sPh
By the way, I would look at Contegix, Connectria, or similar hosted services provider serving small and medium sized businesses. If you are unfamiliar with the technology hand it over to someone who is whose price is reasonable.
At that price point the real question is a basic one, do you want to build a cluster? If yes, I wouldn't build that exact setup but probably go with Athlon X2 5050e CPUs. You can also get used 1U dual cpu servers on ebay and sites like geeks.com almost all day long for $100-150 each. They did have a bunch on this page: http://www.geeks.com/products_sc.asp?Cat=821 but are currently sold out of the dirt cheap stuff. The downside of the pre-built older stuff is they'll cost more in electricity to run. Now, if you answered "No, I don't really just want to build a cluster for fun." then your best bet will be to just build an i7 based machine. With the cluster you'd be able to afford max 6 nodes with 2 cores each that will be individually slower than the i7's cores. With the i7 you'd only have 8 (logical) cores but they'd be faster and overall draw less power (cheaper to operate) than the 12 core cluster. If the application you're working with can truly be threaded easily enough to take advantage of an 8-12 cpu cluster you should look into porting it to run on a GPGPU. And that's if there's not already code to do it. A lot of scientific functions are already available written in CUDA. You can get a ton of performance out of a $200 video card if the application can be parallelized.
Hi, for up to 10000 users per day one windows server can easily handle the load. If you need higher availlability then you can use Windows Network Load Balancing service which is available in the standard edition of windows. You still have to replicate all your data manually, but since each server has a local copy of pages and data then even when you patch your windows server (once a month on patch tuesday) or just reboot then the second node will take over the shared IP address and your visitors will see minimal disruption of service. The only problems you will have to deal with will be user uploads and database sync if you want each of your server to have a local copy. Otherwise you can also use a third server if you need database service, but that server would not be redundant. The only way to make an MS SQL server redundant would be with the clustering service that comes with windows Enterprise and SQL 2005 Standard, but watch out for the licensing costs. Ah and you need also a SAN for your database storage. So in essence: 2 web servers with windows network load balancing = cheap 2 MS SQL servers with cluster service = very expensive My recommendation: Buy decent hardware with good support (any of the big three: IBM, Dell, HP) because when hardware fails you need that motherboard, power supply, hdd or memory ASAP Use RAID 1 or RAID 5 for ALL storage, you want high availlability after all, I prefer Hot Plug drives, you don't want downtime because you swap a HDD and HDDs are like consumables these days Use windows network load balancing if you can afford it to maintain web site availability. Learn Linux if you want a cheaper licensing. Consider all the costs associated with database clustering, it can easily run you into a 100 000$ solution for an MS SQL solution
One of my clients recently had 100,000 unique visitors an an hour, on a single web server, and a single database server.
You should be fine with decent shared hosting.
Seriously, if this is a non-profit then fiduciary responsibility is probably very important to them. I'm sure they are excited to have someone like you help them but don't use them to "play" enterprise admin. The numbers you have presented are miniscule and I doubt your data is so critical that it requires absolute 24x7 uptime. The amount you would cost them for 1 server would pay for web hosting for several years at a provider as well as greatly reduce the amount of administration.
If you want to be a sysadmin then remember the most important tenet. Always do right by the customer.
If you have to pay for power and/or have to deal with the environmental aspects of living/working near the multiple machines (heat, noise, etc.), then I would also suggest a single box.
OpenBSD of course. It was just discussed on undeadly.org how they're in the process of changing some of the relevant code to even better improve things.
As is now, you can easily do exactly what you need with OpenBSD and CARP (and some other related tools in th base system) - for Free and Securely!
If high availability is your concern, then you need redundancy from end-to-end, not just in the servers. A cost-effective way to do that is use Stonesoft's firewall/VPN solution. It can load balance DSL, cable modem and other Internet connections, clusters the devices themselves, and perform back end server load balancing of your Web servers. The centralized management is very powerful as well. 30 day evaluations available off their Web site.
[full disclosure: I own no monkeys, but I do work for Stonesoft]
Ich suche die Leidenschaft, die keine Leiden schafft.
Measure the memory cost of your web application. Suppose that it's PHP and a session takes 35MB, then you need 35MB for the duration of servicing the request. With 1000 visitors a day, if they all visit during lunch hour, and they are each looking at 10 pages, you'll have about 2.7 requests per second on average.
This means that on average you'll need another (35MB + database overhead + Apache overhead) x 2.7 memory per second. If page generation lasts an astoundingly long 2 seconds, you'd have about 6 sessions stacked up before you recovered the memory used by the first session in the queue. Assuming that you need 10MB for Apache + database, you'd need all of 270MB + OS footprint to run your server.
I think we can safely say that 16GB is overkill under these circumstances.
Of course if it's lunch hour, your peak (which is the important thing) would be higher: maybe 50% people would hit in the first 15 minutes of the hour. You need to do capacity planning which is appropriate for the load and the technology you are using.
By contrast: one of my sites had 15 minutes of fame, and had 20,000 page views across about three hours. It was running as static content, from a Xen instance, with 1GB of memory, and about 25% of processor time on a dual processor 1GHz system. There wasn't even a hiccup in dealing with the load.
I do realise that clustering has it's uses, but the truth is that most clustering and HA solutions were merely marketing tricks to sell consulting and expensive hardware to gullible IT managers with an overblown sense of self importance. The more money you spend the more important you are. Right?
How else are you going to justify your huge salary.
It's fairly trivial to install RedHat/CentOS based clusters, especially for web serving purposes.
There are a few components involved:
1) A heartbeat to let each node know if the other goes out.
2) Some form of shared storage if you need to write to the filesystem.
3) Some methood of bringing up services when it fails over.
A web server with a backend database is one of the canonical examples. You'd install the heartbeaat service on both nodes. Next, install DRBD (distributed replicated block device). Finally, configure the services to bring up during a failure. The whole process takes about an hour following instructions on places like HOWTOFORGE.
But 1000 visitors a day is not much. It's small enough that you could consider virtualizing the nodes and just using virtualization failover.
There are way to many questions that need to be known before a competent technical architect can help design the "just right" solution for you.
Most of the people here are experts on some small part of the solution and will spout "all you need is X" - and that's fine for free. I've worked on telecom - can never go down - systems for over 10 tens as a technical architect leading project teams from 1 to over 300 software developers and 20 others on the hardware side.
On the surface, FTP and web pages don't sound like the best solution to the problem as stated. Did yo just learn HTML and want to use it?
Now, here's my $0.02 on your problem: /. may still use pound for load balancing, so you know it scales.
* 1,000 visitors a day can be run from my cell phone. That's "nothing" traffic for a network or an old desktop.
* Avoid clustering at the OS or application level unless you really, really need it. You probably don't. Almost nobody needs clustering.
* Use network load balancing. There are many, many solutions for this. The easiest is from F5 (buy through Dell), but free versions work fine too - I've been using `pound` for years myself.
* Backups are key. RAID is not backups. Verify that you can actually **recover** from bare metal using your backups. Don't pull a Ma.gnolia http://blog.wired.com/business/2009/01/magnolia-suffer.html
* Disaster Recovery is important. Often, you can solve both backup and recovery and DR at the same time.
If you are a non-profit doing something I believe in, I'll do network, systems, B&R, and DR deigns and consult with you for free, an enterprise class solution. My company looks at FOSS solutions first, before recommending commercial, costly solutions. All our internal systems are FOSS, though we do have a lab with Microsoft servers since that's what many customers demand/need.
Think of a good TA just like a CPA or Lawyer. You pay us to prevent all the problems that could happen later that cost your huge amounts of money. After my CPA does my taxes, I sleep better at night.
Your best solution:
An ordinary PC with Centos (or equiv.) loaded.
You will have at the end of the day:
1) A perfectly good solution for your application.
2) Learned that Linux is not hard to learn and that the Linux community supports you better than M$.
3) Your pride will be intact. More money for your non-profit, less for Steve Ballmer.
With 1000 users if you want SQL Server you need to purchase a processor license: 5k$/CPU for Standard Edition, 25k$/CPU for Enterprise. (You only license physical CPU, not cores or hyperthreading). Add the Windows license (6k$). And you have no hardware yet.
The "good news" is that with failover clustering (which is all you need cause 1000 users does not require load-balancing), Microsoft requires licenses only for the active node. And the failover node can be cheaper hardware, as it will run only under abnormal situations and can offer a lesser performance (management is usually ok with that).
If you go with Linux + Postgres or MySQL, you pay no licenses. Those products are a bit less user-friendly, but they give you more control over your setup. Use database clustering and/or replication, and use either one of the many free load-balancing software or pay for a very good one (like Zeus).
Based on my experience, I would say: for a small intranet, use Microsoft (Windows, SQL Server, Sharepoint) because you can leverage on MS-Office and powerful groupware tools (project management, BI, reporting) and actually provide value to your end-users. But for a large intranet or for public-facing sites, where you don't control the end-users platform, use Linux, it's worth the learning curve.
lucm, indeed.
A few years ago we were facing the problem of the need to host/maintain a Java webservice. We started to look into common Java containers like Tomcat, JBoss and naturally Glassfish. The only problem we saw was that the application server had to function as a backend and thus we would need the webserver to relay requests.
Eventually we stumbled upon the Java System webserver 7 and that turned out to be much more than merely a webserver with a nice administrative interface. If you're used to administrating Apache servers then it can be a bit tricky to get used to this since the server fully uses XML for its configuration files (that is, if you chose not to use the admin. interface). At first we focussed fully on the Java container, but eventually started to discover that you could do a whole lot more with this critter.
Personally I think it really excells at clustering. If you made changes on one node then one command (or 2 clicks of the mouse) is enough to distribute those changes all over the cluster. Next to that it has excellent (online) documentation and is free for use just like Apache is. Oh, and before I forget.. While it is aimed at Java usage its also perfectly capable of supporting other languages like PHP. Either by using a PHP addon or simply setting up PHP as some sort of "back end" (allow use of FastCGI for example).
Considering the price and the ease of use (setup a cluster in approx. six steps) I think this might be just what you want. And its onboard extensive statistics engine will allow you to clearly see for yourself if the load on your park is getting too high or not.
And yes, I agree with most other reactions that your load really doesn't need clustering. I'll add a little more to that; the service I mentioned above is currently still running on a single Webserver 7 instance and easily deals with more than that amount. We did tune the Java container to suit our needs, but apart from that even an app. server should be capable of handling this load. But having said that I think you might find this webserver very usefull nonetheless. Especially the administrative interface might save you guys a lot of tweaking.
Although Citrix XenServer is based on Linux, it has a Windows interface for management, which makes most tasks easy.
Linux over complicated...ha ha
I will sell him a system fully capable of handling ten times that traffic with hot standby failover for 50 bucks a month with ds3 bandwidth available to it.
Got Code?
HAproxy (which is the one I use) has the ability to define "backup" servers which can be used in the event of a complete failure of all servers in the pool, even if there is only one server in the main pool. If you're trying to do this on the cheap, that may help. It also has embedded builds for things like the NSLU2, so it may be easy to run on an embedded device you already have.
"He may look like an idiot, and talk like an idiot, but don't let that fool you. He really is an idiot." - Duck Soup
CARP is a protocol that does automatic load balancing and IP failover.
Install your application on 2 (or more) servers, give them the same address virtual IP address using CARP, et voila. Nothing more do buy, and no need to install any load balancer.
CARP's reference implementation is on OpenBSD, and it's shipped by default. DragonflyBSD, NetBSD and FreeBSD ship with an older version.
{{.sig}}
Windows clustering allows for Active/Active clusters, so you CAN run the same service on two cluster nodes at the same time (with the exception of Exchange).
Setting up two servers to host VMWare guests and copying is not a good idea either - the HA tools for VMWare are expensive, and totally unneccessary for the proposed deployment. Without these HA tools, he would have to down his primary guest every time he wanted to make a snapshot.
We're talking about a very simple deployment here - HTTP and FTP. You don't even need clustering or a dedicated load balancer - instead, try using round-robin DNS records to do some simple load balancing, and then use a shared storage area as your FTP root (could be a DFS share for Windows or an NFS mount in Linux). This would give you a solid two-server solution that works well for what you're trying to accomplish, and adding servers would be trivial (just deploy more nodes, and add DNS records to the list).
If it grows much larger than 2 nodes, you might consider an inexpensive load-balancer; Barracuda sells one that works well and will detect a downed node.
Clustering for this job is totally unnecessary though. You're wasting your time by looking into it.
http://www.caoslinux.org/features.html
"The NSA-1.0 release identifies the stabilization and validation of the core operating system, fully tested on some of the world's fastest public and private systems and architectures. And now with NSA 1.0.8 you get bleeding-edge security updates, the new 2.6.28 kernel, updated packages such as OFED 1.4 and gcc-4.3.3, a streamlined Sidekick system configuration toolkit (making the installation of Caos Linux and Perceus even faster and easier), the latest Perceus 1.5 cluster management software, and Abstractual, Infiscale's cloud virtualization solution. All of these updates are already integrated in the NSA-1.0.8 ISO release of Caos Linux"
My favorite (the name seals the deal for me) is http://www.ultramonkey.org/
It's probably more complicated and overkill for what the poster needs, but it worked great for us. We used this years ago for transaction processing (~100,000 transactions an hour, not too busy) on a couple old HP NetServers with 1GB RAM each.
At night I drink myself to sleep and pretend I don't care that you're not here with me
Use Google. Why spend all that money buying up equipment for a non-profit that could be spent on your REAL mission.
Do it in Google sites and dump the data center. I even think google offers google apps for free to non-profits.
Everybody keeps saying that 1000 unique visitors is peanuts and starts talking about Apache, etc. The OP mentions FTP as well, and didn't say if those 1000 users will all be regularly FTP'ing megabyte files or if they will be almost exclusively using HTTP with the occasional FTP download. If the former is the case, without analyzing it too much, it seems like this would be too much traffic for a single server to handle, no?
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
It's a good thing you didn't link to your existing site if you're worried about 1000 visitors a day...
Check out my sysadmin blog!
save your money. if you really need time on a cluster you should apply for a grant from the ohio supercomputer center, ncsa, or investigate the offerings from your own institution. applying for a grant may sound daunting, but it really involves little more than filling out a form and asking for some cluster time. good luck!
Don't forget Squid, which can also act as a proper reverse proxy/cache, and is the precursor to putting your content on a CDN.
(Ie, once you solve the issues surrounding putting Squid in front of your website, you've mostly solved the CDN related issues.)
(Or, you could try my Squid-2 fork called Lusca, but thats for a different story, and post. :)
I agree that 1000 is not a whole lot. One option for scalability would be slicehost.com. They allow you to quickly ramp up the amount of server power that you have with literally a click of the button.
I have been using them for a while and never really had a problem. You can get multiple 'slices' together to do more load balancing for what ever you need.
Another option for you could be Amazon's EC2 with S3. Infinite scalability.
I've set up Apache and mod_proxy_balancer for just this purpose. The sites don't have enough traffic for me to justify buying an F5 or Cisco CSS load balancer, so I use proxy balancer with a bunch of vhosts, it works great.
Add Keepalived and you can have redundant (though not stateful failover) load balancers on the super cheap.
For SSL it still works well. Give it a look, took all of an afternoon to set up a failover pair of servers. I don't know yet how much traffic it will take, but a single CPU 800Mhz server is doing a couple Mb/sec with no sweat for me.
I like music
you may have heard "good fast cheap... pick 2" this is similar.
if your content is dynamic, you have more to worry about. DB servers, storage other application specific issues...
if your content is static or close to it, round robin DNS is plenty. rsync between 2 boxes, and set up the round robin. How far away the boxes are determines how long they take to sync and how much of a safety net it really is. next to each other in the same rack protects from HW fault. different datacenters protects from power and networking. Different states protects from natural disasters.
if you don't know how much to spend then....
you need to figure out how much money downtime (unexpected) costs you. Then you can figure out how much to spend to reduce your chance of tht downtime. If it's going to cost you $1000 per hour, and you expect that with 2 boxes in the same datacenter there might be a 1% chance of failure of both systems that would require 24h to come back up, then your total loss would be 24K, 1% of that is $240 to spend on mitigation. in that case, put your systems in different datacenters. If you're going to lose $100,000/hour, then spend 100x on mitigation.
"We are not tolerant people. We prefer drastically effective solutions"
Heartbeat + HAProxy + nginx. We're using a combination of these to replace our aging BIG-IP setup. HAProxy does the actual HTTP load-balancing, whereas nginx is serving up all the static media (pics, etc).
If you haven't looked at the F5 product line you should. The ability to use TCL language to write "iRules" and shear performance of even the smallest device is amazing. The devcentral.f5.com site is also great and allows you to gain from others experience. With an F5 in front the rest of the systems behind can be simple and cookie cutter with no complex setup. The F5 will handle persistence, load-balancing, and once you have your setup you can forget them for the most part.
For the FTP server part, you just need some Linux boxes running your favorite daemon and a shared storage for the files.
--russ
We use OpenBSD with CARP and pfsync and relayd(8). It works a treat load balancing our web and jabber servers. I highly recommend it and the documentation that comes with OpenBSD is second to none. It's also an extremely secure OS for firewalls and routers.
There is a common understanding that a single server can serve static data many orders of magnitude the scenario described.
But for some dynamic content that triggers database queries, the must is to use memcached.
Unless your application is very resource intensive (or badly written) a single server can cope easily with 1000 visitors. So add another server or two for redundancy.
;).
;).
Use RAID1 (RAID10 if you need better disk performance), and get backups. If you're on a tight budget you could use hotplug SATA drives for backups (if you don't have a habit of dropping your backup media on the floor, HDDs can be better than tapes). If you're on a really tight budget use those USB to PATA/SATA adapters
I suspect you will find that decently specced server hardware will typically be more _available_ than your ISP.
Even if you do the BGP multi-home stuff (e.g. links to two ISPs), if one of your ISP links goes down it can sometimes take several minutes for half of the world to figure out the new route to your servers.
I guess a relevant question is what is the impact of downtime? For example, if slashdot is down, productivity could actually go up
If you need very low downtime and want to cater for extreme circumstances (available in event of hurricanes, earthquakes), it can get really expensive - because it means you need at least two physically separate sites far enough from each other.
I work for a largs E-commerce site as a network engineer, we gets thousands of connections a second. In order to accomodate that load we use F5 Network BigIP loadbalancers. GTM's or ( Global Traffic Managers) allow you to have geographic loadbalancing accross multiple datacenters, and LTM's ( local traffic managers) these are actually the devices you need to purchase. All you do is add the ip addresses of each server to the bigip in the form of nodes, add the nodes to a pool, and add the pool to a virtual server. The VIP or virtual server IP is the one used in dns, or as part of a wide IP system if using gtm's.
F5 has the biggest market share in loadbalancers and their products are top notch. If you like they can come in and demo them for you. I see a lot of replys to your questions but nothing I have seen so far uses a product that a company would actually accept. Open source is great, but when you need 24x7 365 uptime, high availability in the form of stateful clustering.
Grab a crappy old athlon tbird box with a gig of ram and set it up as a router/firewall running *LVS (Linux Virtual Server) to forward web requests to your back end web server. You can start out with one web server and gauge the load. If you want to scale the system, add more backend web servers and configure LVS with the new backend ip addresses.
For redundancy on the athlon router, trunk a couple nics for network, and boot from cdrom (knoppix) if you are worried about system disk failure. You could also buy a 3ware 2 lane raid card for a couple bills and sata raid a couple
hard disks if cdrom boot doesn't work for you. It's cheaper to keep a couple cdrom drives on hand, and spare knoppix
cds, than setup a bootable hard drive raid system.
Figure out if you want a shared filesystem for the web servers, or just rsync the important stuff between them for
starters. Software raid on another crappy athlon box will work well for backend storage in the beginning. If you
have high disk load, you may need to upgrade the fileserver if transfer rates exceed bus bandwidth. The point is,
you are non-profit and running on a shoestring budget. start out cheap and dirty. Spend money on hardware later when
you find out where your bottlenecks develop.
If you loose a backend webserver, LVS can be configured to handle it in different ways.
[*] - http://www.linuxvirtualserver.org/
boycott slashdot February 10th - 17th check out: altSlashdot.org
"Proto Balance" is value for money (although not free) and runs on Windows as well as Unix. www.protonet.co.za
OpenBSD PF or hoststated seem like the simplest solutions to me.
Stop Computers/Cars Analogies on S
Given that the original poster wrote that "reading" about the technologies involved, seems that perhaps he has no experience on doing this setup or he has limited experience, finding an ISP to host the site may be a viable solution.
As others have pointed out, it is not high volume so almost any ISP should be able to handle the work.
This will allow the poster to work on the systems/programs to present the data and not worry about the infrastructure.
as a noncomputer specialist /. reader, this whole conversation sounds really wierd.
Why can't i just call up a bunch of guys in the yellow pages, or whatever passes for yellow pages, and say, I got a 1000 users a day, yadayada, gimme a quote.
all this arcane stuff - you have know this program, that program, why should some small nonprofit even have to think about it
to put this in perspective, it is as if the original poster was the maintenance guy, and he was asking for what type of capacitor to install in the new electric motor controllers in the hvac system.
no small or even large nonprofit would even think about - it would just be part of the hvac vendor's bill
I think the answere is that server type stuff is deliberately kept opaque and complicated so sysadmins have jobs - after all, if i could just get a quote on it, most of the people who have posted might not have paychecks, right ?
You really have to ask yourself if this is core to the business. If not, move it to a hosting provider that handles these issues for you. Do a quick cost breakdown based on your time, materials, etc. and then take a look at a hosting provider that may give you a discount or three because of your non-profit status.
As a lot of previous posters have mentioned, 1000+ unique visitors even with each doing heavy traffic is still pretty low. 100,000+ would warrant load balancing... or if you are serving up media of some sort and then I'd recommend a media server to offload that portion of the traffic.
What you can do is performance tune your webserver and database server. BTW if you don't have your DB on a different server, that could be a first step.
So performance tuning... here are a few good articles on the MediaTemple site which deal with Apache and MySQL:
http://kb.mediatemple.net/questions/246/(dv)+HOWTO:+Basic+Apache+performance+tuning+(httpd)
http://kb.mediatemple.net/questions/258/(dv)+HOWTO%3A+Basic+MySQL+performance+tuning+(MySQLd)
Yes these are written for their customers but they apply to any server running Apache and/or MySQL.
This next article I'm posting as an example only:
http://kb.mediatemple.net/questions/770/(dv)+HOWTO%3A+Misc.+performance+tuning
The idea with this is to turn off services you're not using. It mentions specific services known to be running on MTs DV servers... DNS, SpamAssassin, etc. YMMV but the idea is sound.
If you need a variety of services to work, consider running multiple servers dedicated to individual tasks. This will also help when it's time to troubleshoot, upgrade, etc.
I suggest using Virtual Machines for everything, especially since there are available VM Images for just about any base configuration you can think of (regardless of vendor) and it makes backups, swapping out upgrades, etc. very efficient. The process is such: copy existing VM image to a new machine (or new container) upgrade everything, test, test, test, then swap the new image out for the old when you're ready. Voila. If you've used a different machine for your DB for instance, you can upgrade your webserver machine w/ all scripts, etc. with no downtime - and have a backup sitting there seconds away from re-deployment if anything goes wrong.
A fool throws a stone into a well and a thousand sages can not remove it.
It seems to me that, unless you have very special needs, you should hire someone else to do the work of providing web hosting. It's much cheaper to have a dedicated team do the work for thousands of servers than have your own team.
I've looked at A2 Hosting. I've never used them, and don't know anyone connected with them, but they seem like they know what they are doing.
I wouldn't recommend my present web host.
Does anyone else have recommendations about web hosting?
Your needs for 1000+ uniques are minimal. If I were to do it, I'd get a shared hosting account someplace and move on. Shared hosting can handle *way* more than that.
But if high availability (limited downtime) is part of the requirements, I'd say go out and buy an F5 BigIP. You plug your internet in the front, your machines in the built-in switch, configure your domain names in it using the web interface, and you're done. Set it to do service-checks, and it'll automatically pull out of the pool any machine that fails or that you take down for maintenance. So you get full up-time so long as your power and network don't fail.
Yes, you can get the same functionality using Linux HAProxy. But you sort of need to understand what you're doing. Reading the way your question is asked, I suspect you're learning this, and do you really want to make the mistakes on a real live project? Just go with the appliance until you have a solid understanding of what you're doing. Shoot -- I have a good solid understanding from years of experience, and I still use the BigIP when I have a budget (and HAProxy when I don't). It's just easier, and I can move on to more interesting problems with my time.
Once you've got this setup, set up a cron job to rdist the site to all the machines so that all your data is always on each machine. If you've got a database, you have some choices. For completely static data, I like to have it replicated to each machine, and have each web server just query localhost. If it's dynamic, have a replicated pair. At your levels, that can exist on the web servers.
I really dislike the cross-mounted disk architecture of traditional cluster solutions, because there are too many shared components. Each of those multiplies your possible points of failure for your whole setup. Better to keep everything completely separated, so if one component fails, that whole machine just drops out and the site keeps working because of the load balancer and because each machine can operate by itself.
Ten thousand visitors *per hour* may be the level to start thinking about a second machine.
Excuse me, but please get off my Pennisetum Clandestinum, eh!
Everyone is jumping on this guy for not knowing what he is doing. We have no idea what the work of each query is. Heck, I was involved with a web server where each query would result in a 60-second fluid dynamics computation. If it had 1000 unique visitors a day, poisson arrivals would result in the server being overloaded at some point.
Out of curiosity I just tried out my method with Konqueror 3.5 and Firefox 3, and it works ! I defined a DNS name resolving to 2 A records and simulated HTTP timeouts on my server with iptables. Konqueror will re-request the IP until it finds one that doesn't time out; in fact it always re-request the IP because contrary to what I thought it doesn't implement DNS pinning. Firefox is even more efficient: if the 1st IP times out, it will try to connect to the 2nd (and 3rd, etc) IPs returned in the DNS response without even having to send another DNS request (because it does implement DNS pinning). Obviously both browsers take some time to detect the timeout, so it is preferable to keep only up-and-running server IPs in the zone file, but I am happy to discover that such a basic HA technique works :)
The Windows Cluster Service is for application clustering. Applications have to be "cluster-aware", which means they have to be specifically written to take advantage of the clustering. For doing what you are describing with out-of-the-box Windows features, you should use Windows Load Balancing, often referred to as Windows NLB. NLB is included with both Windows Standard and Enterprise, but the Cluster Service requires Enterprise.
You are describing a very basic scenario. Windows NLB will work fine. Just read up on the plentiful guidance for NLB. You can easily load balance tcp 21, 80, and 443 (FTP, HTTP, HTTPS) with NLB. I recommend using unicast mode in NLB and configuring your network switch ports to "portfast" mode to ensure the quickest convergence times.
I use Kemp Technologies load balancers. (http://www.kemptechnologies.com) I pad about $5k for a active passsive clustered pair of LM-1500s. Understand, they aren't F5s but if you need basic load balacing or failover these guys are great and for a fraction of the cost. They support round robin, weighted round robin, least connection, weighted least connection, adaptive, and fixed weighted load balancing and L4/L7 sticky sessions.
The Kemp's support ssl acceleration though I don't use it for web ssl. I mainly use the ssl acceleration for other protocols like FIX and others though I'm sure it handles https just fine. As I noted, we use the LM-1500s which are the smallest ones they have. We have an AJAX platform that streams data and we handle about 350+ requests per second during busy periods and these guys hardly register any load. They are Linux based and very simple to setup and use.
I was looking at pound and other load balacing options, and I can tell you. Those work, but for simplicity and ease of setup, the Kemp's are golden. Another place to look is loadbalancer.org. They are Linux based too. They are a little more pricy, but I know people that have used them and like them also.
As for the back in, I'm thinking a cluster isn't what you need. If your website is completely dynamic, you probably just need to replicate your database to second server and have two web servers handling requests. The only reason you need two webservers is if one fails. From what I'm seeing, you don't even need all the virtual servers (xen, etc) That would just complicate everything. Use a single ftp server and keep your FTP files rsync'ed to the second server in case you need to fail over. The Kemp's can do the failover stuff for you. If setup properly, you can use two servers for everything and if one fails, the Kemp's can completely failover everything without you doing anything.
The requirements to handle 1000 unique visitors/day will depend on what exactly you are serving. I ran a website that got well over 1000 uniques per day on a Pentium MMX 200 Mhz with 64 megs of RAM and a 1.2 gigabyte hard drive. This was significantly overkill for the site. However, that was entirely static content. Oh, except it handled email, spam filtering, and a database for a POS system for a retail establishment with two stores.
If you are serving mostly dynamic content, you'll want more processing power and more RAM. Almost certainly, you'll be fine with a bottom end computer, but you probably want something manufactured in the last five years or so. This will obviously depend on what your dynamic content actually is, though; more complexity will require more processing power.
If you cannot afford any outages, you may want to look at redundant hardware, failover systems, etc. etc., but you first need to determine how much an outage will cost you. What if you have a 5 minute outage? An outage lasting an hour? Eight hours? A day? In any case, before you look at redundant hardware, you'll need a service level agreement from your ISP.
And of course, if you are looking at something to stream 1 gigabyte of traffic to each of these thousand uniques, that's a whole different matter. Now you may want to look at content delivery networks, and possibly multiple servers just to handle the outbound network traffic.
No matter what your requirements, though, you need to look at a good backup solution.
Oceania has always been at war with Eastasia.
I am on the board of an internet-based non-profit (I won't name it for fear of slash-dotting it 8^). We had about 3000 hits per day last year, so we're roughly in your ballpark.
We use a commercial host (1and1, if it matters). Costs us a bit more than $100 per year. Works just fine.
I strongly suggest you start with something like that. It will let you focus on the design and implementation of your web site, and will give you actual stats to use when planning your future.
HA means more than multiple servers. It means multiple internet connections, multiple geographic locations for your servers, and multiple administrators (among other things). Very complex, very expensive. A good commercial host will have very good availability and you allow them to deal with the complexity.
Good luck.
hoststated is called relayd nowadays
http://marc.info/?m=119713600504150
I agree with the cloud solutions, go Amazon unless buying servers makes sense for financial, legal, or control reasons.
If you are buying servers, buy Apple's Xserves. Easiest to use server configuration tools ever. Does everything you are asking for.
If you find the documentation for windows easier then use windows.
Reading about Windows 2003 Clustering makes the whole process sounds easy, while Linux and FreeBSD just seem overly complicated. But is this truly the case?
Having to work for a living is the root of all evil.
For ease of implementation and use I'm a big fan of Barracuda's appliance.
I would suggest getting two full virtual servers from a hosting provider, one for your DB and one for your web/applications server. This can be based on Xen or VMware, but that doesn't doesn't matter to you. Just let the hosting provider's high-availability clustering handle things for you. Add load-balancing in only when traffic levels require it - and with the sort of traffic you are discussing, you do not need load-balancing unless your code is really bad or user requests generate some really massive computations (such as some sort of online business intelligence or analytics).
Some bigger providers in this space are here, here, here, and here. Amazon EC2, which others have mentioned, may not be a good a fit for your proposed applications, since their storage model is stateless.
If this were my project - I would purchase a new Dell or HP server. Servers today have all the HA and redundant features built in, and when you have all your eggs in one basket, you need to have a very strong basket.
On this server, I would run VMware 3i Hypervisor (AKA VMware ESX) - this gives you the ability to access all the cores and ram on the modern server, otherwise you will have CPU cores sitting idle (irrespective of the os you install).
Run ~4 instances of Microsoft Server 2003 or 2008. Have two of the servers clustered or load balanced for the web services. The other two for AD/Email, whatever else you need.
Joel
Good security is based upon reality and common sense. Common sense is a function of having common knowledge.
Can you explain?
I'm looking for something to handle the virtual IP in case the server goes down, I'm wondering if heartbeart is overkill, and if carp can handle it in a cleaner way.
The "BEST" way to do this would be an application that is coded for session sickyness(that it if your site is databsae run where users login or if you have something like a shopping cart), you would build it out with a back end DB Active-Passive Cluster, then 2 or more active nodes in your HTTP server cluster with an Big IP F5 in front. Now this is quite $$$ but you asked what the best way is so there you have it.
Big drawback is ease of use... The interface was designed by the engineers and really sucks, but it can do a lot and comes in a redundant pair config, so you always have a standby unit to take ofver if the first one dies.
Keep passing the open windows...
I agree with OpenBSD using PF...then get yourself The book of PF ( http://oreilly.com/catalog/9781593271657/ ) This book is the most comprehensive PF documentation I've had the chance to read. But like many others told. 1000 hit per day on a server is nothing to be worried about.
There are a few different solutions to your question, and many of them depend in the Webserver used.
For Windows, you do not need to use clustering for the webserver: you can use Network Load Balancing. If you have a SQL back-end, then you will need clustering...
To explain:
Clustering is used for programs that can only be run in single instance (Exchange, SQL, etc...)
IIS (Microsoft's Web Server) can be run in multiple instances. Therefore you can use Network Load Balancing (NLB). With NLB, the request will be directed to any available server, if one goes down - the other is available.
NLB can be implemented (badly) with round-robin DNS - which will send the request to each server in sequence.
These same techniques can be used in Linux (see thread), and other Unices. For the Web-Front-end, use some form of NLB. For the back-end, use clustering.
At this point, I should point out that eventually you will have a single point of failure-unless you are very careful. (Do you have a redundant SAN? Are all etherent paths seperate and redundant? How about power to your server space?)
I should also point out that everything above is overly simplified.
First off, I wanted to mention that the 1000 is off by two zeroes. It should read 100,000. Secondly, for those of you who have not provided anything constructive, I give to you the one-fingered salute. It's amazing how much crap and how many assholes exist on this forum. All I wanted was some CONSTRUCTIVE advice and it seems like the majority of what I got was a bunch of bickering and a load of insults. Bleh... But to those who have actually posted some useful comments...I THANK YOU. :) Your insight has been helpful and for the most part, confirmed what I had planned already.
Less-geeky computer repair alternative for Lansing, MI
"I am working with a non-profit that will eventually host a massive online self-help archive and community (using FTP and HTTP services). We are expecting 1,000+ unique visitors / day. [...]"
Others have pointed this out to you, but 1,000 visitors is not much load at all. I work at a large university, and during registration first day of classes, we have 500 unique users (what you call "visitors") in each hour. On the first day of classes, we may get 1,000 unique users per hour as students look up their class schedules, and sign in to the registration system to drop that stupid class they were just in. We run a load balancer at the network level, so that traffic is balanced immediately at the switch, rather than at a host level before being sent to a back-end web host.
But doing the same in your case will be very expensive. If you work at a non-profit, you probably don't have this in your budget.
If you're just doing simple http and ftp (that is, not running a web application with a database back-end .. or an application that keeps "state" on the server, requiring users to always go back to the same server server they first visited) then you might consider the simplest solution of all: DNS round-robin. Simply put, you enter the IP addresses for two web servers (or ftp servers) for a single www entry in DNS. At the expense of hitting your DNS more frequently, you could set the TTL to 1 hour for the round-robin so that if server #1 went down, you could push an update to DNS so "www" just points to server #2, and users are only inconvenienced for about an hour.
But your best solution is probably just to outsource this, especially if you're only doing simple http and ftp. A good web hosting company already has this infrastructure available to you. No need to re-invent the wheel for just 1,000 users.
link?
Gravity Sucks
It is interesting that most of the posts so far have focused on ensuring that our original poster has sufficient business acumen to make the decision to build a clustered hosting environment. There are reasons other than straight margins why downtime for a website is an absolute no-go. For instance, I work for a medium sized data center. Although we do few direct conversions through the website, the embarrassment of that site going down more than justifies a clustered solution. I will assume that OP has done the math. That being said, I have had excellent experience with ultramonkey / ldirectord. Ldirector has a single primary conf file that provides for pseudo custom service requests to check availability. I have found this to be much more intuitive than Windows clustering services, however if you are planning to have IIS boxes using SSL you may run into trouble loadbalancing HTTPS traffic. The problem with 2 boxes and heartbeat only is that often times a box will stop serving websites but will not drop ping. You need a service that is smart enough to realize that a 404 page is not what you are looking for. That being said, custom validation queries can include SQL queries, SMTP, IMAP and POP sessions, HTTP requests that look for specific responses, etc. This would need to sit on a dedicated firewall in front of at least two identical hosts. Note this introduces a single point of failure - a philosophically sound cluster will have two identical firewalls running heartbeat. Another point of failure will be the switch providing link to these hosts. I would recommend redundant uplinks configured using VRRP to avoid lost availability due to a dead switchport. I can go on, and the scale of a cluster topology is limited only by one's imagination, but I think this is a good start. Josh Wieder Atlantic.Net
We've been using pfSense to load balanced 40 million HTTP connections to a cluster of servers. It's rock solid, has a nice fluffy GUI to configure with and has FreeBSD underlying with pf as the firewall. It's a pretty kickass toy.
Get some real numbers. If it's like the non-profit I worked with, it's around a thousand visitors a month. They were confusing hits with visitors. They also requested forums several times, but every time I got it up and running for them, they wanted it turned off because they were scared of them being empty, and not inclined to post/admin in the forums themselves.
Best approach is to start small and grow it as you need. Get the basic site up first, then add the forums, then the archive - see what they really do and don't need.
I'm out of my mind right now, but feel free to leave a message.....
If you need a more powerful cluster. (Well, 1000 users isn't that much still, but anyway).
- RHCS (Redhat Cluster Suite)
- SAN (iSCSI, cheapo MD3000i + GFS) - needed for RHCS.
- LVS (IP Load Balancer)
- KVM (Virtualization)
Of course you will need a bare minimum of 2 servers but 3 for best results, and this setup can be done using active/active. (Instead of looking at your "simple" solution by Microsoft Windows 2003 which is basically active/passive).
This is rather more complex than the other options above, but i've done it and it seems more robust, (plus using nginx for caching static content if necessary).
Unless your application(s) are horrifically written, you won't need a lot of hardware to pump that out. If you are really worried about high availability (and for those sorts of traffic numbers, I don't know if you really should), then make arrangements for a hot spare and plan on manually flipping the switch if the primary machine fails.
Evolution: love it or leave it
ok... maybe I'm missing something, but what's wrong with using httpd2 for proxying/load balancing ? It seems to provide both ?
lukasz
Had a client that had some outsourced rack space. They had spent some time ensuring relatively HA for their cluster. They chose a hosting provider that provided redundant network connections, UPS, etc. This is what happened: - There was a fire in the power conduits under the street taking down a big piece of the electrical grid - UPS kicked in - servers stayed up - Building was on a generator, so when power to the building when out, the generators kicked in - (lots of fuel for the generators) - Fire department showed up, and started to put out fire under the street - Hydrant use dropped water pressure - Reduced water pressure dropped cooling ability of the generator - Generator shut off to prevent damage - UPSs ran down quickly - Servers crashed hard Nobody included the city WATER supply in the redundancy plan.
www.pfsense.org forums.pfsense.org
You might want to check out CUDA. It lets you run parallel algorithms on a GPU, and you should be able to get hardware that can run significantly more than ~26 Gflops for less than $800.
There is a lot of good advice in the other posts, but so many are laden with other people's baggage filling in your missing data. Let me condense it for you to a real solution
I have set up high availability systems that are currently handling 18TB traffic a month, with many millions of page views, with systems that you can literally unplug the server handling the load and have a hiccup of less than a second. And I have done this with 2 servers.
Your 1000 visitors a day is something one server could handle the traffic for, as long as we aren't talking something boutique like streaming live HD video. But that is only half your problem - you want to be able to survive a catastrophie on that machine (someone accidentally kicking the power cord, etc).
First, I would suggest you do not want to handle this hardware yourself. I have worked with ServerBeach and RimuHosting, and would gladly recommend either for this setup. You can handle everything else though.
Second, you want two machines, pretty much anything in ServerBeach's category 3 will handle what you need.
Third, you need them in a particular configuration:
1) You want them each to have a publicly available IP (the references the box), then you want a floating IP between them (that will be the IP your web address uses). More about that IP later.
2) You want the two machines to have a second network card, and have a private network between them. (used for heartbeat and disk replication - see below)
3) you want to set up HALinux and DRBD.
HALinux is a software solution that will run on both boxes. One box is your 'primary' and the other is 'secondary'. The secondary box watches the primary one, and if the primary one fails for any reason, the secondary one takes over for it. It does this by pinging it as often as you specify (perhaps multiple times a second), and if it doesn't answer, it takes over its IP address. You see, that floating IP address I mentioned earlier resolves to the first machine, but the second machine can take it over (for this to work, they have to be on the same router). The downtime here is less than a second.
So that is all well and good, but the second machine needs to be able to run just like the first one. This is where drbd comes in.
DRBD is like Raid mirroring, but for two hard drives in separate machines. Everything written to one hard drive must also be written to the second for the write to be successful. Over a prigate Gig-e network, in my testing, the drives suffer about a 22-25% performance hit. All data - the database, the deployed applications, even the config files for all my services sit on this shared drive. If the first machine fails, the second machine has all the data it needs to take over the job.
I have set up exactly this setup more than once. And despite everyone here laughing at your "1000 users" figure, high availability isn't about scalability - your 1000 users might be worrying about something so important this setup is peanuts to them compared to the lost time if you have to spend 15 minutes jerking around with a server problem. I enjoy working on these systems because I can fix problems outside of a crisis mode, since there is always a machine ready to go.
If you'd like help with this, or if you'd even like someone to set it up and host it for you, I'd be happy to help. (dbock at codesherpas dot com)
Don't spend your money on purchasing 2-6 servers... seriously - look into what 2 decent machines in this setup will cost at ServerBeach, and also think how much easier this will be if they handle all the physical stuff for you. The configuration details are something you can handle yourself, and it is not that hard if you are comfortable at a command line prompt.
No mention of Linux Virtual Server?
It's not exactly easy to set up, but it provides all possible types of load balancing and even the load balancer itself is HA'd by heartbeat (in a 2-node LB cluster).
Downtimes can be reduced to single seconds.
(My LoadBalancer cluster switched if the master LB didn't check in for 5 whole seconds)
Webserver reply times can be just as tight.
Client sessions can be bound to the answering webserver and the bindings can disappear when the designated webserver dies.
"I was in love with a beautiful blonde once, dear. She drove me to drink. It's the one thing I am indebted to her for."
16GB? Are you mad? Anything beyond 1GB should be enough to handle 1000 unique visitors per day. If you want to virtualize the system and have a separate web- and database server, 2GB should be enough already, if you ant to go further and have a separate virtual mail server in there, 2GB is still sufficient and 3GB is plenty.
My site gets 50,000 visits a day (>100 req/s peak) and I do just fine on a single server with 16 GB of RAM. And that's probably more than I really need -- I could make do with 8 GB, and less if my application (vBulletin) were more scalable.
MediaWiki developer, Total War Center sysadmin
We use a pair of Coyotepoint Equalizer E250 appliances for our web load balancing. About $5,000 for the HA pair, but its about the easiest load balancer to install and run that I could imagine...so if you are more worried about the ability to support and maintain the system than you are the cost then this could be a better choice than building your own from open source tools.
on a 250 Mhz machine with 128M RAM on a 128k leased line, back in 98. Hand coded Perl on Apache, msql at first then mysql, on Linux.
An OLPC's gotta be better than that.
Does it work well enough for prod use?
Get this book: Scalable INternet Architectures. Theo will tell how how to approach the problem.
For the volumes that you are talking about, you don't need a huge architecture, unless something is serious funky with your application. You are 3 or 4 orders of magnitude away from a having a hard problem to solve.
I was taught to respect my elders. The trouble is, it's getting harder and harder to find some.
Citrix Netscaler. All high end web hosting companies use it. If you are thinking about Foundry or a custom linux box, don't. You'll see the limitations after it's implemented- and by then you'll be screwed.
RAM is cheap. Upgrading a running server isn't.
A) 1000 a day is fairly small. I server 12,000 unique logins per day with 1 web server (multiple back-ends, so point b)
B) Rather than cluster the entire application/site, it is usually better to separate the applications and processes and give them either their own virtualized server space or their own physical server.
Database on one server
Middleware/application on another
Static content on another, etc..
Not only can you figure out bottlenecks easier, but when/if you need to upgrade, you are putting resources directly where they are needed.
In terms of high availability, (in addition to the usual hardware duplications and backups/failovers, etc..) I would recommend virtualizing all your services into something like ZFS containers or vmware.
If a server dies, being able to quickly transfer a virtual zone (from backup) to a new server is very nice.
You get a lot more bang for your buck using LVS and custom scripts (if you need them) than you do with F5 boxes.
I know that quite a lot of teenagers have "self-help archives" but they have more to do with causing Repetitive Stain Injury that what I assume your aim is. Having said that, being worried about "up"times takes on a whole new meaning.
Note to grammar Nazi's: I know the "r" is missing. Think about it. :-)
www.pfsense.org?
I'm really tired of reading how 1,000 users is peanuts. You have no idea what those 1,000 users are doing. Maybe he's running Tomcat servers and he's got 1,000 customers using some financial or clinical java application that requires five 9s? I realize you can serve pictures of you in a cat suit humping some other furry at a convention on your tandy II to 12 people per week and extrapolated some figures, but you really have no concept of his requirements.
Just bored and had a random thought for super cheap super easy "manual HA" (oxymoron?). Round robin DNS is usually a hassle because of TTL values. So let's setup something using round robin DNS, no need for front end load balancer where each node essentially acts as a peer. Basically if you have a few nodes today using round robin DNS you can implement this for free, today. The advantage is you won't have to wait for TTL values to be flushed from DNS caches, so when you have a problem you can remove it immediately.
Assuming linux/bsd/etc:
1. create ifcfg-bond0:1-n on every host, each of which represents a node in the cluster. Now each node can start a sub-interface and answer requests for any other node. Also create one specific iface for management on each node.
2. Setup round-robin DNS: add IP address of each node to the A record for www.
3. If one node is down, or needs to go down, just start the appropriate ifcfg-bond0:X iface on another node and if necessary shut that iface down on the node that needs work. Use the management address you created above to perform any maintenance and get it back online.
Of course this could all be automated pretty easily using a heartbeat. Each node tests the services on other nodes, nodes all agree when a specific node has died and determine the appropriate host to take over the load.
Lets get real and put this into perspective.
Any Pentium I which is 200 mHz and up can do this.
Whether FTP or HTTP, this is still just file sharing. A Pentium I can fill a 10 Base-T Lan with no issues at all. In fact they can probably get close to filling a 100 Mbit/second Lan. One needs to test this of course in the application at hand. This is easy to do.
T3 runs about 45 Mbits/second and this corresponds to DS3 (Digital service level 3) In North America this is equivalent to 672 DS0 channels each of which is 64 Kbits/second (8192 Bytes/second) not counting stolen bits.
So a T3 is "about" 1/2 of a 100 bit/second ethernet LAN connection.
http://en.wikipedia.org/wiki/T-carrier
Carrier pricing for a DS3 will be about $5,000 per month.
The rest of this picture is dependent on how good or how badly the server side is set up. My point is that even a 10 year old Pentium I can handle the load.
As most of the posters have indicated 1,000 unique visitors are easily accommodated, it is still nothing to sneeze at when it comes to supporting a business critical resource. When things go well then you can easily multiply the number of unique users.
Option 1. Using a predefined load balance solution such as HA Proxy is nice but still leaves you with a single point of failure (the Proxy), distribution tools and resources are abundant. Common ones are via virtual addressing (cisco IOS functions) but these still leave you with a single point of failure.
Option 2. DNS based HA, remote servers etc. With such a small load your capabilities could easily be supported by two servers and using a DNS based failover. Many DNS Sources have just such an option, look at your DNS provider for the specific options. Since most DNS sources have some hardening, the options are fairly solid and the single source of failure is mitigated.
Option 3. Virtual environments. Using a virtual environment you have a great flexibility with both service and failover. Resources such as The Grid (http://www.gogrid.com/) or Amazon Elastic Cloud Computing service (http://aws.amazon.com/ec2/). The benefit of cloud computing is uptime. Expecting failures is a good practice, and cloud computing allows you to expect the failure and if one occurs then the cloud can be programmed to automatically spawn a new instance or using metrics at a predefined load the cloud can be programmed to spawn a new instance and start forwarding unique requests to the new instance.
It is easy for us Linux loving types to be very fond of the home grown solutions. There are nice boxed solutions out there that solve your issues. In the end it all comes down to time and money. Since in many cases time is money, then money and money are your issues. You either pay for it up front or on the back side. While you can do just about anything with an open-source solution, your biggest factor is going to be expertise. There is allot of expertise available to help you with that need. In the case of closed source solutions, it may seem like an out of the box success but always be leery of the difference between how much work you have to do and how much the package costs.
800 bucks can buy you great hours of CPU on Amazon EC2 isn't it ?
You forgot to mention LVS
I'm not insane! My mother had me tested.
2 boxes for hardware failover will do you fine, if you are worried about HA the its the COST of downtime that you are worried about (i.e. down for an hour exceeds $1000 in lost revenue) which will justify the solution. Don't just drive availability to five nines because you feel its cool, do it because the business requires it.
This is something that is rampant: techies tend to overestimate the value of uptime.
Sure, it's sexy to have high availability this and redundant that, but unless your company is pulling down at least $1,000,000 per year or more in gross revenues, it's hard to beat the 3 to 4 nines or so uptime delivered by a good quality, whitebox server running Linux. Something like this unit would deliver excellent performance and excellent reliability at a very low cost.
How much does an hour of downtime actually cost your company? Be honest. If you had to tell your customers: "we were down for 2 hours because a software update caused us to have to ..." what would it actually cost your company? Especially if it only happened every year or so? In my experience, even in fairly stiff production environments, there has been no cost at all. We've maintained about 99.95% uptime for the past 3 years, with 1 "incident" every year or so, with no cost at all. In fact, our company has a good reputation for availability and support!
So don't spend money on sexy hardware with lots of blinkie lights and cross-connects, which often decrease your reliability by introducing unnecessary complexity.
Instead, spend money on your hosting. Don't *ever* host it in-house. Ever. Get a first-tier hosting facility, with redundant network feeds, power, and staff who give a damn. Don't be afraid to pay for it, because it will probably save you money, anyway. You'd be amazed at how price-competitive top-notch hosting farms can be!
Make sure to get to know the on-site techies on a first-name basis, give 'em a six-pack of their favorite beverage, and thank them profusely when they do anything for you. The goodwill these types of things can bring will work wonders for you down the road.
And remember:
2 nines is 3.65 days of downtime per year. .365 days of downtime per year (~ 8 hours) .0365 days of downtime per year (~ 45 minutes)
3 nines is
4 nines is
It's a very, very rare case indeed where 3-4 nines of uptime isn't completely sufficient.
And 1,000 unique visits per day? Pssht. Unless you are doing some pretty ferocious database stuff, (EG: joins across 12 tables with combined inner/outer/composite joins) the aforementioned server should do the job just wonderfully.
DON'T FORGET BACKUPS! And backup your backups, because backups fail, too.
I have no problem with your religion until you decide it's reason to deprive others of the truth.
Buy lots of RAM. I run two commercial java web apps, and I need it.
Cpu0 : 1.7%us, 0.3%sy, 0.0%ni, 89.4%id, 8.3%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 0.0%us, 0.0%sy, 0.0%ni, 99.7%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 0.0%us, 0.0%sy, 0.0%ni, 99.3%id, 0.7%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16440536k total, 16321188k used, 119348k free, 393412k buffers
Swap: 12144632k total, 44k used, 12144588k free, 3390756k cached
You have two choices:
1. Provide "real numbers" of website visits and expected volume growth.
2. Stop wasting time on irrelevant things and setup the host for access to content.
I'm guessing your number one concern will soon be providing a usable interface to the content. Don't read any more comment/responses to your question and get to work on how to present it.
I've been using Pound for a few years now without any major issues. It also gets updated frequently enough, and you can use it as an SSL accelerator if you have the hardware for it.
theplanet.com has a shared load balancer service they offer.
If you are looking to do this in house, I would suggest getting either a serverIron, of trying some kind of software load balancer.
Unless you get competent, experienced, help. This is going to sound snarky, but seriously, if you think you need multiple servers for 1,000 users per day hitting a help archive, you do not know enough to setup a server properly.
If I were you, I'd consider looking into the new hosting platforms built on cloud environments.
You should be able to find some online web hosts who charge per-use in a clustered environment so you don't have to bother about setting up your own EC2 servers (or whatever) yourself. Leave it to the experts as it were.
Some places to check out:
RightScale
ScaleMySite
GoGrid
My company uses a cloud hosting provider and it's been great not even thinking about architecture as our website hosting needs have grown.
http://www.barracudanetworks.com/ns/products/balancer_overview.php
There's an incredible amount of ways to architect for application performance and redundancy. If asked, customers will say they want as much uptime as possible. I have counterparts in my line of work that spend 1 million dollars a year for this. I on the other hand spend about zero dollars a year, for just a little bit less.
Lives and significant dollars will not be lost if the applications I manage are down for 15 minutes a month. People are definitely inconvenienced, but not dead.
But on the other hand, I do have a requirement at my organization to provide geographical redundancy in case of catastrophic failure at a single site. Simple DNS changes with shell scripting has sufficed for this.
So, you have to evaluate what it is that you're targeting and be able to provide an assessment of cost vs. benefit. My guess is that with your limited requirements described, software load balancers will be the way to go.
But please remember that when Microsoft says it's easy, they mean they only provide you an interface to the easy stuff. When you have to do the hard stuff, there just simply won't be a button for it.
j/k.
I see you're still too modest to plug your own awesome code, so I'm doing it for you.
-UCARP for a virtual shared ip. -DRBD for a shared filesytem between two hosts. Typically you use this for the /www folder
-CSync2 for syncing configuration files.
That way UCARP triggers apache to start on the passive host, when the master fails.
Do you possibly work for Proto co??
Do you possbly work for Proto Networking?
Perlbal is still going strong too.
I will use Solaris to solve my HA dilemma.
It's gratis, and so it the Sun cluster software.
If you want to go the open source route, then you could also use Solaris express community edition, with the open HA cluster (which is open source version of the Sun cluster software).
Also look at project "Colorado" at
http://opensolaris.org/os/project/colorado/
Solaris seems like the most obvious choice for high availability clustering to me, because it's enterprise grade and the software is gratis.
Implement your stuff on an nVidia card using CUDA.
Applications have gotten to be so complex, it can be difficult to make all of the dependencies high-availability. And as we know, the chain is only as strong as its weakest link.
My current client just deployed a state of the art HA application. Oracle RAC enterprise with hot standbys, huge Weblogic clusters, F5 load balancers, datacenters in two different geogrpahies, each with redundant connections. This application is rock solid--except for one little wrinkle.
The application depends on an ancient, crotchety legacy system. Naturally, I informed my client that they needed to upgrade the legacy system. "We don't throw out perfectly-functioning systems." "But it isn't highly available--it's running on unsupported hardware using an unsupported version of Solaris, and the database resides on a single Oracle instance. That is a single point of failure."
After much back and forth, the client elected not to replace the legacy system. You can guess where this story leads.
Their shiny, new whiz-bang application goes down once or twice per month due to legacy system outages. In the end, as you might of guessed, the client just decided to live with the downtime. I still don't understand why they could find so much budget for the new application, but couldn't be bothered to do something about this duck-taped, old legacy app.
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
AMD X2 4450e
2GB DDR2 800
GF9800 GTX+
MSI K9 Motherboard
all for about $400. double this. play nice.
Hey there Julian, what's up?
Probably HA not load
1. What will each user do? Unless they are modifying a data set with many interrelationships, where small changes can trigger large recalculations and updates, I doubt 1000 or even 10000 daily visitors (even if over a short time window) will require clustering to provide satisfactory performance. If it is not the performance under load that you are concerned about, but rather that you can ensure near 100% uptime (whether load is low or high), then clustering is the wrong solution.
Redundant dedicated servers
2. With a low load (1000 users for a typical web application) but HA requirements, I think you're best bet is to place a server in two different data centers. The data centers should be in different cities, belong to different companies, and utilize different backbone providers. Once you have selected such data centers, either rent dedicated servers (a good way to start), or go with basic colocation. At any one time, the server in one data center will be "active", the other on "standby". The "standby" server permits reads but no writes, the "active" server permits both. For data in a database, use transaction log shipping from active to standby to keep the standby up to date. For data on disk, use rsync from active to standby.
3. Various techniques are possible for failing over to the standby server in an emergency. You may want to use DNS round-robin stuff for this.
No FTP
4. Do not use FTP for anything. The only exception would be if you wanted to allow anonymous read access to FTP, that is okay.
http://www.slicehost.com
get 2 512MB vps with nginx or lighttpd.
that should be more than enough for what you want and they even have a tutorial for a basic HA setup.