Implementing a Load-Balanced Webserver?
Amoeba Protozoa asks: "How do I implement a load-balenced, layer 4 switching web-server? Would it be possible to mix O/Ss? Besides your incoming bandwidth, where do the bottlenecks occur? I would prefer to use Apache, Linux or BSD, and be able to utilize mod_perl or PHP to access a shared MySQL database. I would like to make this setup as scalable as possible."
There's all sorts of load balancing.
You can have 2 or more machines (with different IP addrs, naturally) using round robin DNS to answer to the same name.
You can have seperate boxen for HTML, graphics, and the database. Depending on what you're trying to accomplish, you may be able to split the db among 2 or more boxes.
If you've got (for example) one machine as HTML and a different one for the db, it really doesn't matter what OS their running.
What do you mean by layer 4 switching? The last time I saw that buzzword was in brochures for networking h/w, i.e. glorified ethernet switches. Save yourself a few bucks and just stick everything on 100BaseT. If you can afford a network connection that makes 100BaseT a bottleneck, then you can afford to hire someone to do all this for you.
Oh, and as to bottlenecks... Take a look at some of the tutorials about reducing the storage requirements of your GIFs. It's amazingly easy to shrink your GIFs, and that's probably the cheapest way to optimize a web site.
Yeah, this is vague, but your question is a bit vague too. Don't take my word for it, set it up and test the hell out of it.
Of course if you want to use GIFs you'll have to pay the lovely Unisys $5000 license fee. Try PNG format for your images. It's just awesome.
Network Address Translation is a good way to build a Web cluster, there is several benefits with this architecture : - Your can add and remove web servers from the cluster at any time. - The traffic is evenly distributed to each server (that is not true with DNS round robin because the clients caches the addresses localy). But the bottleneck is on the device(s) that do the address translation.... http://www.csn.tu-chemnitz.de/~mha/linux-ip-nat/di plom/node4.html#SECTION00043100000000000 000 You could do a reverse proxy with SQUID too....
Ya know, it would help if people did simple google
searches before posting to askSlashdot. But I guess that's asking a little much.
http://www.linuxvirtualserver.org
RedHat also has Piranha, but that's (IMO) a cheap
hack made to meet a deadline.
If you like Apache, check out mod_backhand. It is a module load-balancer that is under development (but works well now) over at The Center for Networks and Distributed Systems at Johns Hopkins.
It is a module that incurs almost NO overhead. You can mark directories or locations with Load Balancing policies and BOOM. That is it. It communicates with other Apache servers via multicast and handles the rest. You can even plug in your own decision making algorithms. It is super simple to load balance cgi-scripts to some machines, mod_perl database script to another set and images based on a completely different policy. Or just use our default
It curently runs under Linux and Solaris, but the next release will support BSDI as well.
It is a software solution that can be combined with any hardware solution you choose (if you need that too). You can't loose with this. The install process and set up time combined is very minimal.
Of course, I am a little biased
-- Theo Schlossnagle
Having server redundancy and load-balancing is nice and there are more than just a few options nowadays (free even) that let you do this.
But to be honest, the Internet is always the real bottleneck. Once you try and provision your first DS3, you will figure this out.
If your primary concern is scaleability, you need to find someone who understands network design issues, including but not limited to:
Networking protocols like BGP4 and STP (Spanning Tree is very important), VLANs/Trunking/EtherChannel/ISL, all switch and router software/hardware specifications/bugs, Provisioning of Telco circuits, Public and Private Peering, etc.
If you don't understand how the Internet works and how to build scaleable networks (and this takes years of experience), then Knowledge is your biggest botteleneck. I believe someone had stated above that beyond 100BaseT, you need to hire someone that knows what they are doing. Well, that sounds about right.
Eddieware is somewhat nifty.
At least worth a look, anyway.
http://www.linux-vs.org. It uses IBM's technique called direct routing. They say it'll scale to over 100 (or 150??) computers in a cluster. It can also load balance with tunnels for remote systems.
Scalable always means different things to different people. If you mean that you want to saturate your T1 with static web pages before your server goes down, all you really need is a standard Linux box.
If you mean "I want to run e-bay", then you should check out an application server. My personal favorite is Dynamo [disclaimer: I work for them] but there are cheaper solutions if you don't have the cash.
You go to this url: Linux Virtual Server Project and download the tools neccessary to build a load balancer based on Linux. This replicates the functionality of the Cisco LocalDirector, which is your other option (pricey $8k). People will tell you that the LocalDirector blows, and compared to the $18,000 load balancers, it does, but for what it is designed to do, it does it's job well. Just as the Linux VS project.
Database servers are often the bottleneck. I think people are nuts for load balancing their web servers when it is the database servers that are most often bogged down. A 486 running PHP can saturate multiple T1s if the database back end is fast enough (I've done it).
I think that layer 4 switching is the way to go. Automatic load balancing between multiple servers. Server failover support (if a server goes down, it no longer gets requests).
We will be evaluating some shortly... we'll see how it goes.