Slashdot Mirror


Load Balancing Heavy Websites on Current Tech?

squared99 asks: "I have just delved into some research on a set up for very high traffic websites. I'm particularly interested in how many webservers would be needed at minimum and the type of technology powering them. Slashdot seemed like a good sample site to check out, so I went to Slashdot's technology FAQ to get a starting point. This setup seems to be from 2000, is most likely a bit out of date, and I'm assuming the same number of webservers would not be needed with current server technology. What would experts in the Slashdot community recommend as a required setup to handle Slashdot-like volumes, if they had to do it today using more current hardware? How many webservers could it be reduced to, while maintaining enough redundancy to keep serving pages, even under the heaviest of loads?"

63 comments

  1. hardware by schnits0r · · Score: 1

    Anything fairly new should be alright, I think the big problem is your pipe size. I mean if you have 30,000 new connections and only 300 kbs, its not goign to transfer data very well.

    1. Re:hardware by magefile · · Score: 1

      Hey, it's not the size of the pipe it's the ... eh, screw it.

  2. It depends by Anonymous Coward · · Score: 0

    On the website hosted. Many dynamic pages with content coming from a database? Or just loads of static pages?

    1. Re:It depends by WebCrapper · · Score: 1

      I know the topic is mostly about hardware, this is actually a pretty big thing that can get you in trouble.

      If you're serving out static content, you need to make sure that your graphics are low in size whenever possible. For instance, you don't need a full page graphic like the top of the reply area here on slashdot, just the 1 corner graphic and the rest is a table color.

      Going past the basics and assuming its dynamic content, you'll need to make sure your code is optimized (ie: not the type of code someone just slapped together). I can't tell you how big of a difference it can make when I take a page written when I started programming in PHP and adjusting it now that I now a heck of a lot more. Also, if you find that certain pages are more popular and your data isn't changing hour by hour, just cache the results of the page into static content. Every hour or so your cache can be dumped and setup again if needed. This stops a lot of reads off the server for the same exact data.

      You'll also need to optimize the database as much as possible.

      General things that can help with dynamic pages are simple things like timedate calls for pages - let javascript do that (although i don't like to use it, it takes a very, very, slight load off the server), etc...

      I have seen a lot of people say that they need better hardware when they just need to optimize their software and/or increase the pipesize. I even ran into one person that refused to use a specific database because it was killing his site (he forgot to change the max connections...).

  3. One Word by cuntzilla · · Score: 2, Informative
  4. Prime Example: wikipedia by dyftm · · Score: 5, Informative
    1. Re:Prime Example: wikipedia by Anonymous Coward · · Score: 1, Insightful

      Since I can't reach the wikipedia server around 2 out of three times I wouldn't call this a successfull example

    2. Re:Prime Example: wikipedia by joebp · · Score: 2, Insightful

      72 servers and it still runs slower than any other website of its popularity.

    3. Re:Prime Example: wikipedia by FooAtWFU · · Score: 3, Insightful

      Well, few sites of that popularity are quite as 'read-write'. When you have people submitting edits to articles every second, things get a little trickier.

      --
      The World Wide Web is dying. Soon, we shall have only the Internet.
    4. Re:Prime Example: wikipedia by Anonymous Coward · · Score: 0

      So they should engineer around the problems instead of throwing hardware at it. Disk access is expensive. Keep writes in memory and push them to disk every X seconds. Need I actually say "cache read requests"?

      Seriously, they have very specific needs, they should made a custom database that handled the problem.

    5. Re:Prime Example: wikipedia by dubl-u · · Score: 1

      So they should engineer around the problems instead of throwing hardware at it. [...]
      Seriously, they have very specific needs, they should made a custom database that handled the problem.


      They? It's an open source project, pal, funded and built by people like us. If you think some chunk of work needs doing, you can step up to the plate.

      And if you'd rather just go on leeching, then maybe you could change your attitude from, "Dudes, this gift sucks! What were you thinking?" to "Thanks for the cool free thing!"

    6. Re:Prime Example: wikipedia by FooAtWFU · · Score: 2, Insightful
      Exactly. MediaWiki and the Wikimedia sites are put together with off-the-shelf components: Apache, PHP, MySQL, Squid, and a few caching systems for various data whose name escapes me at the moment.

      A complete-and-total system rewrite in something that's not PHP would do wonders for efficiency, but the development manpower is not there- it would take an enormous amount of effort to get it usable, let alone useful.

      --
      The World Wide Web is dying. Soon, we shall have only the Internet.
  5. Test, test, test... by PaulBu · · Score: 3, Funny

    ... and you've just missed your greatest opportunity for this by not providing a link to your website! ;-)

    Paul B.

    1. Re:Test, test, test... by squared99 · · Score: 1

      That's cause I'm still scoping the architecture :) But I'll be sure to come back when its up. As has been said in other slashdot posts, I'll include references to Natalie Portman, SCO, and the RIAA. that ought to do it.

  6. Google for "Virtualized" or "Utility Hosting" by matheny · · Score: 2, Interesting

    Many sites are moving towards utility based hosting or virtualized setups. The problem with high capacity sites is that you often end up having to purchase enough servers to deal with peak time, but don't need the servers during off hours. Utility based hosting services charge you for what you _use_ and allow you to scale as needed. Savvis (http://www.savvis.net/ I know offers a utility hosting platform based on Inkra, 3Par and blade servers. IBM has a similar setup.

  7. Take a look at livejournal's setup by Artega+VH · · Score: 2, Informative

    Akamai for static content and take a look at livejournal's setup for dynamic content (master-master replication based on mysql).

    Other people are much more qualified than I to answer the number of servers questions though.

    --
    groklaw, wired and slashdot. The holy trinity of work based time wasting.
    1. Re:Take a look at livejournal's setup by Cecil · · Score: 2, Insightful

      Please god, don't *ever* duplicate Livejournal's setup. It's a horrible, nasty hack and anyone who uses Livejournal will tell you that it doesn't work very well either. Although it's gotten better in the last year or so. But that's way, way, way more computing power than they should need to run that site. It's mostly a sign of a system that expanded without any real future-proof planning at all, which isn't really their fault, but if you have the opportunity to think it over and actually plan things, please do it better. You'll thank yourself later.

    2. Re:Take a look at livejournal's setup by RussGarrett · · Score: 1

      Care to justify this? How would you do it then?

      I'm seriously interested - we're in the situation LiveJournal was 3 years ago, and I'm interested to know whether there's a better way than going the way they are.

    3. Re:Take a look at livejournal's setup by Anonymous Coward · · Score: 0

      Just hire people who know what they are doing rather than relying on opensores perl hacks. In 2005, there's many people who can build scalable sites.

  8. RTFB by Anonymous Coward · · Score: 1, Informative

    RTF Server Load Balancing by Tony Bourke. After reading that book you will at least know what you need to look for. Also, you can outsource your load balancing if that is optimal for your needs using something like the Akamai's servers (Microsoft.com uses Akamai, Netcraft confirms).

  9. Pound by slashflood · · Score: 3, Interesting

    Take Pound, a few web server machines, a database server and a NFS server (no Coda, AFS or GFS needed in most cases) and you should be set. This is a setup that I installed for a high traffic website and it is very stable.

  10. Use Slashdot's current tech guide by Safety+Cap · · Score: 1, Funny
    I would start off by examining Slashdot's updated technology guide.

    From those, you will get an idea of the type and scope of technolgy the slash teams use to maintain one of the world's most popular sites.

    Granted, your team is not as stilled as the crack techs at /. central, but the specs on that page will get you pointed in the right direction.

    --
    Yeah, right.
    1. Re:Use Slashdot's current tech guide by Anonymous Coward · · Score: 0

      503 Service Unavailable The service is not available. Please try again later with the link you gave. :(

    2. Re:Use Slashdot's current tech guide by Anonymous Coward · · Score: 0

      You're right, slashdot techs, please add "Quit being a dick, ereet hekzor, you breakink ze site!" to that error document.

  11. Some more considerations by slashflood · · Score: 0

    It depends on what you wanna do with a load balanced webserver. If it is servering heavily dynamic content (like PHP), you need good processing power in your webservers. All the webservers should have two gigabit network ports. One should be connected to the switch that is also connected to the load balancer and the other port should be connected to the backend switch, where you can find the database and NFS server. Put some more RAM into the backend servers. Tweak the NFS server settings (rsize, wsize). Try different MaxClient settings in your apache configuration, but don't overdo it, because the limitation is not the CPU, but the I/O.
    Most important: Use mmcache, if your site is based on PHP! If I'd turn off the mmcache, our site would be unusable. The performance increase is awsome.

    1. Re:Some more considerations by DrZaius · · Score: 3, Informative

      NFS sucks. Use something like CVS to keep your webroots in sync and have each server host it's own copy of the content.

      You get to get out a massive single point of failure (the NAS) and you get a little closer to linear scalability (adding another webserver doesn't put more load on your NFS box).

      --
      -- DrZaius - Minister of Sciences and Protector of the Faith
  12. database is the bottleneck by sevinkey · · Score: 1

    I've worked on both the Windows and Linux development sides of a shop that receives about a million hits a day on each side. On both sides, the bottlenecks where always the database.

    Both sides used pretty much the same setup for webservers... 4 load balanced webservers with hyperthreading at around 3ghz (at the moment... always around 75% of the fastest processors out there to save money). These are sitting in datacenters with multiple 10gbps connections, and each has a hotswapable copy of the entire system running at another data center.

    I've found SQL Server can take quite a bit more abuse than replicated mysql, but mysql is extremely fast. However, we have 4 admins for the mysql servers. I myself admin the SQL Servers in addition to my programming and tech support roles. Biggest downside to SQL Server is price.

    My suggestion: tread very lightly on the database and you'll be able to handle more load than you'd expect. Cron static pages off the database when possible, or have static pages generated automatically when you update your database. Also look into caching mechanisms for frequently used data.

    And as in an programming project, profile and tweek and measure and patch.

    1. Re:database is the bottleneck by jbplou · · Score: 1

      This should be qualified with the database can be the bottleneck. Depending on how often the data is updated he may be able to cache a large amount of the data on the webservers. A .NET example would be caching a commonly hit data table or object collection even for just a minute or 2 can reduce a large amount of database traffic.

    2. Re:database is the bottleneck by Anonymous Coward · · Score: 0

      For caching mechanisms take a look at Quickshift, especialy for MS SQL Server performance. I guarantee you'll NEVER have an I/O related performance issue. Great technology, no one has heard of them yet, based in Austin, Tx, but will be a big winner re: instant SQL Server Performance improvements.

      www.quickshift.com

    3. Re:database is the bottleneck by golgotha007 · · Score: 0, Troll

      First of all, any person that talks about 'hits per day' isn't someone who works with high traffic websites. To folks like us, it's all about page views, uniques and sometimes impressions.

      we have 4 admins for the mysql servers.
      What the hell for?

      each has a hotswapable copy of the entire system running at another data center.
      with proper failover, this is pointless and wasteful.

      Cron static pages off the database when possible
      I couldn't agree more.

      I'm doing on average of 250k page views per day. My website is heavily database driven (all reads, almost no writes, but with some complex queries). Here are the specs:
      Connection: Dual Gigabit Ethernet (1000Mbps) fiber-optic connections to AboveNet and XO Communications.
      2.4GHz single processor P4.
      1GB RAM
      mysql-4.1.3
      php-4.3.8
      Apache/2.0.51

      I'm doing more traffic than you, and it's all done on a single processor, single machine. This machine is so bored, the load never breaks 0.10.

      I actually have another machine ready to act as a secondary webserver, but I just don't need it! Right now, I'm just using it for backups and failover.

    4. Re:database is the bottleneck by sevinkey · · Score: 1

      the main reason for the redundant system is in the even Level3 loses connectivity. We provide billing services for 40,000 websites, and we can't ever have the system go down for more that 5 minutes. Our customer support department gets swamped in a hurry.

      The 4 database admins are for the 7 different clusters, and these systems do more than provide for webpages, they do spidering for site changes, password management, etc, etc.

      so I suppose this kind of a system is beyond the scope of the question asked.

    5. Re:database is the bottleneck by onepoint · · Score: 1

      >>First of all, any person that talks about 'hits per day' isn't someone who works with high traffic websites. To folks like us, it's all about page views, unique and sometimes impressions

      Sad part about your statement is, that due to lack of industry standardization people confuse all the terms.

      I myself prefer to use the terms that were published back in 2000 by a company called net genesis. written by Matt cutler . title of the work is called e-metrics - business metrics for the new economy ... here is the link http://www.emetrics.org/articles/whitepaper.html

      I use the above paper as a point of reference so that everyone can understand the terms I speak in.

      Onepoint

      --
      if you see me, smile and say hello.
  13. Foundry ServerIrons by bahamat · · Score: 1

    We use Foundry ServerIrons. We have two of them set up in an active/standby configuration. We've got approximately 35 servers (5 services) load balanced between the SI's, and average traffic over 50Mb/s just to those services. The SI's are very robust, and I'm quite pleased we got them.

  14. Quizilla.com by Xunker · · Score: 1

    I run quizilla.com, a pseudo-entertainment site that does 60-70 million pages a month, at least 2/3rds being dynamic database backed.

    The site faq has the grity details, but basically everything is running on 8 web servers with a cluster of 4 database servers. Mod_perl is used for the most highly trafficed pages, though some less used pages are still static CGIs.

    For the way I have it set up, this farm has reached it's limit with the web servers getting pegged pretty constantly during peak hours, and the database servers aren't far behind (mostly due to lack of ram).

    The site makes heavy use of Memcached as well as a homebrew ghetto load balancing system based on apache mod_rewrite and some ansilary code.

    If I had my druthers, I'd keep the number of machines but have the web heads be 2.8-3ghz Xeons or Opterons with 1.5 GB ram each and the database servers could be dual 1.8ghz xeons with at least 3GB ram each. Idea memcache would be at least 2GB, but more is always better. From my guess, a setup like that would run my site at 100mil quick pages a month, instead of like now where pages often take 5 seconds or more.

    One big things that you don't really notice until you try to make things on this scale is that optimization is king -- optimize the hell out of your code. A stray regex might not look expensive, but when it's happening twenty times a second on every machine it quickly adds. up.

    Code is almost always the weakest link in a big cluster in that seldom are things sufficiently planned -- I've had huge growing pains since I never planned on scalling past one machine so when i had to move to 2,3,4 and up to 8 is has been a real hassle making things work "right" in a massive cluster. Plan for clustering from the get-go if you even have the slightest inkling it will do high traffic volumes.

    --
    Hilary Rosen's speech was about her love of money and her desire to roll around naked in a pile of money.
    1. Re:Quizilla.com by Peter+Cooper · · Score: 1

      And not just logic code either, but stuff like SQL makes a massive difference if your app is DB heavy.

      In some applications there may be a valid argument for denormalization to reduce load. In others, the SQL was cobbled together by someone without the adequate experience and it's pounding away at the database (one such occurrence is when using subselects with MySQL 4.1.x.. it can prove significantly faster to split out subselects and pass them into another DB call using regular code.. since MySQL does not properly optimize them).

  15. Your question cannot be answered by Guspaz · · Score: 3, Insightful

    It is impossible to answer your question unless you define "heavy" traffic.

    Some people might consider a hundred thousand pageviews per day to be heavy. Others might consider a million pageviews per day to be heavy.

    From experience a hundred thousand for a reasonable application can be handled on one server. A million would probably require 2 to 4.

    1. Re:Your question cannot be answered by Anonymous Coward · · Score: 0

      .. And some of us would just smile at those numbers, thinking about the aproximately 8000hits/sec our servers receive at the moment.

      One hundred thousand pageviews on one server? That's a tad low.. only about 1.1 pageview per second. If the code is reasonably well coded - you should get that up to at least 10 per second.

    2. Re:Your question cannot be answered by dubl-u · · Score: 2, Insightful

      It is impossible to answer your question unless you define "heavy" traffic.

      Amen to that.

      Step one is to figure out what you mean by heavy traffic. Slashdot is probably at a couple million pageviews per day, and Alexa tells us that there are nearly 1500 sites bigger. A top-10 site will get circa 1000x what Slashdot gets.

      In step two, figure out what kind of traffic you're dealing with. Most of Slashdot's page views are probably just hits on the front page or current article by guests, so they can be heavily cached. I'd guess maybe 15% of Slashdot's page views are ones that need to be seriously dynamic. That's a bonus, as even a commodity server these days can give you quite a lot of static traffic. And it's important to think about what kind of static content you're serving. Slashdot's is mostly HTML, and you'll do things very differently for a media-heavy site like Flickr or Atom Films, and very differently again for something like Orbitz or Base Camp.

      Step three is to start asking yourself some serious questions about what kind of data you have, where it will live, how much it gets changed, what kind of transactional integrity you need to have, what kind of reliability you're wiling to pay for, and how it will get to the places that need to serve it up.

      Step four is to think broadly about the possible architectures. Yes, your average web site is basically an engine for turning HTTP requests into SQL queries, and turning SQL result sets into HTML. But there are many more ways of storing, managing, and rendering your data than that, many of which have radical performance implications. A great example is Google's architecture; if they'd tried to build it with a standard web approach, they'd be six or eight orders of magnitude poorer.

      Then in step five, build a cartoon version of your architecture and test it until it bleeds. Even better, build models of your top three architectures and see how they work. The only way you'll know if you can take massive load is to take massive load. Yes, this can be a pain to set up, but it's much, much less pain than you'll feel when a few hundred thousand people watch your site fail.

      And then for the last step, build your site incrementally, regularly testing performance as you go. Suppose it takes you six months to build it. If you save all your testing until the end, you've got six months of code to dig through to find the culprits, and six months during which you might have baked in an assumption that leaves you screwed. If you start out small and add to your test suite over time, you're much more likely to find problems when they're small and cheap to fix.

      And since this is Slashdot, I'll add step 7: Profit!

    3. Re:Your question cannot be answered by Anonymous+Cow+herd · · Score: 1

      Just want to add my support for this post. Until you've answered the first three questions, the solution can vary from a single box with an appropriately-sized pipe to a full N-tier load-balanced architecture. I was going to add something as well, but now I've forgotten it. :-P

      --
      Ita erat quando hic adveni.
    4. Re:Your question cannot be answered by Guspaz · · Score: 1

      That's exactly what I mean though. Heavy traffic means different things to different people, and you can't just say "What do I need to handle heavy traffic" and get answers that are relevant.

      Not ONLY does it depend on how MUCH heavy traffic is, it depends on WHAT you're doing. A simple page that makes a few database queries is going to be a lot faster than a complex page with a bunch of very complex queries that does a lot of mangling with the data returned.

      It's dangerous thinking, though, to take the number of pageviews one gets in a day and divide by number of seconds to get a pageview-per-second value. Bursts of visitors and hourly traffic patterns make such calculations useless.

      I was dealing with something like 150k pageviews per day. I considered that heavy at the time, because in my circumstances it was; it was eating up about 2000GB/mth of traffic, and during peak times was using up most of my serving resources. But look at somebody in your position, who receives 8000 hits per second, which to you is heavy traffic.

      You're really supporting my point, which is that heavy means different things to different people doing different things ;)

  16. Ultramonkey + LVS-Kiss + Mon by Plake · · Score: 2, Informative

    At my work we use Ultramonkey with LVS-kiss and Mon.

    Our hardware infrastructure includes 2 load-balancers running in a failover system with 3 web servers in the backend (1.8ghz, 512ram, 40gig hdd, 100mbps network) systems. That hosts over 60 million page views a month, it also supports real-time failover. For monitoring there are tools out there that use MRTG/RRD for cluster statistics.

  17. Obvious answer... by ebrandsberg · · Score: 2, Informative

    Check out http://www.netscaler.com/>. The companies behind the top 10 websites on the internet have, maybe you should too.

    Disclaimer, I work for Netscaler, but the customers we have gained should help in your decision.

    1. Re:Obvious answer... by ebrandsberg · · Score: 1

      Maybe I should have used preview. http://www.netscaler.com/

  18. What the tests prove. by NTT · · Score: 1

    It's somewhat dated but the FUD busting response to the Mindcraft fiasco has all the formulas on how to figure out what you hardware you need for your pipe. You only need to plug in current processor specs to see what you need. I could only find it in the archives: http://web.archive.org/web/20040409223206/http://c s.alfred.edu/~lansdoct/mstest.html

    1. Re:What the tests prove. by dubl-u · · Score: 1

      You only need to plug in current processor specs to see what you need.

      That's probably not true. The processor is only one component in a system, and it's often not the bottleneck. Also, since then there have been substantial changes in web servers, kernels, and all sorts of hardware that goes around the processor. Further, that page only talks about static content; it doesn't tell you anything about the dynamic content.

      Ordering hardware based on theoretical calculations from formulas in six-year-old articles is asking for trouble. A better approach is to set up something akin to your current app on whatever hardware you have handy, throw a ton of load at it, and see where your bottlenecks are.

  19. Pound by psyconaut · · Score: 1

    As previously mentioned, Pound is a wicked, lean load balancer/HA arbitrator that runs well on Linux.

    -psy

  20. Outsource to geocities by digitalgimpus · · Score: 1

    Outsource to geocities /ducks

  21. 1M pages/day on a single server by jbellis · · Score: 1

    Hmm... would your experience be MySQL-based? :P

    Carnage Blender serves about 1M pages a day off a single (dual 2.4GHz xeon; 4 GB ram) machine. Those are database-backed pages, with a lot more updates than most read-only-ish sites.

    Powered by postgresql. And a lot of tuning.

    1. Re:1M pages/day on a single server by Guspaz · · Score: 1

      My experience was indeed MySQL based. But it was a very low-end server.

      When I was worrying about such things, it was cheaper to get 4 low-end servers than one high-end as you described. Of course it's a bitch to manage multiple servers instead of just one.

      And of course 100k pageviews per day (Which eventually grew to 300k on a different server) simply didn't justify anything more than a low-end server.

  22. Nothing like a Hardware-based load-balancer by forged · · Score: 1
    So Slashdot uses an Arrowpoint Content Switch from circa 2000, but except with the name change to Cisco and some technology updates, the same basic lineup still makes Cisco's portfolio.

    Some examples here. The examples are heavy on Corporate speak, but you were asking about a large Web/Content architecture, right?

  23. Popular sites configs by quenting · · Score: 1

    This page shows the server specifications for some of the busiest message boards on the internet, along with what software they use. You'll see the configurations are quite disparate, from the 90+ servers serving the anime fans at Gaia Online on 100% open source software, to the sometimes single server hosting some of the other top 50 forums.

    1. Re:Popular sites configs by WebHostingGuy · · Score: 1

      I like this entry:

      Number 958
      Spammers Heaven
      Served by Shared Hosting
      Apache/1.3.27, Red-Hat/Linux, PHP/4.3.3
      phpBB

      Can't they afford a dedicated server yet?

      --
      Quality Hosting e3 Servers
  24. Round Robin DNS by llzackll · · Score: 1

    Hey, it's one method.

  25. LVS/ipvs - during peak 4 servers 8 GB/day each by Anonymous Coward · · Score: 0

    I was managing a site that received multimillion hits per day 4 apache mod_perl servers, mysql backend (it was getting hammered 1500 qps, I think) because we tracked lots of user stats for the spreadsheets for the higher-ups), 2 TB NFS NAS RAID thing server for customer image files and was pumping out 8+ GB per day each server.

    The LVS "director" server was constantly at a load level of 0.0X.

    You could run a large site with relatively little hardware. LVS works better than great and the hardware loadbalancer people are praying that more people don't find out about it.

    About the only thing "normal" people wouldn't have access to is the bandwidth - everything else you could get used off ebay or craigslist.

  26. wildly off topic by Clover_Kicker · · Score: 1

    Hey man, I love Carnage Blender!

    I had to stop playing because it was too addictive, but you've got a cool thing there.

  27. Hows about.... by Anonymous Coward · · Score: 0

    ....a Beowulf cluster? :p

  28. Take a look at F5 by arnie_apesacrappin · · Score: 1
    I've worked with the Cisco CSS, Foundry ServerIron and F5 solutions and I fell in love when I got the F5s. I haven't given the other two another chance since I switched.

    But before I switched, I got demos from all three players and put them in a head-to-head contest. I would suggest doing the same. In a lab setting, we couldn't hit the devices hard enough to pick a clear winner based on performance. When I looked at administration and features, the F5 pulled to the front.

    The GUI is clear and concise for those who like GUI's. The OS in the old version (4.X) was BSD, while the new version (9.X) is Linux based. You get full root access to the box, so you can write scripts in the shell language of your choice. The last time I worked with the CSS it had a proprietary scripting language; it may be different now.

    The power for me as a network guy is the scripting. I can rebuild our entire site in about 30 seconds thanks to the scripts. I've also been able to move some of the administration to admins that normally wouldn't be qualified to work with a load balancer because I can give them scripts and restrict their access so that they can only run the scripts I specify.

    SSL proxies are a cool feature as well. It's not documented, but there is an easy process to convert IIS certificates to apache-style certificates for use on the F5. It helps a lot when you are adding/removing servers on a regular basis.

    Our site takes enough hits to keep 4 servers loaded all day long. The load average on the F5 never hits 1.0. This is with about 40 Mbps of traffic running through the F5.

    --

    Still, with a plan, you only get the best you can imagine. I'd always hoped for something better than that. -CP

    1. Re:Take a look at F5 by Anonymous Coward · · Score: 0

      Another vote for F5 Load balancers/ssl accelerators. We have about a dozen running a couple hundred sites.

  29. Get a hardware load-balancer by gtrubetskoy · · Score: 1


    One mistake that I see lots of people make is use a PC-based load-balancer. A hardware device (Foundry ServerIron, Nortel Alteon, Cisco CSM, etc) is well worth the money (especially if you get it on ebay).

  30. A few articles from AnandTech.com by shakah · · Score: 1
    AnandTech.com has had a few articles on their site setup, e.g.: There are a few on that site about database server performance, too...
  31. Zeus ZXTM by chrysalis · · Score: 1

    Our web site serves about 3 millions pages a day on two gigabit links. The balancing rules are quite complex since the content is splitted on multiple servers and SANs. We use software load balancers: Zeus ZXTM on Gentoo Linux. The nice thing about software load balancers is that you can easily replace the hardware if it fails. Having a spare PC is way cheaper than a spare load balancer. We are very pleased with ZXTM so far. Very reliable, fast, and very flexible. It uses a PHP-like scripting language to process requests and you can really handle any specific backend architecture with that.

    --
    {{.sig}}
  32. There isn't one answer. by jafo · · Score: 1

    The answer to this question depends entirely on how heavy each request you serve is. If you are just throwing together some PHP code that pulls information from a few databases, possibly updating some others, on every hit, it could require quite a large number of machines to handle the load. If you are clever, effectively making the results static pages, it may take very few systems.

    A good starting place is to just measure it by testing how long it takes to serve a page like what you are expecting to be publishing, and come up with an average of how long it takes to serve that request. Multiply this by the number of expected visitors and you get the number of machines you get a rough idea of the number of machines you need to handle the load. Very rough, but it's a starting place.

    For example, the last time my site got slashdotted, users were hitting a page that is generated from a database. Through clever design, these pages get cached quite heavily, so only the first view requires 200-ish ms to generate. After that, unless the database is updated, the pages require more around 10ms to serve. Serving a static page through Apache requires around 8ms.

    During the heaviest hit period, there was only around 4% CPU load on the server. Without the caching, this probably would have been more like 80%.

    Nobody can tell you what the answer to this question will be for your situation. I can tell you that everyone plans for their new site to be as popular as slashdot, but would remind you that trying to come out of the chute able to handle the load of slashdot is probably a waste of time and money. Sure, if you have a few hundred thousand dollars to spend on hardware you can happily build up an infrastructure that will handle huge loads. However, if it takes a year or two for those loads to come, at that time you can buy the same computing horsepower for probably half what it costs today. In the mean time, why don't you spend the time you would have spent architectuing this massive database cluster and set of apache workers, instead providing content and marketing to your site?

    It's easy to spend time on the geeky network and computing parts of a design, as a geek, but the marketing and content side is the one that's most likely to make it a slashdot.

    Sean