Slashdot Mirror


High-Performance Web Server How-To

ssassen writes "Aspiring to build a high-performance web server? Hardware Analysis has an article posted that details how to build a high-performance web server from the ground up. They tackle the tough design choices and what hardware to pick and end up with a web server designed to serve daily changing content with lots of images, movies, active forums and millions of page views every month."

42 of 281 comments (clear)

  1. High-performance web server by quigonn · · Score: 5, Informative

    I'd suggest everybody with the need of a high-performance web server to try out
    fnord. It's extremely small, and pretty fast (without any special performance hacks!), see here.

    --
    A monkey is doing the real work for me.
    1. Re:High-performance web server by Electrum · · Score: 4, Informative

      Yep. fnord is probably the fastest small web server available. There are basically two ways to engineer a fast web server: make it as small as possible to incur the least overhead or make it complicated and use every possible trick to make it fast.

      If you need features that a small web server like fnord can't provide and speed is a must, then Zeus is probably the best choice. Zeus beats the pants off every other UNIX web server. It's "tricks" include non blocking I/O, linear scalability with regard to number of CPU's, platform specific system calls and mechanisms (acceptx(), poll(), sendpath, /dev/poll, etc.), sendfile() and sendfile() cache, memory and mmap() file cache, DNS cache, stat() cache, multiple accept() per I/O event notification, tuning the socket buffers, disabling nagle, tuning the listen queue, SSL disk cache, log file cache, etc.

      Which design is better? Depends on your needs. It is quite interesting that the only way to beat a really small web server is to make one really big that includes everything but the kitchen sink.

  2. But any web server is high-performance by Ed+Avis · · Score: 5, Insightful

    Computer hardware is so fast relative to the amount of traffic coming to almost any site that any web server is a high-performance web server, if you are just serving static pages. A website made of static pages would surely fit into a gigabyte or so of disk cache, so disk speed is largely irrelevant, and so is processor speed. All the machine needs to do is stuff data down the network pipe as fast as possible, and any box you buy can do that adequately. Maybe if you have really heavy traffic you'd need to use Tux or some other accelerated server optimized for static files.

    With dynamically generated web content it's different of course. But there you will normally be fetching from a database to generate the web pages. In which case you should consult articles on speeding up database access.

    In other words: an article on 'building a fast database server' or 'building a machine to run disk-intensive search scripts' I can understand. But there is really nothing special about web servers.

    --
    -- Ed Avis ed@membled.com
    1. Re:But any web server is high-performance by khuber · · Score: 5, Insightful
      With dynamically generated web content it's different of course. But there you will normally be fetching from a database to generate the web pages. In which case you should consult articles on speeding up database access.

      I'm just a programmer, but don't big sites put caching in front of the database? I always try to cache database results if I can. Honestly, I think relational databases are overused, they become bottlenecks too often.

      -Kevin

    2. Re:But any web server is high-performance by NineNine · · Score: 5, Insightful

      Good databases are designed for performance. If databases are your bottleneck, then you don't know what you're doign with the database. Too many people throw up a database, and use it like it's some kind of flat file. There's a lot that can be done with databases that the average hack has no idea about.

    3. Re:But any web server is high-performance by NineNine · · Score: 4, Insightful

      You're absolutely right. Wish I had some mod points left...

      Hardware only comes into play in a web app when you're doing very heavy database work. Serving flat pages takes virtually no computing effort. It's all bandwidth. Hell, even scripting languages like ASP, CF, and PHP are light enough that just about any machine will work great. The database though... that's another story.

    4. Re:But any web server is high-performance by jimfrost · · Score: 5, Interesting
      As you say, databases are usually the bottleneck in a high-volume site. Contrary to what Oracle et al want you to believe, they still don't scale and in many cases it's not feasible to use a database cluster.

      Big sites, really big sites, put caching in the application. The biggest thing to cache is session data, easy if you're running a single box but harder if you need to cluster (and you certainly do need to cluster if you're talking about a high-volume site; nobody makes single machines powerful enough for that). Clustering means session affinity and that means more complicated software. (Aside: Is there any open source software that manages session affinity yet? )

      Frankly speaking, Intel-based hardware would not be my first choice for building a high-volume site (although "millions of page views per month" is really only a moderate volume site; sites I have worked on do millions per /day/). It would probably be my third or fourth choice. The hardware reliability isn't really the problem, it can be good enough, the issue is single box scalability.

      To run a really large site you end up needing hundreds or even thousands of Intel boxes where a handful of midrange Suns would do the trick, or even just a couple of high-end Suns or IBM mainframes. Going the many-small-boxes route your largest cost ends up being maintenance. Your people spend all their time just fixing and upgrading boxes. Upgrading or patching in particular is a pain in the neck because you have to do it over such a broad base. It's what makes Windows very impractical as host for such a system; less so for something like Linux because of tools like rdist, but even so you have to do big, painful upgrades with some regularity.

      What you need to do is find a point where the box count is low enough that it can be managed by a few people and yet the individual boxes are cheap enough that you don't go broke.

      These days the best machines for that kind of application are midrange Suns. It will probably be a couple of years before Intel-based boxes are big and fast enough to realistically take that away ... not because there isn't the hardware to do it (though such hardware is, as yet, unusual) but because the available operating systems don't scale well enough yet.

      --
      jim frost
      jimf@frostbytes.com
    5. Re:But any web server is high-performance by jimfrost · · Score: 5, Informative
      I've seen both kinds and take it from me, many small servers is more of a headache than the hardware cost savings is worth. Your network architecture gets complicated, you end up having to hire lots of people just to keep the machines running and with up-to-date software, and database connection pooling becomes a lot less efficient.

      You save money in the long run by buying fewer, more powerful machines.

      --
      jim frost
      jimf@frostbytes.com
    6. Re:But any web server is high-performance by jimfrost · · Score: 5, Insightful
      Yea, big Suns are too expensive and you do need to keep the server count high enough that a failure or system taken down for maintenance isn't a really big impact on the site. I mentioned in a different posting that my cut on this is that the midrange Suns, 4xxx and 5xxx class, provide good bang-for-the-buck for high-volume sites.

      Beware of false economy when looking at hardware. While it's true that smaller boxes are cheaper, they still require about the same manpower per box to keep them running. You rapidly get to the point where manpower costs dwarf equipment cost. People are expensive!

      Capacity is an issue. We try to plan for enough excess at peak that the loss of a single server won't kill you, and hope you never suffer a multiple loss. Unfortunately most often customers underequip even for ordinary peak loads, to say nothing of what you see when your URL sees a real high load.[1] They just don't like to spend the money. I can see their point, the machines we're talking about are not cheap; it's a matter of deciding what's more important to you, uptime and performance or cost savings. Frankly most customers go with cost savings initially and over time (especially as they learn what their peak loads are and gain experience with the reliability characteristics of their servers) build up their clusters.

      [1] People here talk about the slashdot effect, but trust me when I tell you that that's nothing like the effect you get when your URL appears on TV during "Friends".

      --
      jim frost
      jimf@frostbytes.com
    7. Re:But any web server is high-performance by jimfrost · · Score: 5, Interesting
      If you're just serving static pages you're right. If you're doing dynamic content then you're wrong.

      But 2.5 million hits a day is still just a moderate volume site to me. One of the sites I worked on sees in excess of a hundred million hits per day these days; it was up over ten million hits per day back in 1998.

      I don't happen to know what Slashdot does for volume, but Slashdot is a very simplistic site when it comes to content production. Each page render doesn't take much horsepower and sheer replication can be used effectively. Things get more complicated when you're doing something like trying to figure out what stuff a user is likely to buy given their past buying history and/or what they're looking at right now.

      If you really think a 4-way Intel box is equivalent to a 12-way Sun, well, it's clear you don't know what you're talking about. You're wrong even if all you're talking about is CPU, and of course I/O bandwidth is what makes or breaks you -- and there's no comparison in that respect.

      --
      jim frost
      jimf@frostbytes.com
    8. Re:But any web server is high-performance by Matey-O · · Score: 5, Insightful

      I think the big problem here is the tendency to DBify EVERYTHING POSSIBLE.

      Like the State field in an online form.

      Every single hit requires a tag to the databases. Why?

      Because, heck if we ever get another state, it'll be easy to update! Ummm, that's a LOT of cycles used for something that hasn't happened in, what, 50 years or so. (Hawaii, 1959)

      --
      "Draco dormiens nunquam titillandus."
    9. Re:But any web server is high-performance by PhotoGuy · · Score: 4, Interesting
      A key question someone needs to ask themselves when storing data in a relational database, is "is this data really relational"?

      In a surprising amount of cases, it really isn't. For example, storing user preferences for visiting a given web page; there is never a case where you need to relate the different users to each other. The power aggregation abilities of relational databases are irrelevant, so why incur the overhead (performance-wise, cost-wise, etc.)

      Even when aggregating such information is useful, I've often found off-line duplication of the information to databases (which you can then query the hell out of, without affecting the production system) a better way to go.

      If a flat file will do the job, use that instead of a database.

      --
      Love many, trust a few, do harm to none.
  3. gee, i wonder.. by Anonymous Coward · · Score: 5, Funny

    .. if their webservers are as reliable as the ones in the article..
    i guess there's only one way to find out..

    slashdotters! advance! :P

  4. Re:10'000 RPM by autocracy · · Score: 4, Funny
    In comparison to what? Yes, they're faster than the 7,200 you probably have - but they only run at 2/3 the speed of most really high end drives (15,000 RPM). Really it's not too bad a trade-off.

    Also, please note that the laws of physics say that it can read more data if the head is able to keep up - and I'm sure it is.

    --
    SIG: HUP
  5. That "howto" sucks by Nicolas+MONNET · · Score: 5, Interesting

    There is no useful information in that infomercial. They seem to have judged "reliability" through vendor brochures and in a couple days; reliability is when your uptime is > 1 year.

    This article should be called "M. Joe-Average-Overclocker Builds A Web Server".

    This quote is funny:

    That brings us to the next important component in a web server, the CPU(s). For our new server we were determined to go with an SMP solution, simply because a single CPU would quickly be overloaded when the database is queried by multiple clients simultaneously.

    It's well known that single CPU computers can't handle simultaneous queries, eh!

    1. Re:That "howto" sucks by khuber · · Score: 5, Insightful
      Well, not to mention that high traffic sites usually have a bunch of webservers and then a load balancer in front of them. This article obviously isn't for big league web serving.

      -Kevin

    2. Re:That "howto" sucks by jimfrost · · Score: 5, Informative
      High traffic sites, the ones that are really dynamic anyway, do more than that.

      They start with a load balancer at the front end, or possibly several layers of load balancer. If they run a distributed operation they'll use smart DNS systems or routers to direct requests to the most local server cluster. The server cluster will be fronted by a request scattering system.

      Behind the request scattering system you'll find a cluster of machines whose job it is to serve static content (often the bulk of data served by a site) and route dynamic requests to another cluster of servers, enforcing session affinity for the dynamic requests.

      Behind the static content servers are the application servers. They do the heavy lifting, building dynamic pages as appropriate for individual users and caching everything they can to offload the database.

      Behind the application servers is the database or database cluster. The latter is really not that useful if you have a highly dynamic site as there are problems with data synchronization in database clusters (no matter what the database vendors tell you). But that's ok, single databases can handle a lot of volume if built correctly and caching is done appropriately at the application level.

      And there you have it, the structure of a really large site.

      --
      jim frost
      jimf@frostbytes.com
  6. "Three times the power?" by mumblestheclown · · Score: 5, Insightful
    From the article:

    If we were to use, for example, Microsoft Windows 2000 Pro, our server would need to be at least three times more powerful to be able to offer the same level of performance.

    "three times?" Can somebody point me to some evidence for this sort of rather bald assertion?

    1. Re:"Three times the power?" by khuber · · Score: 5, Interesting
      That was total FUD. The two operating systems have comparable performance on the same hardware.

      -Kevin

    2. Re:"Three times the power?" by NineNine · · Score: 5, Informative

      "Microsoft Windows 2000 Pro"

      I got a good laugh out of this... W2K Pro is the desktop version, not the server version. Wow. Great article. Really well informed author.

    3. Re:"Three times the power?" by (H)elix1 · · Score: 5, Informative

      That was total FUD. The two operating systems have comparable performance on the same hardware.

      Win2k pro limits you to 10 concurrent TCP/IP connections, Win2K Server has no (artificial) limit but won't cluster, Advanced Server can cluster but I don't know a thing about it..

      Linux has no (artificial) limit... not sure about clustering options there either.

      Found out about the TCP/IP limit when I added SP2 and trashed my evening counter-strike server - this makes a HUGE difference.

    4. Re:"Three times the power?" by Magila · · Score: 4, Informative

      Win2k pro limits you to 10 concurrent TCP/IP connections.

      Whao! bullshit meter rising! While Win2K does have a limit on TCP/IP connections, it is in the thousands. A limit of 10 would be totaly ridiculous, it would cripple the OS for MANY people. Also, most of the traffic for a CS server is UDP so the TCP/IP connection limit isn't going to affect that much at all.

    5. Re:"Three times the power?" by elemental23 · · Score: 5, Informative
      The maximum number of other computers that are permitted to simultaneously connect over the network to Windows NT Workstation 3.5, 3.51, 4.0, and Windows 2000 Professional is ten. This limit includes all transports and resource sharing protocols combined. This limit is the number of simultaneous sessions from other computers the system is permitted to host.

      From Microsoft Knowledge Base Article Q122920.
      (Warning: The page layout is broken in Mozilla)

      It's an artificial limitation. The idea is that if you need more simultaneous connections you should buy Win2k Server. In other words, MS wants you to spend more money.

      --
      I like my women like my coffee... pale and bitter.
  7. A little disapointing really by grahamsz · · Score: 5, Insightful

    The article seemed way too focused on hardware.

    Anyone who's ever worked on a big server in this cash-strapped world will know that squeezing every last ounce of capacity out of apache and your web applications needs to be done.

  8. my $0.02 by spoonist · · Score: 5, Informative

    * I prefer SCSI over IDE

    * RedHat is a pain to strip down to a bare minimum web server, I prefer OpenBSD. Sleek and elegant like the early days of Linux distros.

    * I've used Dell PowerEdge 2650 rackmount servers and they're VERY well made and easy to use. Redundant power supplies, SCSI removable drives, good physical security (lots of locks).

  9. Strange choice of processors by Ed+Avis · · Score: 5, Insightful

    I know that in the server market you often go for tried-and-tested rather than latest-and-greatest, and that the Pentium III still sees some use in new servers. But 1.26GHz with PC133 SDRAM? Surely they'd have got better performance from a single 2.8GHz Northwood with Rambus or DDR memory, and it would have required less cooling and fewer moving parts. Even a single Athlon 2200+ might compare favourably in many applications.

    SMP isn't a good thing in itself, as the article seemed to imply: it's what you use when there isn't a single processor available that's fast enough. One processor at full speed is almost always better than two at half the speed.

    --
    -- Ed Avis ed@membled.com
  10. How to make a fool of yourself by noxavior · · Score: 5, Funny

    Step one: Submit story on high performance web servers.
    Step two: ???
    Step three: Die of massive slashdotting, loss of reputation and business


    Still, if someone has a link to a cache...

    --
    Karma:This parrot is dead! (and so is the joke.)
  11. And as the last step... by MavEtJu · · Score: 5, Funny

    ... Don't forget to post an article on /. so you can actually measure high-volume bulk traffic.

    [~] edwin@topaz>time telnet www.hardwareanalysis.com 80
    Trying 217.115.198.3...
    Connected to powered.by.nxs.nl.
    Escape character is '^]'.
    GET /content/article/1549/ HTTP/1.0
    Host: www.hardwareanalysis.com

    [...]
    Connection closed by foreign host.

    real 1m21.354s
    user 0m0.000s
    sys 0m0.050s

    Do as we say, don't do as we do.

    --
    bash$ :(){ :|:&};:
  12. High powered webserver? by Moonshadow · · Score: 5, Funny
    In an hour or so, I'm predicting it will be a high-powered heap of smoking rubble. It's almost like this is a challenge to us.

    Maybe it's their idea of a stress test. It's kinda like testing a car's crash durability by parking it in front of an advancing tank.

  13. Re:10'000 RPM by Krapangor · · Score: 5, Funny
    10k drives are LESS reliable, since they move faster

    This implies that you shouldn't store servers in high altitudes, because they move faster up there due to earth rotation.
    Hmmm, I think we know now why these Mars missions tend to fail so often.

    --
    Owner of a Mensa membership card.
  14. server load by MegaFur · · Score: 5, Funny

    Many other people will likely post a comment like mine, if they haven't already. But hey, karma was made to burn!

    According to my computer clock and the timestamp on the article posting, it's only been about 33 minutes (since the article was posted). Even so, it took me over a minute to finally receive the "Hardware Analysis" main page. The top of that page has:

    Please register or login. There are 2 registered and 995 anonymous users currently online. Current bandwidth usage: 214.98 kbit/s

    Draw your own conclusions.

    --
    Furry cows moo and decompress.
    1. Re:server load by Anonymous Coward · · Score: 5, Funny

      Please flush my dns entry, or better yet unplug me. There are 0 registered and millions of the slashdot horde currently refreshing their browser and laughing at my stats. Current bandwidth usage: 100 Mbit/s.

    2. Re:server load by fusiongyro · · Score: 5, Interesting

      Well, they're about slashdotted now. They lost my last request, and it says they have almost 2000 anonymous users. I sometimes think the reason I like reading Slashdot isn't because of the great links and articles, but instead because I like being a part of the goddamned Slashdot effect. :)

      Which brings me to the point. Ya know, about the only site that can handle the Slashdot effect is Slashdot. So maybe Taco should write an article like this (or maybe he has?). The Slashdot guys know what they're doing, we should pay attention. Although I find it interesting that when slashdot does "go down," the only way I know is because for some reason it's telling me I have to log in (which is a lot nicer than Squid telling me the server's gone).

      --
      Daniel

  15. Alternative HowTo by h0tblack · · Score: 4, Informative

    1. goto here
    2. click buy
    3. upon delivery open box and plugin
    4. turn on Apache with the click of a button
    5. happily serve up lots of content :)

    6. (optional) wait for attacks from ppl at suggesting using apple hardware...

  16. Why Apache? by chrysalis · · Score: 5, Informative

    I don't understand.

    Their article is about building a high performance web server, and they tell people to use Apache.

    Apache is featureful, but it has never been designed to be fast.

    Zeus is designed for high performance.

    The article supposes that money is not a problem. So go for Zeus. The Apache recommendation is totally out of context.

    --
    {{.sig}}
  17. Re:Not-so high performance by chrysalis · · Score: 4, Insightful

    The article is about *WEB* high performance.

    I don't see your point. "ping" has never been designed to benchmark web servers AFAIK.

    My servers don't answer to "ping". Does it mean that the web server is down? Noppe... it's up a running...

    "ping" is not an all-in-one magic tool. By using "ping" you can test a "ping" server. Nothing else.

    --
    {{.sig}}
  18. Server running at near 100% load by ssassen · · Score: 5, Informative
    From the SecureCRT console, connected through SSH1, as the backend is giving me timeouts. I can tell you that we're near 100% server load and are still serving out those pages to at least 1500 clients. I'm sure some of you get timeouts or can't even reach the server at all, for that I apologize, but we just have one of these, not a whole rack full of them.

    Have a good weekend,

    Sander Sassen

    Email: ssassen@hardwareanalysis.com
    Visit us at: http://www.hardwareanalysis.com

  19. how to build a high performance/reliable webserver by jacquesm · · Score: 4, Informative

    1) use multiple machines / round robin DNS
    2) use decent speed hardware but stay away from
    'top of the line' stuff (fastest processor,
    fastest drives) because they usually are not
    more reliable
    3) replicate your databases to all machines so
    db access is always LOCAL
    4) use a front end cache to make sure you use
    as little database interaction as you can
    get away with (say flush the cache once per
    minute)
    5) use decent switching hardware and routers, no
    point in having a beast of a server hooked up
    to a hub now is there...

    that's it ! reasonable price and lots of performance

  20. More Advice from the site by HappyPhunBall · · Score: 4, Funny
    Once you have the hardware setup and the software configured, it is time to design your site to perform. The following tips will help you create a site that is just as scalable as ours. Enjoy.
    1. Use lots, and I mean lots of graphics. Cute ones, animated ones, you name it and people expect to see them. Skimping here will hurt your image.
    2. CSS style sheets may be the way of the future, but just for now make sure you include dozens or even hundreds of font tags, color tags, and tables in your site. Trust us. This has the added benefit of increasing your page file size by at least 30%. You do want a robust site right?
    3. Make sure you are serving plenty of third party ads! Their bandwidth matters also, and you know the way to make money on the web is be serving lots of "fun" animated ads. This will not slow down the user experience of your site one bit! Those ad people are slick, they know that you are building a high bandwidth / high performance site and will be expecting the traffic.
    4. A site is not a high performance site until is has withstood the infamous Slashdot effect. You will want to post a link to your site on /. post haste to begin testing.
    That should be enough to get you started. Now you too can build a rocking 200K per page site, and having read our hardware guidelines, you can expect it to perform just as well as ours did. One more free tip: Placing a cool dynamic hit counter or traffic meter on your site in a prominent position will encourage casual visitors to hit the reload button again and again, driving the performance of your site through the roof.
  21. This is wrong on soooo many levels. by (H)elix1 · · Score: 5, Interesting
    (include standard joke about high performance web serving getting /.)

    I'd post sooner, but it took forever to get to the article.. here are my thoughts...

    First off SCSI.

    IDE drives are fast in a single user/workstation environment. As a file server for thousands of people sharing an array of drives? I'm sure the output was solid for a single user when they benched it... looks like /. is letting them know what multiple users do to IDE. 'Overhead of SCSI controller'... Methinks they do not know how SCSI works. The folks who share this box will suffer.

    Heat issues with SCSI. This is why you put the hardware in a nice climate controlled room that is sound proof. Yes, this stuff runs a bit hot. I swear some vendors are dumping 8K RPM fans with ducting engineered to get heat out of the box and into the air conditioned 8'x19" chassis that holds the other 5-30 machines as well.

    I liked the note about reliability too... it ran, it ran cool, it ran stable for 2 weeks. I've got 7x9G Cheetahs that were placed into a production video editing system and ran HARD for the last 5+ years. Mind you, they ran about $1,200 each new... but the down time cost are measured in minutes... Mission critical, failure is not an option.

    OS

    Lets assume the Windows 2000 Pro was service packed to at least SP2... If that is the case, the TCP/IP stack is neutered. Microsoft wanted to push people to Server and Advanced Server... I noticed the problem when I patched my counter strike server and performance dogged on w2kpro w/sp2 - you can find more info in Microsoft's KB... (The box was used for other things too, so be gentle) Nuking the TCP/IP stack is was the straw that cracked my back to just port another box to Linux and run it there.

    Red Had does make it easy to get a Linux box up and running, but if this thing is going outside the firewall, 7.3 was a lot of work to strip out all the stuff that are bundled with a "server" install. I don't like running any program I did not actually install myself. For personal boxes living at my ISP, I use slackerware (might be moving to gentoo however). Not to say I'm digging through the code or checking MD5 hashes as often as I could, but the box won't even need an xserver, mozilla, tux racer, or anything other than what it needs to deliver content and get new stuff up to the server.

    CPU's (really a chassis problem):

    I've owned AMD's MP and Intel's Xeon dually boards. These things do crank out some heat. Since web serving is usually not processor bound, it does not really matter. Pointing back to the over heating issues with the hard drives, these guys must have a $75 rack mount 19" chassis. Who needs a floppy or CD-ROM in a web server? Where are the fans? Look at the cable mess! For god's sake, at least spend $20 and get rounded cables so you have better airflow.

  22. Re:Apache 1.3x? by GoRK · · Score: 4, Insightful

    Their IDE-RAID is actually software RAID. The SCSI myth can go off the shelf, sure, but don't take the RAID myth down.

    The promise FastTrak and Highpoint and a few others are not actually hardware RAID controllers. They are regular controlers with enough firmware to allow BIOS calls to do drive access via software RAID (located in the firmware of the controller), and OS drivers that implement the company's own software RAID implementation at the driver level, thereby doing things like making only one device appear to the OS. Some of the chips have some performance improvements over a purely software RAID solutions, such as the ability to do data comparisons between two drives in a mirror during reads, but that's about it. If you ever boot them into a new install of windows without preloading their "drivers", guess what? Your "RAID" of 4 drives is just 4 drives. The hardware recovery options they have are also pretty damned worthless when it comes to a comparison with real RAID controllers - be they IDE or SCSI.

    A good solution to the IDE RAID debacle are the controllers by 3Ware (very fine) or the Adaptec AAA series controllers (also pretty fine). These are real hardware controllers with onboard cache, hardware XOR acceleration for RAID 5 and the whole bit.

    Anyway, I'm not really all that taken aback that this webserver is floundering a bit, but seems really responsive when the page request "gets through," so to speak. If it's not running low on physical RAM, it's probably got a lot of processes stuck in D state due to the shit promise controller. A nice RAID controller would probably have everything the disks are thrashing on in a RAM cache at this point.

    ~GoRK

  23. Re:10'000 RPM by Syre · · Score: 5, Insightful

    It's pretty clear that whomever wrote that article has never run a really high-volume web site.

    I've designed and implemented sites that actually handle millions of dynamic pageviews per day, and they look rather different from what these guys are proposing.

    A typical configuration includes some or all of:

    - Firewalls (at least two redundant)
    - Load balancers (again, at least two redundant)
    - Front-end caches (usually several) -- these cache entire pages or parts of pages (such as images) which are re-used within some period of time (the cache timeout period, which can vary by object)
    - Webservers (again, several) - these generate the dynamic pages using whatever page generation you're using -- JSP, PHP, etc.
    - Back-end caches (two or more)-- these are used to cache the results of database queries so you don't have to hit the database for every request.
    - Read-only database servers (two or more) -- this depends on the application, and would be used in lieu of the back end caches in certain applications. If you're serving lots of dynamic pages which mainly re-use the same content, having multiple, cheap read-only database servers which are updated periodically from a master can give much higher efficiency at lower cost.
    - One clustered back-end database server with RAID storage. Typically this would be a big Sun box running clustering/failover software -- all the database updates (as opposed to reads) go through this box.

    And then:

    - The entire setup duplicated in several geographic locations.

    If you build -one- server and expect it to do everything, it's not going to be high-performance.