Slashdot Mirror


High-Performance Web Server How-To

ssassen writes "Aspiring to build a high-performance web server? Hardware Analysis has an article posted that details how to build a high-performance web server from the ground up. They tackle the tough design choices and what hardware to pick and end up with a web server designed to serve daily changing content with lots of images, movies, active forums and millions of page views every month."

128 of 281 comments (clear)

  1. High-performance web server by quigonn · · Score: 5, Informative

    I'd suggest everybody with the need of a high-performance web server to try out
    fnord. It's extremely small, and pretty fast (without any special performance hacks!), see here.

    --
    A monkey is doing the real work for me.
    1. Re:High-performance web server by Electrum · · Score: 4, Informative

      Yep. fnord is probably the fastest small web server available. There are basically two ways to engineer a fast web server: make it as small as possible to incur the least overhead or make it complicated and use every possible trick to make it fast.

      If you need features that a small web server like fnord can't provide and speed is a must, then Zeus is probably the best choice. Zeus beats the pants off every other UNIX web server. It's "tricks" include non blocking I/O, linear scalability with regard to number of CPU's, platform specific system calls and mechanisms (acceptx(), poll(), sendpath, /dev/poll, etc.), sendfile() and sendfile() cache, memory and mmap() file cache, DNS cache, stat() cache, multiple accept() per I/O event notification, tuning the socket buffers, disabling nagle, tuning the listen queue, SSL disk cache, log file cache, etc.

      Which design is better? Depends on your needs. It is quite interesting that the only way to beat a really small web server is to make one really big that includes everything but the kitchen sink.

    2. Re:High-performance web server by Fefe · · Score: 2, Informative

      fnord supports CGI and PHP can be run in CGI mode.
      Actually, at least two people are using fnord to host a PHP site.

      Don't expect stellar performance, though. PHP is by no means a small interpreter. I guess it would be possible to be fast and PHP compatible with some sort of byte code cache. If there is enough demand, someone will implement it.

    3. Re:High-performance web server by Electrum · · Score: 2

      will Zeus run on linux?

      Yes. It is a UNIX web server. It does not run on Windows.

      How do you get non-blocking I/O out of a blocking file system? Or are you talking about non blocking socket I/O?

      You don't. Not having non blocking I/O available for the filesystem is one of the most annoying things about UNIX. Though, there are ways around it. Either use a separate thread or process to do file I/O, or use mmap() with mincore().

  2. 10'000 RPM by Nicolas+MONNET · · Score: 3, Insightful

    The guys use 10'000 RPM drive for "reliabilit" and "performance" ... 10k drives are LESS reliable, since they move faster. Moreover, they're not even necessarily that faster.

    1. Re:10'000 RPM by autocracy · · Score: 4, Funny
      In comparison to what? Yes, they're faster than the 7,200 you probably have - but they only run at 2/3 the speed of most really high end drives (15,000 RPM). Really it's not too bad a trade-off.

      Also, please note that the laws of physics say that it can read more data if the head is able to keep up - and I'm sure it is.

      --
      SIG: HUP
    2. Re:10'000 RPM by khuber · · Score: 3, Informative
      10k drives are LESS reliable, since they move faster.

      Okay, well ,you can use ancient MFM drives since they move much slower and would be more reliable by your logic.

      Personally, I'd take 10k SCSI drives over 7.2k IDE drives for a server, no question.

      -Kevin

    3. Re:10'000 RPM by Krapangor · · Score: 5, Funny
      10k drives are LESS reliable, since they move faster

      This implies that you shouldn't store servers in high altitudes, because they move faster up there due to earth rotation.
      Hmmm, I think we know now why these Mars missions tend to fail so often.

      --
      Owner of a Mensa membership card.
    4. Re:10'000 RPM by Syre · · Score: 5, Insightful

      It's pretty clear that whomever wrote that article has never run a really high-volume web site.

      I've designed and implemented sites that actually handle millions of dynamic pageviews per day, and they look rather different from what these guys are proposing.

      A typical configuration includes some or all of:

      - Firewalls (at least two redundant)
      - Load balancers (again, at least two redundant)
      - Front-end caches (usually several) -- these cache entire pages or parts of pages (such as images) which are re-used within some period of time (the cache timeout period, which can vary by object)
      - Webservers (again, several) - these generate the dynamic pages using whatever page generation you're using -- JSP, PHP, etc.
      - Back-end caches (two or more)-- these are used to cache the results of database queries so you don't have to hit the database for every request.
      - Read-only database servers (two or more) -- this depends on the application, and would be used in lieu of the back end caches in certain applications. If you're serving lots of dynamic pages which mainly re-use the same content, having multiple, cheap read-only database servers which are updated periodically from a master can give much higher efficiency at lower cost.
      - One clustered back-end database server with RAID storage. Typically this would be a big Sun box running clustering/failover software -- all the database updates (as opposed to reads) go through this box.

      And then:

      - The entire setup duplicated in several geographic locations.

      If you build -one- server and expect it to do everything, it's not going to be high-performance.

    5. Re:10'000 RPM by GC · · Score: 2

      How about running the web server from a RAM Disk? That's an age old trick to make speed improvements!

  3. But any web server is high-performance by Ed+Avis · · Score: 5, Insightful

    Computer hardware is so fast relative to the amount of traffic coming to almost any site that any web server is a high-performance web server, if you are just serving static pages. A website made of static pages would surely fit into a gigabyte or so of disk cache, so disk speed is largely irrelevant, and so is processor speed. All the machine needs to do is stuff data down the network pipe as fast as possible, and any box you buy can do that adequately. Maybe if you have really heavy traffic you'd need to use Tux or some other accelerated server optimized for static files.

    With dynamically generated web content it's different of course. But there you will normally be fetching from a database to generate the web pages. In which case you should consult articles on speeding up database access.

    In other words: an article on 'building a fast database server' or 'building a machine to run disk-intensive search scripts' I can understand. But there is really nothing special about web servers.

    --
    -- Ed Avis ed@membled.com
    1. Re:But any web server is high-performance by khuber · · Score: 5, Insightful
      With dynamically generated web content it's different of course. But there you will normally be fetching from a database to generate the web pages. In which case you should consult articles on speeding up database access.

      I'm just a programmer, but don't big sites put caching in front of the database? I always try to cache database results if I can. Honestly, I think relational databases are overused, they become bottlenecks too often.

      -Kevin

    2. Re:But any web server is high-performance by NineNine · · Score: 5, Insightful

      Good databases are designed for performance. If databases are your bottleneck, then you don't know what you're doign with the database. Too many people throw up a database, and use it like it's some kind of flat file. There's a lot that can be done with databases that the average hack has no idea about.

    3. Re:But any web server is high-performance by NineNine · · Score: 4, Insightful

      You're absolutely right. Wish I had some mod points left...

      Hardware only comes into play in a web app when you're doing very heavy database work. Serving flat pages takes virtually no computing effort. It's all bandwidth. Hell, even scripting languages like ASP, CF, and PHP are light enough that just about any machine will work great. The database though... that's another story.

    4. Re:But any web server is high-performance by khuber · · Score: 2, Interesting
      Our databases are tuned. Some apps would just need to transfer too much data per request for a SQL call to be feasible.

      -Kevin

    5. Re:But any web server is high-performance by jimfrost · · Score: 5, Interesting
      As you say, databases are usually the bottleneck in a high-volume site. Contrary to what Oracle et al want you to believe, they still don't scale and in many cases it's not feasible to use a database cluster.

      Big sites, really big sites, put caching in the application. The biggest thing to cache is session data, easy if you're running a single box but harder if you need to cluster (and you certainly do need to cluster if you're talking about a high-volume site; nobody makes single machines powerful enough for that). Clustering means session affinity and that means more complicated software. (Aside: Is there any open source software that manages session affinity yet? )

      Frankly speaking, Intel-based hardware would not be my first choice for building a high-volume site (although "millions of page views per month" is really only a moderate volume site; sites I have worked on do millions per /day/). It would probably be my third or fourth choice. The hardware reliability isn't really the problem, it can be good enough, the issue is single box scalability.

      To run a really large site you end up needing hundreds or even thousands of Intel boxes where a handful of midrange Suns would do the trick, or even just a couple of high-end Suns or IBM mainframes. Going the many-small-boxes route your largest cost ends up being maintenance. Your people spend all their time just fixing and upgrading boxes. Upgrading or patching in particular is a pain in the neck because you have to do it over such a broad base. It's what makes Windows very impractical as host for such a system; less so for something like Linux because of tools like rdist, but even so you have to do big, painful upgrades with some regularity.

      What you need to do is find a point where the box count is low enough that it can be managed by a few people and yet the individual boxes are cheap enough that you don't go broke.

      These days the best machines for that kind of application are midrange Suns. It will probably be a couple of years before Intel-based boxes are big and fast enough to realistically take that away ... not because there isn't the hardware to do it (though such hardware is, as yet, unusual) but because the available operating systems don't scale well enough yet.

      --
      jim frost
      jimf@frostbytes.com
    6. Re:But any web server is high-performance by NineNine · · Score: 3, Informative

      Our databases are tuned. Some apps would just need to transfer too much data per request for a SQL call to be feasible.

      I had this problem for a while... Sloppy coding on my part was querying 65K+ records per page. Server would start to crawl with a few hundred simultaneous users. Since I fixed it, 1000+ simultaneous users is no problem at all.

    7. Re:But any web server is high-performance by khuber · · Score: 2, Informative
      Very good info Jim.

      Yeah, my experience is at a relatively large site. We use mostly large and midrange Suns, EMC arrays and so on. There's a lot of interest in the many small server architecture though that is still being investigated.

      -Kevin

    8. Re:But any web server is high-performance by jimfrost · · Score: 5, Informative
      I've seen both kinds and take it from me, many small servers is more of a headache than the hardware cost savings is worth. Your network architecture gets complicated, you end up having to hire lots of people just to keep the machines running and with up-to-date software, and database connection pooling becomes a lot less efficient.

      You save money in the long run by buying fewer, more powerful machines.

      --
      jim frost
      jimf@frostbytes.com
    9. Re:But any web server is high-performance by khuber · · Score: 2, Interesting
      The interest is primarily hardware cost (the big Suns cost over $1m, and EMC arrays are likewise). Another issue is that when you have a few big machines and you do a deployment or maintenance, it's a struggle for the other boxes to pick up the slack. If you had more small servers, you could upgrade one at a time without impacting capacity as much.

      What do you think about handling capacity? Do you see sites with a lot of spare capacity? We'd have trouble meeting demand if we lost a server during prime hours (and it happens).

      -Kevin

    10. Re:But any web server is high-performance by jimfrost · · Score: 5, Insightful
      Yea, big Suns are too expensive and you do need to keep the server count high enough that a failure or system taken down for maintenance isn't a really big impact on the site. I mentioned in a different posting that my cut on this is that the midrange Suns, 4xxx and 5xxx class, provide good bang-for-the-buck for high-volume sites.

      Beware of false economy when looking at hardware. While it's true that smaller boxes are cheaper, they still require about the same manpower per box to keep them running. You rapidly get to the point where manpower costs dwarf equipment cost. People are expensive!

      Capacity is an issue. We try to plan for enough excess at peak that the loss of a single server won't kill you, and hope you never suffer a multiple loss. Unfortunately most often customers underequip even for ordinary peak loads, to say nothing of what you see when your URL sees a real high load.[1] They just don't like to spend the money. I can see their point, the machines we're talking about are not cheap; it's a matter of deciding what's more important to you, uptime and performance or cost savings. Frankly most customers go with cost savings initially and over time (especially as they learn what their peak loads are and gain experience with the reliability characteristics of their servers) build up their clusters.

      [1] People here talk about the slashdot effect, but trust me when I tell you that that's nothing like the effect you get when your URL appears on TV during "Friends".

      --
      jim frost
      jimf@frostbytes.com
    11. Re:But any web server is high-performance by jimfrost · · Score: 5, Interesting
      If you're just serving static pages you're right. If you're doing dynamic content then you're wrong.

      But 2.5 million hits a day is still just a moderate volume site to me. One of the sites I worked on sees in excess of a hundred million hits per day these days; it was up over ten million hits per day back in 1998.

      I don't happen to know what Slashdot does for volume, but Slashdot is a very simplistic site when it comes to content production. Each page render doesn't take much horsepower and sheer replication can be used effectively. Things get more complicated when you're doing something like trying to figure out what stuff a user is likely to buy given their past buying history and/or what they're looking at right now.

      If you really think a 4-way Intel box is equivalent to a 12-way Sun, well, it's clear you don't know what you're talking about. You're wrong even if all you're talking about is CPU, and of course I/O bandwidth is what makes or breaks you -- and there's no comparison in that respect.

      --
      jim frost
      jimf@frostbytes.com
    12. Re:But any web server is high-performance by Matey-O · · Score: 5, Insightful

      I think the big problem here is the tendency to DBify EVERYTHING POSSIBLE.

      Like the State field in an online form.

      Every single hit requires a tag to the databases. Why?

      Because, heck if we ever get another state, it'll be easy to update! Ummm, that's a LOT of cycles used for something that hasn't happened in, what, 50 years or so. (Hawaii, 1959)

      --
      "Draco dormiens nunquam titillandus."
    13. Re:But any web server is high-performance by Matey-O · · Score: 2

      "The hardware reliability isn't really the problem, it can be good enough, the issue is single box scalability."

      I dunno, our current major project is running on an ES7000 (8 processors, fully redundant, running Windows Datacenter) It seems pretty beastly to me.

      At the point here where X Unix implementation is x% faster than Y Microsoft implementation, the issue is decided by other factors. As long as either is fast enough to handle the load, n-th degree performance doesn't matter.

      In out case, the company that won the contract specified the hardware, it was part of a total cost contract (you get one amount of money to make this work, work within those boundaries.)

      _Presumably_ that company is happy enough with Windows performance on a 'big iron' box.

      --
      "Draco dormiens nunquam titillandus."
    14. Re:But any web server is high-performance by jimfrost · · Score: 2
      I think we're going to see more and more of this kind of server. The Zseries mainframes running Linux are really interesting because you're not so dependent on scalable SMP capabilities and yet you get the same kind of manageability as if you were working with a big SMP box. Nice.

      I haven't personally done any deployments on such a system, but I like the idea.

      --
      jim frost
      jimf@frostbytes.com
    15. Re:But any web server is high-performance by Hast · · Score: 3, Informative

      How about reading the FAQ before you start giving out "facts"? Slashdot is running on:
      * 5 load balanced Web servers dedicated to pages
      * 3 load balanced Web servers dedicated to images
      * 1 SQL server
      * 1 NFS Server
      Either the "little 4 way intel" you mention has a serious case of shizofrenia or your just full of it. (Guess which theory I'm going for.)

      Besides the poster mentioned that those sites /are/ bigger than Slashdot. E.g. the mention that "Getting your URL posted during Friends" is nothing like getting it posted on Slashdot.

      I know I shouldn't feed the trolls, but someone might actually belive this tripe.

    16. Re:But any web server is high-performance by Aldurn · · Score: 3, Informative

      Aside: Is there any open source software that manages session affinity yet?

      Yes. Linux Virtual Server is an incredible project. You put your web servers behind it and (in the case of simple NAT balancing) you set the gateway of those computers to be the address of your LVS server. You then tell LVS to direct all IPs of a certain netmask to one server (i.e. if you set for 255.255.255.0, 192.168.1.5 and 192.168.1.133 will connect to the same server).

      The only problem I had with it was that it does not detect downtime. However, I wrote a quick script that used the checkhttp program from Nagios to pull a site out of the loop when it went down (these were Windows 2000 servers: it happened quite frequently, and our MCSE didn't know why :)

      There are higher performance ways to set up clustering using LVS, but since I was lazy, that's what I did.
      --
      char sig[120] = "\0"
    17. Re:But any web server is high-performance by Ed+Avis · · Score: 2

      One database query per page is not too bad. You can make that scalable and it's certainly a lot less effort than trying to track large amounts of data _outside_ the DB.

      You have a problem when a single page view takes hundreds of database queries (as happened with a certain web toolkit I used to develop on).

      --
      -- Ed Avis ed@membled.com
    18. Re:But any web server is high-performance by jimfrost · · Score: 3, Interesting
      I have more than a few problems with that idea, but amongst them is:

      • Diskless systems start to collapse the central servers even by forty or fifty clients. By the time you're talking the thousand or more Intel systems necessary for a big site you're looking at having to have a tiered system just to do software deployments, forget about data serving.

      • Diskless systems don't work well if you have more data than you can realistically afford to store in memory. You start to see practical limits (like hardware limitations) in the low gigabyte range, when most larger websites have static content to deliver in the hundreds of gigabyte range.

      • Applications are notoriously hungry because they have to do a lot of caching to offload the database since databases generally don't scale well. It's pretty common to see our application servers running with 2+ gig heaps, and we'll run one application server per CPU on a system, and you're probably running three or more 6 or 8 CPU systems just for the application server part. Try to make that diskless and you're now talking about machine configurations with something like 30G of RAM ... very expensive and impractical.

      We're talking about a totally different scale, really.

      --
      jim frost
      jimf@frostbytes.com
    19. Re:But any web server is high-performance by PhotoGuy · · Score: 4, Interesting
      A key question someone needs to ask themselves when storing data in a relational database, is "is this data really relational"?

      In a surprising amount of cases, it really isn't. For example, storing user preferences for visiting a given web page; there is never a case where you need to relate the different users to each other. The power aggregation abilities of relational databases are irrelevant, so why incur the overhead (performance-wise, cost-wise, etc.)

      Even when aggregating such information is useful, I've often found off-line duplication of the information to databases (which you can then query the hell out of, without affecting the production system) a better way to go.

      If a flat file will do the job, use that instead of a database.

      --
      Love many, trust a few, do harm to none.
    20. Re:But any web server is high-performance by jimfrost · · Score: 2
      It's smarter to manage affinity by session, not by IP, since a variety of sources have rotating IPs (most notably AOL, but some business firewalls do it too).

      Anyway, thanks for the tip. I haven't seen the LVS stuff at all yet.

      --
      jim frost
      jimf@frostbytes.com
    21. Re:But any web server is high-performance by jimfrost · · Score: 2
      Yes, Google is one such site, although their runtime is simplistic enough that it's not a really good example of a typical large-volume site. Amazon would be better, or eBay; I know eBay uses larger machines, don't know about Amazon. The only really high volume site I know off the top of my head that uses Intel-based hardware and individual personalization is hotmail and again they're dealing with thousands of servers.

      If you're building a site like that then you've got to make your decision as to whether you'd rather use thousands of Intel servers or a few tens of larger servers. If it were my decision I'd go for the smaller number of larger servers simply because they require a lot fewer IT people to keep running, and every IT person you don't have to hire is another new machine or two you could buy every year. It adds up.

      --
      jim frost
      jimf@frostbytes.com
    22. Re:But any web server is high-performance by johnlcallaway · · Score: 2

      I agree with the assessment about using non-Intel hardware, but disagree with the big v/s little argument, specifically the manpower requirement. Our website uses several automated tools to distribute updates to our webservers and app servers, which are Netras. The Netras all share the exact same Sun image, which is very, very small. All unneeded packages (X, language packs, etc) were removed. Unison is used to keep the web pages and JSP pages syncronized.

      We have had 1 failure (SCSI drive) since implemented 1 year ago. It took us 20 minutes to have the box back up and running (Jumpstart). Granted, we only have 20 now. But based on the amount of time we actually spend working on the machines, one of us could handle 5 to 10 times this amount.

      Now, 100 Netras cost about 600,000. You can't touch any other Sun equipment at that price and get 100 CPUs. A Sunfire15K w/72 CPUs is over $3M, without maintenance. I could afford a couple more admins at those prices.....

      --
      I rarely read replies, it's my opinion and if you thought about your opinion a little more, I'm OK with that.
    23. Re:But any web server is high-performance by strobert · · Score: 2

      I have a question how much manpower (say in terms of number of sysadmins) do you generally use for say a group of 10 mid range sun servers say E4500's?

      Reason I am asking is some experience we had here where an admin dealing with the intel/linux side of things was able to handle about 40 boxes each with plenty of room to sprae, whereas on the sparc/solaris side an admin was dealing with two boxes and wasn't really even able to keep up.

    24. Re:But any web server is high-performance by jimfrost · · Score: 2
      One person can do the maintenance of the main servers with only part-time effort, although generally such operations are well staffed for other reasons. The online servers are only the tip of the iceberg in such an operation -- you also have the database(s) with its associated guru, staging system(s), some number of developers, artists, etc. each with one or more systems, and of course the network infrastructure for such a system is very substantial.

      Keep in mind that with that kind of horsepower you're talking about a pretty darn large site -- like way into the tens of millions of dynamic page views per day. One customer I worked with was handling more than ten million dynamic page views per day on just three systems running at less than half utilization each. (There were three or four smaller boxes doing static content up-front, and a larger database box behind however.)

      The ancillary systems tend to far outnumber the main systems. Generally, at least in the places I've seen, IT handles the lot of them.

      --
      jim frost
      jimf@frostbytes.com
    25. Re:But any web server is high-performance by strobert · · Score: 2

      oh, I understand that. we actually have a half dozen effective mirrors of the production environment for development/testing/etc.

      I was just kind of curious on what manpower ratios you genereal use for all of these servers (both main and pre-production/dev/test). I.e. for say 10 servers (say 2 main, the other 8 in use to get the product to the 2) how many sysadmins would you generally see in use.

    26. Re:But any web server is high-performance by jimfrost · · Score: 2
      If it's that few then you could easily get by with only one admin, keeping in mind that he'll have to sleep and go on vacation on occasion. That's a pretty small site though.

      I think the hundred-million dynamic pageviews site had three admins, but they switched hats with other jobs. One did double duty as the group manager, and the other two were part-time programmers. Multiple admins also meant that there was the possibility of time off :-).

      They had a lot of outside help, though, since Exodus was hosting their machines for them and there was another IT department that did desktop management for the rest of the organization.

      --
      jim frost
      jimf@frostbytes.com
  4. gee, i wonder.. by Anonymous Coward · · Score: 5, Funny

    .. if their webservers are as reliable as the ones in the article..
    i guess there's only one way to find out..

    slashdotters! advance! :P

  5. That "howto" sucks by Nicolas+MONNET · · Score: 5, Interesting

    There is no useful information in that infomercial. They seem to have judged "reliability" through vendor brochures and in a couple days; reliability is when your uptime is > 1 year.

    This article should be called "M. Joe-Average-Overclocker Builds A Web Server".

    This quote is funny:

    That brings us to the next important component in a web server, the CPU(s). For our new server we were determined to go with an SMP solution, simply because a single CPU would quickly be overloaded when the database is queried by multiple clients simultaneously.

    It's well known that single CPU computers can't handle simultaneous queries, eh!

    1. Re:That "howto" sucks by khuber · · Score: 5, Insightful
      Well, not to mention that high traffic sites usually have a bunch of webservers and then a load balancer in front of them. This article obviously isn't for big league web serving.

      -Kevin

    2. Re:That "howto" sucks by jimfrost · · Score: 5, Informative
      High traffic sites, the ones that are really dynamic anyway, do more than that.

      They start with a load balancer at the front end, or possibly several layers of load balancer. If they run a distributed operation they'll use smart DNS systems or routers to direct requests to the most local server cluster. The server cluster will be fronted by a request scattering system.

      Behind the request scattering system you'll find a cluster of machines whose job it is to serve static content (often the bulk of data served by a site) and route dynamic requests to another cluster of servers, enforcing session affinity for the dynamic requests.

      Behind the static content servers are the application servers. They do the heavy lifting, building dynamic pages as appropriate for individual users and caching everything they can to offload the database.

      Behind the application servers is the database or database cluster. The latter is really not that useful if you have a highly dynamic site as there are problems with data synchronization in database clusters (no matter what the database vendors tell you). But that's ok, single databases can handle a lot of volume if built correctly and caching is done appropriately at the application level.

      And there you have it, the structure of a really large site.

      --
      jim frost
      jimf@frostbytes.com
  6. "Three times the power?" by mumblestheclown · · Score: 5, Insightful
    From the article:

    If we were to use, for example, Microsoft Windows 2000 Pro, our server would need to be at least three times more powerful to be able to offer the same level of performance.

    "three times?" Can somebody point me to some evidence for this sort of rather bald assertion?

    1. Re:"Three times the power?" by khuber · · Score: 5, Interesting
      That was total FUD. The two operating systems have comparable performance on the same hardware.

      -Kevin

    2. Re:"Three times the power?" by NineNine · · Score: 5, Informative

      "Microsoft Windows 2000 Pro"

      I got a good laugh out of this... W2K Pro is the desktop version, not the server version. Wow. Great article. Really well informed author.

    3. Re:"Three times the power?" by (H)elix1 · · Score: 5, Informative

      That was total FUD. The two operating systems have comparable performance on the same hardware.

      Win2k pro limits you to 10 concurrent TCP/IP connections, Win2K Server has no (artificial) limit but won't cluster, Advanced Server can cluster but I don't know a thing about it..

      Linux has no (artificial) limit... not sure about clustering options there either.

      Found out about the TCP/IP limit when I added SP2 and trashed my evening counter-strike server - this makes a HUGE difference.

    4. Re:"Three times the power?" by sheldon · · Score: 2

      I hope when you're talking about clustering, you don't mean Beowulf?

      I find most Linux advocates don't understand the first thing about clustering, and keep misusing the term. Generally speaking for a web server you need limited clustering, that is you just want to do load balancing. But you also want to monitor the servers such that if one fails you take it out of the loop.

    5. Re:"Three times the power?" by Aldurn · · Score: 2, Informative

      At a website I used to work at, they decided they needed to use Windows 2000 Advanced Server for web clustering. That is, quite possibly, the worst decision they ever made (aside from going with Windows 2000; trust me on this one.)

      Win2k AS Load Balancing (aka WLBS: Windows Load Balancing Service) works by detecting other computers on the network with the same service, and they decide who will handle what request. They both have a primary IP, which is unique, in addition to a "virtual" address, which is the same on all of them. They also have a fake MAC address which is identical on both (makes for interesting ping responses.)

      An interesting thing we noticed about WLBS is that, unless a computer is off the network, it will still be in the cluster. I.e. if IIS fails on one machine, as long as you can ping it, it will still get traffic.

      When we moved from WLBS to LVS, we noticed a 50% drop in average CPU usage. This is probably due to the fact that now the clustering horsepower was moved off the web servers, but still, a free product versus a rather expensive one. And we've had better uptime now than ever before.

      --
      char sig[120] = "\0"
    6. Re:"Three times the power?" by Magila · · Score: 4, Informative

      Win2k pro limits you to 10 concurrent TCP/IP connections.

      Whao! bullshit meter rising! While Win2K does have a limit on TCP/IP connections, it is in the thousands. A limit of 10 would be totaly ridiculous, it would cripple the OS for MANY people. Also, most of the traffic for a CS server is UDP so the TCP/IP connection limit isn't going to affect that much at all.

    7. Re:"Three times the power?" by elemental23 · · Score: 5, Informative
      The maximum number of other computers that are permitted to simultaneously connect over the network to Windows NT Workstation 3.5, 3.51, 4.0, and Windows 2000 Professional is ten. This limit includes all transports and resource sharing protocols combined. This limit is the number of simultaneous sessions from other computers the system is permitted to host.

      From Microsoft Knowledge Base Article Q122920.
      (Warning: The page layout is broken in Mozilla)

      It's an artificial limitation. The idea is that if you need more simultaneous connections you should buy Win2k Server. In other words, MS wants you to spend more money.

      --
      I like my women like my coffee... pale and bitter.
  7. A little disapointing really by grahamsz · · Score: 5, Insightful

    The article seemed way too focused on hardware.

    Anyone who's ever worked on a big server in this cash-strapped world will know that squeezing every last ounce of capacity out of apache and your web applications needs to be done.

    1. Re:A little disapointing really by januschr · · Score: 2, Insightful

      The article seemed way too focused on hardware.

      Well the name of the website is "Hardware Analysis"... ,-)

      --
      This is my sig. Read it and weep.
    2. Re:A little disapointing really by Zeinfeld · · Score: 2
      The article seemed way too focused on hardware

      Yeah, maybe if the site had not been slashdotted...

      Does not appear that the site considers the most effective way to make a Web server fly, replace the hard drives with RAM. Ditch the obsolete SQL engine and use in memory storage rebuilt from a transaction log.

      Of course the problem with that config is that an outage tends to be a problem so just duplicate the hardware at a remote disaster recovery site.

      Sound expensive? Well yes, but not half as expensive as some of the systems people put together to run SQL databases...

      --
      Looking for an Information Security student project suggestion?
      Try http://dotcrimeManifesto.com/
  8. my $0.02 by spoonist · · Score: 5, Informative

    * I prefer SCSI over IDE

    * RedHat is a pain to strip down to a bare minimum web server, I prefer OpenBSD. Sleek and elegant like the early days of Linux distros.

    * I've used Dell PowerEdge 2650 rackmount servers and they're VERY well made and easy to use. Redundant power supplies, SCSI removable drives, good physical security (lots of locks).

    1. Re:my $0.02 by khuber · · Score: 3, Funny
      Back alley colocation. It's the only way to afford it these days.

      -Kevin

    2. Re:my $0.02 by Door-opening+Fascist · · Score: 3, Informative
      RedHat is a pain to strip down to a bare minimum web server, I prefer OpenBSD [openbsd.org]. Sleek and elegant like the early days of Linux distros.

      OpenBSD doesn't have support for multiple processors, which are a necessity for database servers and dynamic web servers. I'd say FreeBSD is the way to go.

    3. Re:my $0.02 by yomahz · · Score: 3, Interesting

      RedHat is a pain to strip down to a bare minimum web server, I prefer OpenBSD [openbsd.org]. Sleek and elegant like the early days of Linux distros.

      Huh?

      for i in `rpm -qa|grep ^mod_`;do rpm -e $i;done

      rpm -e apache
      cd ~/src/apache.xxx ./configure --prefix=/usr/local/apache \
      --enable-rule=SHARED_CORE \
      --enable-module=so
      make
      make install

      with mod_so (DSO - Dynamic Shared Object) support, module installation is trivial.

      --
      "A mind is a terrible thing to taste."
    4. Re:my $0.02 by spoonist · · Score: 2

      What I meant by "strip down to a bare minimum web server" was more along the lines of:

      * I don't want freakin' xinetd running

      * I don't want freakin' gpm running

      * I don't want freakin' portmap running

      etc, etc.

      I've got more important things to do with my time than turn off every process known to man that comes installed. OpenBSD already comes with mostly everything turned off.

    5. Re:my $0.02 by SuiteSisterMary · · Score: 3, Informative

      If your server isn't designed with 'security' in mind, including the ability to padlock the chassis, and at least send an SNMP trap when the chassis is opened, then you need to learn that as far as 'computer and data security' is concerned, protecting from external network attacks is actually quite low on the totem pole.

      Or, "If Joe Random Idiot can walk in and rip out the hard drive, who cares how 3117 your firewall and other network protections are."

      --
      Vintage computer games and RPG books available. Email me if you're interested.
    6. Re:my $0.02 by yomahz · · Score: 2

      I've got more important things to do with my time than turn off every process known to man that comes installed. OpenBSD already comes with mostly everything turned off.

      Hmmm... the installation makes it pretty easy to remove these services. All it takes is a couple of clicks of a mouse. Even if it's post install, all you have to do is remove the files from the /etc/init.d and /etc/rc?.d dirs.

      You're right tho', the default probably shouldn't come with everything. You'd think they'd learn a lesson from MS and all the services that they turn on for you by default.

      --
      "A mind is a terrible thing to taste."
    7. Re:my $0.02 by SuiteSisterMary · · Score: 2

      I've seen it happen. Put on a nice business suit, claim to be a consultant, and the SEP field magically kicks into play.

      Like those IBM commercials showing the inside of the network as a round table, and the two thieves come in. "Umm...we're vendors."

      Or, to put it your way, why have the challenge if you've the swipe door? Why have the server room locked if the front door is locked? And so on. Just because you've a firewall doesn't mean you don't tell your database server to only accept requests from the webserver and the admin console. Just because the front door's locked, and the server room door's locked, doesn't mean you shouldn't lock the racks, and the machines.

      You might choose to trust Juan Third Party Repairman to repair the right machine, let alone not fuck something up, accidentally or maliciously, but I don't. For example.

      I guess what I'm trying to say in my own rambling way is, there's no percentage in taking chances.

      --
      Vintage computer games and RPG books available. Email me if you're interested.
  9. Re:Troll? Informative is more like it. by OpCode42 · · Score: 2, Interesting

    Every time that you click on a link and get bumped back to the front page here on Slashdot, it's a failure of mysql. So much for high-performance.

    Why hasn't Slashdot changed to postgresql?


    I thought this was a good question, if slightly off-topic.

  10. Strange choice of processors by Ed+Avis · · Score: 5, Insightful

    I know that in the server market you often go for tried-and-tested rather than latest-and-greatest, and that the Pentium III still sees some use in new servers. But 1.26GHz with PC133 SDRAM? Surely they'd have got better performance from a single 2.8GHz Northwood with Rambus or DDR memory, and it would have required less cooling and fewer moving parts. Even a single Athlon 2200+ might compare favourably in many applications.

    SMP isn't a good thing in itself, as the article seemed to imply: it's what you use when there isn't a single processor available that's fast enough. One processor at full speed is almost always better than two at half the speed.

    --
    -- Ed Avis ed@membled.com
  11. How to make a fool of yourself by noxavior · · Score: 5, Funny

    Step one: Submit story on high performance web servers.
    Step two: ???
    Step three: Die of massive slashdotting, loss of reputation and business


    Still, if someone has a link to a cache...

    --
    Karma:This parrot is dead! (and so is the joke.)
  12. And as the last step... by MavEtJu · · Score: 5, Funny

    ... Don't forget to post an article on /. so you can actually measure high-volume bulk traffic.

    [~] edwin@topaz>time telnet www.hardwareanalysis.com 80
    Trying 217.115.198.3...
    Connected to powered.by.nxs.nl.
    Escape character is '^]'.
    GET /content/article/1549/ HTTP/1.0
    Host: www.hardwareanalysis.com

    [...]
    Connection closed by foreign host.

    real 1m21.354s
    user 0m0.000s
    sys 0m0.050s

    Do as we say, don't do as we do.

    --
    bash$ :(){ :|:&};:
  13. High powered webserver? by Moonshadow · · Score: 5, Funny
    In an hour or so, I'm predicting it will be a high-powered heap of smoking rubble. It's almost like this is a challenge to us.

    Maybe it's their idea of a stress test. It's kinda like testing a car's crash durability by parking it in front of an advancing tank.

  14. Re:So fast and soo goo... by khuber · · Score: 3, Funny
    It's still running. It's just extremely slow. Or maybe it's so fast it's zipping through space-time and it only seems slow from our reference frame.

    -Kevin

  15. Defintion of irony by nervlord1 · · Score: 3, Funny

    An article about creating high performacne webservers being slashdotted

    --
    Microsoft IIS is to webserving as KFC is to healthy eating
    1. Re:Defintion of irony by Electrum · · Score: 2

      Well, by using the same brilliant skills of analysis you do, this article is running on Apache, and the webserver is dead. That must mean that Apache is the Taco Bell of the webserver world, right?

      That would be about right. It's cheap, lots of people use it, but it's certainly not the best.

  16. Re:So fast and soo goo... by irc.goatse.cx+troll · · Score: 3, Informative

    Server has nothing to do with it.
    10,000 slashdotters * 500k pages = 5gigs in about an hour.
    these figures are both estimates, but you can see that network congestion is obviously more of a bottleneck than their performance server.

    --
    Pain lasts, kid. Its how you know you're alive. Sometimes I think this growing up thing is just pain management-TheMaxx
  17. server load by MegaFur · · Score: 5, Funny

    Many other people will likely post a comment like mine, if they haven't already. But hey, karma was made to burn!

    According to my computer clock and the timestamp on the article posting, it's only been about 33 minutes (since the article was posted). Even so, it took me over a minute to finally receive the "Hardware Analysis" main page. The top of that page has:

    Please register or login. There are 2 registered and 995 anonymous users currently online. Current bandwidth usage: 214.98 kbit/s

    Draw your own conclusions.

    --
    Furry cows moo and decompress.
    1. Re:server load by Queuetue · · Score: 2
      Please register or login. There are 4 registered and 1428 anonymous users currently online. Current bandwidth usage: 1183.73 kbit/s
      Took about 3 minutes, next page would not load.
    2. Re:server load by Anonymous Coward · · Score: 5, Funny

      Please flush my dns entry, or better yet unplug me. There are 0 registered and millions of the slashdot horde currently refreshing their browser and laughing at my stats. Current bandwidth usage: 100 Mbit/s.

    3. Re:server load by fusiongyro · · Score: 5, Interesting

      Well, they're about slashdotted now. They lost my last request, and it says they have almost 2000 anonymous users. I sometimes think the reason I like reading Slashdot isn't because of the great links and articles, but instead because I like being a part of the goddamned Slashdot effect. :)

      Which brings me to the point. Ya know, about the only site that can handle the Slashdot effect is Slashdot. So maybe Taco should write an article like this (or maybe he has?). The Slashdot guys know what they're doing, we should pay attention. Although I find it interesting that when slashdot does "go down," the only way I know is because for some reason it's telling me I have to log in (which is a lot nicer than Squid telling me the server's gone).

      --
      Daniel

    4. Re:server load by stevey · · Score: 3, Interesting

      I seem to remember that there was an article just after the WTC attacks last year, which discussed how Slashdot had handled the massive surge in traffic after other online sites went down.

      From memory it involved switching to static pages, and dropping gifs, etc.

      Unfortunately the search engine on Slashdot really sucks - so I couldn't find the piece in question.

    5. Re:server load by blibbleblobble · · Score: 2

      There are a few registered and quite a few anonymous users currently online. Current bandwidth usage: 6.80 kbit/s Oct 19 12:02 EDT

      Guess they stopped counting. We're supposed to be impressed that their dynamic page with 7 embedded tables and 160 images loads in less than three minutes?

      If only they hadn't copied the review format from Toms Hardware. Take a 1000-word article, add 2000 words of padding, and split between 9 pages including an index.

    6. Re:server load by 1110110001 · · Score: 3, Informative

      Maybe the article Handling the Loads, describing how Slashdot kept their Servers up at 9/11, is a bit of the thing you're looking for. b4n

  18. Almost by Anonymous Coward · · Score: 2, Insightful

    > One processor at full speed is almost always better than two at half the speed.

    You can safely drop that 'almost'.

    1. Re:Almost by bolthole · · Score: 2
      You can safely drop that 'almost'.

      wrong. For example, when you have a situation where you have lame hardware/drivers that do a lot of busywaits. With a single-cpu system, your system will be completely idle under that situation, no matter what speed cpu you have. Whereas with a dual cpu system, you will be able to get other work done.

      Assuming you have a decent OS, of course.

  19. Alternative HowTo by h0tblack · · Score: 4, Informative

    1. goto here
    2. click buy
    3. upon delivery open box and plugin
    4. turn on Apache with the click of a button
    5. happily serve up lots of content :)

    6. (optional) wait for attacks from ppl at suggesting using apple hardware...

    1. Re:Alternative HowTo by h0tblack · · Score: 2

      Definitely sounds like an interesting evaluation exercise.
      I'm of the opinion that it was a great move by Apple to move into this lower end server market. There's a lot of organisations that need some sort of server system for their network, but don't have the resources or the expertise to use some of the more traditional *nix based systems. That isn't to say that these are solely aimed at the "Idiots Guide to running a Server" market. There may be some nice user-friendly management and monitoring tools, but there's a lot under the hood to play with too. In the future there's also some interesting possibilities with clustering and the upcoming PPC970's from IBM. After all, this is really the first 'proper' server offering from Apple, future generations of the Xserve are definitely something to keep an eye on IMHO.

    2. Re:Alternative HowTo by GoRK · · Score: 2

      You forgot at least one step. Pick one to add but not both:

      4.5. Just because we're using a mac webserver, doesn't mean we're free from the responsibility of properly tuning our configuration. Anyone can buy a box of any type that's preconfigured to run apache when you first plug it in. Anyway, we tune the heck out of our Apache so that it will stand up to the load we're expecting.

      or

      7. Wonder what is going wrong when we realize we have no grasp of how our computer or applications actually work.

      ~GoRK

    3. Re:Alternative HowTo by mcowger · · Score: 2, Informative

      You missed a few steps:

      3a) Pull off god awful packaging
      3b) Install in rack with mickey mouse install setup thatrequires removing the cover from the machine, exposing all the internal electronics while your at it
      3c) Making sure the system sags in the middle while installed in the rack.


      and

      4a) Wipe OS because you have to before you can set up RAID
      4b) Setup RAID, have the disk set utility fail multiple times with cryptic errors, only to find that Apple's own docs say this is 'normal behavior'
      4c) When disks fail are are removed, must reboot server to signle user mode to reconstruct failed data. May or may not work...apple says 'normal behavior'


      and

      5a) Hope that your machine doesn't exhause it TCP connection pool which it will if you make too many SSH connections to it.


      Sorry, Im ust so pedantic today.

      Really, though, the XServes are a cheap attempt at a server that just doesn't work. Its a mickey mouse hack from the beginning. And yes, I have set them up personally. Only 2, because I wont reccommend the purchase of anymore after THAT experiment.

  20. Why Apache? by chrysalis · · Score: 5, Informative

    I don't understand.

    Their article is about building a high performance web server, and they tell people to use Apache.

    Apache is featureful, but it has never been designed to be fast.

    Zeus is designed for high performance.

    The article supposes that money is not a problem. So go for Zeus. The Apache recommendation is totally out of context.

    --
    {{.sig}}
    1. Re:Why Apache? by khuber · · Score: 2, Redundant
      Any web server can be good enough as long as you spread the load over enough boxes. Apache is much more flexible than Zeus.

      -Kevin

    2. Re:Why Apache? by jimfrost · · Score: 2
      Apache is more flexible, but in traditional versions (1.x) you have a problem in that a new program instance is used for each request. That makes things like maintaining persistent connections to the application servers really hard.

      Using something like iPlanet each server instance opens a number of connections to each application server in your cluster; you get a nice connection pool that way. With the Apache design (again this is 1.x) you can't use a pool so TCP setup/teardown costs between the web server and the application servers start to be an issue.

      Not that people don't do it, but it's a lot less efficient.

      I can't speak for Zeus, and as I understand it the most recent version of Apache allows threaded deployments that can take advantage of connection pooling, but most high volume sites use IIS or iPlanet as their front end web server.

      --
      jim frost
      jimf@frostbytes.com
    3. Re:Why Apache? by jimfrost · · Score: 2
      You're talking about one particular application I imagine. MQ Series is actually pretty rare in large scale deployments, DB2 is like my third choice in databases, and I'd prefer not to use HTTP servers as the actual application server.

      YMMV.

      --
      jim frost
      jimf@frostbytes.com
    4. Re:Why Apache? by Electrum · · Score: 2

      Any web server can be good enough as long as you spread the load over enough boxes. Apache is much more flexible than Zeus.

      Sure, but if you need 2+ Apache boxes to handle the load of one Zeus box, wouldn't it make more sense to buy Zeus in the first place?

      I would like you to qualify your statement about Apache being more flexible. Zeus is a lot easier to configure than Apache. In what aspects is Apache more flexible?

      When it comes to mass virtual hosting, Zeus beats the pants off Apache. Zeus' configuration is fully scriptable out of the box. Apache's is not. Zeus can do wildcard subservers. Apache cannot. Zeus does not require restarting to make configuration changes or add sites. Apache does. Sites can only be added in Apache if using the very limited mass vhost module.

    5. Re:Why Apache? by crucini · · Score: 2
      Apache is more flexible, but in traditional versions (1.x) you have a problem in that a new program instance is used for each request. That makes things like maintaining persistent connections to the application servers really hard.

      Actually, Apache 1.x forks a number of children upon launch, the quantity specified by the StartServers parameter (default 5). It then forks and kills children as necessary to accomodate the load, keeping the number of spare (idle) processes between MinSpareServers and MaxSpareServers. So it always has a pool of spare servers to handle the next connection - it does not fork upon accepting a connection.

      Therefore database handles can be held by the process and used through multiple request/response cycles. Mod_perl users accomplish this transparently with the Apache::DBI module, which overrides the connect method of DBI, causing it to first draw from a pool of cached handles.

      Of course this technique can easily be applied to TCP connections to application servers, or any other reusable resource that takes time to acquire.
    6. Re:Why Apache? by Bedouin+X · · Score: 2

      No it wouldn't. Another webserver would cost less than a copy of Zeus.

      --
      Dissolve... Resolve... Evolve...
    7. Re:Why Apache? by jimfrost · · Score: 2
      This is true if you're using Apache as the application server, but most large web applications use the HTTP server as a front end, serving only static content. They refer requests to a back-end application for dynamic page generation, and often that application is running on a cluster of machines.

      If you're using session affinity to bind a session to a particular application server, which is pretty much a necessity for high-volume applications, then it's to your benefit if each HTTP server can hold a connection open to every application server.

      You can't do that on a 1.x Apache server because you'd end up having one connection for every app server and every Apache instance, and that can easily run into the tens of thousands of connections.

      With Apache, therefore, you usually build a new TCP connection with each request, which is not very efficient.

      --
      jim frost
      jimf@frostbytes.com
  21. Re:Not-so high performance by chrysalis · · Score: 4, Insightful

    The article is about *WEB* high performance.

    I don't see your point. "ping" has never been designed to benchmark web servers AFAIK.

    My servers don't answer to "ping". Does it mean that the web server is down? Noppe... it's up a running...

    "ping" is not an all-in-one magic tool. By using "ping" you can test a "ping" server. Nothing else.

    --
    {{.sig}}
  22. Server running at near 100% load by ssassen · · Score: 5, Informative
    From the SecureCRT console, connected through SSH1, as the backend is giving me timeouts. I can tell you that we're near 100% server load and are still serving out those pages to at least 1500 clients. I'm sure some of you get timeouts or can't even reach the server at all, for that I apologize, but we just have one of these, not a whole rack full of them.

    Have a good weekend,

    Sander Sassen

    Email: ssassen@hardwareanalysis.com
    Visit us at: http://www.hardwareanalysis.com

    1. Re:Server running at near 100% load by Anonymous Coward · · Score: 3, Insightful

      I'm sorry, but if your server cannot handle 2000 connections then NineNine is right, you have a crappy backend. How is the fact that you have Flash animation relevant? Isn't a 200k flash animation the same as a 200k jpeg from the server's point of view? If your server cannot handle 2000 connections, what business do you have writing an article about "high performance" webservers? It would be a different story if you entitled it "high performance webserver for less than $1000," but you didn't.

      Personally I think the new trend on Slashdot of "hey, I saw this article about ____, it's really insightful and just great!" being submitted by the author of that article is sort of shitty. If anybody knows about building a high traffic webserver, it would be Slashdot, so you'd think they'd be a little pickier about what they post regarding high performance servers.

    2. Re:Server running at near 100% load by happystink · · Score: 2

      Yeah, but 1500 clients WHAT? this minute, this second?

      --

      sig:
      See the "..for smart people" banners Wired runs here? Look elsewhere guys.

  23. Re:Building a Better Webserver in the 21st Century by khuber · · Score: 3, Informative
    I hate to do this, but actually MS has put out some good stuff that's relevant to larger sites.

    http://www.microsoft.com/backstage/whitepaper.htm

    -Kevin

  24. Re:Not-so high performance by Fluffy+the+Cat · · Score: 2, Informative

    Servers will generally carry on pinging even if they're heavily overloaded. Lag or missing packets is generally either a congested or bad link.

  25. how to build a high performance/reliable webserver by jacquesm · · Score: 4, Informative

    1) use multiple machines / round robin DNS
    2) use decent speed hardware but stay away from
    'top of the line' stuff (fastest processor,
    fastest drives) because they usually are not
    more reliable
    3) replicate your databases to all machines so
    db access is always LOCAL
    4) use a front end cache to make sure you use
    as little database interaction as you can
    get away with (say flush the cache once per
    minute)
    5) use decent switching hardware and routers, no
    point in having a beast of a server hooked up
    to a hub now is there...

    that's it ! reasonable price and lots of performance

  26. OK so where do I start? by SuperCal · · Score: 2

    I was really excited to see this article, because oddly enough I am seriously considering setting up my own webserver. In fact am thinking of running slashcode. So far everyone has been saying that the article generally sucks. So the question remains where should I start? I was thinking of buying a few of my company's used PCs and building a cluster... that scares me a bit, as I'm not a computer genius, but I can get a great deal on these computers (between 5 and 10 500mhz wintel computers)

    OK, I know that was rambling so to recap simply, is it better to go with a expenive single MP solution like the article, or with a cheaper cluster of slow/cheap computers

    --
    Business News and Resources: www.usasource.net
    1. Re:OK so where do I start? by ssassen · · Score: 3, Informative
      People are negative because the server has been unreachable for some, but they tend to conveniently forget that we did not design for 2000+ simultaneous clients, just a couple of hunderd really. Just thought I'd let you know, as we only have one of these whereas most websites (like Anand and Tom) have a rack full of them. Still we're handling the load pretty well and are serving out the pages to about 1500 clients.

      Have a good weekend,

      Sander Sassen

      Email: ssassen@hardwareanalysis.com
      Visit us at: http://www.hardwareanalysis.com

    2. Re:OK so where do I start? by drouse · · Score: 2, Insightful

      I wouldn't worry too much.

      Probably 90% of all non-profit websites could be run off a single 500 MHz computer and most could be run from a sub 100 MHz CPU -- especially if you didn't go crazy with dynamic content.

      A big bottleneck can be your connection to the Internet. The company I work for once was "slashdotted" (not by slashdot) for *days*. What happened was our Frame Relay connection ran at 100%, while our web server -- a 300 MHz machine (running Mac OS 8.1 at the time) had plenty of capacity left over.

      --
      -- I browse at +5 with stripped sigs ... Ha! Ha!
    3. Re:OK so where do I start? by cymen · · Score: 2

      Well where do you plan on putting all these boxes? Are you going to serve your pages over a DSL connection? Or colocate? If you are planning on colocating, you'll be investigating smaller sized servers, like 1U or 2U size, unless you have money to blow. To be honest, you should just setup one server and get some page hits. Then think about how you'll survive the hordes of people that may come in the future. Unless you're serving porn. I would imagine the loads are always fairly high on porn servers. Someone here can surely offer suggestions if porn is involved.

    4. Re:OK so where do I start? by SuperCal · · Score: 2

      Actually, I have been investigating forms of higher speed connections. My plan is to actually set up the hardware and get it running on a simple DSL connetion so I can work on content. After everything is up and running and when I start getting enough traffic that DSL becomes the bottleneck then I'll upgrade. At the moment I know the system I want is overkill, but I would rather do it right now so I can put off a hardware upgrade in the near future. For the moment my server is going to sit in my dinning room, but a friend has offered me space in his office (A big unused closet) when I need to move (the buisness ultra broadband providers here are to expencive, its much cheaper in the City).

      --
      Business News and Resources: www.usasource.net
  27. Apache 1.3x? by djupedal · · Score: 2

    What kind of 'high performance' web server uses back-leveled software? Apache 2.x may not be totally API compliant, but it certainly provides more than 1.3x in terms of performance.

    I am glad they used an IDE RAID, however. The SCSI myth can now go on the shelf.

    1. Re:Apache 1.3x? by Pizza · · Score: 2, Informative

      Actually, their disk tests are fundamentally flawed. RAID0 is only good for boosting raw sustained throughput; it has pretty much no effect on access time. If you want a boost in access time, go for RAID1, as you can load-balance reads across two drives.

      Furthermore, RAID0+1 is also not really worth it, as it still only gives you the ability to fail one drive, and instead of two logical spindle you only have one to do all of the work. But I suppose of your software is inflexible enough to only be able to operate on one partition, so be it.

      I'd like to see some numbers for their boxes loaded up with RAM and high numbers of random I/O operations, which is where the high rotational speed of modern SCSI drives really shine. And this is the access pattern of a dynamic database-driven web site.

      And as others have said, it's not the hardware that makes the most difference in these circumstances, it's how the software is set up, and how the site/database is coded.

      Hell, I've completely saturated a 100mbps network serving dynamic content via pure Java Servlets, and this was only a dual P3-650. With a RAID5 array of 50G 7200RPM SCSI drives, hardly cutting edge even at the time. Dropping in a RAID1 array of WD120 IDE drives couldn't come anywhere close. But once the working set of data was loaded into RAM, they both performed about the same.

      Their IDE raid setup is certianly considerably cheaper though, and that's a tradeoff that most people can easily make.

      --
      -- I ain't broke, but I'm badly bent.
    2. Re:Apache 1.3x? by GoRK · · Score: 4, Insightful

      Their IDE-RAID is actually software RAID. The SCSI myth can go off the shelf, sure, but don't take the RAID myth down.

      The promise FastTrak and Highpoint and a few others are not actually hardware RAID controllers. They are regular controlers with enough firmware to allow BIOS calls to do drive access via software RAID (located in the firmware of the controller), and OS drivers that implement the company's own software RAID implementation at the driver level, thereby doing things like making only one device appear to the OS. Some of the chips have some performance improvements over a purely software RAID solutions, such as the ability to do data comparisons between two drives in a mirror during reads, but that's about it. If you ever boot them into a new install of windows without preloading their "drivers", guess what? Your "RAID" of 4 drives is just 4 drives. The hardware recovery options they have are also pretty damned worthless when it comes to a comparison with real RAID controllers - be they IDE or SCSI.

      A good solution to the IDE RAID debacle are the controllers by 3Ware (very fine) or the Adaptec AAA series controllers (also pretty fine). These are real hardware controllers with onboard cache, hardware XOR acceleration for RAID 5 and the whole bit.

      Anyway, I'm not really all that taken aback that this webserver is floundering a bit, but seems really responsive when the page request "gets through," so to speak. If it's not running low on physical RAM, it's probably got a lot of processes stuck in D state due to the shit promise controller. A nice RAID controller would probably have everything the disks are thrashing on in a RAM cache at this point.

      ~GoRK

  28. More Advice from the site by HappyPhunBall · · Score: 4, Funny
    Once you have the hardware setup and the software configured, it is time to design your site to perform. The following tips will help you create a site that is just as scalable as ours. Enjoy.
    1. Use lots, and I mean lots of graphics. Cute ones, animated ones, you name it and people expect to see them. Skimping here will hurt your image.
    2. CSS style sheets may be the way of the future, but just for now make sure you include dozens or even hundreds of font tags, color tags, and tables in your site. Trust us. This has the added benefit of increasing your page file size by at least 30%. You do want a robust site right?
    3. Make sure you are serving plenty of third party ads! Their bandwidth matters also, and you know the way to make money on the web is be serving lots of "fun" animated ads. This will not slow down the user experience of your site one bit! Those ad people are slick, they know that you are building a high bandwidth / high performance site and will be expecting the traffic.
    4. A site is not a high performance site until is has withstood the infamous Slashdot effect. You will want to post a link to your site on /. post haste to begin testing.
    That should be enough to get you started. Now you too can build a rocking 200K per page site, and having read our hardware guidelines, you can expect it to perform just as well as ours did. One more free tip: Placing a cool dynamic hit counter or traffic meter on your site in a prominent position will encourage casual visitors to hit the reload button again and again, driving the performance of your site through the roof.
  29. Re:how to build a high performance/reliable webser by SuperCal · · Score: 2

    Thanks, I wish I hadn't posted earily in this article so I could use my mod points. Now, my only question is how fast is decent speed? I'm about to build my own server (actually I'm going to have some help, but I want to at least sound like I know what I'm doing) nothing fancy. I don't expect a huge hit count or anything, so would using older (500-750 mhz)second hand computers, properly upgraded memory and storage, work? Also would you recomend replacing the powersuply. One the guys whoes helping me swears that will save me money in the long run on energy costs, but I don't know if its worth the cost.

    --
    Business News and Resources: www.usasource.net
  30. How not to get slashdotted? by Bahamuto · · Score: 2, Funny

    Does building this high performace web server prevent you from being slashdotted?

  31. how nice of them by twitter · · Score: 2
    Current bandwidth usage: 214.98 kbit/s

    Draw your own conclusions.

    How nice of them to share that information.

    The obvious conclusion is that my cable modem could take a minor slashdoting if Cox did not crimp the upload and block ports. Information could be free but thanks to the local Bell's efforts to kill DSL things will get worse until someone fixes the last mile problem.

    The bit about IDE being faster than SCSI was a shocker. You would think that some lower RPM SCSIs set to strip would have greater speed and equivalent heating. The good IDE performance is good news.

    --

    Friends don't help friends install M$ junk.

    1. Re:how nice of them by strobert · · Score: 2

      Not sure if you noticed but they tried using the AMI megaraid controllers. They should have tried a Mylex. In spite of what Dell tech support witll tell you (the PERC in the Dell's is a branded MegaRaid) that i960 based boards just have the performance issue, the Mylex DAC960 is i960 based and hums along just fine. I have seen 2-5x write performance increases going between the PERC and the Mylex -- and yes just proved this to management recently.

  32. Re:how to build a high performance/reliable webser by Electrum · · Score: 3, Interesting

    3) replicate your databases to all machines so
    db access is always LOCAL


    This is probably a bad idea. Accessing the database over a socket is going to be much less resource intensive than accessing it locally. With the database locally, the database server uses up CPU time and disk I/O time. Disk I/O on a web server is very important. If the entire database isn't cached in memory, then it is going to be hitting the disk. The memory used up caching the database cannot be used by the OS to cache web content. A separate database server with a lot of RAM will almost always work better than a local one with less RAM.

    This Apache nonsense of cramming everything into the webserver is very bad engineering practice. A web server should serve web content. A web application should generate web content. A database server should serve data. These are all separate processes that should not be combined.

  33. Re:how to build a high performance/reliable webser by jcrowe · · Score: 2, Informative

    The company I work for successfully runs our webserver(php & MySQL) on an old pentium 166. We have several thousand visitors every month & use it for an ftp site for suppliers, a router, firewall, gateway & squid server.

    I think that your 700mhz machine would work fine for just web pages. :)

  34. This is wrong on soooo many levels. by (H)elix1 · · Score: 5, Interesting
    (include standard joke about high performance web serving getting /.)

    I'd post sooner, but it took forever to get to the article.. here are my thoughts...

    First off SCSI.

    IDE drives are fast in a single user/workstation environment. As a file server for thousands of people sharing an array of drives? I'm sure the output was solid for a single user when they benched it... looks like /. is letting them know what multiple users do to IDE. 'Overhead of SCSI controller'... Methinks they do not know how SCSI works. The folks who share this box will suffer.

    Heat issues with SCSI. This is why you put the hardware in a nice climate controlled room that is sound proof. Yes, this stuff runs a bit hot. I swear some vendors are dumping 8K RPM fans with ducting engineered to get heat out of the box and into the air conditioned 8'x19" chassis that holds the other 5-30 machines as well.

    I liked the note about reliability too... it ran, it ran cool, it ran stable for 2 weeks. I've got 7x9G Cheetahs that were placed into a production video editing system and ran HARD for the last 5+ years. Mind you, they ran about $1,200 each new... but the down time cost are measured in minutes... Mission critical, failure is not an option.

    OS

    Lets assume the Windows 2000 Pro was service packed to at least SP2... If that is the case, the TCP/IP stack is neutered. Microsoft wanted to push people to Server and Advanced Server... I noticed the problem when I patched my counter strike server and performance dogged on w2kpro w/sp2 - you can find more info in Microsoft's KB... (The box was used for other things too, so be gentle) Nuking the TCP/IP stack is was the straw that cracked my back to just port another box to Linux and run it there.

    Red Had does make it easy to get a Linux box up and running, but if this thing is going outside the firewall, 7.3 was a lot of work to strip out all the stuff that are bundled with a "server" install. I don't like running any program I did not actually install myself. For personal boxes living at my ISP, I use slackerware (might be moving to gentoo however). Not to say I'm digging through the code or checking MD5 hashes as often as I could, but the box won't even need an xserver, mozilla, tux racer, or anything other than what it needs to deliver content and get new stuff up to the server.

    CPU's (really a chassis problem):

    I've owned AMD's MP and Intel's Xeon dually boards. These things do crank out some heat. Since web serving is usually not processor bound, it does not really matter. Pointing back to the over heating issues with the hard drives, these guys must have a $75 rack mount 19" chassis. Who needs a floppy or CD-ROM in a web server? Where are the fans? Look at the cable mess! For god's sake, at least spend $20 and get rounded cables so you have better airflow.

    1. Re:This is wrong on soooo many levels. by seanadams.com · · Score: 3, Interesting

      IDE drives are fast in a single user/workstation environment. As a file server for thousands of people sharing an array of drives? I'm sure the output was solid for a single user when they benched it... looks like /. is letting them know what multiple users do to IDE. 'Overhead of SCSI controller'... Methinks they do not know how SCSI works. The folks who share this box will suffer.

      Methinks it's been a LONG time since you've read up on IDE vs SCSI, and me also thinks you dont have the first clue about how a filesystem works. Yes, there was a time when IDE drives were way slower, mainly because the bus could only have one outstanding request at a time. IDE has since advanced to support tagged command queuing and faster data rates, closing the gap with all but the most horrendously expensive flavors of SCSI. Really, the bottleneck is spindle and seek speed - both IDE and SCSI are plenty fast now.

      The only thing SCSI really has going for it is daisy-chainability and support for lots of drives on one port. HOWEVER there are some really killer things you can do with IDE now. In my web server I'm using the promise RM8000 subsystem: a terabyte of RAID5 storage for about $3500 including the drives IIRC. Try doing that with SCSI drives!

      Anyway.... you suggest that this server is slashdotted because it's disk-bound. Serving the exact same page over and over again. Uh huh. Go read up on any modern file system, then figure out how long it takes to send a 100KB web page to 250,000 people over a DSL line, and then tell me where you think the problem lies.

    2. Re:This is wrong on soooo many levels. by (H)elix1 · · Score: 2

      Many of the 'good ideas' for CPU design, HDD, etc seem to merge together. IDE drives are phenomenally better than they use to be. The last audio workstation used RAID 0/1 IDE drives because it was fast and solid enough. Heck, even the box I built for my wife to do photoshop work was only RAID 0 with a pair of 80G IDE drives.

      IDE has since advanced to support tagged command queuing and faster data rates

      This part of the controller or the RAID card doing the work? Great news if it is. (Then my old KT7A-RAID can be put to better use than it is). I'm all for right tool, right job... but when I hear heavy beating on a web server, I would not use a low end sun box either. Personal or hobbyist grade is one thing... but I'm pounding code for one of the major dot com this weekend (my life sucks) that expects to handle millions of requests. This box is closer to what I would put out there for a game server - counter strike size, not everquest....

      you suggest that this server is slashdotted because it's disk-bound
      Nope - If I was to put money on it, it looks like bad code is the problem here. I suspect someone went nuts with the server side code generation.

      My biggest complaint was they could not deal with the heat. Such an easy problem to fix...

  35. Re:how to build a high performance/reliable webser by Anonymous Coward · · Score: 2, Interesting

    Not so...
    You can cache with technologies like Sleepycat's DBM (db3).

    We have a PHP application that caches lookup tables on each local server. If it cant find the data in the local cache, then it hits our Postgresql database. The local DBM cache gets refreshed every hour.

    Typical comparison
    -------------------
    DB access time for query: .02 secs
    Local cache (db3) time: .00003 secs

    We server load dropped from typical 0.7 to an acceptable 0.2, and the load on the DB server dropped like a rock! This is with over a million requests (no graphics, just GETS to the PHP script) every day.

    We also tuned the heck out of Apache (Keepalive, # of children, life of children etc).

    Some other things we realized after extensive testing:
    1. Apache 2.0 sucks big time! Until modules like PHP and mod_perl are properly optimized, there's not much point in moving there.
    2. AolServer is great for Tcl, but not for PHP or other plugin technologies

    Because of all these changes, we were able to switch from a backhand cluster of 4 machines, back down to a single dual processer machine, with another machine available on hot standby. Beat that!

  36. Re:Not-so high performance by chrysalis · · Score: 2

    ICMP REPLY doesn't exist. Maybe you mean ICMP ECHO REPLY which has nothing to do with MTU discovery.

    --
    {{.sig}}
  37. Not to flame, but the article is bad for newbies by Anonymous Coward · · Score: 2, Insightful

    I'll just mention a couple of items:

    1) For a high performance web server one *needs*
    SCSI. SCSI can handle multiple request at one time and performs some DISK related processing compared to IDE that can only handle request for data single file and uses the CPU for disk related processing a lot more than SCSI does.

    SCSI disk also have higher mean times to failure than SCSI. The folks writting this article may have gotten benchmark results showing their RAID 0+1 array matched the SCSI setup *they* used for comparison, but most of the reasons for choosing SCSI are what I mention above -- not the comparitive benchmark results.

    2) For a high performance webserver, FreeBSD would be a *much* better choice than Redhat Linux. If they wanted to use Linux, Slackware or Debian would have been a better choice than Redhat Linux for a webserver. Ask folks in the trenches, and lots will concur with what I've written on this point due to mainenance, upgrading, and security concerns over time on a production webserver.

    3) Since their audience is US based, It would make sense to co-lo their server in the USA. Both from the standpoint of how many hops packets take from their server to their audience, and from the logistical issues of hardware support -- from replacing drives to calling the data center if there are problems. Choosing a USA data center over one in Amsterdam *should* be a no brainer. Guess that's what happens when anybody can publish to the web. Newbies beware!!

  38. Re:Not-so high performance by chrysalis · · Score: 2

    You are pinging Sourceforge.

    --
    {{.sig}}
  39. Slashdotted by entrylevel · · Score: 3, Funny

    Ooh! Ooh! I really want you guys to teach me how to build a high performance webserver! What's that? You can't, because your webserver is down? Curses!

    (Obligatory disclaimer for humor-impaired: yes I understand that the slashdot effect is generally caused by lack of bandwidth rather than lack of webserver performance.)

    --
    Karma: Incomprehensible (Mostly affected by posting at +5, reading at -1, and metamoderating everything unfair.)
  40. Re:how to build a high performance/reliable webser by jacquesm · · Score: 2, Interesting

    we serve up between 5 and 7 million pageviews daily to up to 100,000 individual IP's

    Decent speed to me is one in which the server is no longer the bottleneck, in other words serving up
    dynamic content you should be able to saturate the pipe that you are connected to.

    I have never replaced the power supply because of energy costs, it simply isn't a factor in the
    overal scheme of things (salaries, bandwidth, amortization of equipment)

    500-700 Mhz machines are fine for most medium volume sites, I would only consider a really fast machine to break a bottleneck, and I'd have a second one on standby in case it burns up

  41. Re:Advice from the wise: by Hast · · Score: 2

    Really? There was an earlier discussion on this topic. (Related to 9/11 or some other day with extremely high traffic.)

    From that discussion I got the impression that what happens when you are bumped to the front page is that you have tried to access a story with non-standard setup. (What you get if you are logged in and change your view preferences.) The system is setup so that some servers only serve static content. (Because that's what most users view.)

    During high load situations a dynamic request is sometimes sent to a static serving server. This is when you are bumped to the front page. (Unfortunately I couldn't find anything about this in the FAQ/About, so I can't verify it.)

  42. "millions of page views every month" not High-Perf by Anonymous Coward · · Score: 3, Insightful

    Too bad "millions of page views every month" is simply not even in the realm that would require "High-Performance Web Server"(s). These guys need to come back and write an article once they've served up 5+ million page views per day. Not hits. Page views.

  43. Re:Not-so high performance by Saint+Aardvark · · Score: 2
    Hehe...for some reason the idea of building a ping server strikes me as funny.

    We tested it in the workshop by hooking it up to a 3Com X500 Terabit Switch, and using over 500 RedHat servers to ping -f. This baby handled it well -- the time we'd spent optimizing the Oracle backend really paid off.

    Yeah. Or maybe I should just have more coffee...

  44. how 2 test a so-called "high-performance" server by eagleyezx · · Score: 2, Funny

    1. load it full of pr()n
    2. post the link on /.
    3. check back in 30seconds

    if it still works, it's high-performance

  45. Western Digital Drives?? by zentec · · Score: 3, Interesting


    The mere fact that they recommended 7200 rpm Western Digital drives for their high performance system gives me the impression they haven't a clue.

    I disagree with the assertion that a 10,000 rpm SCSI drive is more prone to failure than a 7,200 IDE drive because it "moves faster". I've had far more failures with cheap IDE drives than with SCSI drives. Not to mention that IDE drives work great with minor loads, but when you start really cranking on them, the bottlenecks of IDE start to haunt the installation.

  46. Re:Redefinition of irony by elemental23 · · Score: 2

    Guy who didn't read the article makes an uninformed M$ bash and gets modded to four...

    The Microsoft line was the poster's sig. Check your Slashdot preferences, there's an option to include a "--" between post content and sig. I don't know why this isn't on by default, it eliminates mistakes like this.

    (I added the "--" to my sig myself because it seems a lot of people don't have this enabled)

    --
    I like my women like my coffee... pale and bitter.
  47. They forgot the important bits by ToasterTester · · Score: 2

    This setup doesn't account for HA or scaleability. With hardware as cheap as it is today there is no excuse for not using multiple servers to avoid downtime, and allow for maintenace without taking the site down. Also what about backup, not even mentioned. Last I don't fully agree with the RAID 0 + 1. For a large database, but on a small setup like this I wouldn't. They article seems to imply the data is more read than write RAID 5 has better read performace.

    So article was missing a lot for a professional setup.