Slashdot Mirror


Ask Slashdot: Optimizing Apache/MySQL for a Production Environment

treilly asks: "In the coming weeks, the startup company I work for will be rolling out a couple of Linux boxes as production webservers running Apache and MySQL. Management was quick to realize the benefits of Linux, but I was recently asked: "Now that we're rolling out these servers, how do we optimize out of the box RedHat 6.0 machines as high performance web and database servers in a hosting environment"?

22 of 143 comments (clear)

  1. Check out the optimization tips page at apache.org by Anonymous Coward · · Score: 4
    At the Apache.org web site there is a guide to optimize Apache's performance.

    Also Dan Kegel wrote an interesting web page in response to the whole Mindcraft NT/IIS vs. Apache/Linux fiasco and on that page are several detailed measures to improve Apache's performance under Linux:

    Dan Kegel's Mindcraft Redux page
    Apache Week 'zine

    ...as for my own personal experience w/ Apache I learned that when compile Apache, remove any Apache modules you won't be needing saves plenty of RAM, and in the httpd.conf file you want to set StartServers, MaxClients, and MaxRequestsPerChild so that Apache does not spawn new children too often -- the trick is before you start Apache look at "top" count the number of processes, now start Apache under normal traffic conditions, look at number of processes you're running now to see how many http children are running -- whatever that number is add 10, and that should be your StartServers setting. The MaxRequestsPerChild default is 30 but I like to crank it up to 300 or more so that http children are not being killed and recreated too often (the reason for that setting was to avoid possible memory leaks from sucking up all your RAM which hasn't been a problem with the httpd's I've worked with)

  2. Here's how I handle several million hits per day.. by Anonymous Coward · · Score: 5

    Basically, I just winged it. My site started out with maybe ten thousand hits per day, but quickly (over the course of two years) ramped up to about 5 million hits a day. I just hacked together some Perl scripts, and when I need to make changes, I just try 'em out on the production server. Who needs beta testing? If there are performance problems, I just buy faster hardware. If there are stability problems, people are understanding, after all, I *am* using Linux.

    Sincerely,

    Rob Malda

  3. Re:General purpose advice by Tom+Rothamel · · Score: 2
    Eliminate the use of directory overrides (via .htaccess) wherever possible. They're usually not worth it.

    Not only that, turn them off. (AllowOverrides None, IIRC) If you simply don't use them but have them enabled anyway, you pay the price WRT all the stat(2) calls the server does looking for them.

    This is all IIRC, but I usually have a good memory. Then again, I did just wake up.

  4. Re:MySQL vs PostgreSQL by Ranger+Rick · · Score: 2

    Basically, it comes down to: Postgres is much more complete (it has more of the SQL spec implemented -- transactions, etc.). MySQL is much faster. It all comes down to how you expect to use it. If you are going to be doing complex joins and transactions and such, MySQL probably won't cut it (yet), otherwise, MySQL (most definitely!) makes up in speed what it lacks in features.

    There's obviously more to it than that, but I'm not aware of any specific comparisons...

    --

    WWJD? JWRTFM!!!

  5. Clarification of LIKE vs. == by rodent · · Score: 2
    By LIKE I was refering to full substring matching using LIKE "%foo%". "foo%" will use the index but "%foo%" won't and on my server running 4 %% types would bring it to a crawl. I finally had to go with exporting the table to a flat file after updates and use an awk script to search. It can handle 40 concurrent searches with awk.

    --
    rodent...
    Tactical nuclear weapons are a viable alternative!
  6. RAM & RAID 1+0 is your friend. by rodent · · Score: 3
    Personally, I designed and currently admin a site that gets about 1 mil hits/day. Over a weeks time it averages about 50 queries/second with peaks at 500 queries/sec. The setup is dual p2/400's with 512 megs ECC (soon to be a gig) and the db's on a lvd scsi drive. The db's run a total of 2 gigs. There's typically 50 apache processes running at a time.

    As for optimization, definately check your queries and always use keyed fields and == queries. Doing like queries will kill your performance to being unusable on decently large tables (>100k records). Definately read the MySQL docs concerning RAM usage and the various switches to optimize it's RAM usage. That is extremely important.

    As for Apache, don't use .htaccess at all costs and only compile in required modules. Also check the tuning FAQ mentioned above.

    --
    rodent...
    Tactical nuclear weapons are a viable alternative!
    1. Re:RAM & RAID 1+0 is your friend. by gampid · · Score: 2

      Not to be heretical or anything but I was doing some benchmarking on MySQL's LIKE versus == matching on int's. It was actually faster using LIKE. I don't know why but I suspect it's because LIKE uses some sort of binary tree to find the int and the == tries to walk through them. This is not the case when you're using like to match a string or substring, in that case == seemed to work better.

      -Evan

      --

      The power of technology is manifest in how it is applied within the social matrix.
  7. Idiot of the year award by Eric+Green · · Score: 2

    "HTML Programmer", eh? Talk about a skill that will be obsolete in 20 years, when we're all using XML and have WYSIWYG XML editors...

    BTW, programmers write programs, not text. So "HTML Programmer" is a misnomer in the first place -- that should be "HTML page creator".

    -E

    --
    Send mail here if you want to reach me.
  8. Tuning webservers by jd · · Score: 3
    Here's my quick list of things I do, when tuning the webservers I've set up in the past. Note: I offer NO guarantees to the usefulness of this information. For all I know, it'll turn your pet hamster into a frog.

    0) If you have LOTS of RAM, compile Apache, MySQL and optionally Squid with EGCS+PGCC at -O6. The extra speed helps.

    1) Guesstimate the number of simultaneous connections I'm likely to have.

    2) Guesstimate how much of the data is going to be dynamic, and how much static.

    3) IF (static > dynamic) THEN install Squid and configure it as an accelerator on the same machine. Give most of the memory over to Squid, and configure a minimal number of httpd servers. You'll only need them for accesses of new data, or data that's expired from the cache.

    4) IF (static 5) If you've plenty of spare memory, after all of this, compile the kernel with EGCS+PGCC at -O6, but check it's reliability. It's not really designed for such heavy optimisation, but if it works ok, the speed will come in handy.

    NOTE: Ramping up the compiler optimiser flag to -O6 does improve performance, but it also costs memory. If you've the RAM to spare, it is sometimes worth it.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  9. To all you bitter schmucks with nothing else to do by dclatfel · · Score: 2

    Yeah, I posted the Rob Malda as Anonymous Coward question - mostly because I wanted clarification on it. It seemed very odd, and yes, I knew it was probably just some anonymous lame-oid. Now ... to the rest of you (mostly A-Cowards as well) who choose to harsh on me because I merely question this ... Screw you.

    Trust me ... I would put my IQ points up against any of yours any day of the week. And yes, I did list HTML on my resume as a programming language, because in the positions I would be interested in ... there's very little reason to treat it otherwise. By profession, I'm a researcher, not a programmer.

    So, as I said, to everyone who has so little better to do than scan Slashdot waiting for opportunities to flame others (under Anonymous Coward status), screw off.

    Cordially yours,
    David

    --
    Share data. Share code. Share ideas. Share the wealth.
    http://stockfilter.org
  10. Re:RAID! by James+Manning · · Score: 2

    First, PLEASE don't point people to that horrible howto... as soon as Linus will accept the real software raid versions (and howto) available over at:

    http://metalab.unc.edu/pub/Linux/kernel.org/pub/li nux/daemons/raid/alpha/

    and

    http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/

    Second, realize 0+1 (typically 1+0, or RAID 10) only gives you half of total physical space in effective space.... sometimes you can afford that, sometimes you can't... and you still generate the scsi bus loads of the full drive set :)

    In the very typical (especially in these situations) case of reading the databases, it's worth agreeing that 1+0 becomes 0+0 (since you can split reads across a raid1, assuming no failed drives)

    Last, as a side note to the mysql part, try to use isamchk (if the db server can have any down time) for pre-sorting your database instead of doing the sorting as part of your SQL

  11. a couple ideas by felix · · Score: 3
    So some of the best things you can do have already been mentioned - split out your database from your front end webservers, let the backend have it's own machine and run raid 0+1 on the db server. The frontends won't need the raid since they'll be serving a lot of the static stuff out of cache.

    Some other ideas, are to split image serving onto it's own apache, not necessarily it's own box. This apache can be completely pared down to absolute minimum modules, since all it will be doing is serving up static images. It also let's cache be used efficiently, since mostly the common images will be stored. As opposed to common images contending with common text files for cache space if images and content are served from the same apache.

    Also, what are you using in apache to create dynamic pages and connect to the db? Use long running processes where possible, which means pick mod_perl, php, fastCGI, servlets, etc... over plain cgi scripts. This will save you lots of cycles and also let you have persistent db connections. Always a very good thing.

    Taking the splitting out of machines to the next level, you could also try splitting all of your dynamic content to it's own machine, mod_proxied through your front end apache's. This makes the front ends very small since they barely need any modules installed at all. It also gets some extra performance out of your dynamic content apaches. Of course you're running a lot of boxes now. :)

    Read this if you're running mod_perl. And read this to optimize your db.

  12. Linux and Databases by doomy · · Score: 2

    I work in a research lab that does a lot of databases on Linux. We started off with msql and then graduated to mysql. We were initially running redhat with msql and slowly moved to Debian, since we felt it was a more stable server distribution. Also it was more configurable, and we were able to tweak almost anything in the system to it's limit. Recently, we moved to Oracle 8i, but we kept our mysql around.

    Some of the thing you might need to know. If your going to do some serious databases, I recommend you spend more money in faster harddisks (SCSI preferble, multiple disks (oracle runs very nicely with the database spanned over 3-4 disks and the program running on another disk -- partitions wont do ). Have a generous amount of RAM and swap. If your making this a database box, dont use it for anything else. Even hosting a web server is not a good idea (As far as I'm concerned). Use WebDB if you like and host the database box seperately with just the database running as the main application.

    Make sure you have a stable kernel. Make sure you have a secure system. Use ipchains to block out anything but local and remove all telnet and other daemons. Security is something a lot of people forget when making large databses.

    Make sure you make daily, if not hourly backups (based on how sensitive your data is). RAID is a good way to keep your system running. Also if your database is web based, you might need to have 2 or 3 boxes set up identically and databse queries being distributed over all of them.

    With Oracle, read everthing, they have a lot of tweaks listed on their pdf files and documents that come with the dist. Read all of them. Some tweaks are to the kernel. So pick a good stable kernel and stick to it. Forget about monthly kernel upgrades. I recommend yearly or every 6 months kernel upgrades. Software wise, if your doing Oracle 8i, make sure it's a glibc2.1 system (RH6 and debian potoato (we use potato, even though it's unstable, it lets us tweak the system and gives us the most familier interace ).

    On mysql, it might help to read some of the online tweaks, also it might be a good idea to compile the server yourself, instead of using the one that came with your dist. Or compile it and copy it over what came with your distribution. Dont use msql unless there is no other way to do it.


    And good luck.
    --

    --
    ...free your source and the rest would follow...
  13. Re:MySQL, ?? by kuro5hin · · Score: 2
    Depends on what you want your DB to do, really. I can't speak specifically to syBase (I've also heard good things about it) but I know why we use mySQL. It's fast, and very low overhead for queries.

    The caveat to this, of course, is that you must know how to set up your database right. I recently had an opportunity to play around with a fairly large db (upwards of 400,000 records) on mySQL. The records represent people, and some of the fields are birth month, birth date, last name and first name. I wanted to select las and first names for people who were born today. So, with no indexes, the query selected about 600 records, and took 11.8 seconds. Yes, that's right, 11.8 seconds. I was floored! Here's me thinking "mySQL's fast! It'll work great!" Well.

    So then I went back through and indexed (birth month, birth date), checked that I had done it right with EXPLAIN, and ran the exact same query again. This time it took 0.8 seconds. A total time savings of 11 seconds. I learned an important lesson that day... Always index everything you're going to use as a key! With this in mind, mySQL is indeed damn fast, and low overhead.

    Now, the other thing I can't really speak to is reliability. mySQL doesn't really support referential integrity, and I guess it's up to you whether you need it or not. I've seen my share of M$-trained database folks who use CASCADE as a cheap crutch to paper over their bad code. Rather than write queries that do what they really wnat them to do, they just spend the extra overhead to have CASCADE's do it for them. I've also seen times where this was crucial to a db's function. Either way, it's something to consider. I've also never seen mySQL handle failure, or had to rebuild it after one. Whatever you usde, your strategy should account for this possibility, in any case.

    --
    There is no K5 cabal.
    I am not the real rusty.
  14. Re:Flamebait? by orabidoo · · Score: 2

    well of course PHP will fun faster than perl *as CGI*. use mod_perl and be happy. PHP is a pretty close imitation of perl, but perl has a much more complete, mature and flexible programming environment. the main advantage of PHP is that, by being simpler and smaller, it's easier to start working with. for a complex site, where you want a fair amount of real programming on the backend, I'll take perl over PHP anyday. mod_perl has many modes of operation; the simplest (Apache::Registry) emulates CGI scripting without much of the fork/compile overhead. Embperl lets you put perl inside the html (this is the way PHP does it too), and you can write complete handlers if you want too.

  15. Re:Performance tips for Apache... by fdicostanzo · · Score: 2

    with linux caching, this isn't really necessary. with enough memory, the whole thing will be in memory anyway

    --
    Synergies are basically awesome, and they're even better when you leverage them. -PA
  16. Major Performance Boost by wintahmoot · · Score: 4

    I haven't read all of the previous comment, so it may well be that this has been posted before.

    Okay, this is how I generally do it. First of all, I suppose that you're using Perl, so these tips are for a Perl/Apache/MySql environment.

    1) Use mod_perl so that your script doesn't neet a whole perl compiler for each separate instance in memory. The performance boost is just incredible...

    2) Use Apache::DBI. It will prevent your script from connecting and disconnecting your DB each time it's called and rather use a persistent database connection. Great for performance.

    There are some other tweaks that you can do. If you're interested, just let me know...

    Wintermute

  17. Ready-made solutions by rde · · Score: 3

    There are ready-made solutions out there such as E-smith; you can download a cd image (or even buy the cd), and it'll install the system with extras built in; it's designed to be an 'out-of-the-box' sorta thing.

  18. Optimizations by platinum · · Score: 4
    First of all: IMO, if you have to ask how to optimize your company's equipment in a forum such as this, you need some real help (perhaps of the mental variety). There are a plethora of web sites on optimizing systems. OTOH, I might as well share our experiences.

    Our company uses Apache, MySQL, and PHP extensively (and exclusively). You can't beat the price/performance ($0.00 / excellent == great value). Thorough our research, we settled with the following combination:
    • Web Server: FreeBSD 3.2-STABLE with Apache 1.3.9 / PHP 3.0.9 on a PII-400 w/128 Meg RAM, IBM 4.55G U2W Drive. Due to FreeBSD's proven track record for Web/Network performance, stability, and security (e.g. Yahoo, wcarchive, and others), it's a natural.
    • SQL Server: Linux 2.2.x with MySQL 3.22.25 on a PII-400 w/256 Meg RAM, IBM 4.55G U2W System Drive and a Mylex AcceleRAID 250 w/4 IBM 4.55G U2W Drives in a RAID-5 configuration. Linux was the obvious choice when considering MySQL performance and driver availability wrt RAID controllers.
    Optimization suggestions:
    • Apache: Ensure you have adequate spare servers to handle the connections (StartServers, MaxSpareServers, MaxClients, and MaxRequestsPerClient in the config); nothing sucks more than clients not being able to connect. Also, if you are using embedded script of some sort (PHP, Perl, etc.), use modules compiled into Apache (mod_perl, etc.); this should significantly increase speed and decrease the overhead of reloading the module for each access.
    • MySQL: Tweak the applicable setting as appropriate. We increased (usually doubled in most cases) the following: Join Buffer, Key Buffer, Max Connections, Max Join Size, Max Sort Length, Sort Buffer, and Sort Buffer). If possible, depending on the amount of data, get as much memory in the system as possible. If the OS can maintain frequently used data cached, disk access won't be required which significantly increases the speed of queries, etc. In addition, get rid of that pre-compiled MySQL and compile it yourself. If possible, optimize using egcs/pgcc for your platform. Also, compile mysqld statically; this will increase it's memory overhead a bit but can increase it's speed by 5 - 10% by not using shared libraries.
    • Storage: For optimum speed, use SCSI (of course). For our data, we require RAID 5 for redundancy. If that is not required, RAID 0 (striping) can be used for increased speed. The optimal way is to use hardware RAID (external RAID or RAID controller). Luckily, Linux has drivers for quite a few different RAID controllers that are available for a reasonable price.
    • Linux: Beware of Redhat's security problems, disable all unnecessary services, et. al. Seek out security-oriented and Linux performance-tuning sites for more suggestions.
    • General: Don't skimp on hardware. A cheap component, be it a drive, network card, motherboard, or whatever, if it fails, will cause unrecoverable downtime. We decided on Intel NL440BX boards (serial console/BIOS support is nice), PII-400's, and IBM SCSI drives in both boxes. If one box were to have a catastrophic failure, the other is able to perform both webserver and SQL server functions if necessary. We can also simply replace a failed component with one pulled from a similarly-configured non-production (test) box, or just swap boxes altogether.
    Both Apache and MySQL have good sections on performance tuning. Do not be afraid to RTFM.

    Any questions/comments can be directed to me. Flames directed to /dev/null.
  19. Performance tips for Apache... by mgreenwood · · Score: 2

    Memory is pretty damn cheap -- I've been running my web server off a ramdisk. Archive your web server in a tar ball then just expand it onto the ram disk... just don't put your db there :-)

  20. Re:All the money in the world by coyote-san · · Score: 2

    Postgres is totally free and supports transactions. It might not have the performance of Oracle, but it doesn't have the cost of Oracle either. :-)

    --
    For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
  21. Well, here's what I know (for what it's worth) by jguthrie · · Score: 2
    The database-driven websites that I have The Houston Northwest Bar Association Website (with an attorney finder) and the The C Bookstore (plug!plug!plug!) are based on PostgreSQL rather than mySQL so I don't know how well these lessons apply, but I've learned that PostgreSQL has a considerable overhead to each query so one big query is better than lots of little queries.

    Of course, neither of those sites is particularly busy and I'm more proud of the management utilities than the sites themselves, but that's par for this course.

    The thing I did learn was that using perl and CGI is quite clumsy for this sort of thing. I eventually switched to PHP3 because everything goes together much faster. I don't know what it does to the performance, but since both sites are being served from the world's slowest Web server hardware (the database server is a 486dx2-80 and the database server has the HNBA website on it but the C Bookstore Web server is the 5x86-120 that I use for most of the four dozen or so domains that I host) and performance is not that big an issue, I'm not all that worried. It'd be nice if it got some hits, though.