Ask Slashdot: Optimizing Apache/MySQL for a Production Environment
treilly asks: "In the coming weeks, the startup company I work for will be rolling out a couple of Linux boxes as production webservers running Apache and MySQL. Management was quick to realize the benefits of Linux, but I was recently asked: "Now that we're rolling out these servers, how do we optimize out of the box RedHat 6.0 machines as high performance web and database servers in a hosting environment"?
Also Dan Kegel wrote an interesting web page in response to the whole Mindcraft NT/IIS vs. Apache/Linux fiasco and on that page are several detailed measures to improve Apache's performance under Linux:
Dan Kegel's Mindcraft Redux page
Apache Week 'zine
Basically, I just winged it. My site started out with maybe ten thousand hits per day, but quickly (over the course of two years) ramped up to about 5 million hits a day. I just hacked together some Perl scripts, and when I need to make changes, I just try 'em out on the production server. Who needs beta testing? If there are performance problems, I just buy faster hardware. If there are stability problems, people are understanding, after all, I *am* using Linux.
Sincerely,
Rob Malda
Not only that, turn them off. (AllowOverrides None, IIRC) If you simply don't use them but have them enabled anyway, you pay the price WRT all the stat(2) calls the server does looking for them.
This is all IIRC, but I usually have a good memory. Then again, I did just wake up.
Basically, it comes down to: Postgres is much more complete (it has more of the SQL spec implemented -- transactions, etc.). MySQL is much faster. It all comes down to how you expect to use it. If you are going to be doing complex joins and transactions and such, MySQL probably won't cut it (yet), otherwise, MySQL (most definitely!) makes up in speed what it lacks in features.
There's obviously more to it than that, but I'm not aware of any specific comparisons...
WWJD? JWRTFM!!!
rodent...
Tactical nuclear weapons are a viable alternative!
As for optimization, definately check your queries and always use keyed fields and == queries. Doing like queries will kill your performance to being unusable on decently large tables (>100k records). Definately read the MySQL docs concerning RAM usage and the various switches to optimize it's RAM usage. That is extremely important.
As for Apache, don't use .htaccess at all costs and only compile in required modules. Also check the tuning FAQ mentioned above.
rodent...
Tactical nuclear weapons are a viable alternative!
"HTML Programmer", eh? Talk about a skill that will be obsolete in 20 years, when we're all using XML and have WYSIWYG XML editors...
BTW, programmers write programs, not text. So "HTML Programmer" is a misnomer in the first place -- that should be "HTML page creator".
-E
Send mail here if you want to reach me.
0) If you have LOTS of RAM, compile Apache, MySQL and optionally Squid with EGCS+PGCC at -O6. The extra speed helps.
1) Guesstimate the number of simultaneous connections I'm likely to have.
2) Guesstimate how much of the data is going to be dynamic, and how much static.
3) IF (static > dynamic) THEN install Squid and configure it as an accelerator on the same machine. Give most of the memory over to Squid, and configure a minimal number of httpd servers. You'll only need them for accesses of new data, or data that's expired from the cache.
4) IF (static 5) If you've plenty of spare memory, after all of this, compile the kernel with EGCS+PGCC at -O6, but check it's reliability. It's not really designed for such heavy optimisation, but if it works ok, the speed will come in handy.
NOTE: Ramping up the compiler optimiser flag to -O6 does improve performance, but it also costs memory. If you've the RAM to spare, it is sometimes worth it.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Yeah, I posted the Rob Malda as Anonymous Coward question - mostly because I wanted clarification on it. It seemed very odd, and yes, I knew it was probably just some anonymous lame-oid. Now ... to the rest of you (mostly A-Cowards as well) who choose to harsh on me because I merely question this ... Screw you.
... I would put my IQ points up against any of yours any day of the week. And yes, I did list HTML on my resume as a programming language, because in the positions I would be interested in ... there's very little reason to treat it otherwise. By profession, I'm a researcher, not a programmer.
Trust me
So, as I said, to everyone who has so little better to do than scan Slashdot waiting for opportunities to flame others (under Anonymous Coward status), screw off.
Cordially yours,
David
Share data. Share code. Share ideas. Share the wealth.
http://stockfilter.org
First, PLEASE don't point people to that horrible howto... as soon as Linus will accept the real software raid versions (and howto) available over at:
http://metalab.unc.edu/pub/Linux/kernel.org/pub/land
http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/Second, realize 0+1 (typically 1+0, or RAID 10) only gives you half of total physical space in effective space.... sometimes you can afford that, sometimes you can't... and you still generate the scsi bus loads of the full drive set :)
In the very typical (especially in these situations) case of reading the databases, it's worth agreeing that 1+0 becomes 0+0 (since you can split reads across a raid1, assuming no failed drives)
Last, as a side note to the mysql part, try to use isamchk (if the db server can have any down time) for pre-sorting your database instead of doing the sorting as part of your SQL
Various ramblings
Some other ideas, are to split image serving onto it's own apache, not necessarily it's own box. This apache can be completely pared down to absolute minimum modules, since all it will be doing is serving up static images. It also let's cache be used efficiently, since mostly the common images will be stored. As opposed to common images contending with common text files for cache space if images and content are served from the same apache.
Also, what are you using in apache to create dynamic pages and connect to the db? Use long running processes where possible, which means pick mod_perl, php, fastCGI, servlets, etc... over plain cgi scripts. This will save you lots of cycles and also let you have persistent db connections. Always a very good thing.
Taking the splitting out of machines to the next level, you could also try splitting all of your dynamic content to it's own machine, mod_proxied through your front end apache's. This makes the front ends very small since they barely need any modules installed at all. It also gets some extra performance out of your dynamic content apaches. Of course you're running a lot of boxes now. :)
Read this if you're running mod_perl. And read this to optimize your db.
I work in a research lab that does a lot of databases on Linux. We started off with msql and then graduated to mysql. We were initially running redhat with msql and slowly moved to Debian, since we felt it was a more stable server distribution. Also it was more configurable, and we were able to tweak almost anything in the system to it's limit. Recently, we moved to Oracle 8i, but we kept our mysql around.
Some of the thing you might need to know. If your going to do some serious databases, I recommend you spend more money in faster harddisks (SCSI preferble, multiple disks (oracle runs very nicely with the database spanned over 3-4 disks and the program running on another disk -- partitions wont do ). Have a generous amount of RAM and swap. If your making this a database box, dont use it for anything else. Even hosting a web server is not a good idea (As far as I'm concerned). Use WebDB if you like and host the database box seperately with just the database running as the main application.
Make sure you have a stable kernel. Make sure you have a secure system. Use ipchains to block out anything but local and remove all telnet and other daemons. Security is something a lot of people forget when making large databses.
Make sure you make daily, if not hourly backups (based on how sensitive your data is). RAID is a good way to keep your system running. Also if your database is web based, you might need to have 2 or 3 boxes set up identically and databse queries being distributed over all of them.
With Oracle, read everthing, they have a lot of tweaks listed on their pdf files and documents that come with the dist. Read all of them. Some tweaks are to the kernel. So pick a good stable kernel and stick to it. Forget about monthly kernel upgrades. I recommend yearly or every 6 months kernel upgrades. Software wise, if your doing Oracle 8i, make sure it's a glibc2.1 system (RH6 and debian potoato (we use potato, even though it's unstable, it lets us tweak the system and gives us the most familier interace ).
On mysql, it might help to read some of the online tweaks, also it might be a good idea to compile the server yourself, instead of using the one that came with your dist. Or compile it and copy it over what came with your distribution. Dont use msql unless there is no other way to do it.
And good luck.
--
The caveat to this, of course, is that you must know how to set up your database right. I recently had an opportunity to play around with a fairly large db (upwards of 400,000 records) on mySQL. The records represent people, and some of the fields are birth month, birth date, last name and first name. I wanted to select las and first names for people who were born today. So, with no indexes, the query selected about 600 records, and took 11.8 seconds. Yes, that's right, 11.8 seconds. I was floored! Here's me thinking "mySQL's fast! It'll work great!" Well.
So then I went back through and indexed (birth month, birth date), checked that I had done it right with EXPLAIN, and ran the exact same query again. This time it took 0.8 seconds. A total time savings of 11 seconds. I learned an important lesson that day... Always index everything you're going to use as a key! With this in mind, mySQL is indeed damn fast, and low overhead.
Now, the other thing I can't really speak to is reliability. mySQL doesn't really support referential integrity, and I guess it's up to you whether you need it or not. I've seen my share of M$-trained database folks who use CASCADE as a cheap crutch to paper over their bad code. Rather than write queries that do what they really wnat them to do, they just spend the extra overhead to have CASCADE's do it for them. I've also seen times where this was crucial to a db's function. Either way, it's something to consider. I've also never seen mySQL handle failure, or had to rebuild it after one. Whatever you usde, your strategy should account for this possibility, in any case.
There is no K5 cabal.
I am not the real rusty.
well of course PHP will fun faster than perl *as CGI*. use mod_perl and be happy. PHP is a pretty close imitation of perl, but perl has a much more complete, mature and flexible programming environment. the main advantage of PHP is that, by being simpler and smaller, it's easier to start working with. for a complex site, where you want a fair amount of real programming on the backend, I'll take perl over PHP anyday. mod_perl has many modes of operation; the simplest (Apache::Registry) emulates CGI scripting without much of the fork/compile overhead. Embperl lets you put perl inside the html (this is the way PHP does it too), and you can write complete handlers if you want too.
with linux caching, this isn't really necessary. with enough memory, the whole thing will be in memory anyway
Synergies are basically awesome, and they're even better when you leverage them. -PA
I haven't read all of the previous comment, so it may well be that this has been posted before.
Okay, this is how I generally do it. First of all, I suppose that you're using Perl, so these tips are for a Perl/Apache/MySql environment.
1) Use mod_perl so that your script doesn't neet a whole perl compiler for each separate instance in memory. The performance boost is just incredible...
2) Use Apache::DBI. It will prevent your script from connecting and disconnecting your DB each time it's called and rather use a persistent database connection. Great for performance.
There are some other tweaks that you can do. If you're interested, just let me know...
Wintermute
Martin May
There are ready-made solutions out there such as E-smith; you can download a cd image (or even buy the cd), and it'll install the system with extras built in; it's designed to be an 'out-of-the-box' sorta thing.
Our company uses Apache, MySQL, and PHP extensively (and exclusively). You can't beat the price/performance ($0.00 / excellent == great value). Thorough our research, we settled with the following combination:
- Web Server: FreeBSD 3.2-STABLE with Apache 1.3.9 / PHP 3.0.9 on a PII-400 w/128 Meg RAM, IBM 4.55G U2W Drive. Due to FreeBSD's proven track record for Web/Network performance, stability, and security (e.g. Yahoo, wcarchive, and others), it's a natural.
- SQL Server: Linux 2.2.x with MySQL 3.22.25 on a PII-400 w/256 Meg RAM, IBM 4.55G U2W System Drive and a Mylex AcceleRAID 250 w/4 IBM 4.55G U2W Drives in a RAID-5 configuration. Linux was the obvious choice when considering MySQL performance and driver availability wrt RAID controllers.
Optimization suggestions:- Apache: Ensure you have adequate spare servers to handle the connections (StartServers, MaxSpareServers, MaxClients, and MaxRequestsPerClient in the config); nothing sucks more than clients not being able to connect. Also, if you are using embedded script of some sort (PHP, Perl, etc.), use modules compiled into Apache (mod_perl, etc.); this should significantly increase speed and decrease the overhead of reloading the module for each access.
- MySQL: Tweak the applicable setting as appropriate. We increased (usually doubled in most cases) the following: Join Buffer, Key Buffer, Max Connections, Max Join Size, Max Sort Length, Sort Buffer, and Sort Buffer). If possible, depending on the amount of data, get as much memory in the system as possible. If the OS can maintain frequently used data cached, disk access won't be required which significantly increases the speed of queries, etc. In addition, get rid of that pre-compiled MySQL and compile it yourself. If possible, optimize using egcs/pgcc for your platform. Also, compile mysqld statically; this will increase it's memory overhead a bit but can increase it's speed by 5 - 10% by not using shared libraries.
- Storage: For optimum speed, use SCSI (of course). For our data, we require RAID 5 for redundancy. If that is not required, RAID 0 (striping) can be used for increased speed. The optimal way is to use hardware RAID (external RAID or RAID controller). Luckily, Linux has drivers for quite a few different RAID controllers that are available for a reasonable price.
- Linux: Beware of Redhat's security problems, disable all unnecessary services, et. al. Seek out security-oriented and Linux performance-tuning sites for more suggestions.
- General: Don't skimp on hardware. A cheap component, be it a drive, network card, motherboard, or whatever, if it fails, will cause unrecoverable downtime. We decided on Intel NL440BX boards (serial console/BIOS support is nice), PII-400's, and IBM SCSI drives in both boxes. If one box were to have a catastrophic failure, the other is able to perform both webserver and SQL server functions if necessary. We can also simply replace a failed component with one pulled from a similarly-configured non-production (test) box, or just swap boxes altogether.
Both Apache and MySQL have good sections on performance tuning. Do not be afraid to RTFM.Any questions/comments can be directed to me. Flames directed to
Memory is pretty damn cheap -- I've been running my web server off a ramdisk. Archive your web server in a tar ball then just expand it onto the ram disk... just don't put your db there :-)
Postgres is totally free and supports transactions. It might not have the performance of Oracle, but it doesn't have the cost of Oracle either. :-)
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
Of course, neither of those sites is particularly busy and I'm more proud of the management utilities than the sites themselves, but that's par for this course.
The thing I did learn was that using perl and CGI is quite clumsy for this sort of thing. I eventually switched to PHP3 because everything goes together much faster. I don't know what it does to the performance, but since both sites are being served from the world's slowest Web server hardware (the database server is a 486dx2-80 and the database server has the HNBA website on it but the C Bookstore Web server is the 5x86-120 that I use for most of the four dozen or so domains that I host) and performance is not that big an issue, I'm not all that worried. It'd be nice if it got some hits, though.