Ask Slashdot: Optimizing Apache/MySQL for a Production Environment
treilly asks: "In the coming weeks, the startup company I work for will be rolling out a couple of Linux boxes as production webservers running Apache and MySQL. Management was quick to realize the benefits of Linux, but I was recently asked: "Now that we're rolling out these servers, how do we optimize out of the box RedHat 6.0 machines as high performance web and database servers in a hosting environment"?
separate partition or separate hd?
separate disks, controllers, partitions wherever possible.
Hmmm... putting files on different controllers and disks but the same partition. Now that's a neat trick.
Uh, no dude. That's just somebody being funny. And like most funnicisms, there's a large grain of truth in it.
Did you check out his home page? He's a Ph.D student who lists "HTML" as a programming language.
Thanks for your information-free post...naturally leading to *this* information-free post... injecting noise into discussions since 1993
This is very easy to say. I could just as easily say "Format your harddisk, install NT Server + 4 NICs". What hard data do you have to back this up? Any? Or are you another wannabe hacker blowing smoke?
While this tip isn't specifically tuning, more likely optimization. We recently had to go through and restructure service to ensure rock solid reliability. Our best effort at this was to split the web serving, from the MySQL engine, from the actual DB. We came up with a structure that stuck two web servers on the front end, backed by two MySQL servers that talked to two replicated RAID arrays way on the back end. The whole point of this was to a) allow us to stick a load balancer on the back end, and b) ensure maximum performance for each component. It's not the cheapest way to do it, but it certainly kicks ass from a performance and reliability stand point...
There are a couple of experimental systems about which do this kind of thing. For example prol does aggressive caching (but is not db-backed). I once heard somebody say that Roxen also does caching, but I have not investigated that.
Yeah, right. There's a reason FleaBSD is not as popular as Linux. It's spelled p-e-r-f-o-r-m-a-n-c-e.
Scaling up Apache is as easy as putting out a web server farm and setting up DNS round-robin entries for each web server. If you then add in use of the Fake package, it become trival to create a high-availablity enviroment.
Has anyone tried this?
The point being, the idiot that mistook an anonymous poster for Rob Malda was a Ph.D candidate...a slap in the face to the notion that "Dr." in front of your name guarantees in intellectual giant. Possibly this hit a little too close to home?
I don't see why this comment got marked down... It just said that PHP based sites go together a lot quicker and easier than Perl CGI ones, which is true.
I forgot to add this:
One of the most expensive operations is the initial database connection. Use persistant connections wherever possible. (PHP is very nice for this)
Jerry
I've been using PostgreSQL and Apache on redhat for a while (works great, good perf and stability) and the most important performance measure was separating the cgi/static server from the database server. I wrote a little library object on the cgi server that made a network connection to the database server (running PostgreSQL and Apache/mod_perl with normal Pg connections) so that the Apache on the db server provided a natural connection pool. Send the SQL statement back, get the results forward, and ta-da--automatic load-balancing across two machines and connection pooling at the same time. I also thought about splitting database tables across two (or more) database servers, and having the cgi server's library object run the two searches in parallel (perl threading, ain't it great) and combine the two incoming results (doing sorting and such). Anyway, there are lots of cool things to be done.
What kind of idiot uses Software RAID on a production box where they want performance. Sheesh
Have each Apache child fork a cgi process written in C/C++ linked with "libmysqlclient.a" (included in MySQL) to query the DB. Once the cgi is working, debugged and memory leak free, and only if you need the performance, convert your cgi into an Apache module. Stephen schan_ca@rocketmail.com
It's already been proven over and over again that Linux with Apache and MySQL makes for a very poor web server platform. Just look at how poorly slashdot.org stays up. Therefore you should install something proven to be fast such as Windows NT with SQL Server 7. HA!
yeah right, that's the first thing on every startup's mind... welcome to the commercialization of linux pal, it's downhill from here.
If the site is read-only (static data, like a news site or something), then RAID is a waste of time and money.
This book pretty much says it all and is much more cetailed and investigative than the arguments here.
Logo kicks ass! What are you talking about. I challenge you to a duel of the logo. I am gonnd PEN DOWN with you buddy!
The best thing you can do is format the drive, and install NT. Much better performance for dynamically built HTML and database connections. Has been proven by many different benchmarks. For a database, try Oracle 8i.
SlashDot can keep it up about as long as a high school virgin. It's pretty damn sad. And when it is up, it's slow as shit. Why do people want to use Apache & MYSql again...?
Okay, it's not a RAID stack, a stack implies that you want your data to come out in LIFO order. A RAID is, like the acronym stands for, an array.
AOLserver has always had very good database integration. With v.3 now open source this is a very smart choice for database backed web sites. There are drivers for a number of databases (including postgres). Scripting is done with TCL or ADP pages (similar to PHP and ASP).
[warning questions from linux newbie] i putting together requirements for new web/db hardware and would like to eventualy move our production env to linux. how well does linux take advantage of mulitprocessor hardware, or support multithreading in applications such as apache and other database packages? how does spawning a process differ from spawning a thread performance wise, and reliablity wise? [less deadlocks vs process startup overhead..etc?]
I'd say REDHAT's no more good for high performance, reliable servers.
Debian for general distributions or look for a distros that are optimized for your purpose!
- from a debian user
generally database interface in PHP3 is one of
the most idiotic designs I ever seen
PHP does not have any general database interface, thus support for it cannot be bad.
PHPLIB's generic db layer is nice. Get it here
It must be that crappy Apache web server that most of the Internet is using.
I'm sure the hardware manufacturers & retailers love hearing comments like that.
hotmail dot com
If it was a really good solution, you wouldn't have to complain about Bandwidth. Did you see windows2000test.com complaining about bandwidth? No!
Linux doesn't work on multiprocessor hardware. You should install NT instead.
But, you're right. They didn't blame it on bandwidth. But then, they do have to leave a marketing opportunity for a subsequent release of Windows.
"Windows 2003 will solve *all* of your stability problems"
Probly why the busiest site on the internet (cdrom.com) runs Linux. doh, wait it runs FreeBSD. On *one* computer. Not 20 or 10. 1.
>using perl and CGI is quite clumsy for this sort
>of thing. I eventually switched to PHP3 because
>everything goes together much faster.
Well, while PHP (or indeed any scripting language) reduces lead time, it makes it easier to make some elementary mistakes, such as not separating code, formatting and content properly, or using good programming style. I'm not saying it's not possible to do it right, just that it doesn't seem very easy to do it right in PHP.
> world's slowest Web server hardware (the
> database server is a 486dx2-80
Now come on! Until quite recently, our web/db server was a 486dx2-50, but then we gave those nice people at DNUK.com a call...
Oh, and this is anonymous so that I get flamed to hell, but don't get to hear about it at work!
> Memory is pretty damn cheap currently SDRAM is getting more and more expensive...
You can write an emulator of a (memory limited) Turing machine in COBOL and LOGO. This qualifies them as "full" programming languages IMHO.
> And before you start some "it's not real programming unless it's
> compiled" rant, tell it to a perl hacker...
Actually perl does a "run-time-compilation" before it executes something...
s/perl hacker/shell skripter/ for a better example.
> Due to FreeBSD's proven track record for Web/Network performance, stability,
> and security (e.g. Yahoo, wcarchive, and others), it's a natural.
The best examples are missing: www.apache.org runs apache (haha!) on FreeBSD and don't forget the now MS-owned company that sends you so much spam...
Cool! Someone with a clue on Slashdot!
It's great to see someone makeing use of the strengths of *BOTH* Linux and *BSD instead of staying in his small world.
i've noticed that a lot of posts to this thread emphasize good database design and query optimization as the biggest factors in speed/server load of something like this. which brings me to my question. I'm currently working on a traffic exchange style script set with php3 and mysql. Basically what this thing will do is track unique visitors, then for each hit that comes from a domain (yahoo.com, etc), it will record it, then send them a hit back. I'm writing it so that you can send a ratio of hits back (70%, 150%, whatever) to any given site. The way i've written this thing is as follows: when the user hits the main page, the referring url is stripped of everything but the domain and extension (foo.com), then the client IP is checked against a table containing IPs from the last 24 hours. If it isn't stored in the table, then the query "update stats set hitsin=hitsin+1 where domain='foo.com'" is executed, giving foo.com a hit in. Bear with me here :) What i'm thinking about having it do is, a seperate perl script would run every few minutes as a cron job, and determine who needed hits out, based on their ratio and number of hits in. Then the script would fill a table with the urls that need hits out, one row per hit out needed. THe out script would then pop values off the top, using a select.. limit 0,1 and then delete the one it just selected, after sending the user there. My questions are: a. this probably isn't the most efficient or best way to do this. What would be? or what could i change to help this thing run more efficiently? b. on the IP tracking, would it be faster to make the ip data in the table the primary key, and just try to insert an IP every time, then determine whether or not it was already in the db based on the result of the INSERT? thanks
A lie. Pure and simple. I wonder how much M$ pays these trolls.
Hey so how exactly did you fix this? I just noticed our mysqlds are running at Nice 5, and it's a dedicated db machine. But I can't find where in safe_mysqld it sets this. Is there a command line option to set the nice-ness?
I just start it with 'nice -n -20 bin/safe_mysqld &', that way it ends up running at -15. If you start it with 'nice -n -5 bin/safe_mysqld &' it should run at 0. Does anyone have any idea why this is set up the way it is?
Why not, just glue them together with RAID-0 and off you go.
Man, you're radical. Come here so I can jizz all over your face.
This is the way professionals do it. I did some Windows development recently and while the environment and tools were buggy and quirky the volumes of information with VC++ 6.0, the Web and on the MS site were astounding. As a Windows newbie I was able locate everything I needed very fast. As a seasoned Linux/Unix person I have had trouble finding well orchestrated info when I need it. This is an area that the Linux community should be focused on. As for you, your comment only rings of immaturity and it's unlikely that you know anything about "commercialization" or startups. Just keep your bitterness to yourself.
What a moron. If you're going to flame someone, at least get your flame technically correct. HTML is *NOT* a text formatting language. HTML is intended to describe the structure of a document, not its layout and presentation.
People trying to use it as a formatting language is why we see so many fucked up web sites.
If you want to speed up perl, take a look at Velocigen. For serving static content, see kHTTPd.
If you have several servers that you do things like load content off of, and aren't using round-robin DNS, then you can speed up what the customer sees by cutting the number of DNS lookups. Use IP addresses in your URLs (URLs on servers that you control the IP of only!).
I have yet to look into support for compressed HTML, but I can imagine for some things (mailing list archives) it would be a real blessing.
According to the benchmarks on mySQL's page (http://www.mysql.com/benchmark.html), it does very well against the big databases like msSQL and Oracle. Can anyone else verify these, or at least explain how it could of done so well. From the comments I have read so far, it seems like mySQL isn't powerful enough to surpass these giants.
Would you consider HTML with Cold Fusion tags (CFML) a "programming language" ?
Haven't done it with mySQL, but with Postgres/PHP/Apache I allocate 1Mb of RAM for each httpd child, and then enough left over for the entire db in RAM. I also run a separate server to feed images only, off a RAID stack. Make sure that the apache htdocs is on a separate partition and controller than the db files.
My mail server was showing some strange performance problems under high loads which were caused by mysqld (used for authentication) running at nice +5. Apparently safe_mysqld did that behind my back. Under high loads mysqld would be put on the backburner.
The Apache/MySQL/RH is a common combination that will continue to grow in popularity for small and medium size installations. Judging from the responses that I have read no-one has performed a serious analysis of the performance issues of this trio. Most of the suggestions are useful first-pass ideas but there are likely other specific tuning approaches that can be used.
I ask that you document your development specifically focusing on any novel solutions that you found that increased performance. A faster CPU doesn't count (sheesh). Also, put this info together in a concise readable format and provide it to the Apache site or the Linux Tuning site (forget the URL at this moment.) It's very important that work of this type be formally documented and accessible.
You'll probably need to tune your configuration on several levels: hardware, OS, application and sql/other.
m ance.html#Performace for a list of parameters to tune Mysql with. You'll either need to talk with TCX about using the available optimizations, consult with someone else who does, or spend a lot of time with trail and error, since these optimizations are very dependant on your hardware configuration and what type of work you'll be doing with the databases (ie. write intensive, read intensive, or both)
Hardware Tuning:
- Use a caching raid controller fully populated with cache configured for raid level 0+1.
-Use IBM or Seagate 10000 rpm SCSI drives with lot's of cache.
- Consider multiple SCSI cards (or channels) to separate the OS + logs, indexes and data files on to separate raid arrays.
- Also strongly consider using separate web and database servers, so each can be fully optimized for its job.
- Obviuosly use as much ram as you can afford. (preferably 100 or 133 mhz)
-Uses multi-processor computers for the database and web servers.
- Connect the web and database servers using a back end network, separate from the internet connection.
OS Tuning:
I'm not terribly familiar with tuning the Linux OS, but I suspect that there are many resources already available.
In general you'll want to:
- Optimize the block size on your raid array's for mazimum performance (trial and error using bonnie or the like)
- Optimize the amount of memory used for cache.
Application Tuning:
- Look at http://www.mysql.org/Manual_chapter/manual_Perfor
- Look at http://www.apache.org/docs/misc/perf-tuning.html for information on tuning apache.
I would first suggest using PHP, but barring that, I would definately use mod_perl. There are probably a lot of other sources for tuning apache available on the Internet.
A suggestion: optimize the number of server children with the number of availble processors.
SQL/other Tuning:
Understanding how to properly build tables and indexes is somewhat of an art, but you can really make or break the whole site with proper use of SQL and indexes. I'd either spend some time learning table/index design, and coding sql for performance, or consult someone who knows.
Hope this helps a little bit.
Jerry
jerry@bellnetworks.net
Also Dan Kegel wrote an interesting web page in response to the whole Mindcraft NT/IIS vs. Apache/Linux fiasco and on that page are several detailed measures to improve Apache's performance under Linux:
Dan Kegel's Mindcraft Redux page
Apache Week 'zine
Basically, I just winged it. My site started out with maybe ten thousand hits per day, but quickly (over the course of two years) ramped up to about 5 million hits a day. I just hacked together some Perl scripts, and when I need to make changes, I just try 'em out on the production server. Who needs beta testing? If there are performance problems, I just buy faster hardware. If there are stability problems, people are understanding, after all, I *am* using Linux.
Sincerely,
Rob Malda
Not only that, turn them off. (AllowOverrides None, IIRC) If you simply don't use them but have them enabled anyway, you pay the price WRT all the stat(2) calls the server does looking for them.
This is all IIRC, but I usually have a good memory. Then again, I did just wake up.
Basically, it comes down to: Postgres is much more complete (it has more of the SQL spec implemented -- transactions, etc.). MySQL is much faster. It all comes down to how you expect to use it. If you are going to be doing complex joins and transactions and such, MySQL probably won't cut it (yet), otherwise, MySQL (most definitely!) makes up in speed what it lacks in features.
There's obviously more to it than that, but I'm not aware of any specific comparisons...
WWJD? JWRTFM!!!
My fhttpd with combination of MySQL and PHP can be considered, too -- it allows some configuration options and optimizations that Apache doesn't provide -- you can limit the number of connections to database, use separate userids for sets of scripts, etc. If you want even more performance, program in C or C++ can be written as its module, and the API is much easier to use than one of Apache.
Contrary to the popular belief, there indeed is no God.
rodent...
Tactical nuclear weapons are a viable alternative!
As for optimization, definately check your queries and always use keyed fields and == queries. Doing like queries will kill your performance to being unusable on decently large tables (>100k records). Definately read the MySQL docs concerning RAM usage and the various switches to optimize it's RAM usage. That is extremely important.
As for Apache, don't use .htaccess at all costs and only compile in required modules. Also check the tuning FAQ mentioned above.
rodent...
Tactical nuclear weapons are a viable alternative!
"HTML Programmer", eh? Talk about a skill that will be obsolete in 20 years, when we're all using XML and have WYSIWYG XML editors...
BTW, programmers write programs, not text. So "HTML Programmer" is a misnomer in the first place -- that should be "HTML page creator".
-E
Send mail here if you want to reach me.
FreeBSD is Apache's main development platform? I don't know where you got that from. It's not like the Apache developers sit huddled around a single box and develop in a dark room. I don't think fbsd is the prevalent platform among Apache developers.
On the database side theres a feature of mysqld called --log-update. Call it using mysqld --log-update=/usr/mysql/update_logs/update. This will create a log of eveything that changes in your DB and can be reinserted back into the mysql monitor. To go along with this everytime you call `mysql --flush-logs` a new update file will be created as update.# - where # increases for each call. At this point there are quite a few scripts written to insert this log file into another DB - most of them use perl DBI.
To increase the performance of your setup there are several options noted in the mysql manual. But none of them will do a whole lot of good if the queries and tables you construct are poorly designed and indexed.
Depending on what scripting language you use theres probobly a way to compile it into apache. Whether it be mod_perl,pyapache or PHP. I would plan on doing this. A good way to speed up your system after this is to run 2 httpd servers.
The first server compile plain apache with mod_rewrite and proxy support, the second server compile in your application support. If you put all your applications in one directory you can easily proxy to them with proxypass.
ex. proxypass /perl server:88/perl
This way things like images and html will be served by a webserver that only takes up 400-500K instead of one that could take up to 10M-20M depending on how many scripts and libraries are in memory(Of course some of that is shared). When your server gets hit hard you'll probobly notice having maybe 5 - 10 times as many regular servers as there are application servers this way.
A more advanced thing with this setup is to utilize your backup server. Its will take a little work but you could have apache proxy to a list of application servers that exist in a config file, and have this config file altered based on system availabilty. At this point though it may just be easier to get localdirector, unless your organization is really strapped for cash.
I've thought of this approach too. It should be very fast once you're done - maybe faster than any other way. But it seems to me the going will be very slow. C/C++ is much more work to write than Perl, PHP and the like. And you'll need to learn alot about the Apache module API. I looked at this API in some detail and realized how complicated it might get.
You could of course start it out as a module - forgetting the CGI version. If you're leaking memory during development, keep 'MaxRequestsPerChild' at a very low value - 10 or maybe 1 even. Then increase it to 100 or whatever when your leaks are under control.
-- Mike Greaves
So? I've known people who consider LOGO, or even COBOL a programming language.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
0) If you have LOTS of RAM, compile Apache, MySQL and optionally Squid with EGCS+PGCC at -O6. The extra speed helps.
1) Guesstimate the number of simultaneous connections I'm likely to have.
2) Guesstimate how much of the data is going to be dynamic, and how much static.
3) IF (static > dynamic) THEN install Squid and configure it as an accelerator on the same machine. Give most of the memory over to Squid, and configure a minimal number of httpd servers. You'll only need them for accesses of new data, or data that's expired from the cache.
4) IF (static 5) If you've plenty of spare memory, after all of this, compile the kernel with EGCS+PGCC at -O6, but check it's reliability. It's not really designed for such heavy optimisation, but if it works ok, the speed will come in handy.
NOTE: Ramping up the compiler optimiser flag to -O6 does improve performance, but it also costs memory. If you've the RAM to spare, it is sometimes worth it.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Is this for real? Rob Malda posting as an Anonymous Coward? What's up with that Rob?
Share data. Share code. Share ideas. Share the wealth.
http://stockfilter.org
Yeah, I posted the Rob Malda as Anonymous Coward question - mostly because I wanted clarification on it. It seemed very odd, and yes, I knew it was probably just some anonymous lame-oid. Now ... to the rest of you (mostly A-Cowards as well) who choose to harsh on me because I merely question this ... Screw you.
... I would put my IQ points up against any of yours any day of the week. And yes, I did list HTML on my resume as a programming language, because in the positions I would be interested in ... there's very little reason to treat it otherwise. By profession, I'm a researcher, not a programmer.
Trust me
So, as I said, to everyone who has so little better to do than scan Slashdot waiting for opportunities to flame others (under Anonymous Coward status), screw off.
Cordially yours,
David
Share data. Share code. Share ideas. Share the wealth.
http://stockfilter.org
Mount the /home/httpd filesystem with the noatime mount option, then there will be no writes generated from read-accesses, and your ext2fs will effectively work as a ram-disk, if you have enough memory. Only, it's a helluwalot easier to just change stuff where it resides, and know it will be written to disk, instead of back-forth-copying tar images...
Second benefit: if you really get low on memory, you're fucked with a 1 Gig RAM disk, whereas the disk-cache will quickly be thrown away and used for whatever memory hog you have running.
ram-disks are good for booting over-modularized kernels _only_.
When I set up a production environment (regardless of operating system chosen), the first step is always to have policy and change control.
Change Control
You must have change control or you will suffer downtime. Downtime represents a transaction rate of 0 trans/sec, which is clearly unacceptable.
Development and Acceptance Test
You must have a separate machine for development, and another machine for acceptance test. Of the two machines, only the A.T. machine must be identical to your production server. Otherwise A.T. simply cannot replicate the environment you're going to test, and thus any testing is at best misleading, or at worst, completely invalid.
You must create a set of repeatable build instructions that takes you from a fresh blank machine to a stable, reliable, working production system. And you should have a set of tests that thoroughly gives the resulting systems a complete workout, including sustained load, boundary condiditions (such as empty rows), and attacks against the system whilst trying to continue to process transactions.
Finally, the best advice I can give you is don't skimp on reliability and availability. Buy RAID with hot rebuild. Buy a server with redundant PSU's, and not a handmade machine. Buy an additional NIC per machine, and put that on a different switch - dual path everything.
In terms of SQL and web based stuff, from a security standpoint, it's always advisable to have your SQL server behind a firewall (or at least on a separate private network).
In terms of speed, I've always found that having enough RAM to allow several outer joins to complete in RAM really helps. As someone else mentioned, it's a good idea to index columns you select on a regular basis.
Make sure you can dump the database online - stopping the dbms whilst a dump takes place is unacceptable; if it takes 30 minutes, that's reduced your availability from near 100% to 97.9%. That's bad.
Good luck!
Andrew van der Stock
I can't agree more on the Database.
I have no experience in Cold Fusion so I'll decline comment on that.
I'd caution anyone trying to Netscape's web servers on HP though. In my experience it likes to run away alot. I've been away from the environment with that configuration for a while and maybe they've upgraded OS or servers, but we were getting it hogging 90% of our CPU several times a day.
First, PLEASE don't point people to that horrible howto... as soon as Linus will accept the real software raid versions (and howto) available over at:
http://metalab.unc.edu/pub/Linux/kernel.org/pub/land
http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/Second, realize 0+1 (typically 1+0, or RAID 10) only gives you half of total physical space in effective space.... sometimes you can afford that, sometimes you can't... and you still generate the scsi bus loads of the full drive set :)
In the very typical (especially in these situations) case of reading the databases, it's worth agreeing that 1+0 becomes 0+0 (since you can split reads across a raid1, assuming no failed drives)
Last, as a side note to the mysql part, try to use isamchk (if the db server can have any down time) for pre-sorting your database instead of doing the sorting as part of your SQL
Various ramblings
Such as Netscape/Cold Fusion/Oracle/Sun?
Besides not being able to call on the experience of all of you guys when the going gets tough, what are the other drawbacks besides the obvious (MONEY)?
MySQL is not a solution for me. It lacks many features that Sybase or Oracle provide (can you say TRANSACTIONS?). Netscape and Cold Fusion have better integration of security. Has a benchmark been done on PHP vs Cold Fusion? PHP seems to be able to handle Cold Fusion's role pretty well according to PHP's site.
Is the answer truely a mish mash of the both? Pay for Netscape for the SSL and Oracle for the STUD (I still like Sybase better) of a database that it really is, but go freeware where you can?
Just looking for a couple of good opinions.
This is not the sig you are looking for...
Some other ideas, are to split image serving onto it's own apache, not necessarily it's own box. This apache can be completely pared down to absolute minimum modules, since all it will be doing is serving up static images. It also let's cache be used efficiently, since mostly the common images will be stored. As opposed to common images contending with common text files for cache space if images and content are served from the same apache.
Also, what are you using in apache to create dynamic pages and connect to the db? Use long running processes where possible, which means pick mod_perl, php, fastCGI, servlets, etc... over plain cgi scripts. This will save you lots of cycles and also let you have persistent db connections. Always a very good thing.
Taking the splitting out of machines to the next level, you could also try splitting all of your dynamic content to it's own machine, mod_proxied through your front end apache's. This makes the front ends very small since they barely need any modules installed at all. It also gets some extra performance out of your dynamic content apaches. Of course you're running a lot of boxes now. :)
Read this if you're running mod_perl. And read this to optimize your db.
- if you use the word db in any web application remember that the web server is merely acting like one very voracious user acting on behalf of lots of web site visitors
- use seperate hardware for the web server and for the database - resist the temptation to keep them on the same machine for $$$ reasons
- along these same lines... when available add more db servers that have replicated db's and tables using mysqldump and mysql db put OR statements in your MYSQL_CONNECT on the web page you write to allow it to failover to the db's that you have available when your db servers time out or exhaust the number of connections they can field
- render as much static content as possible using pre processors vs. having it "on the fly".
- recompile everything and look at the pages that cover the flags and considerations at compile time
- never never allow a user to "build" something that remotely looks like SQL in the URI and passed arguments to a CGI or application... if you let them stick malicious JOIN's into the MYSQL_QUERY your db will likely be choked over time
- if you have RAM to use put as much as possible on the web server and if you have fast processors give it to the db before you give it to the web server
- always put LIMIT into your mysql apps if you can get away with it
- write init scripts that will stop and start your db servers if you experience peak loads
- always put a "nice" message in the die statment for your sql connection... something other that "cannot connect to database"... make it something more flowery
:)
I am sure there are more but that's all I could think of right now."You cannot uncook Mushoo pork once is has been cooked" -- wiseman
http://fudge.org
I work in a research lab that does a lot of databases on Linux. We started off with msql and then graduated to mysql. We were initially running redhat with msql and slowly moved to Debian, since we felt it was a more stable server distribution. Also it was more configurable, and we were able to tweak almost anything in the system to it's limit. Recently, we moved to Oracle 8i, but we kept our mysql around.
Some of the thing you might need to know. If your going to do some serious databases, I recommend you spend more money in faster harddisks (SCSI preferble, multiple disks (oracle runs very nicely with the database spanned over 3-4 disks and the program running on another disk -- partitions wont do ). Have a generous amount of RAM and swap. If your making this a database box, dont use it for anything else. Even hosting a web server is not a good idea (As far as I'm concerned). Use WebDB if you like and host the database box seperately with just the database running as the main application.
Make sure you have a stable kernel. Make sure you have a secure system. Use ipchains to block out anything but local and remove all telnet and other daemons. Security is something a lot of people forget when making large databses.
Make sure you make daily, if not hourly backups (based on how sensitive your data is). RAID is a good way to keep your system running. Also if your database is web based, you might need to have 2 or 3 boxes set up identically and databse queries being distributed over all of them.
With Oracle, read everthing, they have a lot of tweaks listed on their pdf files and documents that come with the dist. Read all of them. Some tweaks are to the kernel. So pick a good stable kernel and stick to it. Forget about monthly kernel upgrades. I recommend yearly or every 6 months kernel upgrades. Software wise, if your doing Oracle 8i, make sure it's a glibc2.1 system (RH6 and debian potoato (we use potato, even though it's unstable, it lets us tweak the system and gives us the most familier interace ).
On mysql, it might help to read some of the online tweaks, also it might be a good idea to compile the server yourself, instead of using the one that came with your dist. Or compile it and copy it over what came with your distribution. Dont use msql unless there is no other way to do it.
And good luck.
--
The caveat to this, of course, is that you must know how to set up your database right. I recently had an opportunity to play around with a fairly large db (upwards of 400,000 records) on mySQL. The records represent people, and some of the fields are birth month, birth date, last name and first name. I wanted to select las and first names for people who were born today. So, with no indexes, the query selected about 600 records, and took 11.8 seconds. Yes, that's right, 11.8 seconds. I was floored! Here's me thinking "mySQL's fast! It'll work great!" Well.
So then I went back through and indexed (birth month, birth date), checked that I had done it right with EXPLAIN, and ran the exact same query again. This time it took 0.8 seconds. A total time savings of 11 seconds. I learned an important lesson that day... Always index everything you're going to use as a key! With this in mind, mySQL is indeed damn fast, and low overhead.
Now, the other thing I can't really speak to is reliability. mySQL doesn't really support referential integrity, and I guess it's up to you whether you need it or not. I've seen my share of M$-trained database folks who use CASCADE as a cheap crutch to paper over their bad code. Rather than write queries that do what they really wnat them to do, they just spend the extra overhead to have CASCADE's do it for them. I've also seen times where this was crucial to a db's function. Either way, it's something to consider. I've also never seen mySQL handle failure, or had to rebuild it after one. Whatever you usde, your strategy should account for this possibility, in any case.
There is no K5 cabal.
I am not the real rusty.
HTML is only a Hiper textual markup language - you don't do logic operations with it (til now) - it's not a PROGRAMMING language. Just text formatting language!
((HTML == text_formatting_language)!=programming_language)
You can't put logic on it. Not without JavaScript, etc....
What about other (free) HTTP servers?
-rozzin.
While it is acceptable for some minor stuff there is no point in using it once you hit something larger. I wonder why people don't use Sybase - it is free for production, very nice database that can handle a lot of stuff.Extremely easy to use, programm - there is excelent PHP3 support (well , as good as PHP3 offers - generally database interface in PHP3 is one of the most idiotic designs I ever seen)
with linux caching, this isn't really necessary. with enough memory, the whole thing will be in memory anyway
Synergies are basically awesome, and they're even better when you leverage them. -PA
- There's no way to have a well tuned system here if the db isn't well tuned, especially as the db grows in size. Make sure that all of your queries are as efficient as possible. Check to make sure that queries are against indexed columns whenever possible. Use the "explain" feature of the server to check the complexity of the queries you're passing.
- Compared to static page delivery, just about any parsed HTML is Evil and Bad for your performance. Limit db lookups to pages that truly need it. IIRC, Slashdot handles this problem by having the front page re-generated by a cron job every so often. Once created by the script, it's just a plain ol' static page.
- Eliminate the use of directory overrides (via
.htaccess) wherever possible. They're usually not worth it.
At this stage, you've probably heard all this advice before, but one repitition of the obvious never hurt anyone. Hope it's helpful.Or just change webserver.
Some (I use Roxen Challenger) use a single-process approach.
They "compile" and then embed your scripts into the main process, and so you save time because you don't need to fire up the interpreter.
Also, because of the long-lived, single-process approach, you can share the DB connections among your scripts, and most of all cache
Anybody know anything here? It seems to me that mSQL is a bit faster for simpler queries (esp no joins), but that's jsut an observation.
--Andrew Grossman
grossdog@dartmouth.edu
I haven't read all of the previous comment, so it may well be that this has been posted before.
Okay, this is how I generally do it. First of all, I suppose that you're using Perl, so these tips are for a Perl/Apache/MySql environment.
1) Use mod_perl so that your script doesn't neet a whole perl compiler for each separate instance in memory. The performance boost is just incredible...
2) Use Apache::DBI. It will prevent your script from connecting and disconnecting your DB each time it's called and rather use a persistent database connection. Great for performance.
There are some other tweaks that you can do. If you're interested, just let me know...
Wintermute
Martin May
There are ready-made solutions out there such as E-smith; you can download a cd image (or even buy the cd), and it'll install the system with extras built in; it's designed to be an 'out-of-the-box' sorta thing.
Yeah, sure.. damn stupid retard spammers.
"Only one thing is for sure in the Universe: me"
--Corndog
Corndog
If Apache is using mod_perl to serve dynamic content, how does Apache pass the page to Squid? Does the original HTTP request go to Squid on port 80 and then Squid hands the request off to Apache? Should Squid be on a firewall-like machine in front of the Apache machine?
User computer --> Squid server --> Apache server --> DB server?
cpeterso
If your data is so important that you need RAID 5 on your servers, why are you using MySQL, which does not support database transactions? I admit I have not used MySQL, but the impression I got was that it was a little "fast and loose" with regards to reliability. Did you consider any databases that support transactions like PostgreSQL or Oracle? You seem to have really done your homework and I'm very interested to read what you've found out!
cpeterso
Is it faster to put Apache and MySQL on separate Linux boxes, connected via 100Base-T? What sort of performance hit would we get if we put it all on one box? What about one box with double the RAM? Thanks in advance for your help.
Ryan
Yes, I forgot about www.apache.org. Indeed, their server run FreeBSD. In addition, FreeBSD is (or at least was the last time I checked) their main development platform. As for that M$-owned company, I did not include it due to flamers/ac's/etc.
Yes, you can get them precompiled from TcX. I was referring to the binaries precompiled from Redhat (I have no idea if they are TcX's or not). My sole purpose was to emphasize the performance gains you can achieve using pgcc and having the server statically linked. In addition, you may get even better optimization using a newer version of pgcc than TcX used.
OTOH, I'm sure that the binaries from TcX are probably fully optimized and would be the best source if you didn't want to or are unable to compile them yourself.
I'm pretty sure I read somewhere that FreeBSD was their main development platform, but I sure as hell cannot find it now (though I will continue to look). OTOH, according to Netcraft's survey, the majority of their systems are FreeBSD.
Our company uses Apache, MySQL, and PHP extensively (and exclusively). You can't beat the price/performance ($0.00 / excellent == great value). Thorough our research, we settled with the following combination:
- Web Server: FreeBSD 3.2-STABLE with Apache 1.3.9 / PHP 3.0.9 on a PII-400 w/128 Meg RAM, IBM 4.55G U2W Drive. Due to FreeBSD's proven track record for Web/Network performance, stability, and security (e.g. Yahoo, wcarchive, and others), it's a natural.
- SQL Server: Linux 2.2.x with MySQL 3.22.25 on a PII-400 w/256 Meg RAM, IBM 4.55G U2W System Drive and a Mylex AcceleRAID 250 w/4 IBM 4.55G U2W Drives in a RAID-5 configuration. Linux was the obvious choice when considering MySQL performance and driver availability wrt RAID controllers.
Optimization suggestions:- Apache: Ensure you have adequate spare servers to handle the connections (StartServers, MaxSpareServers, MaxClients, and MaxRequestsPerClient in the config); nothing sucks more than clients not being able to connect. Also, if you are using embedded script of some sort (PHP, Perl, etc.), use modules compiled into Apache (mod_perl, etc.); this should significantly increase speed and decrease the overhead of reloading the module for each access.
- MySQL: Tweak the applicable setting as appropriate. We increased (usually doubled in most cases) the following: Join Buffer, Key Buffer, Max Connections, Max Join Size, Max Sort Length, Sort Buffer, and Sort Buffer). If possible, depending on the amount of data, get as much memory in the system as possible. If the OS can maintain frequently used data cached, disk access won't be required which significantly increases the speed of queries, etc. In addition, get rid of that pre-compiled MySQL and compile it yourself. If possible, optimize using egcs/pgcc for your platform. Also, compile mysqld statically; this will increase it's memory overhead a bit but can increase it's speed by 5 - 10% by not using shared libraries.
- Storage: For optimum speed, use SCSI (of course). For our data, we require RAID 5 for redundancy. If that is not required, RAID 0 (striping) can be used for increased speed. The optimal way is to use hardware RAID (external RAID or RAID controller). Luckily, Linux has drivers for quite a few different RAID controllers that are available for a reasonable price.
- Linux: Beware of Redhat's security problems, disable all unnecessary services, et. al. Seek out security-oriented and Linux performance-tuning sites for more suggestions.
- General: Don't skimp on hardware. A cheap component, be it a drive, network card, motherboard, or whatever, if it fails, will cause unrecoverable downtime. We decided on Intel NL440BX boards (serial console/BIOS support is nice), PII-400's, and IBM SCSI drives in both boxes. If one box were to have a catastrophic failure, the other is able to perform both webserver and SQL server functions if necessary. We can also simply replace a failed component with one pulled from a similarly-configured non-production (test) box, or just swap boxes altogether.
Both Apache and MySQL have good sections on performance tuning. Do not be afraid to RTFM.Any questions/comments can be directed to me. Flames directed to
Nice starting point if you are on a budget:
Software RAID mini-HOWTO
Also take a look at:
Linux High-Availability HOWTO
-- If you can't convince them, confuse them (Truman)
Some good suggestions out there, but one thing hasn't been mentioned. It's good to test the site with some kind of load generation tool (I think there's one at apache.org) when you're trying out different configurations *before* you go live. Every site is likely to be optimally tuned a bit different.
:).
Also overlooked is possibly tuning the filesystem for caching and the like (file descriptors) and networking (maximum connections).
Possibly most of all, when I've seen performance problems, it's been due to how the code was written
Memory is pretty damn cheap -- I've been running my web server off a ramdisk. Archive your web server in a tar ball then just expand it onto the ram disk... just don't put your db there :-)
MySQL and PostgreSQL seem to be the two main backend db engines discussed here. Does anyone know if any comparisons exist between the two that cover their use (by a commercial organisation)?
Of course, neither of those sites is particularly busy and I'm more proud of the management utilities than the sites themselves, but that's par for this course.
The thing I did learn was that using perl and CGI is quite clumsy for this sort of thing. I eventually switched to PHP3 because everything goes together much faster. I don't know what it does to the performance, but since both sites are being served from the world's slowest Web server hardware (the database server is a 486dx2-80 and the database server has the HNBA website on it but the C Bookstore Web server is the 5x86-120 that I use for most of the four dozen or so domains that I host) and performance is not that big an issue, I'm not all that worried. It'd be nice if it got some hits, though.
It figures an eponymously named anonymous loser like yourself would have the balls to criticize someone else. I think it's fair to say that you don't post your online information because you haven't accomplished jackshit.
You cynical bastards are quite amusing. It'll be interesting to see how cynical you are 20 years into your dead end careers. I'm sure the "HTML programmer" will be doing quite fine.
Hates people who have stupid little sigs
I'm replacing some perl code with Java servlets using mySQL, Apache JServ, XML and XSL.
Apache JServ allows load balancing (basically doing a round robin over each of your servlet engines). I've found performance goes up about 30% for each PC you throw into the mix (I've only been able to test this up to 3 PCs).
FYI: I've found I needed servlet engines running on 2 PC's connected to 1 mySQL database to reach the performance of the perl app which stores its data as | delimited files.
While this may seem pretty poor, using a database means that the scalability (for size of data) is going to be a lot better than the file solution and the servlet solution used XSL which gives us a lot more flexibility over the HTML that we generate (basically each one of our users can have a completely different looking site while running the same app as all the others).
I'll be posting some benchmarks at http://objexcel.com in a few days if anyone is interested.
Peter
>welcome to the commercialization of linux pal, it's downhill from here.
That's rather cynical, as we port various Linux applications to QNX it's in our favor to document what we've found to improvde performance. The fact this documentation also helps Linux is just part of the benefit of open source, itself.
If a company has gone through its paces to approve using Linux it's only logical that the people looking for all that free support will also contribute to it.
-From Up North
William Bull Senior UI Engineer QNX Software Systems Ltd.
This talk of HTML programming reminds me
of the bright light who suggested, in response to a "rewrite the browser in Java" thread on the mozilla.general newsgroup, that it would be
better to rewrite the browser in XML...
No, they went back to Solaris, which is very interesting when thinking about disputes between Microsoft and Sun :-). By the way, it's no wonder that NT can't handle the load, considering some 40million users they have.
:-)
I guess it sure was the pain for people at Microsoft to choose Solaris as their "OS of the choice". Hmm, it might be worth asking why they didn't stick to their great OS?
Why not use multiple small servers than load :)
balance them with eddie. check out the
eddieware project. Cool thing is, it runs on
FreeBSD AND Linux and it's open source
http://www.eddieware.org