Improving Database Performance?
An anonymous reader asks: "An acquaintance of mine runs a fair sized web community. Having nearly 280,000 users in a single MySQL database with about 12 mirrors accessing it, performance has become an big issue. I was wondering what practical methods could be used to lessen the load and decrease user query times. Funds are limited, so he'd rather not pay for a commercial database system at this time."
Use Memcached. Which is used by livejournal, slashdot, wikipedia, and sourceforge. I also use it. My database has no load. It's not to difficult to implement and there are tons of API's
Although I am too lazy to look up any hard evidence (and when is it necessary on slashdot?), I've heard that Postgre-SQL is much better at heavy loads then MySQL.
Without some idea of the access patterns, schema and actual DBMS configuration its hard to say what can be done to improve performance. There are purely mechanical things like caching as much as possible, getting faster disks, more memory, etc... but these may not even help depending upon the more fundamental issues of DB design and deployment. Depending upon the use MySQL may not even be appropriate given some of its limitations, PostgreSQL may be a better fit for instance if there's a lot of updating going on or client s/w is performing many queries to assemble views which could be better done closer to the data.
Works like a champ.
Set up multiple read slaves to carry the bulk of your read traffic, especially for your mirrors. Considering MyISAM's native table locking behavior, this should reduce your master db load quite a bit, even just moving your mirror read-load to a slave replicant.
Also, query caching is a beautiful thing.
- billn
If you do a lot of Reading from the database, add a index on the columns you query for.
If you do a lot of Writing to the database, you _may_ want to do batch updating. Or buy/build a raid (I still recommend SCSI )
Finally; Databases are easy to use, but hard to master. There are reasons why people get paid hundreds of dollars an hour to fix peoples database problems.
You can push MySQL way beyond 280,000 customer records. I know because I've done it.
.01% of 10 million per day.
With properly normalized data, on fast, current, commodity hardware (10krpm drives for instance), using InnoDB, you can pretty easily push MySQL into the 5 to 10 million records-per-table range before you start really needing a bigger relational database engine. This assumes no more than 1% of your data is needing to be updated per day. Staying with GPL database software is a really smart thing to do: you don't know how much time and money gets spent on just negotiating with Oracle over their licenses: it is anything but simple. Small business web sites cease to be "small business" when they grow beyond of
A non-trivial part of my business is in advising companies in how to get the most out of MySQL. Replication is one part of that, but having the right data structure for scalability is really key. Want more? Ask around at: www.phpconsulting.com
http://tinyurl.com/4ny52
1. 280,000 users
2. No revenue
3. ?
4. Profit!!
get rid of all those pesky users. No load, you can scale back hardware requirements, and no more annoying email.
280,000 records, even for MySQL, isn't that much and indicates that performance is being driven down either by tiny hardware or more likely...
1) Badly optimised queries
2) Poor index selection and maintainance
3) Generally poor schema design
It might also be that queries should be cross table with sub-queries(not a MySQL strong point).
9/10 poor database performance is due to bad database design.
An Eye for an Eye will make the whole world blind - Gandhi
So, for example, if you want to insert a string that is too big for the field, MySQL will gladly suck it up with nary a peep (meanwhile, your data is trashed: truncate hell), whereas Postgre (and other non-toy RDBMSs) will refuse to insert the record.
Wikipedia has a nice comparison.
Yeah, right.
To the O.P.: Provide some info - we're not mind-readers. Today's User Friendly is somehow appropriate.
How well normalized is the schema? Mostly reads? Writes? Both? 280,000 users? So what. Do you mean simultaneous users or are only 2 on at a time? Are they accessing a single 100 record table or lots of large tables? Are they indexed properly? What is the OS, memory, disk, processor...? How much processing is required of the DB vs. the front-end. Have you run into any specific problems that might indicate that a different db might be more appropriate. What have you tried and what was the result?
To the editors: Please reject Ask Slasdot questions from posters who can't be bothered to provide the most basic background info.
This is Slashdot. I would like to believe that the typical reader could be rather more technically erudite.
~~~~~~~
"You are not remembered for doing what is expected of you." - Atul Chitnis
Like a poster above mentioned, it really depends on your access patterns.
The ABSOLUTELY MOST IMPORTANT THING is to set up some benchmarks that reflect your usage patterns. Once you have some solid benchmarks, you can set up a farm of test machines and start benchmarking, adjusting, benchmarking, adjusting, over and over until you've got the performance you need. I can't stress this enough. You need a good, automated benchmark system to get you the first 85-90% of the way. The last 10% has to be done "in the field" unless your benchmarks are REALLY good.
Generally, you want to minimize disk usage time on your databases. Everything else is just gravy. Make sure you've got some FAST disks for your MySQL boxes, and make sure they are striped RAID (or whatever your benchmarks show as being the fastest). Choose your filesystem (ext3, reiser, etc) the same way: use the one that benchmarks the fastest.
Next, there are lots of things you can tune in MySQL. For instance, did you know there's a "query cache" that will save the results of queries, indexed by the SQL query text? In *some* situations, it can be very useful. In others, it actually degrades performance. First learn about MySQL's various knobs and get them tuned.
Next, you might need to distribute reads to more mirrored DBs and/or to application-level caching like memcached. Depending on your app, this can give you a 100x speed increase.
Next, you might want to partition your database, if your data is suited for it. For instance, all queries for customers in Idaho go to a separate machine just for the idaho customers. All your application goes through a DB access layer that selects the right machine.
Basically, you need to get the "main loop" down: benchmark, adjust, benchmark, adjust, etc., etc, and then start trying things out!
The same goes for PostgreSQL.
But whatever you do, the LAST thing you want to do is mess with your database intregity. If anybody tells you to "turn off constraints" or "denormalize for performance", they are idiots. Your primary goal should always be data integrity! If you've got a real app, with real paying customers, and real valuable data (i.e., not a blog or something), you can't afford to throw 30 years of database best practices out the window to get a 5% speed increase. Today's SQL databases unfortunately don't even begin to implement even the most basic relational features, but that doesn't mean you shouldn't try. Just a tip...I've made plenty of consulting dollars fixing the mess people left when they started valuing performance over data integrity.
... but Microsoft's has a free version of SQL Server... the "Microsoft SQL Server 2000 Desktop Engine" MSDE has some limitations like you can only have a 4 GB database, and some amount of throttling.
They say it is ideal for websites with up to 25 concurrent users, but in reality with carefully planned caching and statically generated pages, data replication, and indexing the tables that number can be pushed into the thousands... I worked on a point of sales system where that was exactly what we did, becasue clients find it hard to want to buy a SQL Server license.
The trick is to keep your hits from getting to the DB as much as possible. The techniques for this are varied, but mostly this means caching your pages as static content. Depending on the dynamic nature of your site's content, you might be able to run a cron job daily that renders much of your site's content into static HTML files.
I discovered something very interesting when I was running a large mySQL installation. We had only about 50 users but they were telemarketers continuously beating on the database all day.
Certain reports would kill the system - make it stop entirely for minutes at a time. What I discovered was that this kind of query
select (fields) from calendar where date like '2005-08-15%'
was horribly slow. Instead, use
select (fields) from calendar where date >= '2005-08-15' and date date_add('2005-08-15', interval 1 day)
That was many, many TIMES faster when the field date was indexed. The problem was that the date is stored as a numeric value and not a string.
All that time, the date index was not being used and I didn't even realize it.
Hope this helps someone, if not the original poster someone else.
D
Ruby on Rails makes you so fricking productive, you could use dBase III and it wouldn't matter. Just twiddle some bits with RoR and BAM!... your performance problems go away.
RoR solves all computing problems with less than 20 lines of code and outperforms everything on the market. Java, C, Perl, PHP are all obsolete... better get your resumes updated.
SQLRelay http://sqlrelay.sourceforge.net/ might be a good option here. If you do end up switching the backend from MySQL to PostGres or whatever, it's supported there too.
Check out the High Performance MySQL book for info on how to speed it up. Most of it's probably obvious for the hardcore DBA guy, but I found it useful:
http://www.oreilly.com/catalog/hpmysql/
Discussion grows large as topic is very board. :)
Practically there may be 1000 reasons why MySQL is not performing fast for you which can be fixed. If
Yahoo, CNET, LiveJournal use MySQL, you can do it as well
Great answer comes from great question - be detailed
tell what exactly is the problem. Which query is not responding fast enough which table it is uses etc.
Best place to get help is of course MySQL forums
http://forums.mysql.com/ or MySQL mailing list.
This is of course if post goal is to get some help , if you want flame war you'd better be as low on facts as possible.
not records :) I don't know what he means by 'user' though, simultaneous users, or users that login once a week causing just a few queries to be run?
I have to agree with some other posters -- without knowing some of the dynamics of the db usage, it's difficult to make suggestions. 280,000 users can be a piece of cake if all you're doing is user auths for each (that's about 3/second if everyone logs in once/day). Worse-case,
:)
A few things too look at:
- If there is excessive or improper locking being done (i.e.: do you really need to lock a table to update a record that could only possibly be updated one at a time?)
- If queries can be made less complex
- Indexing. You should become intimate with how indexing works and the various ways of setting it up
- Caching infrequently changed content on the front-end (i.e. generate static web pages that don't change too often rather than dynamically creating them constantly).
- de-normalize your tables if it improves performance. Don't worry nobody's looking
Also, look into some lighter-weight DB & DB-related technologies: HSqlDB, SQLite, C-JDBC, BerkeleyDB, SQLRelay, to name a few. Granted some aren't distributed, but again, not knowing the architecture, some data may be lazily replicated out from the master.
Also, I can't find it now, but I read a while back that MySQL was adopting an open-sourced in-memory DB from a company (Ericcson?) that may be available separately. You also may want to look into something like LDAP (OpenLDAP) if the app is very read-heavy.
_______
2B1ASK1
Increase RAM in your servers.l /
http://philip.greenspun.com/wtr/server-sizing.htm
Slashdot = Sarcasm
Someone to give him an elevator summary of what questions he should throw at somebody's kid in college or a starving IT consultant.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
Here's a very simple solution to your problem. Take n servers (where n is the number of DB machines you have) and evenly split your user accounts across them. Then, use a simple hash table in your application to determine the server from which to query. Example:
...
Account Names Server
a-c db01
d-g db02
h-j db03
The key is to choose your boundaries so that each DB server holds a roughly equal number of account.
If you have a really, really busy database, you could split this across twenty or more servers, where each server is actually a cluster of machines doing replication.
Chris
Upgrade from the 486, it's about time.
(provide more statistics than just how many records are in the user database, and we can probably help you a little bit. Otherwise, we're pretty crippled. With 280,000 user records, it doesn't sound like there's a whole hell of a lot happening on it to choke off a powerful machine, not if the software that's accessing things is done right. So, need more info.)
"Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/
You might suggest that your friend consider asking MySQL for a quote:
http://www.mysql.com/company/contact/
Their Enterprise contracts are probably a bit much for your friend's needs, but they may offer single-incident support for optimization and tuning assistance.
If he doesn't mind delving into DBA-land, he may want to buy a book. If he values the time it would take him to get up to speed and would rather spend it on other pursuits, it may well be worth the money to get some help.
Either way, he'll have to spend something (time or money) -- it's a question of how much his time is worth to him.
Somebody get that guy an ambulance!
(and if it is, why are you even asking on /. but hey)
But if you have lots of database updates going ahead, locking and huge index searching, you might want to look at your slowest most costly queries (sometimes they can be stupid little footer fillers) and reduce those queries. Check what is slow, and see how you can work around it, add a new index, or cut the fat.
But seriously, if your 'performance hit' is pulling static stories out of a database, then how come you haven't looked at cacheing?
#hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
general advice only takes you so far.
Optimizing any system involves two steps, one analytical and one creative. The analytical step is determinining exactly where all that time is going. Sometimes it isn't the 80/20 rule, it's the 99/1 rule. The creative step is figuring out how to avoid spending so much time there, either by avoiding unnecessary trips (caching or just cleaner programming), or to speed up the process in question.
If you're lucky, the bottleneck may be in a single tier. It could be as simple as adding an index (back in the old days deleting indices sometimes helped by optimizers are too smart), or segregating out infrequently used data (Oracle is really nice for this, but you can do the same thing using non-proprietary techniques like views).
Another place to look is in interfaces between tiers, especially if those tiers are on physically different machines. Was that trip to the database really necessary, or could it be cached? If they are on the same machine, one trick that used to help on Windows was avoiding localhost in favor of one of the machine's NIC addresses, which got you around some performance issues in the Windows loopback interface.
You can also consider resource allocation. Your database machine may have its CPU idling almost all the time while your applicaiton server is pegged.
You need to be specific. What is it this application doing on these machines using this design. You can often find a single, innocuous looking decision that was made at some point that is killing your performance. Example: your pages are loading very, very slow, but none of your machines is close to breaking a sweat. It turns out the problem is one of the companies that you are serving banner ads from can't handle the load. So, instead you load your pages then use a javascript to load the banner.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
It's technically about Oracle, but it's a good introduction to DBMS performance and how use good science instead of urban legends to tune a database:
Optimizing Oracle Performance by Cary Millsap with Jeff Holt
davecb@spamcop.net
Not much to offer without knowing more about how the system is designed. If the DBMS is replicated 12 times I suspect some kind of _serious_ design problem and the best fix would be to re-think the aplication. The best advice is to tell the person to get onto some DBMS forums and talk about his aplication and how best to design it. Notice I say "DBMS forums" not MySQL forums. Need to step way back and look at the bigger picture and not ask about low-level mysql related nuts and bolt issues. The best tool for "painting yourself in a corner" really quick is a DBMS in the hands of someone with not training in database design. Performance is NOT and issue I've clocked thousands of querries per second using standard PC hardware and MySQL or PostgreSQL
I am in the process of re-doing my website using PHP and MySQL. My new site will be complete DB driven, to the point that the page content is driven and built from PHP stored in the DB. My goal is to be able to update the website from anywhere in the world with a web connection. I am custom writing this system, rather than use a pre-existing CMS or other blogging system (and there were quite a few that were tempting) - because I wanted to learn PHP and MySQL by doing, rather than by observing.
Anyhow, one of my editors was slowing down on an update - that is, when I clicked "update" to update the site, it was taking a long time to update the database. Various tests indicated that it wasn't PHP with the issue, but running 'top' on my dev box indicated that the apache process was thrashing on these updates. I checked the code, and here is what I discovered:
In my update code, I was issuing a SQL insert for each field in a record, where I was updating multiple fields on the same record, rather than doing an insert with all the fields to be updated in the SQL statement. If I had 10 fields to update, that was 10 INSERTS, instead of the single I should have been doing. As I said, this was a bone-headed move I won't be doing again in the future. Once I corrected the issue, my performance shot up immediately on the update.
I would imagine that the same could be true of any simple SELECT - select out all the fields (and only those fields) you need at one shot, then loop thru the records building your output (whatever it is). Optimize the queries well, too (a misplaced pair of parentheses can make a WORLD of difference in some cases).
In short, keep the number of queries to the backend as short and sweet as possible, reducing the load (and thrashing) on the backend. This should be common sense design, but sometimes in the thrill and rush to build something, programmers forget this, and it can easily cause issues down the line (I was lucky in that I caught it very early in my design of the system).
Good luck, and I hope this helps...
Reason is the Path to God - Anon
I don't know why people always want to get new hardware or software to improve performance. If you review your application code I bet you will find many performance problems. Make better use of caching of data, limited database connections, more efficint SQL, etc... These are all ways to improve performance and can be performed incrementally. I've worked on many application where I was told the hardware simply won't support fast operations and then found I just changed some looping structure or poorly written sql code to increase application performance by over 500%. To improve performance the first step is to review application code, then code that resides on the database(might not exist since this is mySQL), then look to new software and hardware.
One (obvious) way to improve database performance is to cache some database results so you dont need to query the databse all the time you can eaily go 1000* faster. It exist some tools who cares about the cache so try to find one for your language/framework
"Use cases are fairy tales..." I. S. 2005