Slashdot Mirror


Improving Database Performance?

An anonymous reader asks: "An acquaintance of mine runs a fair sized web community. Having nearly 280,000 users in a single MySQL database with about 12 mirrors accessing it, performance has become an big issue. I was wondering what practical methods could be used to lessen the load and decrease user query times. Funds are limited, so he'd rather not pay for a commercial database system at this time."

26 of 95 comments (clear)

  1. Memcached by Exstatica · · Score: 5, Informative

    Use Memcached. Which is used by livejournal, slashdot, wikipedia, and sourceforge. I also use it. My database has no load. It's not to difficult to implement and there are tons of API's

    1. Re:Memcached by SpaceLifeForm · · Score: 3, Insightful
      And add more RAM.

      --
      You are being MICROattacked, from various angles, in a SOFT manner.
    2. Re:Memcached by stevey · · Score: 4, Informative

      Pretty much, yes.

      Here's an introduction to memcached I wrote which might explain it for you.

      In short you modify all the parts of your code that fetch from the database to first check from the memory cache - and when storing invalidate the cache. In general most sites read data more than they write it so most of your accesses come from the cache - thus reducing the load upon your DB.

      If you don't want to modify your code you could look at optimizing the setup of the database server, moving it, setting up replication, etc.

      Still without more details it is hard to know what to suggest.

    3. Re:Memcached by Anonymous Coward · · Score: 2, Interesting

      "Add more RAM" sounds good, but it is not necessarily a real solution.

      For many "real" databases, there is most definitely a big IO performance issue, regardless of the amount of RAM in the system. The main reason is the requirement that data be in stable storage after a commit succeeeds in order to comply with ACID semantics (though being MySQL, it wouldn't surprise me if this requiement didn't bother him).

      If the database is read-only (and being MySQL, that wouldn't surprise me), then sure, adding more RAM until disk access goes to 0 would be one way to get more performance.

      Otherwise, the best way to get really good performance out of a database, after there is a *reasonable* amount of RAM and CPU power there, is to add a good disk controller with a nice battery backed writeback cache.

      Without a battery backed disk controller cache, the log disk's rotational latency quickly becomes a bottleneck and you end up with a limit of 100-200 transactions per second (that modify data, plain selects don't have this limit of course).

  2. Re:Postgre-SQL by Anonymous Coward · · Score: 2, Informative

    If the OP's queries are such that the database is mostly read-only, I don't think that switching over to Postgresql is going to really show enough improvement to justify the pain of switching to a different rdbms.

    Postgresql scales better than Mysql under heavy concurrent read/write conditions. If that is the access pattern for the OP, then I said yes - look into switching to Postgresql.

  3. It Depends by AtrN · · Score: 4, Insightful

    Without some idea of the access patterns, schema and actual DBMS configuration its hard to say what can be done to improve performance. There are purely mechanical things like caching as much as possible, getting faster disks, more memory, etc... but these may not even help depending upon the more fundamental issues of DB design and deployment. Depending upon the use MySQL may not even be appropriate given some of its limitations, PostgreSQL may be a better fit for instance if there's a lot of updating going on or client s/w is performing many queries to assemble views which could be better done closer to the data.

  4. MySQL replication. by billn · · Score: 4, Informative


    Works like a champ.

    Set up multiple read slaves to carry the bulk of your read traffic, especially for your mirrors. Considering MyISAM's native table locking behavior, this should reduce your master db load quite a bit, even just moving your mirror read-load to a slave replicant.

    Also, query caching is a beautiful thing.

    --
    - billn
    1. Re:MySQL replication. by fleck_99_99 · · Score: 2, Funny

      Be careful with those slave replicants. Sometimes, they get cranky and want to know their expiration dates.

      --
      seven two six five
      seven four six one seven
      two six four two e
  5. some MySQL optimization tips by ubiquitin · · Score: 4, Interesting

    You can push MySQL way beyond 280,000 customer records. I know because I've done it.

    With properly normalized data, on fast, current, commodity hardware (10krpm drives for instance), using InnoDB, you can pretty easily push MySQL into the 5 to 10 million records-per-table range before you start really needing a bigger relational database engine. This assumes no more than 1% of your data is needing to be updated per day. Staying with GPL database software is a really smart thing to do: you don't know how much time and money gets spent on just negotiating with Oracle over their licenses: it is anything but simple. Small business web sites cease to be "small business" when they grow beyond of .01% of 10 million per day.

    A non-trivial part of my business is in advising companies in how to get the most out of MySQL. Replication is one part of that, but having the right data structure for scalability is really key. Want more? Ask around at: www.phpconsulting.com

    --
    http://tinyurl.com/4ny52
    1. Re:some MySQL optimization tips by superpulpsicle · · Score: 2, Insightful

      Half the time the problem is actually not the database software itself. It's the hardware and storage.

      - At the least stripe the volumes across multiple disks.

      - The system should have the virtual memory partition/swap on a separate disk alone. If not controller.

      - SCSI drives are notorius for utililizing less resources.

      - Run the database on raw device, not on some filesystem.

  6. Re-design the schema... by MosesJones · · Score: 4, Insightful


    280,000 records, even for MySQL, isn't that much and indicates that performance is being driven down either by tiny hardware or more likely...

    1) Badly optimised queries
    2) Poor index selection and maintainance
    3) Generally poor schema design

    It might also be that queries should be cross table with sub-queries(not a MySQL strong point).

    9/10 poor database performance is due to bad database design.

    --
    An Eye for an Eye will make the whole world blind - Gandhi
  7. Re:Just... by dotgain · · Score: 4, Funny

    Just randomly abort the CGI scripts and throw out an HTTP/503. That's how slashdot does it, and he did "ask slashdot".

  8. More importantly by Safety+Cap · · Score: 4, Informative
    It actually supports ACID, whereas MySQL does not.

    So, for example, if you want to insert a string that is too big for the field, MySQL will gladly suck it up with nary a peep (meanwhile, your data is trashed: truncate hell), whereas Postgre (and other non-toy RDBMSs) will refuse to insert the record.

    Wikipedia has a nice comparison.

    --
    Yeah, right.
  9. A plea. by linuxwrangler · · Score: 4, Insightful

    To the O.P.: Provide some info - we're not mind-readers. Today's User Friendly is somehow appropriate.

    How well normalized is the schema? Mostly reads? Writes? Both? 280,000 users? So what. Do you mean simultaneous users or are only 2 on at a time? Are they accessing a single 100 record table or lots of large tables? Are they indexed properly? What is the OS, memory, disk, processor...? How much processing is required of the DB vs. the front-end. Have you run into any specific problems that might indicate that a different db might be more appropriate. What have you tried and what was the result?

    To the editors: Please reject Ask Slasdot questions from posters who can't be bothered to provide the most basic background info.

    This is Slashdot. I would like to believe that the typical reader could be rather more technically erudite.

    --

    ~~~~~~~
    "You are not remembered for doing what is expected of you." - Atul Chitnis
  10. "it depends" by Anonymous Coward · · Score: 5, Insightful

    Like a poster above mentioned, it really depends on your access patterns.

    The ABSOLUTELY MOST IMPORTANT THING is to set up some benchmarks that reflect your usage patterns. Once you have some solid benchmarks, you can set up a farm of test machines and start benchmarking, adjusting, benchmarking, adjusting, over and over until you've got the performance you need. I can't stress this enough. You need a good, automated benchmark system to get you the first 85-90% of the way. The last 10% has to be done "in the field" unless your benchmarks are REALLY good.

    Generally, you want to minimize disk usage time on your databases. Everything else is just gravy. Make sure you've got some FAST disks for your MySQL boxes, and make sure they are striped RAID (or whatever your benchmarks show as being the fastest). Choose your filesystem (ext3, reiser, etc) the same way: use the one that benchmarks the fastest.

    Next, there are lots of things you can tune in MySQL. For instance, did you know there's a "query cache" that will save the results of queries, indexed by the SQL query text? In *some* situations, it can be very useful. In others, it actually degrades performance. First learn about MySQL's various knobs and get them tuned.

    Next, you might need to distribute reads to more mirrored DBs and/or to application-level caching like memcached. Depending on your app, this can give you a 100x speed increase.

    Next, you might want to partition your database, if your data is suited for it. For instance, all queries for customers in Idaho go to a separate machine just for the idaho customers. All your application goes through a DB access layer that selects the right machine.

    Basically, you need to get the "main loop" down: benchmark, adjust, benchmark, adjust, etc., etc, and then start trying things out!

    The same goes for PostgreSQL.

    But whatever you do, the LAST thing you want to do is mess with your database intregity. If anybody tells you to "turn off constraints" or "denormalize for performance", they are idiots. Your primary goal should always be data integrity! If you've got a real app, with real paying customers, and real valuable data (i.e., not a blog or something), you can't afford to throw 30 years of database best practices out the window to get a 5% speed increase. Today's SQL databases unfortunately don't even begin to implement even the most basic relational features, but that doesn't mean you shouldn't try. Just a tip...I've made plenty of consulting dollars fixing the mess people left when they started valuing performance over data integrity.

    1. Re:"it depends" by UnckleSam · · Score: 2, Informative

      Real workload performance testing is easy with MySQL: Dump your database (mysqldump or a real filesystem snapshot if you have the HW) in a clean state, turn on the query log (don't use binary logging - it only contains statements modifying your db's contents such as UPDATEs and INSERTs) and wait some hours or days. Then deploy the dumped database to a testing system, select an appropriate starting point in the query logs (choose the first statement that arrived after you dumped your db) and feed all statements after to a mysql shell. Measure the time how long it takes to complete with different MySQL settings. This will give you a rough approximation on real life performance. Counterpart: It doesn't respect that queries are processed in parallel because they come from different clients and are processes by different MySQL server threads. (Hm, a perl script parsing the query log - it contains server thread id's - simulating different client threads should be easy to code for someone who knows perl well enough). However, when MySQL's usual knobs aren't enough, you may use MySQL's master-slave replication. While absolutely simple to setup and run it can be problematic on the client side. Using master-slave replication leads to the scenario that you direct ALL data modifying statements (INSERTs, UPDATEs) to a single MySQL master processes while executing read-only statements (SELECT) on multiplte slave processes. The master as well as the multiplte slave machines maintain a full copy of your data; all statements executed on the master are implicitly executed on the slaves as well, because MySQL replication simply means that the master sends all UPDATEs and INSERTs to the slaves as well (so their copy of the data gets updated as well). Sounds nice, but it can lead to quite serious problems. In such a setup, the slaves' data will always lag behind the master's data. Sometimes that doesn't matter. Your users will regret if they find a new blog entry posted on your site 2 seconds after it was posted. But your users won't regret if they post a blog entry and it won't show up in the index page (to which they are redirected immediately after they have posted the new stuff) unless they hit F5 two seconds later. This can happen because the INSERT of the blog entry happens on the master server whist the SELECT * FROM user_blog WHERE uid=foo from the client happens on one of the slave servers which haven't caught up to the master server's point of view (data). (Fancy software developers will see lots of possible race conditions pop up here which can lead to more seriuos malfunction of a well-working secure software.) However, all of these problems can be cured, but that requires a careful design of your application. Or some serious piece of code-auditing and refactoring. That's the downside - MySQL replication is absolutely easy to set up but you have to do more on the client side. Anyways, it's worth a look.

  11. Re:Date searches may not work as you think by fearanddread · · Score: 2, Informative

    Explain syntax is an interesting and informative tool that can help optimize your indexes and queries.

  12. sqlrelay by Vlad_Drak · · Score: 3, Interesting

    SQLRelay http://sqlrelay.sourceforge.net/ might be a good option here. If you do end up switching the backend from MySQL to PostGres or whatever, it's supported there too.

  13. High Performance MySQL by ajayrockrock · · Score: 4, Informative


    Check out the High Performance MySQL book for info on how to speed it up. Most of it's probably obvious for the hardcore DBA guy, but I found it useful:

    http://www.oreilly.com/catalog/hpmysql/

  14. A few things, and alternative technologies by eyeball · · Score: 2, Informative

    I have to agree with some other posters -- without knowing some of the dynamics of the db usage, it's difficult to make suggestions. 280,000 users can be a piece of cake if all you're doing is user auths for each (that's about 3/second if everyone logs in once/day). Worse-case,

    A few things too look at:

    - If there is excessive or improper locking being done (i.e.: do you really need to lock a table to update a record that could only possibly be updated one at a time?)
    - If queries can be made less complex
    - Indexing. You should become intimate with how indexing works and the various ways of setting it up
    - Caching infrequently changed content on the front-end (i.e. generate static web pages that don't change too often rather than dynamically creating them constantly).
    - de-normalize your tables if it improves performance. Don't worry nobody's looking :)

    Also, look into some lighter-weight DB & DB-related technologies: HSqlDB, SQLite, C-JDBC, BerkeleyDB, SQLRelay, to name a few. Granted some aren't distributed, but again, not knowing the architecture, some data may be lazily replicated out from the master.

    Also, I can't find it now, but I read a while back that MySQL was adopting an open-sourced in-memory DB from a company (Ericcson?) that may be available separately. You also may want to look into something like LDAP (OpenLDAP) if the app is very read-heavy.

    --

    _______
    2B1ASK1
  15. Re:Postgre-SQL by dotgain · · Score: 3, Funny

    PHB: I think we should build an SQL database.
    Dilbert (thinking):Does he know what he's talking about, or did he read it in a trade magazine ad?
    Dilbert (speaking):What color would you like that database?
    PHB: I think mauve has the most RAM. -

  16. Re:Postgre-SQL by dubl-u · · Score: 2, Informative

    If the OP's queries are such that the database is mostly read-only, I don't think that switching over to Postgresql is going to really show enough improvement to justify the pain of switching to a different rdbms.

    If the queries are indeed read-mostly, it might be worth benchmarking it against an LDAP server. LDAP servers are built specifically for serving user data, which is usually greater than 99% read. It's been a while since I did benchmarks, but at the time they could beat the pants of of general-purpose databases.

  17. ask the source by hankaholic · · Score: 3, Interesting

    You might suggest that your friend consider asking MySQL for a quote:

    http://www.mysql.com/company/contact/

    Their Enterprise contracts are probably a bit much for your friend's needs, but they may offer single-incident support for optimization and tuning assistance.

    If he doesn't mind delving into DBA-land, he may want to buy a book. If he values the time it would take him to get up to speed and would rather spend it on other pursuits, it may well be worth the money to get some help.

    Either way, he'll have to spend something (time or money) -- it's a question of how much his time is worth to him.

    --
    Somebody get that guy an ambulance!
  18. Re:Date searches may not work as you think by Samus · · Score: 2, Interesting

    You might not be getting much performance out of that index anyway. I can't speak to your specific database but in general if you have a large table and the indexed column has a small number of possible values, you won't be buying yourself a whole lot. Let's say that 30 plus percent of your 100,000 plus record table at any one time has not been processed. IO wise that probably won't be much better than a table scan. Additionally if a table has a lot of writes done to it, a lot of indexes will hamper performance. Often online activity and reporting don't mix very well. Reports can easily swamp a database. I've seen companies that actually use a reporting database that is synched up nightly, just so that their onlines aren't affected negatively. Perhaps in your case you could even consider having 2 tables that are identical instructure except that one is for incoming records and the other is for processed records. Databases are a tricky beast. As you well know what works for a hundred or a thousand records doesn't always scale to a hundred thousand.

    --
    In Republican America phones tap you.
  19. It's time to re-think the overall design by ChrisA90278 · · Score: 2, Insightful

    Not much to offer without knowing more about how the system is designed. If the DBMS is replicated 12 times I suspect some kind of _serious_ design problem and the best fix would be to re-think the aplication. The best advice is to tell the person to get onto some DBMS forums and talk about his aplication and how best to design it. Notice I say "DBMS forums" not MySQL forums. Need to step way back and look at the bigger picture and not ask about low-level mysql related nuts and bolt issues. The best tool for "painting yourself in a corner" really quick is a DBMS in the hands of someone with not training in database design. Performance is NOT and issue I've clocked thousands of querries per second using standard PC hardware and MySQL or PostgreSQL

  20. Something I recently found... by cr0sh · · Score: 2, Interesting
    In a bone-headed move, I might add - but it is something worth checking out to see if this is part of the issue:

    I am in the process of re-doing my website using PHP and MySQL. My new site will be complete DB driven, to the point that the page content is driven and built from PHP stored in the DB. My goal is to be able to update the website from anywhere in the world with a web connection. I am custom writing this system, rather than use a pre-existing CMS or other blogging system (and there were quite a few that were tempting) - because I wanted to learn PHP and MySQL by doing, rather than by observing.

    Anyhow, one of my editors was slowing down on an update - that is, when I clicked "update" to update the site, it was taking a long time to update the database. Various tests indicated that it wasn't PHP with the issue, but running 'top' on my dev box indicated that the apache process was thrashing on these updates. I checked the code, and here is what I discovered:

    In my update code, I was issuing a SQL insert for each field in a record, where I was updating multiple fields on the same record, rather than doing an insert with all the fields to be updated in the SQL statement. If I had 10 fields to update, that was 10 INSERTS, instead of the single I should have been doing. As I said, this was a bone-headed move I won't be doing again in the future. Once I corrected the issue, my performance shot up immediately on the update.

    I would imagine that the same could be true of any simple SELECT - select out all the fields (and only those fields) you need at one shot, then loop thru the records building your output (whatever it is). Optimize the queries well, too (a misplaced pair of parentheses can make a WORLD of difference in some cases).

    In short, keep the number of queries to the backend as short and sweet as possible, reducing the load (and thrashing) on the backend. This should be common sense design, but sometimes in the thrill and rush to build something, programmers forget this, and it can easily cause issues down the line (I was lucky in that I caught it very early in my design of the system).

    Good luck, and I hope this helps...

    --
    Reason is the Path to God - Anon