Slashdot Mirror


Improving Database Performance?

An anonymous reader asks: "An acquaintance of mine runs a fair sized web community. Having nearly 280,000 users in a single MySQL database with about 12 mirrors accessing it, performance has become an big issue. I was wondering what practical methods could be used to lessen the load and decrease user query times. Funds are limited, so he'd rather not pay for a commercial database system at this time."

11 of 95 comments (clear)

  1. Memcached by Exstatica · · Score: 5, Informative

    Use Memcached. Which is used by livejournal, slashdot, wikipedia, and sourceforge. I also use it. My database has no load. It's not to difficult to implement and there are tons of API's

    1. Re:Memcached by stevey · · Score: 4, Informative

      Pretty much, yes.

      Here's an introduction to memcached I wrote which might explain it for you.

      In short you modify all the parts of your code that fetch from the database to first check from the memory cache - and when storing invalidate the cache. In general most sites read data more than they write it so most of your accesses come from the cache - thus reducing the load upon your DB.

      If you don't want to modify your code you could look at optimizing the setup of the database server, moving it, setting up replication, etc.

      Still without more details it is hard to know what to suggest.

    2. Re:Memcached by Anonymous Coward · · Score: 1, Informative
      Why can't memcache do that for you? In other words, the app just calls memcache, and memcache then fetches from the database if it needs to.


      Because memcache doesn't know (or care) how or where the data really comes from.

  2. Re:Postgre-SQL by Anonymous Coward · · Score: 2, Informative

    If the OP's queries are such that the database is mostly read-only, I don't think that switching over to Postgresql is going to really show enough improvement to justify the pain of switching to a different rdbms.

    Postgresql scales better than Mysql under heavy concurrent read/write conditions. If that is the access pattern for the OP, then I said yes - look into switching to Postgresql.

  3. MySQL replication. by billn · · Score: 4, Informative


    Works like a champ.

    Set up multiple read slaves to carry the bulk of your read traffic, especially for your mirrors. Considering MyISAM's native table locking behavior, this should reduce your master db load quite a bit, even just moving your mirror read-load to a slave replicant.

    Also, query caching is a beautiful thing.

    --
    - billn
  4. More importantly by Safety+Cap · · Score: 4, Informative
    It actually supports ACID, whereas MySQL does not.

    So, for example, if you want to insert a string that is too big for the field, MySQL will gladly suck it up with nary a peep (meanwhile, your data is trashed: truncate hell), whereas Postgre (and other non-toy RDBMSs) will refuse to insert the record.

    Wikipedia has a nice comparison.

    --
    Yeah, right.
  5. Re:Date searches may not work as you think by fearanddread · · Score: 2, Informative

    Explain syntax is an interesting and informative tool that can help optimize your indexes and queries.

  6. High Performance MySQL by ajayrockrock · · Score: 4, Informative


    Check out the High Performance MySQL book for info on how to speed it up. Most of it's probably obvious for the hardcore DBA guy, but I found it useful:

    http://www.oreilly.com/catalog/hpmysql/

  7. Re:"it depends" by UnckleSam · · Score: 2, Informative

    Real workload performance testing is easy with MySQL: Dump your database (mysqldump or a real filesystem snapshot if you have the HW) in a clean state, turn on the query log (don't use binary logging - it only contains statements modifying your db's contents such as UPDATEs and INSERTs) and wait some hours or days. Then deploy the dumped database to a testing system, select an appropriate starting point in the query logs (choose the first statement that arrived after you dumped your db) and feed all statements after to a mysql shell. Measure the time how long it takes to complete with different MySQL settings. This will give you a rough approximation on real life performance. Counterpart: It doesn't respect that queries are processed in parallel because they come from different clients and are processes by different MySQL server threads. (Hm, a perl script parsing the query log - it contains server thread id's - simulating different client threads should be easy to code for someone who knows perl well enough). However, when MySQL's usual knobs aren't enough, you may use MySQL's master-slave replication. While absolutely simple to setup and run it can be problematic on the client side. Using master-slave replication leads to the scenario that you direct ALL data modifying statements (INSERTs, UPDATEs) to a single MySQL master processes while executing read-only statements (SELECT) on multiplte slave processes. The master as well as the multiplte slave machines maintain a full copy of your data; all statements executed on the master are implicitly executed on the slaves as well, because MySQL replication simply means that the master sends all UPDATEs and INSERTs to the slaves as well (so their copy of the data gets updated as well). Sounds nice, but it can lead to quite serious problems. In such a setup, the slaves' data will always lag behind the master's data. Sometimes that doesn't matter. Your users will regret if they find a new blog entry posted on your site 2 seconds after it was posted. But your users won't regret if they post a blog entry and it won't show up in the index page (to which they are redirected immediately after they have posted the new stuff) unless they hit F5 two seconds later. This can happen because the INSERT of the blog entry happens on the master server whist the SELECT * FROM user_blog WHERE uid=foo from the client happens on one of the slave servers which haven't caught up to the master server's point of view (data). (Fancy software developers will see lots of possible race conditions pop up here which can lead to more seriuos malfunction of a well-working secure software.) However, all of these problems can be cured, but that requires a careful design of your application. Or some serious piece of code-auditing and refactoring. That's the downside - MySQL replication is absolutely easy to set up but you have to do more on the client side. Anyways, it's worth a look.

  8. A few things, and alternative technologies by eyeball · · Score: 2, Informative

    I have to agree with some other posters -- without knowing some of the dynamics of the db usage, it's difficult to make suggestions. 280,000 users can be a piece of cake if all you're doing is user auths for each (that's about 3/second if everyone logs in once/day). Worse-case,

    A few things too look at:

    - If there is excessive or improper locking being done (i.e.: do you really need to lock a table to update a record that could only possibly be updated one at a time?)
    - If queries can be made less complex
    - Indexing. You should become intimate with how indexing works and the various ways of setting it up
    - Caching infrequently changed content on the front-end (i.e. generate static web pages that don't change too often rather than dynamically creating them constantly).
    - de-normalize your tables if it improves performance. Don't worry nobody's looking :)

    Also, look into some lighter-weight DB & DB-related technologies: HSqlDB, SQLite, C-JDBC, BerkeleyDB, SQLRelay, to name a few. Granted some aren't distributed, but again, not knowing the architecture, some data may be lazily replicated out from the master.

    Also, I can't find it now, but I read a while back that MySQL was adopting an open-sourced in-memory DB from a company (Ericcson?) that may be available separately. You also may want to look into something like LDAP (OpenLDAP) if the app is very read-heavy.

    --

    _______
    2B1ASK1
  9. Re:Postgre-SQL by dubl-u · · Score: 2, Informative

    If the OP's queries are such that the database is mostly read-only, I don't think that switching over to Postgresql is going to really show enough improvement to justify the pain of switching to a different rdbms.

    If the queries are indeed read-mostly, it might be worth benchmarking it against an LDAP server. LDAP servers are built specifically for serving user data, which is usually greater than 99% read. It's been a while since I did benchmarks, but at the time they could beat the pants of of general-purpose databases.