Slashdot Mirror


Oracle Claims Dramatic MySQL Performance Improvements

New submitter simula67 writes "Oracle wins back some karma from the open source community by releasing MySQL cluster 7.2 with ambitious claims of 70x performance gains. The new release is GPL and claims to have processed over 1 billion queries per minute. Readers may remember the story about Oracle adding commercial extensions to MySQL."

19 of 168 comments (clear)

  1. Considering sub queries in IN statements. by micheas · · Score: 5, Insightful

    If you create a query in mysql with an IN statement in the where clause and you put a sub query as the in statement current versions will run the query once for each row of the primary table you are querying. Caching result alone would probably get the 70x speed up. I am suspect that there are other performance stupidities in mysql that are worked around by people doing a simple query and then using php/perl/python/java/etc to parse the result and generate the second query.

    1. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 5, Informative

      If you create a query in mysql with an IN statement in the where clause and you put a sub query as the in statement current versions will run the query once for each row of the primary table you are querying. Caching result alone would probably get the 70x speed up. I am suspect that there are other performance stupidities in mysql that are worked around by people doing a simple query and then using php/perl/python/java/etc to parse the result and generate the second query.

      yes, but many database engines have problems with inner select statements. Refactor your query to use a join an you'll be fine.

    2. Re:Considering sub queries in IN statements. by K.+S.+Kyosuke · · Score: 4, Insightful

      yes, but many database engines have problems with inner select statements. Refactor your query to use a join an you'll be fine.

      And what's wrong with using an RDBMS with a non-braindead optimizer?

      --
      Ezekiel 23:20
    3. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 3, Interesting

      The latest (past year's worth) of PostgreSQL releases automatically rewrite subqueries as joins for you.

      However, MSSQL 2008 performance dropped significantly from prior versions due to the exact opposite --- subqueries are now looped through, where previously they were handled with hash tables internally.

    4. Re:Considering sub queries in IN statements. by micheas · · Score: 3, Informative

      SELECT id, title FROM page WHERE id IN (SELECT pageid FROM hotnews WHERE user = 8)

      The answer to this is to execute the sub query and pass the result back as an array. If you have 10,000 records in table page, you will see about a 2000x speed improvement if you are using php

    5. Re:Considering sub queries in IN statements. by wstrucke · · Score: 5, Informative

      You're probably right, but that's not what this release is referring to. The NDBCLUSTER engine separates "API" nodes from Data nodes. A server running MySQL with NDBCLUSTER enabled is considered an API node, but you can also have a C++ or Java or whatever API node that isn't MySQL. Data nodes are provisioned in one or more "node groups" with one or more data nodes in each group, though it would be dumb to have a single node group or a single node in a node group. Each node group splits the data somehow. You can force tables to exist on just one group, but by default if you have three node groups you would have approximately a third of your data in each group.

      Anyway -- prior to NDBCLUSTER 7.2 if you performed any join whatsoever the API node had to pull the complete tables from all data node groups prior to doing the join on itself and returning the result. This made join performance, simply put -- terrible. I've tested the same query on a standalone out of the box mysql server against a 7.1 cluster and had an 8 second query come back from the cluster in several minutes due to the join performance.

      NDBCLUSTER 7.2 adds what was called "push down joins" in development -- basically the data nodes now do the joins within their own sub-sets of the data for certain joins resulting in a dramatic improvement in performance, since now the API nodes just get the result from the network instead of the entire dataset.

      It really is an amazing improvement and is a result of the dynamic capabilities of MySQL. NDBCLUSTER was never designed for the type of data people are throwing at it, and with the recent improvements it might actually be viable for use on production web sites for more common usage scenarios.

      What I do not see addressed yet with 7.2 is the reload time -- if you have a cluster loaded with several GB of data it can take upwards of 20 minutes to reload the data and indices across the cluster. While the cluster is designed to be up 24x7 (even through upgrades), a single bug or memory error that takes it down can result in a prolonged outage. There are several of these open in the bug tracker.

    6. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 4, Informative

      If you create a query in mysql with an IN statement in the where clause and you put a sub query as the in statement current versions will run the query once for each row of the primary table you are querying. Caching result alone would probably get the 70x speed up

      We have exactly that in MariaDB 5.3 (a fork of MySQL) : http://kb.askmonty.org/en/subquery-cache. It won't always give the 70x speedup, sometimes you need to start execution from the subquery to get decent performance: http://kb.askmonty.org/en/semi-join-subquery-optimizations.

      I am sorry for shameless self-promotion, couldn't resist :-)

    7. Re:Considering sub queries in IN statements. by jon3k · · Score: 3, Informative

      Just fyi, It's called a correlated subquery.

    8. Re:Considering sub queries in IN statements. by cheater512 · · Score: 3, Insightful

      The poor RDBMS has to put up with crap like that over and over again.
      It will probably need therapy after you are through with it.

      The programmer is supposed to know how to use the tool effectively to get the best performance.
      In that example there is a stupidly obvious solution - use a join.

    9. Re:Considering sub queries in IN statements. by K.+S.+Kyosuke · · Score: 4, Insightful

      The poor RDBMS has to put up with crap like that over and over again. It will probably need therapy after you are through with it.

      The programmer is supposed to know how to use the tool effectively to get the best performance. In that example there is a stupidly obvious solution - use a join.

      To me that sounds like fixing a bug in the wrong place. RDBMS is *not* supposed to be dumb. If it were supposed to be dumb, we would not be using SQL in the first place and we'd be passing our own execution plans in the DB engine's internal intermediate language instead. (Well, you can sort of do it with Firebird (BLR) and SQLite (VDBE), but it's not how it's supposed to be used.)

      --
      Ezekiel 23:20
  2. MySQL Cluster != MySQL by Anonymous Coward · · Score: 5, Informative

    Not the same thing.

  3. But of course it reads from RAM by photonyx · · Score: 4, Interesting

    The case when the data set is bigger than RAM amount has not been investigated (link here, see the comments). The hard drive I/O speed would slow it dramatically, unless it's an expensive array of SSDs.

  4. Yes, I RTFA (sue me) by billcopc · · Score: 3, Informative

    If I read the sales pitch correctly, they just integrated Memcached as a backend storage module, so that it plays nicely wrt ACID compliance. Yeah, memory is 70x faster than disk I/O... big whoop!

    Anyone running a sizeable MySQL installation already has heaps of RAM allocated to the InnoDB buffers/caches anyway. It sounds like Oracle compared a stock, distro-default MySQL to their memory-hungry tweaks. Yeah, DUH. I can get a 70x speedup too if I increase MySQL's memory usage from the default 64mb to 48 gigabytes.

    --
    -Billco, Fnarg.com
  5. Re:PostgreSQL by Hognoxious · · Score: 4, Funny

    Is PostgreSQL webscale? MongoDB is.

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  6. MySQL in-memory speed by Amadodd · · Score: 5, Interesting

    Slightly off-topic, but I recently had the oppurtunity to test the speed of a MySQL in-memory database. I have some frequently queried read-only data that simply would not handle the load in MS SQL and was looking for an in-memory solution. MySQL provided the simplest implementation - simply tell the table to use memory storage and configure the server to allow the amount of data you want to host (~250MB in this case). You also have to remember to reload the data from normal InnoDB tables every time you restart the server. I used the same table structures, keys indexes and stored procedures (almost the same) to query the data and linked it through MS SQL so that my applications never new the difference. On exactly the same hardware the speed increase was at least 50X over MS SQL.

    --
    Freedom of speech doesn't come with bandwidth.
  7. Re:Finally! by Admiral+Llama · · Score: 3, Interesting

    No, rather it's amazing what happens when you architect the DB such that the entire thing is forcibly held in memory.

  8. The 70x is for (distributed) joins by jorela · · Score: 5, Informative

    which previously were quite poorly handled.
    See http://www.clusterdb.com/mysql-cluster/70x-faster-joins-with-aql-in-mysql-cluster-7-2/?utm_source=rss&utm_medium=rss&utm_campaign=70x-faster-joins-with-aql-in-mysql-cluster-7-2

  9. 70x by slasho81 · · Score: 5, Funny

    Developers: We've got some really good ideas for increasing performance of complex queries by...
    Marketing: How much in the best conceivable case?
    Developers: Oh, I dunno, maybe 70x.
    Marketing: 70x? Is that good?
    Developers: Yeah, I suppose, but the cool stuff is...
    Marketing: Wow! 70x! That's a really big number!
    Developers: Actually, please don't quote me on that. They'll make fun of me on Slashdot if you do. Promise me.
    Marketing: We promise.
    Developers: Thanks. Now, let me show you where the good stuff is...
    Marketing (on phone): Larry? It's me. How big can you print me up a poster that says "70x"?