Slashdot Mirror


Oracle Claims Dramatic MySQL Performance Improvements

New submitter simula67 writes "Oracle wins back some karma from the open source community by releasing MySQL cluster 7.2 with ambitious claims of 70x performance gains. The new release is GPL and claims to have processed over 1 billion queries per minute. Readers may remember the story about Oracle adding commercial extensions to MySQL."

37 of 168 comments (clear)

  1. Considering sub queries in IN statements. by micheas · · Score: 5, Insightful

    If you create a query in mysql with an IN statement in the where clause and you put a sub query as the in statement current versions will run the query once for each row of the primary table you are querying. Caching result alone would probably get the 70x speed up. I am suspect that there are other performance stupidities in mysql that are worked around by people doing a simple query and then using php/perl/python/java/etc to parse the result and generate the second query.

    1. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 5, Informative

      If you create a query in mysql with an IN statement in the where clause and you put a sub query as the in statement current versions will run the query once for each row of the primary table you are querying. Caching result alone would probably get the 70x speed up. I am suspect that there are other performance stupidities in mysql that are worked around by people doing a simple query and then using php/perl/python/java/etc to parse the result and generate the second query.

      yes, but many database engines have problems with inner select statements. Refactor your query to use a join an you'll be fine.

    2. Re:Considering sub queries in IN statements. by K.+S.+Kyosuke · · Score: 4, Insightful

      yes, but many database engines have problems with inner select statements. Refactor your query to use a join an you'll be fine.

      And what's wrong with using an RDBMS with a non-braindead optimizer?

      --
      Ezekiel 23:20
    3. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 3, Interesting

      The latest (past year's worth) of PostgreSQL releases automatically rewrite subqueries as joins for you.

      However, MSSQL 2008 performance dropped significantly from prior versions due to the exact opposite --- subqueries are now looped through, where previously they were handled with hash tables internally.

    4. Re:Considering sub queries in IN statements. by micheas · · Score: 3, Informative

      SELECT id, title FROM page WHERE id IN (SELECT pageid FROM hotnews WHERE user = 8)

      The answer to this is to execute the sub query and pass the result back as an array. If you have 10,000 records in table page, you will see about a 2000x speed improvement if you are using php

    5. Re:Considering sub queries in IN statements. by K.+S.+Kyosuke · · Score: 2

      That's utter crap. Why involve the client? Why the extra roundtrips? If you need to do the thing you've just described, you must be doing something horribly wrong.

      --
      Ezekiel 23:20
    6. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 2, Informative

      That's just a crappy query... no wonder you've got bad performance.

      SELECT id, title FROM page JOIN hotnews on page.id = hotnews.pageid AND user = 8

      There. Enjoy your performance boost.

    7. Re:Considering sub queries in IN statements. by wstrucke · · Score: 5, Informative

      You're probably right, but that's not what this release is referring to. The NDBCLUSTER engine separates "API" nodes from Data nodes. A server running MySQL with NDBCLUSTER enabled is considered an API node, but you can also have a C++ or Java or whatever API node that isn't MySQL. Data nodes are provisioned in one or more "node groups" with one or more data nodes in each group, though it would be dumb to have a single node group or a single node in a node group. Each node group splits the data somehow. You can force tables to exist on just one group, but by default if you have three node groups you would have approximately a third of your data in each group.

      Anyway -- prior to NDBCLUSTER 7.2 if you performed any join whatsoever the API node had to pull the complete tables from all data node groups prior to doing the join on itself and returning the result. This made join performance, simply put -- terrible. I've tested the same query on a standalone out of the box mysql server against a 7.1 cluster and had an 8 second query come back from the cluster in several minutes due to the join performance.

      NDBCLUSTER 7.2 adds what was called "push down joins" in development -- basically the data nodes now do the joins within their own sub-sets of the data for certain joins resulting in a dramatic improvement in performance, since now the API nodes just get the result from the network instead of the entire dataset.

      It really is an amazing improvement and is a result of the dynamic capabilities of MySQL. NDBCLUSTER was never designed for the type of data people are throwing at it, and with the recent improvements it might actually be viable for use on production web sites for more common usage scenarios.

      What I do not see addressed yet with 7.2 is the reload time -- if you have a cluster loaded with several GB of data it can take upwards of 20 minutes to reload the data and indices across the cluster. While the cluster is designed to be up 24x7 (even through upgrades), a single bug or memory error that takes it down can result in a prolonged outage. There are several of these open in the bug tracker.

    8. Re:Considering sub queries in IN statements. by viperidaenz · · Score: 2

      I swear some people don't even know there is a JOIN keyword in the SQL language.

    9. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 4, Informative

      If you create a query in mysql with an IN statement in the where clause and you put a sub query as the in statement current versions will run the query once for each row of the primary table you are querying. Caching result alone would probably get the 70x speed up

      We have exactly that in MariaDB 5.3 (a fork of MySQL) : http://kb.askmonty.org/en/subquery-cache. It won't always give the 70x speedup, sometimes you need to start execution from the subquery to get decent performance: http://kb.askmonty.org/en/semi-join-subquery-optimizations.

      I am sorry for shameless self-promotion, couldn't resist :-)

    10. Re:Considering sub queries in IN statements. by jon3k · · Score: 3, Informative

      Just fyi, It's called a correlated subquery.

    11. Re:Considering sub queries in IN statements. by cheater512 · · Score: 3, Insightful

      The poor RDBMS has to put up with crap like that over and over again.
      It will probably need therapy after you are through with it.

      The programmer is supposed to know how to use the tool effectively to get the best performance.
      In that example there is a stupidly obvious solution - use a join.

    12. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 2, Insightful

      "some" people are perfectly aware of JOIN and have suffered one too many hairball, incomprehensible SQL statements with 10+ tables mashed together under FROM using JOIN. Sub-queries provide opportunities to factor out some of these into discrete parts that can be individually examined and tested.

      Please, stop discouraging the use and application of sub-queries. If you have encountered people that insist on using them despite MySQL's tragically bad optimizer, it is likely they have been exposed to powerful databases that haven't been copping out on basic RDBMS functionality for over a decade. Instead, harangue the MySQL developers; they are the people at fault.

    13. Re:Considering sub queries in IN statements. by StuartHankins · · Score: 2

      If your living depends on you knowing how to do something well, you will learn to use the tools most effectively.

      Professional DBA's know this, and know how to code correctly to avoid the problem in the first place. Having it "work" vs having it work reliably, quickly, and in a manner which scales is what separates any computer user from a professional. Training, ambition, curiosity, and experience are what transforms you into that professional.

      So while it may be nice to have the machine try to outsmart you while you work in a 4th generation language, learning how and why it works can be illuminating in other projects and other methods. You are able to do more because you know more.

    14. Re:Considering sub queries in IN statements. by K.+S.+Kyosuke · · Score: 4, Insightful

      The poor RDBMS has to put up with crap like that over and over again. It will probably need therapy after you are through with it.

      The programmer is supposed to know how to use the tool effectively to get the best performance. In that example there is a stupidly obvious solution - use a join.

      To me that sounds like fixing a bug in the wrong place. RDBMS is *not* supposed to be dumb. If it were supposed to be dumb, we would not be using SQL in the first place and we'd be passing our own execution plans in the DB engine's internal intermediate language instead. (Well, you can sort of do it with Firebird (BLR) and SQLite (VDBE), but it's not how it's supposed to be used.)

      --
      Ezekiel 23:20
    15. Re:Considering sub queries in IN statements. by theshowmecanuck · · Score: 2

      This is going to happen less and less as programmers think they don't need to understand databases because they ORM like hibernate. I know of many programmers who don't want to know about sql beyond what they need to make their hibernate mapping work.

      --
      -- I ignore anonymous replies to my comments and postings.
    16. Re:Considering sub queries in IN statements. by apotheon · · Score: 2

      It's tough to blame those programmers who don't want to use SQL. It's a miserable language. Even one of its principle designers hates it.

      --
      Unfetter your ideas. Copyfree your mind.
    17. Re:Considering sub queries in IN statements. by rev0lt · · Score: 2

      Following this thread, I decided to benchmark (with EXPLAIN ANALYZE) a simple IN vs JOIN statement on both PostgreSQL 8.2 and PostgreSQL 8.4. The statements are something as follows:

      SELECT * FROM STATEMENTS WHERE CLIENT IN (SELECT ID FROM CLIENTS WHERE STATUS=1);
      SELECT * FROM STATEMENTS JOIN CLIENT ON STATEMENTS.CLIENT=CLIENTS.ID WHERE CLIENTS.STATUS=1;

      The statements table has about 2.5 million rows and clients has about 30k rows. The match (status=1) results in roughly 10k rows.

      Now the interesting part - In 8.2, using JOIN is quite faster, but in 8.4, using IN is actually faster than using JOIN (at least for these queries), because it uses a hash table and a hash join, insted of a full join.

    18. Re:Considering sub queries in IN statements. by rev0lt · · Score: 2

      I've just benchmarked 2 simple queries on PostgreSQL 8.2 and 8.4, and in 8.4 the IN clause is being translated to a hash semi-join, so it is faster (at least otn the simple testcase) than using JOIN.(I've already posted the queries, so if interested, scrolll up) While performance is important. using joins on complex queries usually result on an ugly mess, and I usually prefer maintainability over performance. Also, the parent is right - for complex expressions, it is way easier to test the individual subselects than to try to figure out what's going on on a multi-join expression.

  2. Finally! by K.+S.+Kyosuke · · Score: 2

    I knew they would implement the Oracle storage backend into MySQL one day...

    --
    Ezekiel 23:20
    1. Re:Finally! by Admiral+Llama · · Score: 3, Interesting

      No, rather it's amazing what happens when you architect the DB such that the entire thing is forcibly held in memory.

  3. MySQL Cluster != MySQL by Anonymous Coward · · Score: 5, Informative

    Not the same thing.

  4. But of course it reads from RAM by photonyx · · Score: 4, Interesting

    The case when the data set is bigger than RAM amount has not been investigated (link here, see the comments). The hard drive I/O speed would slow it dramatically, unless it's an expensive array of SSDs.

    1. Re:But of course it reads from RAM by fuzzyfuzzyfungus · · Score: 2

      Nothing can really save your performance numbers once you have to hit the platters, so the question would be not 'does it suck more when it can't work purely in RAM?' but 'How good are the mechanisms available for minimizing(ideally automatically, but at least without too much black magic on the application programmers' parts) the frequency with which your setup ends up needing something that isn't in RAM and being bottlenecked by having to fetch it from disk?'...

    2. Re:But of course it reads from RAM by Bill,+Shooter+of+Bul · · Score: 2

      Who the hell uses actual disks when performance is an issue?
      Also, who the hell uses ssd's when performance is an issue?
      The fast kids use PCI Express cards loaded with flash.
      Texas Memory systems or Fusion IO. Expensive, yes, but this is Mysql CLUSTER we are talking about. You don't use that unless you need the speed.

      --
      Well.. maybe. Or Maybe not. But Definitely not sort of.
  5. Yes, I RTFA (sue me) by billcopc · · Score: 3, Informative

    If I read the sales pitch correctly, they just integrated Memcached as a backend storage module, so that it plays nicely wrt ACID compliance. Yeah, memory is 70x faster than disk I/O... big whoop!

    Anyone running a sizeable MySQL installation already has heaps of RAM allocated to the InnoDB buffers/caches anyway. It sounds like Oracle compared a stock, distro-default MySQL to their memory-hungry tweaks. Yeah, DUH. I can get a 70x speedup too if I increase MySQL's memory usage from the default 64mb to 48 gigabytes.

    --
    -Billco, Fnarg.com
    1. Re:Yes, I RTFA (sue me) by medv4380 · · Score: 2
      If you're going to read an article about a benchmark you might actually read the benchmark too.

      Previously each data node exploited about 6 CPUs with 8 threads, in this benchmark we used about 14 CPUs with 24 threads. We've also removed a number of bottlenecks and the configuration of the threads is done in such a manner as to handle many different loads well.

      The little thing you read about memcached was about adding in a Native API which probably helped things, but getting the multithreading upgraded probably had more to do with it. They also were comparing two benchmarks that both were using just memory, and not a case where 1 used disk and the other used ram.

  6. Re:PostgreSQL by Hognoxious · · Score: 4, Funny

    Is PostgreSQL webscale? MongoDB is.

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  7. the old release by nimbius · · Score: 2

    was GPL as well.
    dont forget that Oracle committed that MySQL server will continue to use the dual-licensing strategy long used by MySQL AB with commercial and GPL versions available until at least 2015. if other big players are any indication this is oracles attempt to avoid another apache/java landmine and preserve the trust and respect of developers until such time as the product is sufficiently 'in-housed' to divorce from the community without fear of retalliation. by 2015 the forked code, should there be yet another fork, can probably enjoy a tenth of the performance of oracles version and have to compete with ingrained lock-ins and contract conditions developed by oracle to further ostracize open source competetors.

    but im not sure if its really relevant at all. databases like hypercube and couch are really giving oracle a run for their copious amounts of money. by 2015 the "paradigm" may have "shifted" as the PHB says.

    --
    Good people go to bed earlier.
  8. MySQL in-memory speed by Amadodd · · Score: 5, Interesting

    Slightly off-topic, but I recently had the oppurtunity to test the speed of a MySQL in-memory database. I have some frequently queried read-only data that simply would not handle the load in MS SQL and was looking for an in-memory solution. MySQL provided the simplest implementation - simply tell the table to use memory storage and configure the server to allow the amount of data you want to host (~250MB in this case). You also have to remember to reload the data from normal InnoDB tables every time you restart the server. I used the same table structures, keys indexes and stored procedures (almost the same) to query the data and linked it through MS SQL so that my applications never new the difference. On exactly the same hardware the speed increase was at least 50X over MS SQL.

    --
    Freedom of speech doesn't come with bandwidth.
    1. Re:MySQL in-memory speed by bbbaldie · · Score: 2

      I've programmed PHP to use SQL Server and MySQL, the MySQL statements typically run in milliseconds, it usually takes SQL Server two or three seconds to respond.

  9. The 70x is for (distributed) joins by jorela · · Score: 5, Informative

    which previously were quite poorly handled.
    See http://www.clusterdb.com/mysql-cluster/70x-faster-joins-with-aql-in-mysql-cluster-7-2/?utm_source=rss&utm_medium=rss&utm_campaign=70x-faster-joins-with-aql-in-mysql-cluster-7-2

  10. 70x by slasho81 · · Score: 5, Funny

    Developers: We've got some really good ideas for increasing performance of complex queries by...
    Marketing: How much in the best conceivable case?
    Developers: Oh, I dunno, maybe 70x.
    Marketing: 70x? Is that good?
    Developers: Yeah, I suppose, but the cool stuff is...
    Marketing: Wow! 70x! That's a really big number!
    Developers: Actually, please don't quote me on that. They'll make fun of me on Slashdot if you do. Promise me.
    Marketing: We promise.
    Developers: Thanks. Now, let me show you where the good stuff is...
    Marketing (on phone): Larry? It's me. How big can you print me up a poster that says "70x"?

  11. Re:PostgreSQL by Admiral+Llama · · Score: 2

    Versus all the other factors you can throw in there for anything involving heavy lifting for an enterprise app, raw price point of the DB engine is pretty close to the bottom of the list.

  12. Re:PostgreSQL by Sarten-X · · Score: 2

    The first 27 seconds made a good point about comparing various products and evaluating them on their merits. The remaining 5 minutes was a mix of strawmen and fallacies. Is there supposed to be a point to that?

    --
    You do not have a moral or legal right to do absolutely anything you want.