Slashdot Mirror


Oracle Claims Dramatic MySQL Performance Improvements

New submitter simula67 writes "Oracle wins back some karma from the open source community by releasing MySQL cluster 7.2 with ambitious claims of 70x performance gains. The new release is GPL and claims to have processed over 1 billion queries per minute. Readers may remember the story about Oracle adding commercial extensions to MySQL."

168 comments

  1. PostgreSQL by Anonymous Coward · · Score: 0, Insightful

    Shove it Oracle, I'll stick with PostgreSQL.

    1. Re:PostgreSQL by Hognoxious · · Score: 4, Funny

      Is PostgreSQL webscale? MongoDB is.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    2. Re:PostgreSQL by dkleinsc · · Score: 1

      Yes: It's strengths are reliability and price point (free), but it's pretty fast, has clustering capabilities, and has been used for large-scale web applications.

      Seriously, it's hard to go wrong with PostgreSql when you need a relational database. MongoDB, of course, is a very different animal intended for very different tasks.

      --
      I am officially gone from /. Long live http://www.soylentnews.com/
    3. Re:PostgreSQL by Admiral+Llama · · Score: 2

      Versus all the other factors you can throw in there for anything involving heavy lifting for an enterprise app, raw price point of the DB engine is pretty close to the bottom of the list.

    4. Re:PostgreSQL by jedidiah · · Score: 1

      Are you kidding? In an Oracle discussion?

      We're talking very large numbers here, and potentially more money than you will make in your entire lifetime.

      Some one definitely cares. They may bite their tongue and still buy Oracle but they do care. The numbers are not trivial.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    5. Re:PostgreSQL by Anonymous Coward · · Score: 0

      Versus all the other factors you can throw in there for anything involving heavy lifting for an enterprise app, raw price point of the DB engine is pretty close to the bottom of the list.

      Only when you work for extremely large enterprises. With most companies in America and around the world, its far, far from the bottom of the list but as you attempt to point out, not necessarily at or even near the top.

    6. Re:PostgreSQL by Sarten-X · · Score: 2

      The first 27 seconds made a good point about comparing various products and evaluating them on their merits. The remaining 5 minutes was a mix of strawmen and fallacies. Is there supposed to be a point to that?

      --
      You do not have a moral or legal right to do absolutely anything you want.
    7. Re:PostgreSQL by apotheon · · Score: 1

      It's worth noting that there have been some pretty high-profile cases where Oracle was failing to deliver for scaling under heavy load, and companies like EnterpriseDB (a value-added PostgreSQL vendor) provided a PostgreSQL-based solution that saved the day. One such example that comes to mind is the FTD migration a few years back. While the EnterpriseDB deployment of PostgreSQL for FTD wasn't free, it sure as hell was cheaper than maintaining Oracle licenses, and it worked a lot better too -- a pretty big deal when considering the size of the FTD network and the fact this was a Valentine's Day season issue when everybody and his brother is getting some flowers delivered to Mom, Wife, and Mistress at the same time.

      --
      Unfetter your ideas. Copyfree your mind.
    8. Re:PostgreSQL by thereitis · · Score: 1

      "If /dev/null is web-scale and fast then I will use it". I almost lost it on that one. :)

    9. Re:PostgreSQL by Anonymous Coward · · Score: 0

      You do know that MySQL has a BLACKHOLE storage engine that lets you do pretty much that, right?

    10. Re:PostgreSQL by Daniel+Phillips · · Score: 1

      Kubuntu did a nasty on me with a surprise, non-optional upgrade to kmail 4.7.2, which is just an unfinished work, to put it mildly. As a confirm glutton for punishment, I decided to make it work no matter what the coast. Anyway, it came with a MySQL backend that was sort-of working, but one day it just fell down and couldn't get up. So I replaced it with the experimental, unsupported PostgresQL backend, and after a little bit of pain learning how to administrate the server, it came up and so far has worked considerably better than the MySQL backend. I had to tell to use less mmapped memory, which is something no user should ever have to be expected to deal with, and it is beyond me why PostgreSQL couldn't figure that one out itself. Well, I suppose if I care enough I will send a patch, but the point is: if that is the biggest issue I had with it, that is damned sweet. So color me impressed with PostgreSQL, and actually, I will stand up and thank the nutsoid KDE dev who pulled this stunt because I finally have an excuse to get to know the free database I should have used a lot more all these years. And by the way, I am kind of curious whether having a full blown relational database sitting under my email client will actually result in smarter email handling some day. So far, not visibly smarter, just slower, fails to filter my spam and sometimes has races and needs to be told how to resolve them. I'll give it a while and see if some actual power shows up.

      --
      Have you got your LWN subscription yet?
    11. Re:PostgreSQL by eric_herm · · Score: 1

      If you start to manage lots of data ( and I think some people enter into this category for their email ), it make sense to use a DB. It may not bring much in term of feature, but if this prevent people from redoing optimisation and so on that are already done by others group, that will permit to free some time to worok on others features. So in short, you will not see much directly; Performances are forgot after a week and start to become the norm, and unless you look at internal and code, you may not see the improvement in term of maintainance.

    12. Re:PostgreSQL by Daniel+Phillips · · Score: 1

      If you start to manage lots of data ( and I think some people enter into this category for their email ), it make sense to use a DB.

      It makes sense on paper. It has yet to be proved that it makes sense for email, and in fact, so far kmail 7.2+ is pretty good evidence for the contrary. I am afraid you will have a difficult time finding anyone who has stayed with Kmail through its recent convulsions and has in fact forgotten about the performance issue. Just do a quick search on "kmail 7.3 slow" to convince yourself.

      --
      Have you got your LWN subscription yet?
    13. Re:PostgreSQL by gmack · · Score: 1

      At least when PostgreSQL starts working it stays working but I agree the same can't be said for MySQL. A recent upgrade changed the layout of mysql.user causing none of the users to be able to login. The oracle approved fix? mysqdump and restore but that just restored the bad layout of mysql.user. The actual fix involved dumping everything one database at a time and recreating mysql.user from a script. My best guess is that whoever designed the MySQL login system ignored SQL best practices and did the C equivalent of a "select * from mysql.user" rather than explicitly ask for the fields in the needed order.

    14. Re:PostgreSQL by datavirtue · · Score: 1

      I obviously didn't get enough sleep as I read: "Obama Claims Dramatic MySQL Performance Improvements" Nonetheless it is a good thing. Why all the Oracle hate? Too much energy expended on disdain towards Oracle reduces the amount available for Microsoft, and I can't afford that.

      --
      I object to power without constructive purpose. --Spock
    15. Re:PostgreSQL by Anonymous Coward · · Score: 0

      Preciselly in what way is a full blown DBMS better at handling mail than simply arranging it in a maildir format?

  2. MariaDB? by Anonymous Coward · · Score: 0

    How many of these performance improvements were copied back from MariaDB 5.3? For me, it's already too late for Oracle's MySQL. I've switched to Maria and I'm not planning on coming back.

    1. Re:MariaDB? by Anonymous Coward · · Score: 0

      Considering MariaDB wasn't really faster than MySQL in any appreciable way, I'm going to guess not many of improvements came from there. Since both are open source, feel free to compare for yourself though.

    2. Re:MariaDB? by Larryish · · Score: 1

      Is there a Virtualmin module for MariaDB yet?

      If so, I'll switch tonight.

    3. Re:MariaDB? by Lennie · · Score: 1

      Tried Drizzle, yet ?

      --
      New things are always on the horizon
  3. Considering sub queries in IN statements. by micheas · · Score: 5, Insightful

    If you create a query in mysql with an IN statement in the where clause and you put a sub query as the in statement current versions will run the query once for each row of the primary table you are querying. Caching result alone would probably get the 70x speed up. I am suspect that there are other performance stupidities in mysql that are worked around by people doing a simple query and then using php/perl/python/java/etc to parse the result and generate the second query.

    1. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 5, Informative

      If you create a query in mysql with an IN statement in the where clause and you put a sub query as the in statement current versions will run the query once for each row of the primary table you are querying. Caching result alone would probably get the 70x speed up. I am suspect that there are other performance stupidities in mysql that are worked around by people doing a simple query and then using php/perl/python/java/etc to parse the result and generate the second query.

      yes, but many database engines have problems with inner select statements. Refactor your query to use a join an you'll be fine.

    2. Re:Considering sub queries in IN statements. by K.+S.+Kyosuke · · Score: 4, Insightful

      yes, but many database engines have problems with inner select statements. Refactor your query to use a join an you'll be fine.

      And what's wrong with using an RDBMS with a non-braindead optimizer?

      --
      Ezekiel 23:20
    3. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 0

      PostgreSQL and Oracle have terrible IN sub select performance. If you're having trouble with MySQL, you'd be shocked what the "big boys" are like.

      You provide no data, volume of records, table defs and index, and no SQL code. How about you provide something concrete, you might get some help.

    4. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 3, Interesting

      The latest (past year's worth) of PostgreSQL releases automatically rewrite subqueries as joins for you.

      However, MSSQL 2008 performance dropped significantly from prior versions due to the exact opposite --- subqueries are now looped through, where previously they were handled with hash tables internally.

    5. Re:Considering sub queries in IN statements. by micheas · · Score: 3, Informative

      SELECT id, title FROM page WHERE id IN (SELECT pageid FROM hotnews WHERE user = 8)

      The answer to this is to execute the sub query and pass the result back as an array. If you have 10,000 records in table page, you will see about a 2000x speed improvement if you are using php

    6. Re:Considering sub queries in IN statements. by Admiral+Llama · · Score: 1

      MySQL choked an on "...and (condition or condition)" doing a seq scan instead of filtering even though there was an index on the table that those conditions were on. I changed it to an "where in (...)" and I got the same result. I had to move the or conditional into being a derived table joined in.

      For the same type of query Informix does a dynamic hash join with no SQL thuggery necessary.

    7. Re:Considering sub queries in IN statements. by K.+S.+Kyosuke · · Score: 2

      That's utter crap. Why involve the client? Why the extra roundtrips? If you need to do the thing you've just described, you must be doing something horribly wrong.

      --
      Ezekiel 23:20
    8. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 2, Informative

      That's just a crappy query... no wonder you've got bad performance.

      SELECT id, title FROM page JOIN hotnews on page.id = hotnews.pageid AND user = 8

      There. Enjoy your performance boost.

    9. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 0

      Yes... that's the right answer... *facepalm*

    10. Re:Considering sub queries in IN statements. by wstrucke · · Score: 5, Informative

      You're probably right, but that's not what this release is referring to. The NDBCLUSTER engine separates "API" nodes from Data nodes. A server running MySQL with NDBCLUSTER enabled is considered an API node, but you can also have a C++ or Java or whatever API node that isn't MySQL. Data nodes are provisioned in one or more "node groups" with one or more data nodes in each group, though it would be dumb to have a single node group or a single node in a node group. Each node group splits the data somehow. You can force tables to exist on just one group, but by default if you have three node groups you would have approximately a third of your data in each group.

      Anyway -- prior to NDBCLUSTER 7.2 if you performed any join whatsoever the API node had to pull the complete tables from all data node groups prior to doing the join on itself and returning the result. This made join performance, simply put -- terrible. I've tested the same query on a standalone out of the box mysql server against a 7.1 cluster and had an 8 second query come back from the cluster in several minutes due to the join performance.

      NDBCLUSTER 7.2 adds what was called "push down joins" in development -- basically the data nodes now do the joins within their own sub-sets of the data for certain joins resulting in a dramatic improvement in performance, since now the API nodes just get the result from the network instead of the entire dataset.

      It really is an amazing improvement and is a result of the dynamic capabilities of MySQL. NDBCLUSTER was never designed for the type of data people are throwing at it, and with the recent improvements it might actually be viable for use on production web sites for more common usage scenarios.

      What I do not see addressed yet with 7.2 is the reload time -- if you have a cluster loaded with several GB of data it can take upwards of 20 minutes to reload the data and indices across the cluster. While the cluster is designed to be up 24x7 (even through upgrades), a single bug or memory error that takes it down can result in a prolonged outage. There are several of these open in the bug tracker.

    11. Re:Considering sub queries in IN statements. by K.+S.+Kyosuke · · Score: 1

      The inner query was *perfectly fine*. It wasn't correlated with the outer query in any way. Nothing prevents the RDBMS to optimize the sequence of these to queries by first fetching the results from the inner query, hashing it and using it to filter the outer query. Optionally, the optimizer is free to rewrite it to a join on its own if the expected result set from the inner query is large.

      --
      Ezekiel 23:20
    12. Re:Considering sub queries in IN statements. by viperidaenz · · Score: 2

      I swear some people don't even know there is a JOIN keyword in the SQL language.

    13. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 4, Informative

      If you create a query in mysql with an IN statement in the where clause and you put a sub query as the in statement current versions will run the query once for each row of the primary table you are querying. Caching result alone would probably get the 70x speed up

      We have exactly that in MariaDB 5.3 (a fork of MySQL) : http://kb.askmonty.org/en/subquery-cache. It won't always give the 70x speedup, sometimes you need to start execution from the subquery to get decent performance: http://kb.askmonty.org/en/semi-join-subquery-optimizations.

      I am sorry for shameless self-promotion, couldn't resist :-)

    14. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 1

      Each SQL engine has its own set of limitation and performance hug. You have to deal with them and optimize your code this way. A lot of people tend to forget the golden rule : do as much data sorting and filtering on the database, leave the logic in the code. You shouldn't pass an array in an SQL request. Those data should be stored in a temporary table.

    15. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 0

      That was obviously a simplistic example.. there are times when an IN statement is required, or makes an otherwise unintuitive query make much more sense.

      Either way if you have to write your SQL around poor areas of the optimizer it's not a good optimizer.

    16. Re:Considering sub queries in IN statements. by jon3k · · Score: 3, Informative

      Just fyi, It's called a correlated subquery.

    17. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 0

      God I can't believe MySQL is still making people suffer this. The answer is to demand that this database catch up with the 90's and handle simple, basic stuff like sub-queries efficiently. There is absolutely NO REASON for this query not to operate efficiently.

      Is that romper room database still silently truncating numbers and strings by default?

    18. Re:Considering sub queries in IN statements. by cheater512 · · Score: 3, Insightful

      The poor RDBMS has to put up with crap like that over and over again.
      It will probably need therapy after you are through with it.

      The programmer is supposed to know how to use the tool effectively to get the best performance.
      In that example there is a stupidly obvious solution - use a join.

    19. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 2, Insightful

      "some" people are perfectly aware of JOIN and have suffered one too many hairball, incomprehensible SQL statements with 10+ tables mashed together under FROM using JOIN. Sub-queries provide opportunities to factor out some of these into discrete parts that can be individually examined and tested.

      Please, stop discouraging the use and application of sub-queries. If you have encountered people that insist on using them despite MySQL's tragically bad optimizer, it is likely they have been exposed to powerful databases that haven't been copping out on basic RDBMS functionality for over a decade. Instead, harangue the MySQL developers; they are the people at fault.

    20. Re:Considering sub queries in IN statements. by StuartHankins · · Score: 2

      If your living depends on you knowing how to do something well, you will learn to use the tools most effectively.

      Professional DBA's know this, and know how to code correctly to avoid the problem in the first place. Having it "work" vs having it work reliably, quickly, and in a manner which scales is what separates any computer user from a professional. Training, ambition, curiosity, and experience are what transforms you into that professional.

      So while it may be nice to have the machine try to outsmart you while you work in a 4th generation language, learning how and why it works can be illuminating in other projects and other methods. You are able to do more because you know more.

    21. Re:Considering sub queries in IN statements. by StuartHankins · · Score: 1

      Before using an "in" clause here I'd probably use this (depends on the number of anticipated rows in both tables):
      select id, title
      from page as a
      where exists (
      select 1
      from hotnews b
      where b.user = 8
      and b.id = a.id
      )
      I would rearrange the fields in the "where" clause to match the index, highest selectivity first. If there were no suitable index I'd consider creating one, possibly a covering index.

    22. Re:Considering sub queries in IN statements. by znrt · · Score: 1

      And what's wrong with using an RDBMS with a non-braindead optimizer?

      from a general database design point of view, everything. considering it is already a quite gratuituous limitation to tie yourself to any particular rdbms, relying on a particular optimizer is just an extreme form of self-mutilation.

      it's true, however, that sometimes specific requirements of a problem will force you to rely on particular rdbms features, including but not limited to optimization. this could be what ... 0,1% of the time?

      and what's wrong with actually learning sql and understanding how rdbs' actually work?

    23. Re:Considering sub queries in IN statements. by micheas · · Score: 1

      The version of mysql in debian no longer silently truncates strings by default.

      I found out as I was abusing the "feature" to not worry about sanitizing the length of data input and my import script suddenly stopped working last year and I had to do things a little less wrong.

      .

    24. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 1

      I've never had any problems with Oracle IN sub select performance: execution plans are pretty reasonable. Can you ellaborate?

    25. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 0

      I use JOIN if I want to JOIN. I use filters (IN, EXISTS) if I want to filter. Doing a JOIN when you want EXISTS is a hack.

    26. Re:Considering sub queries in IN statements. by K.+S.+Kyosuke · · Score: 4, Insightful

      The poor RDBMS has to put up with crap like that over and over again. It will probably need therapy after you are through with it.

      The programmer is supposed to know how to use the tool effectively to get the best performance. In that example there is a stupidly obvious solution - use a join.

      To me that sounds like fixing a bug in the wrong place. RDBMS is *not* supposed to be dumb. If it were supposed to be dumb, we would not be using SQL in the first place and we'd be passing our own execution plans in the DB engine's internal intermediate language instead. (Well, you can sort of do it with Firebird (BLR) and SQLite (VDBE), but it's not how it's supposed to be used.)

      --
      Ezekiel 23:20
    27. Re:Considering sub queries in IN statements. by K.+S.+Kyosuke · · Score: 1

      Just fyi, It's called a correlated subquery.

      If the thing he's talking about had been correlated subquery, he wouldn't have been able to cache its results. Ergo, he must be talking about a lousy execution of a non-correlated subquery.

      --
      Ezekiel 23:20
    28. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 1

      and what's wrong with actually learning sql and understanding how rdbs' actually work?

      This.

      I haven't been a true database developer since Oracle 7 way back in the mists - but really. Just learn SQL and tune your SQL properly. It's not fricking rocket science.

    29. Re:Considering sub queries in IN statements. by theshowmecanuck · · Score: 2

      This is going to happen less and less as programmers think they don't need to understand databases because they ORM like hibernate. I know of many programmers who don't want to know about sql beyond what they need to make their hibernate mapping work.

      --
      -- I ignore anonymous replies to my comments and postings.
    30. Re:Considering sub queries in IN statements. by errandum · · Score: 1

      Where do you see the client and the roundtrips? There would be only one (DB access once) and the the this method would be 100% transparent to the client.

    31. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 0

      For those that didn't RTFA, that is exactly how it was achieved. They put a memcached component in so that (obviously) the select queries would be cached, knowing full well that those results may be hit over and over during the JOIN statement.

    32. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 0

      For those that didn't RTFA, that is exactly how it was achieved. They put a memcached component in so that (obviously) the select queries would be cached, knowing full well that those results may be hit over and over during the JOIN statement.

      Please, you have *no* idea what you're talking about.

      The Memcache interface is just that--an interface to make it easy to talk to Memcache from the NDB storage engine. It has NOTHING to do with increased execution speed for many joins on NDB tables.

      And yes, the 70x speedup figure is about what I got the first time I compared an NDB 7.1 release with a 7.2 that I built from the development tree just after the SPJ patches went in.

    33. Re:Considering sub queries in IN statements. by sg_oneill · · Score: 1

      Which would be fine if it wasnt for the fact the version with the JOIN isn't just faster, but simpler to understand.

      Seriously there is no excuse for bad SQL. Subqueries have their places, but using them like that is just bad programming.

      --
      Excuse the Unicode crap in my posts. That's an apostrophe, and slashdot is busted.
    34. Re:Considering sub queries in IN statements. by apotheon · · Score: 2

      It's tough to blame those programmers who don't want to use SQL. It's a miserable language. Even one of its principle designers hates it.

      --
      Unfetter your ideas. Copyfree your mind.
    35. Re:Considering sub queries in IN statements. by rev0lt · · Score: 2

      Following this thread, I decided to benchmark (with EXPLAIN ANALYZE) a simple IN vs JOIN statement on both PostgreSQL 8.2 and PostgreSQL 8.4. The statements are something as follows:

      SELECT * FROM STATEMENTS WHERE CLIENT IN (SELECT ID FROM CLIENTS WHERE STATUS=1);
      SELECT * FROM STATEMENTS JOIN CLIENT ON STATEMENTS.CLIENT=CLIENTS.ID WHERE CLIENTS.STATUS=1;

      The statements table has about 2.5 million rows and clients has about 30k rows. The match (status=1) results in roughly 10k rows.

      Now the interesting part - In 8.2, using JOIN is quite faster, but in 8.4, using IN is actually faster than using JOIN (at least for these queries), because it uses a hash table and a hash join, insted of a full join.

    36. Re:Considering sub queries in IN statements. by rev0lt · · Score: 2

      I've just benchmarked 2 simple queries on PostgreSQL 8.2 and 8.4, and in 8.4 the IN clause is being translated to a hash semi-join, so it is faster (at least otn the simple testcase) than using JOIN.(I've already posted the queries, so if interested, scrolll up) While performance is important. using joins on complex queries usually result on an ugly mess, and I usually prefer maintainability over performance. Also, the parent is right - for complex expressions, it is way easier to test the individual subselects than to try to figure out what's going on on a multi-join expression.

    37. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 0

      and what's wrong with actually learning sql and understanding how rdbs' actually work?

      1. Time
      2. Money
      3. Time & Money

      Developers barely have enough of either to complete their work, and now you want them to be SQL experts as well?

      Even the DBAs in our enterprise no longer assist in tuning SQL because they haven't the time for it.

    38. Re:Considering sub queries in IN statements. by fatp · · Score: 1

      RDBMS is *not* supposed to be dumb.

      It is supposed to do what it was asked - if someone asks it to do something dumb, it has no choice.

    39. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 0

      God I can't believe MySQL is still making people suffer this. The answer is to demand that this database catch up with the 90's and handle simple, basic stuff like sub-queries efficiently.

      How about simple, basic stuff like ACID... Correct me if I'm wrong but big parts of the web still use MyISAM. Which doesn't do either ACID or referential integrity.

      Let's face it, it's not sadism, it's masochism. These people choose brain-dead simple and the seeming efficiency that goes with it, as long as you don't want anything serious. And it works. If anybody's using MySQL with MyISAM for anything serious, it's a problem of their own making.

    40. Re:Considering sub queries in IN statements. by u38cg · · Score: 1

      Point one, there are queries that can only be written using a sub-query (unusual, but they exist); point two, computers are supposed to help, not hinder. What's the point of having a declaritive language if I can't declare a query in the form I think of it in?

      --
      [FUCK BETA]
    41. Re:Considering sub queries in IN statements. by hholzgra · · Score: 1

      > prior to NDBCLUSTER 7.2 if you performed any join whatsoever the API node had to pull the complete tables from all data node groups ...

      no, but it had to do lookups for the joined table for every row in the first table one by one and had to transfer the join matches to the mysqld/API side,
      this has now changed so that the data node can resolve the join condition itself in certain situations and can so determine and deliver join result rows
      for both joined tables after receiving only a single request from the API/mysqld side instead of having network round trips for every single joined row

    42. Re:Considering sub queries in IN statements. by BenJaminus · · Score: 1

      And yet I don't know anyone who likes Hibernate either.
      I saw an anti-hibernate rant the other day that I think I agree with : "if you don't know SQL you shouldn't be programming with databases". I agree because it's too easy to write really slow systems with Hibernate.

    43. Re:Considering sub queries in IN statements. by mosb1000 · · Score: 1

      While I agree it's a professional's job to know this stuff, I would also like to point out it's not a crime to build your RDBMS in such a way that queries run as fast as possible regardless of how the query was written. Indeed, anything less is an inferior product.

    44. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 0

      Oh, the tragic irony of the attempt to make SQL human readable.

    45. Re:Considering sub queries in IN statements. by swilly · · Score: 1

      You are doing a SELECT *, so when you use the JOIN you are getting more columns back then when you are using IN. Rewrite to specify which columns you want back, and I suspect they will perform equally.

    46. Re:Considering sub queries in IN statements. by apotheon · · Score: 1

      Good point. It seems like there are no good answers where dealing with SQL databases are concerned -- but, unfortunately, we need them quite a lot of the time, because there's no reasonable alternative for many cases.

      --
      Unfetter your ideas. Copyfree your mind.
    47. Re:Considering sub queries in IN statements. by K.+S.+Kyosuke · · Score: 1

      I don't know, the mention of PHP and "passing something back", perhaps? It's not my fault if someone can't express himself clearly. Anyway, the query as it is written already does what the GGP suggests should be done "manually".

      --
      Ezekiel 23:20
    48. Re:Considering sub queries in IN statements. by rev0lt · · Score: 1

      I've rewritten the SELECT to return 3 columns, and in 8.4 using IN is a bit faster than using JOIN (but not by a relevant difference), so you are probably right.

    49. Re:Considering sub queries in IN statements. by cifey · · Score: 1

      Right, For a clever query optimizer, a given sql may run beautifully in version 1 but terribly in version 2 based on some minor retuning to meet unrelated benchmark XX. If the implementation is consistent then you have some chance of maintaining performance.

      --
      Hello Cruel World
    50. Re:Considering sub queries in IN statements. by K.+S.+Kyosuke · · Score: 1

      There are actually two different issues here. First, the relational model. That one is quite fine, it has strong theoretical foundations (easy to do correctness proofs of programs, introduce execution time optimizations, parallel execution, etc., unlike with previous pointer-chasing models), can model a lot of things and is pretty useful overall. Now, whether SQL is a good front end to an RDBMS...that's about as clever a question as whether COBOL is a good programming language. ORMs take away a lot of pain of SQL, but they also take away a lot of its power. I don't think you have to compromise in this case. There are better relational languages (D, for example), but they never got traction because of IBM's aggressive marketing of SQL.

      --
      Ezekiel 23:20
    51. Re:Considering sub queries in IN statements. by Terrasque · · Score: 1

      Here's an actual example:

      http://code.google.com/p/django-tagging/issues/detail?id=160

      Basically, the subquery was run all the time, leading to an amplification of biblical proportions. The solution there was to put that subquery in another subquery (yo dawg), as shown in http://www.xaprb.com/blog/2006/04/30/how-to-optimize-subqueries-and-joins-in-mysql/

      We also tried with a temp table, but this solution was cleaner :) The performance difference was around 1/100th of the time for a sample query.

      We also confirmed that the problem still existed in MySQL 5.5

      --
      It's The Golden Rule: "He who has the gold makes the rules."
    52. Re:Considering sub queries in IN statements. by apotheon · · Score: 1

      Agreed. I have no problem with the relational model for a wide range of uses, but SQL (as you point out) should be quite uncontroversially regarded as a piss-poor language.

      --
      Unfetter your ideas. Copyfree your mind.
    53. Re:Considering sub queries in IN statements. by errandum · · Score: 1

      Yes, he could have phrased it better, but php is processed server side, the client only gets the final result. So no more round trips except the ones between server and database (and in most cases, they are on the same machine or cluster, so it's negligible).

      His method does speed up things if the query is executed with an IN and not with a JOIN... Even if it's not a best practice (lots of concurrency issues if you make things like that frequently).

    54. Re:Considering sub queries in IN statements. by theshowmecanuck · · Score: 1

      But it doesn't mean that if one of the main technologies in your stack is a database, that you shouldn't have to do it well. Or that most ORM programmers should get their heads out of their asses and learn the language and methods of the underlying technology so they don't fuck up the performance of the system by always allowing endless layers of abstraction to in fact form a layer of cold molasses in the system.

      --
      -- I ignore anonymous replies to my comments and postings.
    55. Re:Considering sub queries in IN statements. by K.+S.+Kyosuke · · Score: 1

      "Yes, he could have phrased it better, but php is processed server side, the client only gets the final result."

      Uhm, no. For the DB server, the PHP process is the client. As it is a good custom to have the two machines separate (I really don't know where you got the idea of running these two things on a single machine, that's all but insane), you do get round trips over wire. These will definitely cost you more processing time than keeping it inside the DB server process memory space the whole time - you have to convert the inner query results to the wire protocol, send it through the TCP/IP stack, have it arrive into the PHP's DB driver, convert it into an array, have it reprocessed with your PHP script into another textual query, send the query into the DB server again, have it compiled and optimized by the DB server (as a new query with different constant data, it can't be even precompiled and cached), have it executed again (with the concomitant query execution overhead) - no, I really don't think that this can actually speed up anything. Having the server do a hash join of the inner query automatically is way faster, and any good database can do that. BTW, it will be faster even if the two processes do run on a single machine (which is something for which any system designer should be hanged by his balls, unless he's the sole (home) user of such a system).

      --
      Ezekiel 23:20
    56. Re:Considering sub queries in IN statements. by apotheon · · Score: 1

      You say this as though you are disagreeing with me, which seems odd.

      --
      Unfetter your ideas. Copyfree your mind.
    57. Re:Considering sub queries in IN statements. by errandum · · Score: 1

      And, as stated, it still runs faster than the IN statement (not the JOIN, as everyone already said). The server is the client... Never heard that one before, I guess the OP is not the only one that needs to express itself better. From where I come from never heard the server called client before.

      Second, for small businesses and the like, yes, the webserver and the DB server share resources. It's not only common, it's actually the default configuration option when you configure apache and php (so I'm guessing, not uncommon).

      And there is no extra roundtrip for anything anyways. You do a query and then you filter them serverside and that's it. The time you described about passing everything through the process is negligible when you consider the time (sometimes minutes) that the server takes to process the IN, so you're still wrong.

      The only thing you got right is that the join method will be faster. But that's what everyone has been telling everywhere from the start.

    58. Re:Considering sub queries in IN statements. by theshowmecanuck · · Score: 1

      Funny, I thought you were disagreeing with me. You are right, SQL is a butt ugly construct. But I'm not sure if there is a more human readable AND understandable way to do it (you can't expect only math majors to query the database with nice neat set based mathematical constructs only understandable by math majors). And all I was trying to point out was that as ugly as the language is, my original points still stand: part of the problem with DB performance is programmers who think the closest they need to get to the DB is their ORM framework.

      --
      -- I ignore anonymous replies to my comments and postings.
    59. Re:Considering sub queries in IN statements. by K.+S.+Kyosuke · · Score: 1

      And, as stated, it still runs faster than the IN statement (not the JOIN, as everyone already said).

      If you have that as a practical experience, then there must be something horribly broken in your DB server software.

      the time (sometimes minutes) that the server takes to process the IN, so you're still wrong.

      Ahhh, that's the "horribly broken thing" I'm talking about. I'm not wrong, since there is no practical reason for that to take minutes, unless, of course, the result of the subquery (SELECT pageid FROM hotnews WHERE user = 8) is really huge. Do you really have authors who have written millions of stories?

      --
      Ezekiel 23:20
    60. Re:Considering sub queries in IN statements. by errandum · · Score: 1

      No, just the 10000 ones the OP mentioned. And the bigger the more there are, the more you have to wait.

      And you're trying to cover your failure by nitpicking. There are no extra roundtrips, a webserver should not be the client. You messed up, just admit it already and move on.

      In my CS degree there is a DB benchmarking course where you can see that there are queries taking minutes in Oracle or Postgresql that had to be canceled after 18 hours straight running in mysql (with no end in sight). So yes, in enterprise systems it isn't that hard to conjure a badly formed query that takes minutes to run, like I mentioned.

    61. Re:Considering sub queries in IN statements. by K.+S.+Kyosuke · · Score: 1

      No, just the 10000 ones the OP mentioned. And the bigger the more there are, the more you have to wait.

      In that case, increase the size of your buffers. Fetching ten thousand records using an indexed column can hardly take a minute. And the 10000 records were in the OTHER table, BTW. ;-)

      And you're trying to cover your failure by nitpicking.

      That's no nitpicking, that's simply an attempt to help you design your system properly.

      a webserver should not be the client.

      But it was YOU who claimed that one should solve it in PHP instead of doing it in the DB server process. So now you're contradicting yourself?

      You messed up, just admit it already and move on.

      Wow, now I'm scared. You really gave it to me.

      --
      Ezekiel 23:20
    62. Re:Considering sub queries in IN statements. by errandum · · Score: 1

      Sigh.

      I never claimed you should. I also said it wasn't a best practice. But if you didn't know how JOIN worked it would be orders of magnitude faster than using the IN. That's all I've said. PHP hasn't been that slow for over five years now (slower than C obviously, but going through an array should still be completed in linear time - you only have to pass your table once), and even though (as I mentioned) this kind of practice could lead to concurrency issues, it WORKS in this case.. That's all I've said.

      On the other hand, you tried to humiliate someone with an incorrect statement. Again, there are no roudtrips. There is no client involved. Just someone processing some data using PHP. You've been spouting lingo trying to define yourself as someone "in the know". But if you want people to respect your point of you... Not having idiotic kneejerk reactions would be a nice start.

    63. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 0

      However, MSSQL 2008 performance dropped significantly from prior versions due to the exact opposite --- subqueries are now looped through, where previously they were handled with hash tables internally.

      What a load of BS. Sub-queries transformations, which are applied in all versions, are orthogonal to execution strategies. The choice of which algorithm to use (nested loops, hashing join or merge join) is based on several things like data characteristics (selectivity/density) and presence of indexes. You must have seen a plan and, without having a clue about how the optimizer works, you draw you own wrong conclusions.

    64. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 0

      LOL, admit it, you don't have a clue, don't you? You are wrong on so many levels that I don't even know where to start.

      First you say that you shouldn't tie yourself to an optimizer, then you rethorically ask what's wrong with actually learning SQL and understanding how a RBMS work which is a plain contradiction because that's exactly tying yourself to an optimizer!

      First of all: re-writing IN sub-queries with joins has *nothing* to do with learning SQL. If they're logically equivalent, there's no better way to write them in SQL. There are only smart query engines that can figure they are equivalent and dumb query engine that don't have a clue.

      Secondly, if a RDBMS query engine is stupid enough to treat them differently, i.e. it comes up with different execution strategies, and you need to understand that one strategy is more efficient than the other, then you are just doing what you said is stupid in the beginning: you are writing your SQL code relying on how a particular optimizer behaves (stupidly).

      Having said so, I'm not even sure you understand what really happens inside an RDBMS. Basic transformations like converting some IN/EXISTS into joins (semi, anti, anything you like) are so trivial that they're not even done in the optimization phase of query processing, they are systematically applied when pruning the query tree just after parsing the query text so that the optimization algorithm can treat them homogeneously.

      There's no intelligence in doing these transformations that are known since decades. Any decent, non-braindead, query engine implements them as pointed out by parent. I would still argue with parent that these functions are not demanded to the query optimizer because they are merely logical transformations (the optimizer has to find a good execution strategy for carrying out all the logical operations but if two operations are logically equivalent, it shouldn't have two different strategies, it should simply first normalize the operations to a common one)

      Not only that, it seems you also fail to understand the basic philosophy behind the birth of the Relational Model. Through the algebra and calculus it provides a sound framework to build declarative languages that let database developers focus on what they want to do instead of telling the system how to (physically) do it.

      My beard is gray enough to know that: it took a long time before most RDBMSs implemented a lot of basic transformation, they also failed to properly optimize INs for a long time and, finally, SQL is full of compromises and still far from delivering all the power promised by the Relational Model. But thinking that in 2012 we should still understand how a decent RDBMS handles IN vs an equivalent join is plainly stupid. I expect that any query engine (which is not a toy) can do a lot of more complex tranformations and can detect a lot of contradictions early in the pruning phase without even really executing a query (or part of it) when it's not needed

      If you think otherwise, you should probably move over to any of the great, web-scale, noSQL technologies. There's plenty of opportunities to reinve^H^H^H^Hwrite things there...

    65. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 0

      RDBMS is *not* supposed to be dumb.

      It is supposed to do what it was asked - if someone asks it to do something dumb, it has no choice.

      WTF are you talking about? In SQL there are typically several ways to express the same query. If two (or more) are logically equivalent, there's nothing dumb asked, the query engine is dumb because it can't figure they're the same thing and optimizes them differently.

    66. Re:Considering sub queries in IN statements. by znrt · · Score: 1

      First of all: re-writing IN sub-queries with joins has *nothing* to do with learning SQL. If they're logically equivalent, there's no better way to write them in SQL.

      if you say so.

      actually they're quite different things. logically, JOIN will always produce a join, whereas IN will never do and only check for existence.

      to some extent my point here was that instead into buying into some arbitrary optimizer you should ask yourself if you need a join or not. easy! once you are aware of this, you are in position of writing "good" sql without having to give a crap for optimizers or implementations. more in general I was hinting at database *design*. dumb design is the grandmother of all shitty queries and poor db performance in general. but then ... you can always optimize it, hooray! :D

      of course, the optimizer may want to choose to skip the actual join if it is not actually needed. that's just great. doesn't interest me (most of the time). and probably both forms will produce almost identical results in most engines. fine! but i really don't need to know, i just have to know sql ... and how an (abstract) rdbms works.

      Not only that, it seems you also fail to understand the basic philosophy behind the birth of the Relational Model. Through the algebra and calculus it provides a sound framework to build declarative languages that let database developers focus on what they want to do instead of telling the system how to (physically) do it.

      that's a bold one, mr. bigshot. sadly you fail to realize there IS a conceptual difference between IN and JOIN.

      and i could wildly guess that your erroneous assumption might well have to do with optimizer abuse. :-)

      focus on what they want to do instead of telling the system how to (physically) do it.

      that's exactly what you're getting wrong because you are making assumptions on optimized output. so you make yourself dependent on optimizers (which may be "braindead", remember?). because the physical results (and most probably the performances) happen to be the same in this particular scenario, you don't even spot the semantic difference. in your own words, you should focus on the question, which is if you need a join or not, and just tell sql. sql is your friend :-)

    67. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 0

      First of all: re-writing IN sub-queries with joins has *nothing* to do with learning SQL. If they're logically equivalent, there's no better way to write them in SQL.

      if you say so.

      actually they're quite different things. logically, JOIN will always produce a join, whereas IN will never do and only check for existence.

      I never said JOIN and IN are *always* the same logical operation and if you took your time to read, instead of wasting it writing a lot of rubbish, you would have noticed there's an *if* up there.

      But when they are the same logical operation (e.g. when you simply do an inner join without selecting any column from the joined table), any non-dumb query processor should be able to do such transformation and apply the same optimization technique.

      I see you lack some basic education on the topic, so you may want to check out some facts by yourself. Simply download SQL Server 2008 R2 Express Edition along with the Adventureworks sample database and run the following T-SQL queries:

      SET SHOWPLAN_ALL ON;
      GO

      SELECT S.SalesOrderID, S.OrderDate
      FROM Sales.SalesOrderHeader AS S
      JOIN Sales.SalesTerritory AS T
          ON S.TerritoryID = T.TerritoryID
      WHERE T.CountryRegionCode = 'DE';

      SELECT S.SalesOrderID, S.OrderDate
      FROM Sales.SalesOrderHeader AS S
      WHERE EXISTS(
                      SELECT *
                      FROM Sales.SalesTerritory AS T
                      WHERE S.TerritoryID = T.TerritoryID
                          AND T.CountryRegionCode = 'DE'
                        );

      SELECT S.SalesOrderID, S.OrderDate
      FROM Sales.SalesOrderHeader AS S
      WHERE S.TerritoryID IN (
                      SELECT T.TerritoryID
                      FROM Sales.SalesTerritory AS T
                      WHERE T.CountryRegionCode = 'DE'
                        );

      Result (cut for readability) for all three queries:
          |--Hash Match(Inner Join, HASH)
                    |--Clustered Index Scan()
                    |--Clustered Index Scan()

      You will notice that query plans for all the three queries are exactly the same: with an inner join logical operation and using a hash match physical operation (because there are no useful indexes).

      Since you were enough intellectually dishonest to try and make it look like I was saying something I never said, I don't bother even reading the rest of your post.

      Get an education on the topic first and then come back if you still don't understand how RDBMSs work.

    68. Re:Considering sub queries in IN statements. by apotheon · · Score: 1

      If there's anything in what you said with which I wouldn't agree, it's the fact it looks like you're implying that we can't really do significantly better than SQL. While we haven't yet developed something significantly better (in all important ways), as far as I'm aware, I really believe we could. I think it's mostly a combination of inertia, orthodoxy, and compatibility issues that stops us.

      --
      Unfetter your ideas. Copyfree your mind.
    69. Re:Considering sub queries in IN statements. by znrt · · Score: 1

      First of all: re-writing IN sub-queries with joins has *nothing* to do with learning SQL. If they're logically equivalent, there's no better way to write them in SQL.

      if you say so.

      actually they're quite different things. logically, JOIN will always produce a join, whereas IN will never do and only check for existence.

      I never said JOIN and IN are *always* the same logical operation and if you took your time to read, instead of wasting it writing a lot of rubbish, you would have noticed there's an *if* up there.

      But when they are the same logical operation (e.g. when you simply do an inner join without selecting any column from the joined table), any non-dumb query processor should be able to do such transformation and apply the same optimization technique.

      I see you lack some basic education on the topic, so you may want to check out some facts by yourself. Simply download SQL Server 2008 R2 Express Edition along with the Adventureworks sample database and run the following T-SQL queries:

      SET SHOWPLAN_ALL ON;

      snipped: irrelevant implementation details

      You will notice that query plans for all the three queries are exactly the same: with an inner join logical operation and using a hash match physical operation (because there are no useful indexes).

      yeah, in a particular implementation it may well be. i told you already.

      but IN and JOIN are still different things. you just demonstrated you know how to draw erroneous conclusion from query plans, and that you still don't understand the difference between IN and JOIN. which was exactly my point. thank you.

    70. Re:Considering sub queries in IN statements. by Anonymous Coward · · Score: 0

      yeah, in a particular implementation it may well be. i told you already.

      but IN and JOIN are still different things. you just demonstrated you know how to draw erroneous conclusion from query plans, and that you still don't understand the difference between IN and JOIN. which was exactly my point. thank you.

      I'm getting a bit tired of juvenile rebuttals...

      1. Not all IN sub-queries can be transformed into equivalent JOIN operations, but some (a lot?) can.
      2. For IN sub-queries that can be transformed into a JOIN operation, the operations are logically equivalent, otherwise you wouldn't be writing the same query and you wouldn't get the same results back.
      3. Building a straw-man by bringing into the discussion the general definitions of the IN and JOIN operations won't help as we are talking of specific form of the operations.
      4. Throwing in random phrases like "checking for existence" won't help either. SQL is based on both Relational Algebra and Relational Calculus and operations end up being logically equivalent, so what? This is, practically speaking, what Codd theorem states: the algebra and the calculus are logically equivalent.
      5. Any modern, non-naive, query processor is aware of such logical equivalences and will simplify the query tree before optimization as it doesn't make any sense to apply different optimizations strategies to the same logical operation (whether the resulting logical operation's name is "join" or "thingamjic" is irrelevant).
      6. Dumb query processors (or simple data cruncher as I consider them) simply don't have a clue; by providing different implementations and, possibly, different optimization techniques, they may end up having very diverging performance characteristics.
      7. I provided evidence at least with one enterprise-class RDBMS, it's up to you to provide evidence that it's a "particular" implementation. Oh, and no: MySQL is not an enterprise-class RDBMS, not even close to, it's a data cruncher based on SQL (a language it barely understands). Try with Oracle, DB2 or Sybase.
      8. You are the one proposing to stick to one solution because some (low quality) data crunchers have dumb proprietary implementations.

      As I already said, you fail to understand the basics of Relational technologies. Both the algebra (join) and the calculus (in) were introduced in SQL because they have different expressive power characteristics and are both helpful to write queries in a more practical (readable) way. Forcing people to use one over the other because some products are unaware of some operations' logical equivalence is simply stupid.

      It's 2012 we shouldn't rely on dumb products built by people that stated very clearly in the manual that Foreign Keys were implemented just to avoid syntax errors in SQL scripts, the engine was not really enforcing the constraints because that would slow down operations and constraints should be implemented in the application anyway. Maybe you're just too young to remember. Or just clueless.

    71. Re:Considering sub queries in IN statements. by znrt · · Score: 1

      I'm getting a bit tired

      i can understand that given the considerable effort you are putting into making your *opinion* look like an axiom. to deny a simple evidence you have thrown yourself into long explanations, performed query analysis, even made mad guesses about your conversation partner. you have obviously plenty free time, enjoy!

      of juvenile rebuttals...

      oh, that! well, i find your scornful comments pretty amusing! no visa can pay having the very same guy that postulates that in and join are equivalent because they yield same results in a particular case, telling you that you have no clue about relational algebra. can't get enough of this! :D

      Not all IN sub-queries can be transformed into equivalent JOIN operations, but some (a lot?) can.

      yeah, ok. you are pushing this really far, how about settling a bit.

      first, try to think of the in/join issue as an illustrative example, which is what it was.

      second, try to realize that means and results are not the same. my entire point in that "learn sql" thing was about finding the appropiate means for needed results. this constitutes, as i repeatedly stated, good design. which amounts for me to sound code (be it sql or whatever). it's difficult for me to imagine a reason for a senior developer with extensive experience in sql like you say to be (and i don't question that, absolutely) to disagree with that. haven't you seen loads of "braindead" queries and database schemas in your lifetime? i have. where the fcuk have you been working, pal?

      definitely, in/join are not the same, but they may be equivalent in practice given some specific constraints. so what? didn't i already admit that? but relying on optimization is still a bad practice, in my view. *blaming* bad optimizers for anything is just an extreme form of self-deception. this is my opinion, you don't have to support it. but it's futile and silly to try to demonstrate your opinion is right, another is wrong.

      It's 2012 we shouldn't rely on dumb products built by

      i think this summarizes very well the basic distance between our POVs. this seems to upset you great lengths, but i could not care less! because it's not my problem as a developer/designer. it's the user's problem, if it's a problem at all. my code runs well on any engine because i trust no one. i never did, and have no regrets :-)

  4. Finally! by K.+S.+Kyosuke · · Score: 2

    I knew they would implement the Oracle storage backend into MySQL one day...

    --
    Ezekiel 23:20
    1. Re:Finally! by Admiral+Llama · · Score: 3, Interesting

      No, rather it's amazing what happens when you architect the DB such that the entire thing is forcibly held in memory.

    2. Re:Finally! by K.+S.+Kyosuke · · Score: 1

      That was supposed to be a joke. :)

      --
      Ezekiel 23:20
  5. MySQL Cluster by CastrTroy · · Score: 1

    MySQL Seems like it could be interesting, but I can't get over how it requires the whole thing to be hosted in memory. I'm much more interested in Percona Cluster which is also based off MySQL.

    --

    Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    1. Re:MySQL Cluster by Anonymous Coward · · Score: 0

      Swap isn't really "memory" in the sense that RAM is. Unless you're running WAMP or something.

    2. Re:MySQL Cluster by ducomputergeek · · Score: 1

      Using memory for db cluster nodes isn't new. Teradata has been doing it for a long time now in their data warehousing. It has its advantages on large databases and you have the money to afford the nodes you need for reliability.

      --
      "The problem with socialism is eventually you run out of other people's money" - Thatcher.
    3. Re:MySQL Cluster by tqk · · Score: 1

      ... Unless you're running WAMP or something.

      I hate people like you.

      dict wamp
      From V.E.R.A. -- Virtual Entity of Relevant Acronyms (June 2006) [vera]:

          WAMP
                        Windows, Apache, MySQL, PHP (Windows, Apache, PHP, DB, SQL)

      Damnit, define your !@#$ing terms, and what a stupid acronym that is! The OS should be irrelevant.

      I really, really hate this century. :-P

      [Okay, I feel better now. Apologies for the outburst, but please do feel free to suck a rock and die.]

      --
      "Tongue tied and twisted, just an Earth bound misfit ..." -- Pink Floyd.
  6. MySQL Cluster != MySQL by Anonymous Coward · · Score: 5, Informative

    Not the same thing.

    1. Re:MySQL Cluster != MySQL by Animats · · Score: 1

      MySQL Cluster used to be a pay-only product. Is it free now?

      (The setup with one MySQL master and N read-only slaves has been free for years. That powers Wikipedia, for example. It's not MySQL Cluster. A MySQL Cluster has multiple read-write machines.)

    2. Re:MySQL Cluster != MySQL by Anonymous Coward · · Score: 0

      MySQL Cluster has never been a pay-only product.

    3. Re:MySQL Cluster != MySQL by hholzgra · · Score: 1

      > MySQL Cluster used to be a pay-only product. Is it free now?

      there's always been a GPL version of it, only the optional management tool MySQL Cluster Manager that was added later is not open source

      you may have confused things with the fact that you could only buy
      cluster support contracts for the commercially licensed version only
      while the general MySQL support offerings covered both commercial
      and GPL binaries ...?

    4. Re:MySQL Cluster != MySQL by Damnshock · · Score: 1

      We currently run two masters without problem...

  7. But of course it reads from RAM by photonyx · · Score: 4, Interesting

    The case when the data set is bigger than RAM amount has not been investigated (link here, see the comments). The hard drive I/O speed would slow it dramatically, unless it's an expensive array of SSDs.

    1. Re:But of course it reads from RAM by fuzzyfuzzyfungus · · Score: 2

      Nothing can really save your performance numbers once you have to hit the platters, so the question would be not 'does it suck more when it can't work purely in RAM?' but 'How good are the mechanisms available for minimizing(ideally automatically, but at least without too much black magic on the application programmers' parts) the frequency with which your setup ends up needing something that isn't in RAM and being bottlenecked by having to fetch it from disk?'...

    2. Re:But of course it reads from RAM by Bill,+Shooter+of+Bul · · Score: 2

      Who the hell uses actual disks when performance is an issue?
      Also, who the hell uses ssd's when performance is an issue?
      The fast kids use PCI Express cards loaded with flash.
      Texas Memory systems or Fusion IO. Expensive, yes, but this is Mysql CLUSTER we are talking about. You don't use that unless you need the speed.

      --
      Well.. maybe. Or Maybe not. But Definitely not sort of.
    3. Re:But of course it reads from RAM by sitkill · · Score: 1

      That's weird....I believe that when the dataset is bigger than ram, you just break it into different servers (or just get more ram). This isn't anything new, this is actually old OLD technology, and you can find cases of this everywhere (google, microsoft uses it for hotmail, etc), called the Brick architecture or something like that.

      Actually, a ton of research was done on these types of architectures by Jim Gray (who tragically was lost at sea a few years ago) at Microsoft. Not sure how many of these links are still active but: http://research.microsoft.com/en-us/um/people/gray/

      From my own experience, at a job a few years ago, we were dealing with a little over 1 petabyte of data, and the system was engineered to NEVER hit platter, and to always have data in ram.

    4. Re:But of course it reads from RAM by Trogre · · Score: 1

      That would make it a RAED array.

      --
      "Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
    5. Re:But of course it reads from RAM by tqk · · Score: 1

      How good are the mechanisms available for minimizing(ideally automatically, but at least without too much black magic on the application programmers' parts) the frequency with which your setup ends up needing something that isn't in RAM and being bottlenecked by having to fetch it from disk?

      Are we talking about a web back end db, or just a generic db? I ask as the page I'm replying to shows (according to AdBlockPlus) 47 separate entities running in the background. With all that !@#$ going on, what does DB fetch performance even mean?

      I think network performance is far and away the most determinant factor in this stuff. Optimize your queries to hell and back, but you'll still have to wait for the network. Zzzzzz ...

      "The DB's too slow!"

      Not necessarily. Why don't people understand Performance Analysis these days? Complex systems are complex. There's lots of variables involved.

      --
      "Tongue tied and twisted, just an Earth bound misfit ..." -- Pink Floyd.
  8. That's great... by Daetrin · · Score: 1

    Now can they please work on some dramatic usability improvements so i don't have to cringe every time an Oracle support question comes up at work?

    --
    This Space Intentionally Left Blank
    1. Re:That's great... by jedidiah · · Score: 1

      If you can't be bothered, you can ignore a lot of the finer details in Oracle just as much as you would be prone to with any other RDBMS.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    2. Re:That's great... by Daetrin · · Score: 1

      Uh, if it was something i could ignore then i wouldn't be cringing, i would be just shrugging and ignoring the problem. As long as some of our (big) customers insist on using Oracle we have to work with Oracle to support them. And they're not going to accept "we didn't implement/debug that feature in Oracle because working with Oracle is a pain" very well.

      --
      This Space Intentionally Left Blank
    3. Re:That's great... by tqk · · Score: 1

      Now can they please work on some dramatic usability improvements so i don't have to cringe every time an Oracle support question comes up at work?

      You may wish to question whether you're in the wrong line of work. RDBs aren't really all that complex. Stuff it in, then drag it out and present it in a pretty display format. What else is there to know?

      People are way too hung up on complexity. Life doesn't have to be that hard. You're making it harder than it has to be. Simplify.

      --
      "Tongue tied and twisted, just an Earth bound misfit ..." -- Pink Floyd.
    4. Re:That's great... by Daetrin · · Score: 1

      I think you're pretty massively missing the point? It's not RDBs that are the problem, it's Oracle specifically. Try developing complex software using a SQL back end that supports both MS SQL and Oracle. Getting everything working on Oracle takes longer. And when there are problems it takes longer to debug on Oracle. Just getting Oracle installed properly was a huge pain in the ass compared to MS SQL.

      _Oracle_ is making it harder than it has to be. If i had the choice i'd simplify just by dropping support for Oracle and only work with MS SQL, but that's not my choice to make.

      --
      This Space Intentionally Left Blank
    5. Re:That's great... by tqk · · Score: 1

      It's not RDBs that are the problem, it's Oracle specifically.

      Ah. On that, we agree. Larry's one sharp snakeoil salesman.

      --
      "Tongue tied and twisted, just an Earth bound misfit ..." -- Pink Floyd.
  9. Yes, I RTFA (sue me) by billcopc · · Score: 3, Informative

    If I read the sales pitch correctly, they just integrated Memcached as a backend storage module, so that it plays nicely wrt ACID compliance. Yeah, memory is 70x faster than disk I/O... big whoop!

    Anyone running a sizeable MySQL installation already has heaps of RAM allocated to the InnoDB buffers/caches anyway. It sounds like Oracle compared a stock, distro-default MySQL to their memory-hungry tweaks. Yeah, DUH. I can get a 70x speedup too if I increase MySQL's memory usage from the default 64mb to 48 gigabytes.

    --
    -Billco, Fnarg.com
    1. Re:Yes, I RTFA (sue me) by medv4380 · · Score: 2
      If you're going to read an article about a benchmark you might actually read the benchmark too.

      Previously each data node exploited about 6 CPUs with 8 threads, in this benchmark we used about 14 CPUs with 24 threads. We've also removed a number of bottlenecks and the configuration of the threads is done in such a manner as to handle many different loads well.

      The little thing you read about memcached was about adding in a Native API which probably helped things, but getting the multithreading upgraded probably had more to do with it. They also were comparing two benchmarks that both were using just memory, and not a case where 1 used disk and the other used ram.

    2. Re:Yes, I RTFA (sue me) by bdgregg · · Score: 1
      I thought he was talking about the 70x result, not the 1 billion result. Quoting from the 70x benchmark:

      For this test, 3 machines were used:

      Intel Core 2 Quad Core @2.83 GHz; 8 Gbytes RAM; single, multi-threaded data node (ndbmtd)
      Intel Core 2 Quad Core @2.83 GHz; 8 Gbytes RAM; single, multi-threaded data node (ndbmtd)
      4 Core Fedora VM running on VirtualBox on Windows 7, single MySQL Server

      It looks like the AQL feature has pushed work from that MySQL-on-VirtualBox-on-Windows-7 system to the pair of data nodes with 8 GB RAM, and found it runs 70x faster. It doesn't say how much RAM the MySQL server had, what the data nodes were running, or why it was running in that virtualized environment in the first place. I can't say I'm surprised that offloading work from such a place has has provided a big win. It may be more interesting to see the result from a more typical setup, such as MySQL running on a baremetal OS.

    3. Re:Yes, I RTFA (sue me) by dreemernj · · Score: 1

      It sounds like the RAM isn't actually the real boost. This is MySQL Cluster. The last time I used it, it kept everything in RAM already. I think since then they let you store non-indexed tables onto disk, but that's it. The speed boost sounds like it is for NDB, not InnoDB. NDB doesn't support foreign keys and removing that constraint probably helps its performance too.

      In this case, the actual boost is from AQL, which allows for a query to be split up by JOINs and sent out to different servers to run against their local tables in parallel.

      --
      1 (short ton / firkin) = 89.1432354 slugs / keg
    4. Re:Yes, I RTFA (sue me) by Anonymous Coward · · Score: 0

      Actually the older versions ONLY could keep the data in RAM on the data nodes. Newer versions (7.x) can hold it on disk too.

    5. Re:Yes, I RTFA (sue me) by dreemernj · · Score: 1

      Yep, that's exactly what I said. But thank you for reiterating.

      --
      1 (short ton / firkin) = 89.1432354 slugs / keg
    6. Re:Yes, I RTFA (sue me) by Anonymous Coward · · Score: 0

      And towards the bottom of the discussion we have people that read and understand the article. Yay!

    7. Re:Yes, I RTFA (sue me) by lewiscr · · Score: 1

      DUH. I can get a 70x speedup too if I increase MySQL's memory usage from the default 64mb to 48 gigabytes.

      Just make sure you actually test it. Some of the performance recommendations in the manual don't scale from 64 MB to 48 GB. MyISAM Key Buffer and Query Cache, I'm looking at you.

    8. Re:Yes, I RTFA (sue me) by billcopc · · Score: 1

      I, er... I think I shorthand a lot of this stuff, because it seems trivial to me as a programmer. Anything that works like a hashtable is going to experience diminishing returns, because you have to scan through that every-growing table. O(n^2) and stuff.

      On the other hand, anything that spares you from hitting a disk, that's a guaranteed win. What I commonly do with my DB servers, since my data sets are small but complex, is to give them enough memory to cache the entire database into RAM. The disk effectively becomes a backup copy, in case I need to reboot the box. With RAM being so cheap these days, it's usually more cost effective than deploying more cluster nodes. Memcached is great if all you want is a key-value bucket, but for speeding up existing applications that (ab)use SQL, RAM is the undisputed king of performance.

      --
      -Billco, Fnarg.com
  10. the old release by nimbius · · Score: 2

    was GPL as well.
    dont forget that Oracle committed that MySQL server will continue to use the dual-licensing strategy long used by MySQL AB with commercial and GPL versions available until at least 2015. if other big players are any indication this is oracles attempt to avoid another apache/java landmine and preserve the trust and respect of developers until such time as the product is sufficiently 'in-housed' to divorce from the community without fear of retalliation. by 2015 the forked code, should there be yet another fork, can probably enjoy a tenth of the performance of oracles version and have to compete with ingrained lock-ins and contract conditions developed by oracle to further ostracize open source competetors.

    but im not sure if its really relevant at all. databases like hypercube and couch are really giving oracle a run for their copious amounts of money. by 2015 the "paradigm" may have "shifted" as the PHB says.

    --
    Good people go to bed earlier.
  11. MySQL in-memory speed by Amadodd · · Score: 5, Interesting

    Slightly off-topic, but I recently had the oppurtunity to test the speed of a MySQL in-memory database. I have some frequently queried read-only data that simply would not handle the load in MS SQL and was looking for an in-memory solution. MySQL provided the simplest implementation - simply tell the table to use memory storage and configure the server to allow the amount of data you want to host (~250MB in this case). You also have to remember to reload the data from normal InnoDB tables every time you restart the server. I used the same table structures, keys indexes and stored procedures (almost the same) to query the data and linked it through MS SQL so that my applications never new the difference. On exactly the same hardware the speed increase was at least 50X over MS SQL.

    --
    Freedom of speech doesn't come with bandwidth.
    1. Re:MySQL in-memory speed by bbbaldie · · Score: 2

      I've programmed PHP to use SQL Server and MySQL, the MySQL statements typically run in milliseconds, it usually takes SQL Server two or three seconds to respond.

    2. Re:MySQL in-memory speed by rgbrenner · · Score: 1

      it usually takes SQL Server two or three seconds to respond

      then you're incompetent. stackoverflow uses SQL Server 2008, and serves 12-14 million pages a day.

    3. Re:MySQL in-memory speed by bbbaldie · · Score: 1, Funny

      Thanks for judging me. And for being so humble about it. Oh, and thanks for helping to dispel the notion that arrogant pricks hang around /. :-)

    4. Re:MySQL in-memory speed by SuiteSisterMary · · Score: 1

      Well, what did you expect?

      --
      Vintage computer games and RPG books available. Email me if you're interested.
    5. Re:MySQL in-memory speed by bbbaldie · · Score: 1

      ;-) This sentence added because the winking emoticon above was rejected as a valid comment...

    6. Re:MySQL in-memory speed by xaxa · · Score: 1

      I've programmed PHP to use SQL Server and MySQL, the MySQL statements typically run in milliseconds, it usually takes SQL Server two or three seconds to respond.

      Can you give an example?

      We're evaluating Postgres vs MS SQL at work, and at the moment haven't found any problem big enough to rule either out.

    7. Re:MySQL in-memory speed by bbbaldie · · Score: 1

      Sadly no, this was at a previous employer. I had a script that looked at the arp tables of a core router and compare them to a database of known mac addresses (in MySQL). Typically, there would be around 1,000 macs in the arp table, compared to roughly 4,000 known macs. The web page would load in about 1-2 seconds. OTOH, getting SQL Server's attention would run a couple of seconds or more, followed by the initial query (running a query joining two or three tables, returning perhaps twenty records) adding from 2-5 more seconds. Granted, it would run much quicker after that. Part of the problem was that the community had a much faster SQL Server module for PHP, then Microsoft took it over. The Windows dll was twice as big as it used to be, and performance took a hit. FYI, WAMP on the server end.

    8. Re:MySQL in-memory speed by gbjbaanb · · Score: 1

      that's 162 *pages* per second, not SQL queries. Thing is, how many of those are cached and simply re-served up again? How big are the servers that run these? How many servers do they use? How much Lucene do they use instead of SqlServer??? (you didn't watch the video did you? - 23 minutes in)

      Anyway, your argument is like saying Facebook takes 2 seconds to return my wall of data whereas MyPersonalWebsite with a PHP script I wrote returns instantly, therefore Facebook are useless losers.

      There might well be a 2 second delay if the server is repeating many different sql queries - I used to see a Oracle DB that spent most of its time waiting on the query cache - or if its hitting the DB with queries that lock a lot of data pages - I've seen an app that happily took a lock on nearly all the data in various tables escalating the lock to a full table lock. So yes, they're incompetent, but sometimes queries that work well on one DB don't work well on others, in particular queries that happily ran on Oracle killed Sql Server, simply because of the locking design these 2 DBs implemented (ok, so much so that Sql server now has row-level locking, but there could be others still in there causing problems).

    9. Re:MySQL in-memory speed by codepunk · · Score: 1

      problems other than cost you mean.

      --


      Got Code?
    10. Re:MySQL in-memory speed by Anonymous Coward · · Score: 0

      therefore Facebook are useless losers.

      Isn't this true anyway?

      And if you want to use some script you wrote up as proof, then I'm with you 100%.

    11. Re:MySQL in-memory speed by rev0lt · · Score: 1

      Did you test the queries on the actual database server, or did you just run your script? Were you using regular drivers, or PDO? Was the SQL server on the same machine, or on a separate one?
      Just because your script didn't run well, doesn't mean that SQL Server was slower. Probably the PHP database modules had a lot to do with it - I guess there is much more attention given to MySQL integration than with SQL Server, or some misconfiguration (always 2-3 seconds? Maybe is a DNS issue, were you using hostnames or just IP's?).

    12. Re:MySQL in-memory speed by xaxa · · Score: 1

      problems other than cost you mean.

      We already have an MS SQL database for the HR, finance (etc) systems, so using that might be as cheap (or cheaper) than a new Postgres database, given existing staff (DBA) knowledge and some odd way they fudge the numbers. (Recurring costs for MS software are apparently negotiated and paid for centrally, rather than charged to the department using the software. Daft.)

      Essentially, arguing on grounds of inital/ongoing cost is difficult. My best cost argument is the no-cost license for Postgres means we could change our architecture, have ten times as many servers, have virtual servers, have lots of development servers -- anything -- without having to pay. I'm also concerned that although Microsoft sees us as "academic" at the moment, that might change -- we aren't a university, and the government is telling us we should be trying to make money with our skills/data where we can.

      So, a good, technical reason is much better. Various limitations have ruled out MySQL (e.g. problems with multiple engines -- you may have to choose between full-text search and transactions, for example). Postgres has the edge over MS SQL at the moment -- partly flexibility, partly the GIS extensions, I'll elaborate if anyone's interested. But SQL Server at least claims to do these things too.

    13. Re:MySQL in-memory speed by invalid-access · · Score: 1

      Well, what did you expect?

      The Spanish Inquisition.

    14. Re:MySQL in-memory speed by SuiteSisterMary · · Score: 1

      Sorry, they're still busy running a query on a guy, involving a table lock and several joins.

      --
      Vintage computer games and RPG books available. Email me if you're interested.
  12. The 70x is for (distributed) joins by jorela · · Score: 5, Informative

    which previously were quite poorly handled.
    See http://www.clusterdb.com/mysql-cluster/70x-faster-joins-with-aql-in-mysql-cluster-7-2/?utm_source=rss&utm_medium=rss&utm_campaign=70x-faster-joins-with-aql-in-mysql-cluster-7-2

  13. 33,500 rows? by PhuFighter · · Score: 1

    k. This is slightly laughable. 33,500 rows? in 87 seconds? that seems glacial. And 1.23 seconds being the new speed? that seems as expected. aside from comparing the speed of a Lada vs a common garden slug, how does this compare against other databases?

    1. Re:33,500 rows? by hawguy · · Score: 1

      k. This is slightly laughable. 33,500 rows? in 87 seconds? that seems glacial. And 1.23 seconds being the new speed? that seems as expected.

      aside from comparing the speed of a Lada vs a common garden slug, how does this compare against other databases?

      It was actually only 2000 rows returned -- the source tables had 35K rows. But in fairness, this was for a join across 11 tables, and it's easy for a database to perform slowly at this task if it doesn't have a good query optimizer. A badly optimized query could end up doing thousands (or even millions) of full table scans of multi-thousand row tables.

    2. Re:33,500 rows? by PhuFighter · · Score: 1

      25 columns, at 4 bytes per column, that's 100 bytes. I'm not sure how much overhead, but hopefully not much. 100 bytes at 2000 rows = 200,000 bytes. Across 32KB pages, that's 7 pages, allowing for 10% overhead. I assume that it's 40 Gb/sec IB, how many microseconds would that take to transfer? As for the join, i guess if there is more details about the schema (i think there's a link to it buried somewhere), simple joins should be pretty quick. I agree that a poorly implemented optimizer producing a bad plan could result in a nested-loop join wasting a lot of cycles, what this article tells me is that this database is pretty much on par with a quality RDBMS from years ago, and nowhere near as fast as NoSQL databases of today.

    3. Re:33,500 rows? by rev0lt · · Score: 1

      I'm with you on this one. 1.23 seconds seems like a dreadful result, and I can't imagine why they thought it was a good example.
      The only heavy MySQL application I use is Magento (and it relies heavily on JOINs), and a small server can blow these results out of the water, on similar-sized tables, and with multiple concurrent queries (but only using 3 to 5 JOINs per query).

    4. Re:33,500 rows? by jorela · · Score: 1

      ok...so here is confusion....the press release mentions 2 distictly different things.
      1) upto 70x faster joins
              this did not use the 100 byte records,
              this did use ordinary ethernet
      2) 1B QPM
              this used 100 bytes records
              this used infiniband
              this use single row transactions (NoSQL type benchmark)

  14. 70x by slasho81 · · Score: 5, Funny

    Developers: We've got some really good ideas for increasing performance of complex queries by...
    Marketing: How much in the best conceivable case?
    Developers: Oh, I dunno, maybe 70x.
    Marketing: 70x? Is that good?
    Developers: Yeah, I suppose, but the cool stuff is...
    Marketing: Wow! 70x! That's a really big number!
    Developers: Actually, please don't quote me on that. They'll make fun of me on Slashdot if you do. Promise me.
    Marketing: We promise.
    Developers: Thanks. Now, let me show you where the good stuff is...
    Marketing (on phone): Larry? It's me. How big can you print me up a poster that says "70x"?

    1. Re:70x by Anonymous Coward · · Score: 0

      So Larry Ellison now works in the poster-printing department at Oracle?

    2. Re:70x by slasho81 · · Score: 1

      Larry is just a random sleazeball from Marketing. I didn't think of Ellison.

    3. Re:70x by Anonymous Coward · · Score: 0

      Larry is just a random sleazeball.

      ftfy

    4. Re:70x by TrueSpeed · · Score: 1

      Larry is a sleazeball.

      FTFY

    5. Re:70x by Anonymous Coward · · Score: 0

      So Larry Ellison now works in the poster-printing department at Oracle?

      Larry does indeed design a lot of the ads you see like the ones trashing HP. The really funny ones never make it outside Oracle HQ so be on the lookout if you ever get to visit.

    6. Re:70x by socceroos · · Score: 1

      [quote]Larry is sleazeball[/quote]
      ftfy

    7. Re:70x by jorela · · Score: 1

      he he...do you have illegal survailance of my desk ? /Jonas MySQL Cluster developer

    8. Re:70x by slasho81 · · Score: 1

      Marketers being marketers. :)

  15. In the fine print... by TrueSpeed · · Score: 1

    *Performance will vary and best results are only possible with an Oracle service contract.

  16. Orcle Linux Unbreakable by Anonymous Coward · · Score: 0

    Where did I hear this before "Oracle Linux Unbreakable"? Hmm... Probably coming from the same mouth.

  17. TPC-H by Skinkie · · Score: 1

    So if Oracle is so happy with MySQL. Give us a TPC-H benchmark. Scalefactor: 100.

    --
    Support Eachother, Copy Dutch Property!
  18. Re:33,500 rows? 70x by jorela · · Score: 1

    Being the one that wrote the code: I'm with you too (in a way...)
    The thing is that prior to this release, applications that were using SQL joins (involving more than 2 tables) where ruled OUT:
    Now a lot of more applications *can* use MySQL Cluster.

    That is the new feature...but 70x is what's in the press release.

    (NOTE: mysql cluster is a different prodcut from mysql server)

  19. Re:33,500 rows? 70x by rev0lt · · Score: 1

    The thing is that prior to this release, applications that were using SQL joins (involving more than 2 tables) where ruled OUT:
    Now a lot of more applications *can* use MySQL Cluster.

    Yeah, that's a less flattering remark than a 70x increase in speed, but probably more useful :D. I'm not familiarized with MySQL Cluster, and I'm surprised it is_that_ different from the server version.

  20. yeahh..i remember oracle.. by echonyne · · Score: 0

    but at the end of the day its.. "..Oracle tooke'th away and gave'th back" :|

  21. Uhm. by Anonymous Coward · · Score: 0

    sub-selects are SQL, dumbass.

    1. Re:Uhm. by znrt · · Score: 1

      sub-selects are SQL, dumbass.

      you didn't even get it, didn't you?

      u just have to be smart, that is, now ur domain and take into account the consecuences of your design decisions. that's the hard part of any design. if u do it right u need not to care that much about your providers being dumbasess, or even full of shit (chances are they'll be). if u don't, then ur sw is crap anyway. quick! bitch optimize pleez!

      i know such forward thinking is against this whole hot agile thing but at the end it is mandatory in the abc of any (serious)* developer, something not that much related to sql or optimization as to being a skilled problem solver, which is essentially the same as being a smart designer.

      however, don't get me wrong: i've no problem with you wanting to buy any new problem whatsoever, no matter the price and no matter the reason. be my guest. even more: i wish you the best luck (and revenue) inventing all kinds of fancy problems to solve old problems. and, after all, you always can blame that "braindead optimizer".

      ok, now get back to nintendo.

  22. Those queries by warrax_666 · · Score: 1

    are exactly equivalent. If your RDBMS is so crappy that it can't see that, then that's not the user's problem. SQL was supposed to be an abstraction, right?

    --
    HAND.