Slashdot Mirror


PostgreSQL Getting Parallel Query

New submitter iamvego writes: A major feature PostgreSQL users have requested for some time now is to have the query planner "parallelize" a query. Now, thanks to Robert Haas and Amit Kapila, this has now materialized in the 9.6 branch. Robert Haas writes in his blog entry that so far it only supports splitting up a sequential scan between multiple workers, but should hopefully be extended to work with multiple partitions before the final release, and much more beside in future releases.

13 of 83 comments (clear)

  1. Re:This is a troll. Respond appropriately. by Tablizer · · Score: 3, Funny

    You mean 500 johns or 500 joins? Sounds painful either way.

  2. This improvement a plus by kboodu · · Score: 2

    This will greatly improve the already impressive PostgreSQL database engine and help it compete against the more well-known Enterprise Relational Database Engines at a much better price point (free).

  3. New features every few months by rbrander · · Score: 5, Informative

    Just a few months back, lwn.net had a longish story on PostgreSQL. They were scoring a victory with the "UPSERT" command addition in 9.5, which with speed updates old records, OR inserts a new one, if none. A big feature on your commercial databases. Apparently, PostgreSQL's biggest worry lately is that it has so many developers adding cool new features that there's some resource lacks maintaining and cleaning the base code. (Possibly unfair oversimplification of lwn.net story.)

    I discovered PostgreSQL to get a free geodatabase for mapping, with the PostGIS plug-in...the open plug-in architecture being one of the greatest things about a FLOSS database. After nearly 25 years with Oracle and thinking everything else was a toy by comparison, PG blew me away. Amazing features, high performance, reliable. It's an amazing project, and this news is both impressive and unsurprising.

    1. Re:New features every few months by guruevi · · Score: 2

      The biggest issue with parallelism is that a lot of stuff can't be parallelized in a way that makes sense. The way it is done (dispatching and gathering nodes) only makes sense if the query takes a really long time, otherwise there is a lot of overhead that destroys any type of speedup and could actually make everything else slower. Typically multi-threading in databases is done to speed up multiple independent queries, not a single query.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    2. Re:New features every few months by phantomfive · · Score: 4, Insightful

      PostgreSQL has one of the best-commented and cleanest code-base I've ever reviewed. FWIW

      --
      "First they came for the slanderers and i said nothing."
  4. Re:How does this compare with other databases? by theshowmecanuck · · Score: 3, Informative

    No. "Sharding" is the MYSQL bunch's relabelling of the established RDBMS partitioning paradigm. What this is, is parallel processing of a query. Queries normally run in a single process on a single processor. Now PostgreSQL can have a query split into multiple possesses on two or more processors. Oracle has had this capability for at least a decade.

    --
    -- I ignore anonymous replies to my comments and postings.
  5. Re:I miss pgsql by nullchar · · Score: 2

    The native solution (streaming replication with many options) has been production stable since 9.1 and only increased in features, speed, and reliability.

  6. Re:I miss pgsql by nullchar · · Score: 5, Informative

    MySQL/MariaDB are still toys in comparison to PostgreSQL.

    Postgres has recursive CTEs, regex replacement, native JSON support (as a record type and trivially convert every query return type), and even base64 decoding and xpath parsing.

    MySQL has had some nice features for years, like REPLACE, but since the 9.x branch, CTEs can do that and more. And now PG has UPSERT for simplicity. Replication has always been great with MySQL, but PG's replication is now easy to administer. I've relied on Postgres' streaming replication since 9.1 in production and it's been great for years.

  7. Re:This is a troll. Respond appropriately. by davester666 · · Score: 2

    particularly in parallel.

    --
    Sleep your way to a whiter smile...date a dentist!
  8. Re:How does this compare with other databases? by Anonymous Coward · · Score: 2, Informative

    It's worth pointing out that while this is a useful feature for some applications, it won't make any difference whatsoever for many more. PostgreSQL has been multi-threaded for executing concurrent queries for a very long time, so if you're running a database for many users, you probably won't notice any difference with this functionality. If you have some particular query that takes a heck of a long time like MRP runs, then this will matter to you.

  9. Re:How does this compare with other databases? by Anonymous Coward · · Score: 2, Insightful

    Oracle has had this capability for at least a decade.

    On the other hand it is often worth it to rewrite your application entirely or ditch the customer that needs that particular function to not have to deal with Oracle.

  10. Re:How does this compare with other databases? by theshowmecanuck · · Score: 2
    Blockquote from the MariaDB site (notice the terms 'sharded' and 'partition'). Also, I would hazard a guess that the PostgreSQL code for parallel queries is written in C, not something like PHP.

    Shard-Query is a high performance MPP (massively parallel processing) query engine for MariaDB and MySQL which offers increased parallelism compared to stand-alone servers. This increased parallelism is achieved by taking advantage of MariaDB/MySQL partitioning, sharding, common query clauses like BETWEEN and IN, or some combination of the above.

    Shard-Query is implemented with PHP and Gearman.

    Shard-Query is targeted mainly at big data problems, and OLAP queries in general. The primary goal of Shard-Query is to enable low-latency query access to extremely large volumes of data utilizing commodity hardware and open source database software. Shard-Query is a federated query engine which is designed to perform as much work in parallel as possible over a sharded dataset, that is one that is split over multiple servers (shards) or partitioned tables .

    --
    -- I ignore anonymous replies to my comments and postings.
  11. Re:I hope this doesn't compromise overall performa by rhaas · · Score: 2

    As the fine blog post explains, it is a switchable option. You can set max_parallel_degree=0 to turn it off. Actually, right now, it's off by default, and you have to set max_parallel_degree>0 to turn it on.