Slashdot Mirror


PostgreSQL Getting Parallel Query

New submitter iamvego writes: A major feature PostgreSQL users have requested for some time now is to have the query planner "parallelize" a query. Now, thanks to Robert Haas and Amit Kapila, this has now materialized in the 9.6 branch. Robert Haas writes in his blog entry that so far it only supports splitting up a sequential scan between multiple workers, but should hopefully be extended to work with multiple partitions before the final release, and much more beside in future releases.

83 comments

  1. Re:This is a troll. Respond appropriately. by Anonymous Coward · · Score: 0

    "All queries," LOL cuz they're all the same.

    Idiot.

  2. Re:This is a troll. Respond appropriately. by viperidaenz · · Score: 1

    Not a dual core Pentium D or Athlon X2 from 2005?

  3. Re:This is a troll. Respond appropriately. by Anonymous Coward · · Score: 0

    From multiple points of view, they are. Idiot.

  4. Re:This is a troll. Respond appropriately. by Tablizer · · Score: 1

    Maybe what they meant is that the bottleneck should be disk, RAM, bus, and/or network I/O for the vast majority of sub-queries if "done right". In other words, a single core should be able to computationally sift rows fast enough to keep up with row scans from multiple "sources" for up to say 8 sources, which may be the max practical number of sources.

    However, I imagine that a complex or "mathy" equation in a query could make it be CPU intensive. But maybe that's relatively rare.

  5. Re:This is a troll. Respond appropriately. by Anonymous Coward · · Score: 1

    *Your

  6. Re:This is a troll. Respond appropriately. by Tablizer · · Score: 3, Funny

    You mean 500 johns or 500 joins? Sounds painful either way.

  7. How does this compare with other databases? by Anonymous Coward · · Score: 0

    Is this similar to MySQL / MariaDB's shard query? Do the commercial databases have this?

    Either way, it's good to see. It's been some time since cpu speeds flatlined and multiprocessors became common.

    1. Re:How does this compare with other databases? by theshowmecanuck · · Score: 3, Informative

      No. "Sharding" is the MYSQL bunch's relabelling of the established RDBMS partitioning paradigm. What this is, is parallel processing of a query. Queries normally run in a single process on a single processor. Now PostgreSQL can have a query split into multiple possesses on two or more processors. Oracle has had this capability for at least a decade.

      --
      -- I ignore anonymous replies to my comments and postings.
    2. Re:How does this compare with other databases? by danda · · Score: 1

      The parent did not reference "sharding". He referenced shard-query, which is a mechanism that can parallelize a single query using relational algebra.

      https://www.percona.com/blog/2011/05/14/distributed-set-processing-with-shard-query/

    3. Re:How does this compare with other databases? by Anonymous Coward · · Score: 2, Informative

      It's worth pointing out that while this is a useful feature for some applications, it won't make any difference whatsoever for many more. PostgreSQL has been multi-threaded for executing concurrent queries for a very long time, so if you're running a database for many users, you probably won't notice any difference with this functionality. If you have some particular query that takes a heck of a long time like MRP runs, then this will matter to you.

    4. Re:How does this compare with other databases? by Anonymous Coward · · Score: 2, Insightful

      Oracle has had this capability for at least a decade.

      On the other hand it is often worth it to rewrite your application entirely or ditch the customer that needs that particular function to not have to deal with Oracle.

    5. Re:How does this compare with other databases? by theshowmecanuck · · Score: 2
      Blockquote from the MariaDB site (notice the terms 'sharded' and 'partition'). Also, I would hazard a guess that the PostgreSQL code for parallel queries is written in C, not something like PHP.

      Shard-Query is a high performance MPP (massively parallel processing) query engine for MariaDB and MySQL which offers increased parallelism compared to stand-alone servers. This increased parallelism is achieved by taking advantage of MariaDB/MySQL partitioning, sharding, common query clauses like BETWEEN and IN, or some combination of the above.

      Shard-Query is implemented with PHP and Gearman.

      Shard-Query is targeted mainly at big data problems, and OLAP queries in general. The primary goal of Shard-Query is to enable low-latency query access to extremely large volumes of data utilizing commodity hardware and open source database software. Shard-Query is a federated query engine which is designed to perform as much work in parallel as possible over a sharded dataset, that is one that is split over multiple servers (shards) or partitioned tables .

      --
      -- I ignore anonymous replies to my comments and postings.
    6. Re:How does this compare with other databases? by theshowmecanuck · · Score: 1

      I'll take your word for it. But multithreaded isn't the same thing as multi-process, especially on *nix. And yeah, although I don't program so much now, I have seen people write Oracle queries with hints for running parallel queries when not needed. Since these grab processors it can impact other services on the server.

      --
      -- I ignore anonymous replies to my comments and postings.
    7. Re:How does this compare with other databases? by jedidiah · · Score: 1

      > Is this similar to MySQL / MariaDB's shard query? Do the commercial databases have this?

      Since the beginning of time by comparison...

      --
      A Pirate and a Puritan look the same on a balance sheet.
    8. Re:How does this compare with other databases? by danda · · Score: 1

      I've used shard-query -- an early version of it. See the link I posted. It uses relational algebra to break a single query up into discrete parallel queries and execute them in parallel then combine the result set, on either a single server or multiple servers. It is able to do this outside the DB, and it could probably be ported to pgsql as well.

  8. This improvement a plus by kboodu · · Score: 2

    This will greatly improve the already impressive PostgreSQL database engine and help it compete against the more well-known Enterprise Relational Database Engines at a much better price point (free).

    1. Re:This improvement a plus by kimanaw · · Score: 1

      Hmmm, if so, then there are probably at least 6 other RDBMS's - mostly owned by deep pocket corps, and actually based on Pg - that ORCL could sue and actually hope to collect a check. Given Pg's open source nature, suing them would at best remove the code, but not contribute a single dime for that new dock on Larry's island.

      --
      007: "Who are you?"
      Pussy: "My name is Pussy Galore."
      007: "I must be dreaming..."
    2. Re: This improvement a plus by Anonymous Coward · · Score: 0

      My biggest issue with Postgres is the fact that data partitioned by month is often unoptimized for trivial queries like where you just want to find the first item before a given timestamp. If it doesn't do a table scan it'll at best do a bitmap image scan over every partition. Then it applies the limit 1. So stupid. It's faster to run a separate query for each partition, by a lot.

  9. Patent trolls in 3 2 1 by Billly+Gates · · Score: 0

    I believe Oracle owns using more than 1 cpu in a query if I am correct so their own rdbms looks faster. Maybe it is now being enabled if it expires or maybe not.

    I don't want to be sued using it

    1. Re:Patent trolls in 3 2 1 by sribe · · Score: 1

      I believe Oracle owns using more than 1 cpu in a query if I am correct...

      Well, Sybase had it well over 20 years ago...

    2. Re: Patent trolls in 3 2 1 by MemeRot · · Score: 1

      So it was baked into SQL Server since the beginning

    3. Re: Patent trolls in 3 2 1 by sribe · · Score: 1

      So it was baked into SQL Server since the beginning

      I believe so, but that was a long time ago. Could be my timeline is off and it was added after the MS purchase.

  10. Re:This is a troll. Respond appropriately. by Anonymous Coward · · Score: 0

    Your abuse of apostrophes is disgusting, and the charges probably won't hold up. What is the point of protecting 500 Lithuanians?

  11. New features every few months by rbrander · · Score: 5, Informative

    Just a few months back, lwn.net had a longish story on PostgreSQL. They were scoring a victory with the "UPSERT" command addition in 9.5, which with speed updates old records, OR inserts a new one, if none. A big feature on your commercial databases. Apparently, PostgreSQL's biggest worry lately is that it has so many developers adding cool new features that there's some resource lacks maintaining and cleaning the base code. (Possibly unfair oversimplification of lwn.net story.)

    I discovered PostgreSQL to get a free geodatabase for mapping, with the PostGIS plug-in...the open plug-in architecture being one of the greatest things about a FLOSS database. After nearly 25 years with Oracle and thinking everything else was a toy by comparison, PG blew me away. Amazing features, high performance, reliable. It's an amazing project, and this news is both impressive and unsurprising.

    1. Re:New features every few months by Kjella · · Score: 1

      This one on the other hand seems like baby steps:

      One rather enormous limitation of the current feature is that we only generate Gather nodes immediately on top of Parallel Seq Scan nodes. This means that this feature doesn't currently work for inheritance hierarchies (which are used to implement partitioned tables) because there would be an Append node in between. Nor is it possible to push a join down into the workers at present. (...) With things as they are, about the only case that benefits from this feature is a sequential scan of a table that cannot be index-accelerated but can be made faster by having multiple workers test the filter condition in parallel.

      No partitioning, no joins, right now the only thing you can speed up is a simple table scan where you can't/won't use an index. This is more "proof of concept" parallelism than a useful feature right now. I guess in a release or two this will be a big thing.

      --
      Live today, because you never know what tomorrow brings
    2. Re:New features every few months by guruevi · · Score: 2

      The biggest issue with parallelism is that a lot of stuff can't be parallelized in a way that makes sense. The way it is done (dispatching and gathering nodes) only makes sense if the query takes a really long time, otherwise there is a lot of overhead that destroys any type of speedup and could actually make everything else slower. Typically multi-threading in databases is done to speed up multiple independent queries, not a single query.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    3. Re:New features every few months by phantomfive · · Score: 4, Insightful

      PostgreSQL has one of the best-commented and cleanest code-base I've ever reviewed. FWIW

      --
      "First they came for the slanderers and i said nothing."
    4. Re:New features every few months by Anonymous Coward · · Score: 0

      It is a big thing because there are plenty of sequential scans. If you use text searching with some wild cards (LIKE/ILIKE), or COUNT.

      The intention is to broaden the applicability, and PostgreSQL has a good track record of delivering, especially once a feature has its foot in the door.

    5. Re:New features every few months by Kjella · · Score: 1

      The biggest issue with parallelism is that a lot of stuff can't be parallelized in a way that makes sense. The way it is done (dispatching and gathering nodes) only makes sense if the query takes a really long time, otherwise there is a lot of overhead that destroys any type of speedup and could actually make everything else slower. Typically multi-threading in databases is done to speed up multiple independent queries, not a single query.

      Define "really long time", in the example he's running a query that takes less than a second and gets it down to <250ms. Sure, it's not useful for transaction processing but I got many queries running on millions of rows where anything from a minute to an hour is more like it and milliseconds are peanuts. I would think most people have at least some reports that would benefit.

      --
      Live today, because you never know what tomorrow brings
    6. Re:New features every few months by Anonymous Coward · · Score: 0

      Well, it's in 9.6, which is still a long way from release, as 9.5 is still in testing. It's possible they add more to this before it reaches release.

    7. Re:New features every few months by Anonymous Coward · · Score: 1

      It's like the FreeBSD of RDBMS!

    8. Re:New features every few months by short · · Score: 1

      SQLite already has that as "INSERT OR REPLACE", MySQL as "REPLACE INTO" etc.

    9. Re:New features every few months by Anonymous Coward · · Score: 0

      As a long-time PostgreSQL user/admin, I was really blown away when I got my first elbows-deep exposure to Oracle a couple years ago. Everything in its internals seemed so...archaic. Quaint, even. Manually setting memory allocation in the SGA and such, if you want it to perform? What is this, DOS 6.22? (I kid, but only a little...)

      Then I (and apparently a huge portion of the BI world, which has since started bailing on Oracle in favor of Hadoop for larger data sets) committed a HUGE sin, by Oracle's reckoning. I did a performance comparison. Identical ETL jobs populating identital tables (with identical indexes) with identical data, and identical queries (both pre-canned, and ad-hoc report queries from two BI tools) run against it. My Postgres DW had older, slower CPU cores, 1/8 the RAM (32GB vs. 256GB), 1/2 the spindles in the SAN (both connected to the SAN via fiber). Both servers/databases had bee n tuned for EDW performance, Postgres by myself, Oracle by a consultancy we were using, since management didn't want to send me to the fancy training courses. In every test, Postgres blew Oracle clean out of the water.

      I think the reason people still think of Postgres as a "toy" (you're not the only one that thought that - it's a pretty common misconception about opensource databases in general), is that there's no giant price tag, and no company to sue. In every discussion with the executives, and IT management of several financial firms, and the people at several BI conferences I attended, that seemed to be the consensus time and time again.

    10. Re:New features every few months by Anonymous Coward · · Score: 1

      Having worked with some of the core committers, and attended conferences (and dinner) with others, I'm not the least bit surprised. I consider myself a pretty competent developer, and every single one of them made me feel like a total rookie. Also, none of them had the chest thumping ego that you see from the heads of a lot of other opensource projects. It was refreshing.

    11. Re:New features every few months by Anonymous Coward · · Score: 0

      MySQL as "REPLACE INTO" etc.

      REPLACE INTO is terrible, because it deletes the old row first, and in doing so applies the ON DELETE actions of foreign keys pointing to the table you're working with - so if you have any ON DELETE CASCADEs, you'll delete a bunch of other data when all you meant to do was update a single value.

      The other similar MySQL functionality, INSERT INTO ... ON DUPLICATE KEY UPDATE ..., is also terrible, because if you have multiple unique indexes and the row you're trying to insert clashes with one existing row on one index and another row on the other, it's not deterministic which of the existing rows gets updated. The Postgres version specifically avoids this, by requiring that you specify which column(s) should be used to identify whether there's an existing row to update.

      In conclusion, MySQL is terrible, and anyone who advocates it should be shot.

    12. Re:New features every few months by Anonymous Coward · · Score: 0

      REPLACE INTO is terrible, because it deletes the old row first, and in doing so applies the ON DELETE actions of foreign keys pointing to the table you're working with - so if you have any ON DELETE CASCADEs, you'll delete a bunch of other data when all you meant to do was update a single value.

      I forgot to add - this probably doesn't matter to the typical MySQL fanboy, because they don't know what foreign keys are. Anyone who's in any way serious about databases should avoid it like the plague, though.

      The other similar MySQL functionality, INSERT INTO ... ON DUPLICATE KEY UPDATE ..., is also terrible, because if you have multiple unique indexes and the row you're trying to insert clashes with one existing row on one index and another row on the other, it's not deterministic which of the existing rows gets updated.

      Similarly, MySQL users are probably just happy that they made the database do something without getting an error back - who cares if it's randomly corrupting a handful of records here and there?

    13. Re:New features every few months by Anonymous Coward · · Score: 0

      And this right here is a the problem I have with PostgreSQL. Its a great database and I use it daily, but its fan boys/stealth marketers are rabid, stupid and the continuous Mysql is teh suck foaming at the mouth is getting old. Btw. you are complaining about a feature that Mysql has had for years and years and the people who are using it ahappy with it.

      Every database has its own quirks. If you are a professional you understand its limitations and design around it, and that includes PostgreSQL which can be the poster child for quirky at times.

    14. Re:New features every few months by Anonymous Coward · · Score: 0

      the continuous Mysql is teh suck foaming at the mouth is getting old.

      I quite agree - I have to work with the fucking thing, and the continuous suckiness is getting very old indeed.

      Btw. you are complaining about a feature that Mysql has had for years and years and the people who are using it ahappy with it.

      Not sure which feature you're referring to, since I was complaining about two, but either way you just proved my point - MySQL users are too stupid to realise that both those "features" are blatant data corruption hazards.

      Every database has its own quirks. If you are a professional you understand its limitations and design around it

      In MySQL's case, designing around the limitations means not using it.

  12. I miss pgsql by pi_rules · · Score: 1

    I only ever got to develop a single project on pgsql and I regret that. This was back in 2001. MySQL was pretty immature at the time but had the enormous install base. I went with PostgreSQL because it was more mature. It never let me down. The deployment went fine, it ran great, customer used it on and off for about 6 years and then it was just no longer needed.

    Fast forward to 2011, ten years later, and now I'm running the show and developing a point of sale for the family business I'm in and I run with MySQL just because I'm so much more familiar with it. That and the master-master replication suite from Percona. MySQL is way better than it was in 2001 but still lacking here and there. I still question that choice. I will eventually migrate off MySQL but I don't know if it'll be MariaDB or PostgreSQL.

    1. Re:I miss pgsql by mysidia · · Score: 1, Interesting

      That and the master-master replication suite from Percona.

      I think that is a good reason to pick MySQL.

      As much as I like Postgres..... it seems to be a heck of a lot easier to do replication with MySQL and put together a highly-survivable system.

      I'm not even sure how to even start to go about doing it with Postgres.... although in the past; I have had a Cold/Warm standby Postgres with Slony-L based replication; It was quite frankly, a PITA.

    2. Re:I miss pgsql by Anonymous Coward · · Score: 0

      Check out MySQL 5.7. Not exactly the old MySQL... (although no multicore query processing yet)

    3. Re:I miss pgsql by theshowmecanuck · · Score: 1

      I agree. It is the biggest issue for PostgreSQL. A straight forward native high availability solution should be one of the highest priorities if not the highest.

      --
      -- I ignore anonymous replies to my comments and postings.
    4. Re:I miss pgsql by OverlordQ · · Score: 1

      > I will eventually migrate off MySQL but I don't know if it'll be MariaDB

      So from MySQL to still MySQL?

      --
      Your hair look like poop, Bob! - Wanker.
    5. Re:I miss pgsql by OverlordQ · · Score: 1

      Pg replication systems are dime a dozen, pick one, and implement it.

      --
      Your hair look like poop, Bob! - Wanker.
    6. Re:I miss pgsql by mcrbids · · Score: 1, Insightful

      ... and the replication systems are typically not worth much more than a dime, sadly.

      We have a pretty beefy set up; 4x 16 Core Xeon DB servers with 128 GB of RAM each and Enterprise SSDs, serving hundreds of instances of like-schema databases, one per (organizational) customer, serving an aggregate peak of about 1,000 queries/second in a mixed read/write load.

      And we've never been able to get replication to work reliably, ever. In every case we've ever tried, we've seen a net reduction in reliability. Every single time. Not that we've stopped trying, it has just never reached "just works" territory.

      Replication is PG's Achilles's heel.

      --
      I have no problem with your religion until you decide it's reason to deprive others of the truth.
    7. Re:I miss pgsql by nullchar · · Score: 2

      The native solution (streaming replication with many options) has been production stable since 9.1 and only increased in features, speed, and reliability.

    8. Re:I miss pgsql by nullchar · · Score: 5, Informative

      MySQL/MariaDB are still toys in comparison to PostgreSQL.

      Postgres has recursive CTEs, regex replacement, native JSON support (as a record type and trivially convert every query return type), and even base64 decoding and xpath parsing.

      MySQL has had some nice features for years, like REPLACE, but since the 9.x branch, CTEs can do that and more. And now PG has UPSERT for simplicity. Replication has always been great with MySQL, but PG's replication is now easy to administer. I've relied on Postgres' streaming replication since 9.1 in production and it's been great for years.

    9. Re:I miss pgsql by nullchar · · Score: 1

      Even with streaming replication in 9.3 onward? And tuning it? This a single instance with multiple schemas? Or separate instances?

    10. Re:I miss pgsql by roman_mir · · Score: 0

      Send me an email, I have a retail chain management solution, a single store management system as part of it, supply chain management, e-commerce built into brick and mortar management system at this point even auto ordering. Java and PostgreSQL.

    11. Re:I miss pgsql by mysidia · · Score: 1, Informative

      MySQL has had some nice features for years, like REPLACE, but since the 9.x branch

      The features MySQL has are good enough for 99% of real-world web applications.

      Yes, Postgres has more, but the extra features it has don't necessarily add much value for most programs.

      MySQL multi-master replication features are immensely valuable by comparison, and Postgres lacking them has prevented me from using Postgres, more than once.

    12. Re:I miss pgsql by dskoll · · Score: 1

      You're doing something wrong, then. Streaming replication works fine for us on a fairly similarly-sized setup.

    13. Re:I miss pgsql by jedidiah · · Score: 1

      > The features MySQL has are good enough for 99% of real-world web applications.

      It doesn't even have complete SQL support.

      While that might not matter for a trivial web application. For the more interesting ones that include non-trivial development teams, that will cause problems.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    14. Re:I miss pgsql by Anonymous Coward · · Score: 0

      Look up Postgres BDR. This is a very capable (and dirt simple easy to set up) multi-master version of PG. It took me about 15 minutes to set up a cluster using 3 VM's.

    15. Re:I miss pgsql by Anonymous Coward · · Score: 0

      Check out Postgres BDR. This is a very capable multi-master setup for Postgres, it is open source (but commercially supported).

    16. Re:I miss pgsql by Anonymous Coward · · Score: 0

      Postgres HAS multi-master replication, via Bucardo. It's not part of the core codebase, but it's written by Greg Sabino Mullane, one of the developers of Postgres. You should give it a look. If that's all that keeps you from considering Postgres, the performance demands you must have to necessitate multi-master replication would be far better served by the increased performance of Postgres over MySQL.

    17. Re:I miss pgsql by Anonymous Coward · · Score: 0

      From the FAQ:

      Can Bucardo replicate DDL?
      No, Bucardo relies on triggers, and Postgres does not yet provide DDL triggers or triggers on its system tables.

      Kind of a deal breaker. I want the database schemas of my nodes to keep in sync, no matter what, or bad shit happens.
      To be a true cluster, any operations on the database should be fully transparant.
      I love Postgres, but it sorely lacks something like Percona.

    18. Re:I miss pgsql by Anonymous Coward · · Score: 0

      it seems to be a heck of a lot easier to do replication with MySQL and put together a highly-survivable system.

      Have you seen the massive list of caveats for MySQL replication? If that doesn't make you run for the hills, I don't know what will. But then, the typical MySQL user never reads the documentation beyond the bare minimum to make something happen, then calls it a day.

    19. Re:I miss pgsql by Anonymous Coward · · Score: 0

      And those critical missing features are?

      Mysql has usually had more features than postgres which has usually been more focused on stability. Or at least until the alpha quality code in replication which has only gotten stable in the last release or so.

    20. Re:I miss pgsql by rhodium_mir · · Score: 1

      How many grams of that righteous mineral rhodium do you charge?

      --
      You can't spell "oneiromancy" without "roman".
    21. Re:I miss pgsql by Anonymous Coward · · Score: 0

      Mysql has usually had more features than postgres

      Just give up now.

    22. Re:I miss pgsql by nullchar · · Score: 1

      Yes, Postgres has more, but the extra features it has don't necessarily add much value for most programs.
      MySQL multi-master replication features are immensely valuable

      I guess it depends on your programs. It does take effort to maintain two or more connections (one for write, the others for reads).

      If your use case needs balanced multi-master replication, but simple features, you should use a NoSQL solution.

      Postgres is amazing for reporting where you can bring anything you want in a single query, including JSON output (without using plsql procedures), and bulk updates are fantastic, with CTEs for selects and layers of them for updates/deletes, especially great with regex_replace and windowing functions. (Not even MS SQLServer has regex replace! Nor does it have a query planner to guide you through optimization of giant queries.)

      If your never directly touch the database outside of basic application code, then don't use an SQL solution! The benefit of using Structured Query Language is for human interaction. One-off reports can be tuned and converted to simple tools or repeatable cron jobs that use single queries to extract any data you want.

      If you are using MySQL, you're doing it wrong.

  13. Greenplum's contributions? by Anonymous Coward · · Score: 0

    Didn't Greenplum fork PostGres in order to create a 'shared nothing' version with parallel queries? And isn't that now open source? So maybe they could integrate that code?

    1. Re:Greenplum's contributions? by nullchar · · Score: 1

      Interesting, it appears Greenplum has recently been open sourced.

  14. Re:This is a troll. Respond appropriately. by phantomfive · · Score: 1

    However, I imagine that a complex or "mathy" equation in a query could make it be CPU intensive. But maybe that's relatively rare.

    Good point.

    --
    "First they came for the slanderers and i said nothing."
  15. Re:This is a troll. Respond appropriately. by Anonymous Coward · · Score: 0

    This is clearly a grossly contrived, impossible scenario. Anyone who has spent more anything resembling a reasonable amount of time with Lithuanians will testify to their admiration for real talent and equal disdain for ineptitude. The mother in question would never have made it past her tenth customer without demonstrating strong aptitude for the work in question; had her goods proven lacking, she would have been disposed of in short order, most likely in a fairly public manner to serve as a warning to others who might similarly waste the time and energy of valued customers. Clearly, the parent is no Lithuanian, nor knows any, and is therefore nothing more than a nasty little bitch who will eventually tell such tall tales to the wrong audience; bye-bye, bitch-boi.

  16. Fuck Yeah. PostgreSQL FTW by nullchar · · Score: 1

    I've been using Postgres for well over a decade now, and I still love it. Yes, you have to tune it, like any powerful tool.

    Granted this first pass is only for sequential scans, but those are the simplest to parallelize and generally the slowest. Some queries rely on table scans as not every column can be indexed.

    Postgres' growing feature set is amazing. Thanks team!

  17. Re:This is a troll. Respond appropriately. by davester666 · · Score: 2

    particularly in parallel.

    --
    Sleep your way to a whiter smile...date a dentist!
  18. Re:This is a troll. Respond appropriately. by Anonymous Coward · · Score: 0

    Goatse has plenty of bandwidth

  19. mongoDB by Hognoxious · · Score: 0

    It will be interesting to see how it compares with MongoDB. MongoDB does not use joins. MongoDB is webscale.

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    1. Re:mongoDB by Anonymous Coward · · Score: 0

      Obvious dirty SEO spammer spamming dirty SEO spam is obviously a dirty SEO spammer spamming dirty SEO spam.

  20. I hope this doesn't compromise overall performance by Chrisq · · Score: 1

    This optimisation caters for a niche (admittedly a relatively large one) where there are relatively few queries but large ones. A more typical usage is where there are many smaller queries. I hope that this does not compromise the total throughput, so that the total parallelisation of multiple concurrent queries is not slowed to allow parallelisation withing individual queries. Either that or it should be a switchable option.

  21. Sounds like Terradata by tomhath · · Score: 1

    A leader in parallel query processing is Terradata, it uses custom hardware for massive parallelism. Will be interesting to see if Postgresql can scale on commodity/cloud hardware.

  22. Re:I hope this doesn't compromise overall performa by rhaas · · Score: 2

    As the fine blog post explains, it is a switchable option. You can set max_parallel_degree=0 to turn it off. Actually, right now, it's off by default, and you have to set max_parallel_degree>0 to turn it on.

  23. Re:I hope this doesn't compromise overall performa by jedidiah · · Score: 1

    I wouldn't call it a "niche" exactly. It's one of the major main use cases for employing an RDBMS.

    We just have a lot of "database people" with very limited experience and a limited mindset.

    --
    A Pirate and a Puritan look the same on a balance sheet.
  24. Schooled by CitusDB by Anonymous Coward · · Score: 0

    While I think PostgreSQL is a fantastic database, I find it perplexing that the CitusDB guys can come out of nowhere and make Postgres perform parallel operations, either on a single node or in a cluster, and get MUCH better results than the core developers can achieve. All without making *any* changes to the core. It's mind-boggling.

    1. Re:Schooled by CitusDB by Anonymous Coward · · Score: 0

      I think you're confusing distributed queries (which requires sharding) with parallel query (which doesn't depend on sharding).

      And EnterpriseDB came out with GridSQL for Postgres over 8 years ago which distributed queries, and Postgres-XC had developed this functionality previously too, and both were fully open sourced.

  25. Re:I hope this doesn't compromise overall performa by kupan787 · · Score: 1

    A more typical usage is where there are many smaller queries.

    Typical usage of what? If I have an OLTP system, for a transactional web based system, sure i'd agree. But if I am operating a data warehouse with fact tables housing hundreds of millions of rows, or trying to run largish reports on top of my OLTP system (say for state/fed reporting, or financial reporting), my "typical usage" is not many smaller queries.

    The last two projects I have been on, turning on auto-parrallism in Oracle has made huge performance gains. Not just for reads, but also when enabled for DML, we saw a lot of our bulk inserts and merges ran significantly faster. And this was on SQL that didn't change at all, just our DBAs making a few parameter changes on the DB.

  26. Working on similar feature by DidgetMaster · · Score: 1

    I am building an object store where some of my data objects can each be a key-value store that is used as a column in a relational table. Some queries against a table require a full scan (e.g. SELECT * from my_table where address like '%Main Street%';). If I had a table with a billion rows, it can take awhile to scan the whole address column looking for matches (I dedup the values in each column, but there can still be 100 million unique address values in such a table.) The solution is to break each column into multiple segments and let separate threads scan each segment looking for matches. The scan can occur in parallel on multi-core machines and complete in a much quicker manner than forcing a single thread to scan the whole thing. It sounds very similar to what they are trying to do with PostgreSQL (except that database is row based where the whole row is stored together, instead of a columnar database like mine). Here are two short demo videos of the system in action. https://www.youtube.com/watch?... https://www.youtube.com/watch?...

  27. binlog servers by Anonymous Coward · · Score: 0

    Abstracting Binlog Servers and MySQL Master Promotion without Reconfiguring all Slaves
    http://blog.booking.com/abstracting_binlog_servers_and_mysql_master_promotion_wo_reconfiguring_slaves.html