PostgreSQL 9.1 Released

← Back to Stories (view on slashdot.org)

Posted by Soulskill on Monday September 12, 2011 @07:10AM from the new-and-shiny dept.

With his first posted submission, sega_sai writes "The new version of open source database PostgreSQL was released. This version provides many important and interesting features such as synchronous replication, serializable snapshot isolation, support of per-column collations, K-nearest neighbor indexing, foreign data wrappers, support of SELinux permission controls and many others. The complete list of changes is available here"

5 of 148 comments (clear)

Min score:

Reason:

Sort:

vs Oracle? by pak9rabid · 2011-09-12 07:19 · Score: 4, Interesting

So...how does PostgreSQL compete with Oracle nowadays as far as features go (specifically, spatial, and data-guard-like replication)? Anybody here tried making the switch?
1. Re:vs Oracle? by greg1104 · 2011-09-12 08:28 · Score: 5, Interesting
  
  Lots of companies are converting from Oracle Spatial to PostgreSQL plus PostGIS because it's faster and has better compliance to GIS standards. The text of the talk isn't available, but the FAA Airports GIS and PostgreSQL presentation was a typical report I was in the audience for. The FAA's first conversion happened very quickly: just export their data in a standard format, import into PostgreSQL, and tweak some queries. The result worked so much better that they've standardized on PostgreSQL for spatial applications at the FAA now. Internal projects needing a spatial database have to justify why they want the budget for Oracle Spatial, and it's default deny unless you have a really good reason.
  The addition of synchronous replication to 9.1 has made it a pretty even match for Oracle's Data Guard now. The main bonus is that you can control the integrity level you want at the transaction level. So you can have a database with a mix of important data (only considered safe when on two nodes) and fast, best attempt eventual consistency data, all in one place. Nothing else can replace Oracle at the top end while still having a lean enough mode to be competitive with NoSQL database when integrity isn't the top priority.
  We convert Oracle installs to PostgreSQL all the time at my day job. The main obstacles I keep seeing that don't have simple solutions are 1) using a lot of PL/SQL, 2) differences in query handling, such as OUTER JOIN behavior or reliance on optimizer hints, and 3) can't limit the resources used by individual users easily in PostgreSQL yet. I actually have a design outline for how to solve (3)...would only cost a fraction of a typical Oracle license to sponsor that feature. EnterpriseDB's version of Oracle comes with PL/SQL compatibility, but only in a commercial product that lags behind the open-source releases--and buying from them just switches which vendor you're locked into.
not excited by roman_mir · 2011-09-12 07:47 · Score: 4, Interesting

I am not excited about any of these changes unfortunately, they are somewhat specialized, though having synchronous replication and serializable transaction isolation sounds more useful than other stuff.
But there are real things that are missing. Most obvious is distributing of one SQL request into parallel processes or threads to speed up query execution on multi-core systems (which are all multi-core today). The other is the entire issue of attempting to calculate execution time and failing in various cases in the planner, like the really sad cases of completely mis-handling of the mergejoin estimates, which then forces people to set enable_mergejoin to false unfortunately, it's a sledgehammer approach, but otherwise things that can execute in a few milliseconds can take tens of seconds and even minutes instead.
There are so many ways to improve performance and really kick it up, and instead there are more features added. I think database performance is now more important for PostgreSQL than features (unless this means introducing parallelization of single SQL requests.)
Otherwise it's a good database, it already provides tons of features. The one weird thing that I find though, is that for replication or hot stand by or just for creating a dynamic backup, the segments that are written to the disk are always of fixed size.
You can modify the size, which is 16MB by default, but you can only modify the size when you configure the source code before compiling it: configure --with-wal-segsize=1 - this configures the segments to 1MB, which allows the second drive to last that much longer if all you are doing is using a second drive to keep dynamic backup (and that asynchronous backup method, by the way, the problem that they are solving with "synchronous replication", it's that you either have these segments fill up, and then the segment is written to disk, or you wait until time expires for segment to be written to disk if you set checkpoint_timeout). I imagine treating fixed sized segments is easier than generating segments that are of exact size equal to amount of data that was produced in a time period, but it's a waste of disk though.
The other big thing that I would love to have in a database is ability to scale the database to multiple machines, so have a logical database span multiple disks on multiple machines, have multiple postgres processes running against those multiple disks, but have it all as one scalable database in a way that's transparent to the application. That would be some sort of a breakthrough (SAN or not).

--
You can't handle the truth.
1. Re:not excited by fuzzytv · 2011-09-12 10:13 · Score: 3, Interesting
  
  Cassandra is just one of many NoSQL databases, but yes - NoSQL can be an answer to workaround the CAP theorem in some cases.
  But in many cases it's not a solution. If the data are relational, if you need full ACID, etc. then ditching "consistency" is not a choice. There are projects to build PostgreSQL clustering solutions, that may resemble RAC a bit, although none of them uses shared disk (so each instance needs a separate disk). Let's mention PGCluster, PGCluster II or Postgres-XC (aiming to build write-scalable cluster, something Cassandra does in the NoSQL world). Sure, all this has to follow the CAP theorem.
2. Re:not excited by fuzzytv · 2011-09-12 10:24 · Score: 5, Interesting
  
  The reliability probably improved since ENIAC, but the the question still is "when it is going to fail" and not if it is going to fail. Because it is going to fail - it may be a drive, CPU, PSU, a network switch, an AC unit, the whole AWS data center ... something is going to fail.
  The beauty of CAP theorem as I see it that it says "You can't get all three at the same time, face it." If you don't need the strong consistency (and with most apps you don't), then ditch it and it'll be much easier and cheaper to built and scale the system. I'd say once you realize this inner beauty, it clears your mind - something like a Zen of distributed computing.