PostgreSQL 9.1 Released
With his first posted submission, sega_sai writes "The new version of open source database PostgreSQL was released. This version provides many important and interesting features such as synchronous replication, serializable snapshot isolation, support of per-column collations, K-nearest neighbor indexing, foreign data wrappers, support of SELinux permission controls and many others. The complete list of changes is available here"
So...how does PostgreSQL compete with Oracle nowadays as far as features go (specifically, spatial, and data-guard-like replication)? Anybody here tried making the switch?
I am not excited about any of these changes unfortunately, they are somewhat specialized, though having synchronous replication and serializable transaction isolation sounds more useful than other stuff.
But there are real things that are missing. Most obvious is distributing of one SQL request into parallel processes or threads to speed up query execution on multi-core systems (which are all multi-core today). The other is the entire issue of attempting to calculate execution time and failing in various cases in the planner, like the really sad cases of completely mis-handling of the mergejoin estimates, which then forces people to set enable_mergejoin to false unfortunately, it's a sledgehammer approach, but otherwise things that can execute in a few milliseconds can take tens of seconds and even minutes instead.
There are so many ways to improve performance and really kick it up, and instead there are more features added. I think database performance is now more important for PostgreSQL than features (unless this means introducing parallelization of single SQL requests.)
Otherwise it's a good database, it already provides tons of features. The one weird thing that I find though, is that for replication or hot stand by or just for creating a dynamic backup, the segments that are written to the disk are always of fixed size.
You can modify the size, which is 16MB by default, but you can only modify the size when you configure the source code before compiling it: configure --with-wal-segsize=1 - this configures the segments to 1MB, which allows the second drive to last that much longer if all you are doing is using a second drive to keep dynamic backup (and that asynchronous backup method, by the way, the problem that they are solving with "synchronous replication", it's that you either have these segments fill up, and then the segment is written to disk, or you wait until time expires for segment to be written to disk if you set checkpoint_timeout). I imagine treating fixed sized segments is easier than generating segments that are of exact size equal to amount of data that was produced in a time period, but it's a waste of disk though.
The other big thing that I would love to have in a database is ability to scale the database to multiple machines, so have a logical database span multiple disks on multiple machines, have multiple postgres processes running against those multiple disks, but have it all as one scalable database in a way that's transparent to the application. That would be some sort of a breakthrough (SAN or not).
You can't handle the truth.
Just one small nitpick...sqlite is really meant as an embedded database into an application, it's not a full-fledged database like any of the others mentioned (it doesn't have networking, for example). I suppose you could be scaling up from an embedded sqlite db, but that suggests your application has gotten so big that an external database is necessary.
It's also one of the backing store options for Apple's Core Data framework.
It's not trivial to figure out, but we've been deploying PostgreSQL 9.0 without the problem you describe (must do a fresh dump from the master) for a while now. The repmgr software we've released takes care of all the promotion trivia. Worst-case, unusual situations can require you use a tool like rsync to make an out of date standby node into a copy of the new master. That's not the expected case though.
What's your point?
Are you going to tell me word processors are boring, spreadsheets put you to sleep, and calculators suck the life out of the party?
For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
The other big thing that I would love to have in a database is ability to scale the database to multiple machines, so have a logical database span multiple disks on multiple machines, have multiple postgres processes running against those multiple disk
This exists for Postgres in the form of Yale's HadoopDB project: http://db.cs.yale.edu/hadoopdb/hadoopdb.html http://radar.oreilly.com/2009/07/hadoopdb-an-open-source-parallel-database.html
as well as for commercial forks of Postgres such as EMC's GreenPlum.
It's not their fault they are completely normalized.
You failed to trigger the proper result.
Every mans' island needs an ocean; choose your ocean carefully.