PostgreSQL 9.1 Released
With his first posted submission, sega_sai writes "The new version of open source database PostgreSQL was released. This version provides many important and interesting features such as synchronous replication, serializable snapshot isolation, support of per-column collations, K-nearest neighbor indexing, foreign data wrappers, support of SELinux permission controls and many others. The complete list of changes is available here"
I'm kinda shocked. This used to be a nice neighborhood.
So...how does PostgreSQL compete with Oracle nowadays as far as features go (specifically, spatial, and data-guard-like replication)? Anybody here tried making the switch?
If you wish to use their replication implementation to increase performance, you will probably have to look elsewhere, I'm afraid. In the event that your primary server fails, you are required to promote one of the existing slaves to be the new primary server and all other slaves will require a fresh data dump from the new master. Maybe in another year (when 9.2 gets released) it will be ready for the masses.
I've tried a few SQL databases, and my favorite of them all is Postgres. With Postgres I don't get nearly as many syntax/logic surprises as with SQL Server.. and I don't get nearly as many performance surprises as with MySQL.
I mean for some reason Microsoft couldn't give us a LIMIT/OFFSET clauses for the trivial operation of implementing paging leaving us instead to do sub-queries. What the hell? That's like making a programming language with arrays but leaving out the indexing operator.
I am not excited about any of these changes unfortunately, they are somewhat specialized, though having synchronous replication and serializable transaction isolation sounds more useful than other stuff.
But there are real things that are missing. Most obvious is distributing of one SQL request into parallel processes or threads to speed up query execution on multi-core systems (which are all multi-core today). The other is the entire issue of attempting to calculate execution time and failing in various cases in the planner, like the really sad cases of completely mis-handling of the mergejoin estimates, which then forces people to set enable_mergejoin to false unfortunately, it's a sledgehammer approach, but otherwise things that can execute in a few milliseconds can take tens of seconds and even minutes instead.
There are so many ways to improve performance and really kick it up, and instead there are more features added. I think database performance is now more important for PostgreSQL than features (unless this means introducing parallelization of single SQL requests.)
Otherwise it's a good database, it already provides tons of features. The one weird thing that I find though, is that for replication or hot stand by or just for creating a dynamic backup, the segments that are written to the disk are always of fixed size.
You can modify the size, which is 16MB by default, but you can only modify the size when you configure the source code before compiling it: configure --with-wal-segsize=1 - this configures the segments to 1MB, which allows the second drive to last that much longer if all you are doing is using a second drive to keep dynamic backup (and that asynchronous backup method, by the way, the problem that they are solving with "synchronous replication", it's that you either have these segments fill up, and then the segment is written to disk, or you wait until time expires for segment to be written to disk if you set checkpoint_timeout). I imagine treating fixed sized segments is easier than generating segments that are of exact size equal to amount of data that was produced in a time period, but it's a waste of disk though.
The other big thing that I would love to have in a database is ability to scale the database to multiple machines, so have a logical database span multiple disks on multiple machines, have multiple postgres processes running against those multiple disks, but have it all as one scalable database in a way that's transparent to the application. That would be some sort of a breakthrough (SAN or not).
You can't handle the truth.
Just one small nitpick...sqlite is really meant as an embedded database into an application, it's not a full-fledged database like any of the others mentioned (it doesn't have networking, for example). I suppose you could be scaling up from an embedded sqlite db, but that suggests your application has gotten so big that an external database is necessary.
It's also one of the backing store options for Apple's Core Data framework.
The odds of this happening are NULL...
"I say we take off, nuke the site from orbit. It's the only way to be sure."
Already we can tell you are doomed to failure. You are writing a script while ignoring the fact that this functionality already exists. Just cat /dev/random into your db.
How about an installation that doesn't suck?
spectateswamp, is that you? :)
A successful API design takes a mixture of software design and pedagogy.
But I run Debian Stable on all my servers.
Insensitive clod!
I work with Postgres. I'm "for" Postgres. I think it's great. But you know what? I barely understand what most of these features are. And of the ones I understand, I have no plans of using.
Hate to be a troll, but databases are boring. And Postgres is boring. The database people I know are boring. The guy I know with the most Postgres knowledge I know, is someone I respect as a worker, but is also the most passionless person I have ever met. No one wants an exciting data storage product. They want it to be "mature" and "reliable" and "predictable." I bet that the better Postgres gets, the more boring it'll get. And these exciting "noSQL" databases will all die.
Democracy Now! - your daily, uncensored, corporate-free
I've thought about this problem, and a NoSQL type structure would actually work better. There would simply be too many columns for a relational model to work well.
I are already ahead of you on that one, friend haha. I am try writting for SimpleDB in Erlang now. I hope to execute very soon yes. Very soon we have massively columnar database stretching toward infinity haha!
Thank you all for being so wonderful, slarshdot!
The other big thing that I would love to have in a database is ability to scale the database to multiple machines, so have a logical database span multiple disks on multiple machines, have multiple postgres processes running against those multiple disk
This exists for Postgres in the form of Yale's HadoopDB project: http://db.cs.yale.edu/hadoopdb/hadoopdb.html http://radar.oreilly.com/2009/07/hadoopdb-an-open-source-parallel-database.html
as well as for commercial forks of Postgres such as EMC's GreenPlum.
The only feature that doesn't already exist in ms sql in the same context, probably oracle and mysql too.
I can almost... deploy psql, do what I need to in it and spit back out sql or something. This seems like a really cool feature if it goes the way of OLEDB and becomes a standard, since with oledb I can use any programming language that talks to databases (php, .NET, C) and still talk to my sql instance.
What I can't think of is... why would I ever use psql over taking the straighter ms sql approach?
I'd love to wrap mysql data in ms sql and vice versa in a reliable manner, and better yet wrap sql 2000 in 2008 to prevent the coalescence goodness, but at least the feature exists somewhere.
MGRID is also a PostgreSQL fork that scales to multiple machines: http://www.mgrid.net/
I would love to see a column storage engine for PostgreSQL. Vertica was forked from PostgreSQL 8.1.1 and has phenomenal storage capabilities that directly translate to performance.
Is something like this anywhere on the horizon for the mainline PostgreSQL? Everyone would use it...
There is no "useless" crap being introduced here, these are higher end features that many need and have been waitnig. Just because you as say a small business or hobbyist or startup venture need a small set of database features, remember there are others who are needing bigger features, and as your needs grow over time so you may need these things too.
You can stick with a particular postgresl version of a distribution that is still getting fixes and patches, for example the postgresql 8.1.9 in Redhat / Centos / Scientific Linux 5.x
Meanwhile, there are applications in big business that need and want the synchronous replication. There are geolocative applications needing nearest neighbor indexing. There are applications that handle multiple character sets that need multiple collations of table data.
I myself have clients that need synchronous replication with a couple stand-by servers, we've been making do with an inferior "bandage application" approach to do the same thing.