PostgreSQL Wins LJ Editor's Choice Award
Quickfoot writes "PostgreSQL has won the LJ Editor's choice award for database servers the second year in a row and three times total (2000, 2003 and 2004). With the upcoming features in version 8.0 PostgreSQL is posed to do even better in 2005."
My angst drives me to new lows.
My happiness is simply more to suffer thru.
Current Mood: depressed.
Current database server: PostgreSQL
Listening to: Linkin Park
Oh, Linux Journal.
-- "I'm not a religious man, but if you're up there, save me Superman..."
PostgreSQL has for a long time been an excellent choice of database with full transaction support, sequences, trigers, unicode and much much more. I personally prefer working with it to expensive alternatives such as Oracle and DB2.
The PGAdmin3 tool also allows for those scared/lazy for the command line to use interogate it's schemas.
What I'm not too sure about is where any company is providing support on a level one would get from Oracle or IBM. Oracle's support is actually extremely good (in my experience). Anyone point to any companies ready to provide PostgreSQL support?
----
It looks like they are ( have been? ) quite big fans of MySQL in the past. Netcraft shows that a couple of years ago ( 4th Aug 2002 ) that their webserver was running with Mod Auth MySQL.
Apache/1.3.26 (Unix) PHP/4.1.2 mod_ssl/2.8.10 OpenSSL/0.9.6e AuthMySQL/2.20
Get your own free personal location tracker
Comment removed based on user account deletion
"I love PostgreSQL!"
Sadly, since their affections are misplaced, this is the last generation of PostgreSQL lovers.
"Derp de derp."
The thing I like about it is that they're not scared of going beyond the basic SQL [tablizer-mode:pseudo]relational model, with their business rules, advanced user-defined-types, and table inheritance. And MVCC is amazing!
I do wish there were two things:
(1) A Feature: multimaster asynchronous merge replication. This is Very Hard(tm) (and bogged down by patents in the corporate reich of america), but would make postgres a contender for distributed database applications.
(2) A model extension: distributed foreign keys. I think table inheritance isn't general enough. postgres has demonstrated that they're not afraid of going beyond the relational model. Well, I want a foreign key that can point to columns in two or more tables, and have a value from one OR another, yet still have its constraints enforced - why can't I specify e.g. A(X) REFERENCES C(Y) OR D(Z), so that A(X) may be a value from either C(Y) or D(Z), but nothing else?
Well, just start a few chapters into the excellent postgres manual, skipping the bits that say "introduction to sql" and paying close attention to bits like "advanced features" and "extending sql"...
What a difference a few years makes! Today, PostgreSQL is fast, extremely robust, and incredibly capable. It scales better than MySQL, preserves data integrity better, and makes it possible to do things that you can't do even in Oracle (for instance, just about all DDL in PostgreSQL is transactional: table creation/deletion, index creation/deletion, user creation/deletion, etc. This means, for instance, that you don't have to have an operator to alter a column's datatype: you just create the new column, copy the data into it, and then drop the old column, all within a single transaction, and if you screw up you can roll the whole thing back). It supports a number of different languages in which one can write stored procedures. The planner is quite good and yet is constantly improving.
About the only thing that PostgreSQL is not is auto-adaptive. That is, one still has to configure it to get optimal performance, same as with any database I've ever seen. The default settings provided in the raw distribution are, well, quite conservative: they're set up so that you can successfully start PostgreSQL even on a small, old system, which means you almost certainly have to tweak the configuration file in order to get truly good performance out of it.
In short, PostgreSQL has gotten very, very good in a relatively short period of time. It's so good compared with the other freely-available databases out there that I can't really think of a compelling reason to use anything else -- it's so good that if you need something more capable then you're going to have to pay big, big money.
And 8.0 will get you native Win32 support (with a point-n-click installer and everything, if I'm not mistaken). With its feature set (especially if you include what's going to be delivered in 8.0), that makes PostgreSQL-win32 a real SQL Server killer, as long as it performs well on that platform.
In short, PostgreSQL deserves very much to win this award, and the PostgreSQL development crew deserves a ton of kudos for producing such a kickass database system. I'm very much hoping that third-party software support for PostgreSQL gets as good as it is for MySQL, because the database engine is certainly deserving of it.
Use 'slashdot stuff' in the subject line in any email you send me if you want to get past the spam filter.
I am very very impressed by PostgreSQL. I am waiting for the geeks to slap a programmable GUI onto it. I know there are several that I can mention here, but none comes even half close to what M$'s Jet engine has with Access. I once tried to put together one with PHP but the project became very complex for my managing. I abandoned the effort. Slashdotters, can we start another project to put a decent GUI onto this DB engine? When that is done, business logic can be programmed at form level, reports can be done in XML and published to PDF, even better, error handling at form level with input masks etc can be done. And before you go on, I'd like to remind you that there is Kexi, pgAdmin and so many others. What do you think?
PostgreSQL is an astonishingly great piece of software, one of the few best I've ever worked with.
Thus, if someone tells me they're using mySQL, which is not nearly as powerful as PostgreSQL , I can immediately surmise many things about them, their organization, and their code.
However, if PostgreSQL becomes well-known through all of this publicity, entities might inadvertantly start using it, making it more difficult for me to evaluate cluelessness.
Google for Postgresql support and you'll find lots of support, including but not limited to:
Postgres has caused me alot of stife in installation, and their dump/restore scripts to this date still seem unreliable, which sure, may be due to that Debian Stable still being on 7.2 and I've never had the necessity to try out more recent releases.
However, there's many very cool things I hear you can do with it - I simply don't use them :P
I also think that the fact that postgres and mysql both compete is great - it motivates both teams to continually improve.
In summary, I have found MySQL to work perfectly in creating a leading edge content management system (http://www.silverstripe.com/); one where the development, selling and implementation of it has created jobs and forms a whole business. But I'm sure if Postgres had walked in the door four years ago and I had a more positive experience, I'd be touting that over Mysql in the same subjective vein.
PostgreSQL has buildin recovery. Upon every boot the system verifies if it was correctly shut down and if not it replays the Write Ahead Log (WAL) from the last checkpoint (which occur every X seconds and force all data to be written to disk) until the last committed transaction. That is all recovery you need in a database, and it has been in PostgreSQL for years.
And by all standards I know, PostgreSQL is a true multiuser database. It supports concurrent access and acls (at many levels, the cluster, the database, the schema, the table etc. you can tell exactly who can see and touch what). As a matter of fact, we happily run dozens of PostgreSQL databases for our webhosted customers.
Try out phpPgAdmin which does exactly what you were probably trying to set out to do in PHP.
I am NaN
good starting point, it's also on Safari so you could read it online.
Here it is on amazon
I am NaN
Actually, from PostgreSQL 8 you can use ALTER TABLE.
If all you are doing is presenting web content without much in the way of heavy-duty transactional requirements (ie., no money is changing hands, mostly doing output as opposed to input), then MySQL is fine. Of course, PG would be fine as well. I don't get all of the zealotry on both sides. They are two fine databases, most stuff is not "enterprise" level requirements, so no suprise that you see MySQL all over the place. That doesn't make PG any "better" or "worse"!
A house divided against itself cannot stand.
And 8.0 will get you native Win32 support (with a point-n-click installer and everything, if I'm not mistaken).
:) Pretty cursory look so far but I imported a db I'm working on from Debian and my more complex procedures ran fine.
You aren't mistaken. The sort of install a Windows user will love -- although he will have to find PgAdmin III and create his own desktop shortcut.
With features like point-in-time recovery, I think we are going to hear a lot more about PostgreSQL this year.
PostgreSQL has, for some time, been more scalable and more reliable than MySQL, which is, IMNSHO, a mere toy by comparison. I've pushed PostgreSQL to several major corporate clients, who have been more than happy with it (indeed, some have added stuff on and pushed the changes back to the main source base, something I would not have considered likely from the sort of corporates I'm talking about)
Hell, it's even for simple, single-user database apps, especially when linked to a good ORM layer (EnterpriseObjects/GNUstep DB et al). Why more people doing web development don't push it, I don't know. Everything's bloody MySQL. Blech.
Whats the problem with running VACUUM? You just start VACUM as a cronjob. It's not as if it stops the database from working.
So what's this point-in-time recovery and what's it do better?
As I understand it, prior Point-in-Time recovery the WAL files would replay ALL of the transactions they contained, you could not pick were to end them.
With PIT you can tell the system to replay just to a certain point or all the way.
With the new utilities included with PostgreSQL 8.0(now beta) you can also use this as a backup system (it was not easy to do prior to this). Create a backup dump and load it into your backup server. Copy (rsync would work here) the WAL files over to the backup server and replay them as they compleate. When you need the backup, you can (using an included util) replay the last partial WAL file and bring the system up. If I were do this though I would most likely shrink the size of the WAL files from the stock 16MB to something a little smaller (unless your DB was VERY busy...).
BWP
Prior to 8.0 you just about had to do that, but with the ability to use gzip to compress your archives, it was not too bad unless you had lots of bytea or blobs in the DB.
But with the advent of Point-in-Time recovery in 8.0 thats changed. With the new utils you can make a dump of the system and just copy it and the WAL files around. Database crash (that is not handled automaticly)? Just load the backup and replay the WAL files to whatever point in time you want them. You can even use partial WAL files.
BWP
Continuous VACUUM actually is the solution. It just should be automatic instead of manual.
Databases inevitably have some point in a transaction where they require 2 versions of the same row to be present in persistent storage (on disk). That obviously means that the old version (or the new version in case of a rollback) has to be removed at some point in time. Some databases choose to do this on transaction commit, adding a little bit of overhead to each transaction. Some databases choose to do this in a separate process at scheduled intervals, reducing the commit overhead but adding the overhead of having more versions on disk. PostgreSQL has choosen the second path, and VACUUM is the cleanup process.
Which solution is the best depends on the requirements. As you have discovered, tables with a high turnover get easily bloated when the cleanup is not done frequently enough. The solution for that is to cleanup more often, with cleanup at commit of each transaction as the higher limit. But quite likely it is sufficient if cleanup occurs every X transactions or every Y seconds.
The intention was to have the pg_autovacuum utility integrated in the backend to manage the vacuum process for all databases in a cluster. If enabled, it would allow for automatic vacuum and analyze on tables, with some logic to learn if tables are high-turnover or fairly static. Unfortunately, the patch for that didn't make it into the 8.0 beta.
I am sure that there are people who know other things.
You can always dismiss any heated debate as simply a result of nit-picking or personal preferences.
However, the differences here are quite substantial. And it isn't really 'mysql vs postgresql' - it's more like 'mysql vs inexpensive standards-compliant database solutions'. What really irks most experience database developers about mysql is that mysql abandoned decades of standards and standard features - while insisting that 90% of the users didn't need transactions, triggers, views, etc. That's disingenuous misinformation.
mysql still has a role out there - since it has such a wide host base and so much 'mind share'. But this is all marketing. In almost any technical comparison, Postgresql now comes out on top.
Furthermore, since postgresql is very similar to other relational databases - migration between it and oracle, db2, sql server, etc is relatively painless (unless you went overboard on stored procs, etc). This means that your investment in postgresql is fairly 'future-proof'. If on the other hand, you've gone with mysql, you will always have a more difficult migration - and may fail to get anticipated performance benefits since you are probably using the target database is a way not recommended by its vendor (joining inside application rather than inside sql, etc). This is especially true of the many mysql applications out there that believed MySQL AB when they told people that 90% of the applications didn't need transactions (!)
Of course, you can wait for mysql to catch up to everyone else in the database features area, and perhaps they'll try to become more standards compliant along the way. But that's going to be a tough slog for them, probably involving a complete rewrite. Could take quite a while, and there may be no easy transition for today's mysql apps.
Sibling rivalry? hardly
You are confusing checkpoints with savepoints.
The most expensive task of a database is I/O. If you have a transaction that alters data in 3 places in two tables and that transaction commits, the database has to wait until the data has hit the disk. (Data hitting the disk is the Durability part from ACID.) With an average disk latency of 7 miliseconds doing 6 I/O operations translates to 42 miliseconds waiting before the database can confirm the commit to the client.
Since this is way to long, these 6 writes are pushed out to the buffers in RAM, but writing of these changes to disk is not required. Instead, all six changes are sequentially written to a special file, the Write Ahead Log, and that log file hitting the disk is sufficient to guarantee durability. That is only 1 I/O operation, a significant win in performance.
Obviously at some point the buffers have to be written to disk anyway. This happens when they are so inactive that they drop out of cache the normal way. But to know that all changes to a file have been pushed out to disk, which we need to know for certain before we can recycle the WAL, files are occasionally forced to disk. This forcing to disk is called a checkpoint, and it happens every X seconds, when there has been a certain amount of activity or when a superuser issues the CHECKPOINT command.
The code examples you provided are a way of creating a SAVEPOINT, a point inside a transaction that you can roll back to without having to roll back to the beginning and restarting everything. The prefered syntax is:
BEGIN TRANSACTION;
SELECT foo FROM bar;
SAVEPOINT selectOK;
INSERT INTO bar (foo) VALUES (1); -- duplicate error, roll back to savepoint
ROLLBACK TO selectOK;
UPDATE bar SET foo = foo + 1 WHERE xyz;
COMMIT;
I don't have a recommendation but I do have a strong anti-recommendation.
I normally recommend the associated Oreilly book for most computer topics but in the case of PostgreSQL I have to say Practical PostgreSQL is the worst Oreilly book that I have read. Most of the book is a regurgitation of the freely available postgres manual with an additional chapter that is a blatant plug for the author's proprietary product.
I thought Oreilly could do no wrong until I bought this piece of crap.
First and foremost it has merge replication and real application clustering. Of course you need to spend buttloads of money to get the clustering. It's also extraordinarily self aware. For example you can have oracle email you the list of 10 slowest queries that ran yesterday. It's full of amazing stats about itself that really helps the DBA out when they are trying to troubleshoot weird problems. Oracle also has tons of other features such as dealing with XML data using SQL or xpath queries.
To be fair postgres has some features oracle does not. Things like user defined operators and aggregate fuctions for example. With postgres you can also write code in C and call it from an SP, pretty powerful even though it's quite dangerous.
Postgres is great and it will probably meet the needs of 99% of the people on this planet but if you are amongst the 1% that need oracle nothing else will do.
evil is as evil does
is learning how to pronounce it.
Hell, I just learned how to pronounce MySQL.
--
Solitary, self-taught geek.
The "old" row cannot be deleted at transaction commit! Other transactions might still be reading it.
That's why PostgreSQL has VACUUM.
Moreover, VACUUM by itself just marks a row in the Free Space Manager (FSM) as free and writes over it at the next opportunity. VACUUM FULL will actually shrink the size of the table if that's what you need.
VACUUM should be automatic though, and eventually will be.
Social scientists are inspired by theories; scientists are humbled by facts.
uses direct / raw i/o
That has been hashed out on the mailing lists time and time again.
PostgreSQL is not a filesystem.
Where is the real use-case? There is a huge amount of extra code and extra bugs, and a minor performance optimization at BEST.
Additionally, PostgreSQL would then not be able to make use of OS disk buffers.
So, it would ONLY be useful as a MINOR performance optimization on PostgreSQL only machines, running NO other applications. All that at the cost of so much code and so many bugs?
You could just as well argue for any application that it should implement it's own OS and filesystem.
That being said, I'm sure that some people benefit from the Orcale feature. I doubt you'll see it in PostgreSQL any time soon though.
Social scientists are inspired by theories; scientists are humbled by facts.
Any speed advantage MySQL has is only realized by using MyISAM tables rather than InnoDB tables. However, you sacrifice many of MySQLs newer features (like transactions) by using MyISAM.
Other than that, they are reasonably close in speed. However, I trust PostgreSQL to have better worst-case performance, and better query planning.
It depends on many factors and you should examine your application needs. You should also examine factors other than performance. Sometimes the performance might be similar, but PostgreSQL provides many more options that take burden away from the application, meaning that PostgreSQL might be a big performance win.
In a recent project I did that was DB-intensive, it turned out that PostgreSQL was much faster because of complex queries and the reduced amount of processing my application needed to do (with MySQL I needed to transform the data in my application after I got it from MySQL). I was also able to optimize queries by implementing user-defined aggregate functions (which answered some specific needs from my application).
Social scientists are inspired by theories; scientists are humbled by facts.