PostgreSQL 8.3 Released
jadavis writes "The release of the long-awaited PostgreSQL version 8.3 has been announced. The new feature list includes HOT, which dramatically improves performance for databases with high update activity; asynchronous commit; built-in full text search; large database features such as synchronized scans and reduced storage overhead; built-in SQL/XML support; spread checkpoints; and many more (too many major new features to list here). See the release notes for full details."
I'm a postgresql fan, I've considered it a superior database for years.
However, it seems every client I come into contact with (I am a freelance software designer) seems to believe mysql is the only open source database available and certainly the best one for all jobs.
Mysql is great (or at least, was great) for fast connection times and speed but for a stable, feature-rich excellent database, postgresql has always been ideal.
It's just a shame no one seems to be aware of it.
Are you thinking like you'd do in SQL server (IIRC) or MySQL, where you have a db reference in the table list (i.e., SELECT * from db1.table1, db2.table2 WHERE [join clause])?
you'd probably just want to use a schema for that; the concept maps more or less the same way.
The 8.3 release notes list the Bucardo project http://bucardo.org/ for multi-master replication. I haven't used it... is there something that it is lacking that you think would be addressed by bringing it into the core code base?
Asynchronous commit is very useful in applications where the thing that's important about the data is its statistical distribution, and not the individual data points per se.
Sure. Like so many times in software development "only you can decide whether data loss is acceptable to you. But there are many classes of applications, including high-speed data ingest programs whose only goal is to get the stream of data into the database, where commit-but-don't-wait is not only acceptable but very desirable performancewise." (Tom Kyte)
I just don't trust anything that bleeds for five days and doesn't die.
Are you sure this is such a disaster? As far as I can tell this only means that executing "COMMIT" doesn't block (wait) until the commit has actually happened but returns immediately, and the actual operation is performed "later". The data still goes through the journal (WAL), is still fsynced when needed, etc.
-- Sig down
A couple such comparisons already exist. They may be a year or two out of date however.
Higher Logics: where programming meets science.
Here's a feature matrix of new features vs. old versions. It's easy to see that 8.3 is a huge upgrade.
http://www.postgresql.org/about/featurematrix
I don't know, seems to me a cross database join is neither simple nor proper structure. If you are stuck with it schemas would probably work, painless in Active Record, perl DBI requires 'schema.table' syntax.
chris@xanadu:~$ whatis /.
/.: nothing appropriate.
The new async commit feature bypasses the requirement that records physically hit disk in order to complete a commit. If you must wait for a disk commit (typically enforced by fsync), the maximum number of true commits any one client can do is limited by the rotation speed of the hard drive; typically an average of around 100/second for a standard 7200RPM disk with PostgreSQL. There is no way whatsoever to "tune things and speed things up" here; that's how fast the disk spins, that's how fast you get a physical commit, period.
In order to accelerate this right now one needs to purchase a disk controller with a good battery-backed disk controller and pray it always works. If it doesn't, your database might be corrupted. With async commit, you can adjust the commit rate to something your disks can keep up with (say 50/second) just with this software feature while still allowing a write rate much higher than that, and at no point is database corruption possible (from this cause anyway). This makes people who want to use PostgreSQL in things like shared hosting environments have an option that allows heavy writes even for a single client while having a reasonable data integrity policy--only server crashes should ever lose you that brief period since your last true commit. That's a fair trade for some applications (think web message boards for example) and lets PostgreSQL be more competitive against MySQL based solutions in those areas.
8.2 was released over a year ago - this is not a minor revision.
sic transit gloria mundi
PostgreSQL vs. MySQL WIki
Seems to be features only, no performance.
"I use a Mac because I'm just better than you are."
the simple solution is often easier and faster than the complex solution
When you want something that walks like a duck and talks like a duck, the simplest, fastest, and easiest solution is to get a duck. What you want is done by schemas in postgresql. If you're really doing the separate databases for "performance" reasons, then one presumes that at some point you're going to be putting the databases on separate servers, in which case you'll be wishing you had started with dblink in the first place.
What you're asking for may be "simple" for you, but what about the server side? Rewriting the entire database engine to have a monolithic superdatabase to store system catalogs and authorization information for the databases underneath is none of "simpler, faster or easier", while it does the exact same thing for you as having a "super" database with schemas underneath it using the search_path setting for applications that you don't want to rewrite with schema.table syntax for all of their queries.
If I have been able to see further than others, it is because I bought a pair of binoculars.
I have used PostgreSQL as my primary db since 2000 (version 6.5!) and I have watched it for a while.
PostgreSQL had a number of problems in the past which made it hard to work with including:
1) No ALTER TABLE DROP COLUMN support and other things needed for prototyping (fixed in 7.2 iirc)
2) Issues with dependency tracking in text SQL dumps (fixed in 8.0) meaning that some times one had to hack dumps to get them to restore properly.
3) maintenance tasks required exclusive locks on tables (corrected sometime in 7.x)
4) other things which generally made it somewhat user-unfriendly.
Note that all of the above issues have been corrected as of three years ago. However by that point a lot of the small less serious appliations were more or less wedded to MySQL. For large, shared databanks, PostgreSQL and Ingress II (and maybe Firebird) are the only serious choices. MySQL fails to be useful as soon as you end up requiring the sort of thing described in the title of Codd's original paper....
LedgerSMB: Open source Accounting/ERP
Disks configured for write back with a big cache can handle db writes very quickly. An fsync won't save you, it just asks the controller to verify the write took place. If the controller lies, well, too bad. From the fsync man page: "If the underlying hard disk has write caching enabled, then the data may not really be on permanent storage when fsync() / fdatasync() return."
All that said, I do think there is a place for async commits / write-back caching. Not all data going into a database is mission critical; it depends on the app. This will be a real boon for packet logging, for example. If you work in a bank, well, that's obviously a different story.
This is one of those things that's painful only until you've crossed a certain threshold of information, then it never bothers you again. It's better now but still harder to put the pieces together than it should be.
Start with the documentation on creating a cluster: http://www.postgresql.org/docs/current/static/creating-cluster.html In 8.3 the default of using auth you mentioned has been removed, for the reasons you described. So it now runs as unsecured for local users by default and you have to worry about this yourself, which since it reduces the frustration at getting started was deemed an improvement.
That page suggests some options you can pass to initdb to adjust the default security level. Now you want to look at the initdb docs: http://www.postgresql.org/docs/current/static/app-initdb.html See that if you use -W/--pwprompt (same thing) you can assign a password to the database superuser at cluster creation time. If you do that, you can now change the default authentication scheme to password-based (-A md5 passed to initdb will do that), you'll be secure, and you'll have one user you can login as (postgres) to create more.
To see what other authentication methods are available and to learn what your options are look at http://www.postgresql.org/docs/current/static/client-authentication.html The one you really need to dive into is pg_hba.conf which is the magic text file to edit here. A new one of those will be put in the base directory of your new database cluster. Open that file up, look at the documentation, and you'll need to add a line to add network support like one those in the examples. Probably something like
host postgres all 192.168.12.0/24 md5
(allow access to anybody on the 192.168.12 subnet access the database with a password)
That should get you past the ugly initial hurdles. The next document you may need is how to add more users: http://www.postgresql.org/docs/current/static/sql-createrole.html
again look at the examples first and then backtrack to the parameters, will make more sense that way. After that you'll want to create more databases with createdb: http://www.postgresql.org/docs/current/static/app-createdb.html
And then you should be able to find your away around from there using the psql command line tool.
Note that once you get past accepting connections over the network, you could use a tool like pgAdmin III to handle the rest of this work using a slicker interface. There's even a copy of it bundled with the Windows installer you can use on such a client to administer a remote server running a real OS. It's of course possible to install pgAdmin manually on other platforms as well, see http://www.pgadmin.org/ for other verions (note that binary packages for platforms like RPM don't show up in every release, you have to go back to v1.8.0 to get the last full set of packages).
It would seem not to. Yeah, I wish it had that, too. The other posters who keep telling you to get faster IO miss the idea of having extra CPUs handling locking, cache management, etc. so that even single running queries are faster.
Dewey, what part of this looks like authorities should be involved?
A PostgreSQL schema is just a namespace qualifier; it functions just like MySQL's cross-database joins and is conceptually similar. It isn't a full copy of your database DDL.
One of the issues with this discussion is that there is a disjunct in the terminology used by MySQL users and that used by the rest of us.
:-)
MySQL uses "database" and "schema" synomymously (note that their "information schema" is a separate "database"). In that sense, PostgreSQL has long had cross-schema joins in the same way MySQL does. It is just slightly harder to set things up so that you create tables in the right schemas. (hint: SET search_path='schema_name');
In this way, I do a *lot* of work using cross-schema queries. They work like a charm on PostgreSQL.
What PostgreSQL calls a "database" is analogous to the entire cluster of "databases" on MySQL. There is no analogy to "PostgreSQL cluster" on MySQL (i.e. a set of databases controlled by the same process, each of which contains multiple schemata). For that matter, last time I checked, there was no equivalent on Oracle either.
So "PostgreSQL doesn't have an equivalent to MySQL's cross-db queries" is an issue of terminology rather than substance.
At the same time, as nconway points out, PostgreSQL's MED solutions do not optimize well across distributed queries (i.e. where parts of the queries have to go through connections to other databases/servers). For MySQL users, these are like Federated tables, and I would expect similar (or worse) optimization problems in this area on MySQL. Hence when you are trying to use PostgreSQL as a distributed database, you are in for headaches. Fortunately this is not what the OP asked for
LedgerSMB: Open source Accounting/ERP
I guess nobody appreciates humor here. Well, not if it's against the fanboys.
I'll stand by it though-- switching to MySQL from Postgres made my life significantly simpler:
1) you can install MySQL easily
2) MySQL has great vendor support
3) my experience is that MySQL performs significantly better in the general case (i.e. I'm not spending my entire life tweaking performance)
Many an honest thing were said in jest, I suppose.