vvizard writes "After almost a full year of development since PostgreSQL v7.1 was released, the PostgreSQL Global Development Group is proud to announce the availability of their latest development milestone ... PostgreSQL v7.2, another step forward for the project."
Congrats to the PostgreSQL Development team!
by
thing12
·
· Score: 3, Insightful
This is a huge step forward in making PostgreSQL ready for deployment in the enterprise. Eliminating the locking vacuum in favor of a separate statistics gathering process is clearly the best part of this release.
The only major hurdle left is replication built into the server.
Re:Congrats to the PostgreSQL Development team!
by
thing12
·
· Score: 5, Interesting
Eliminating the locking vacuum...
I should probably clarify that - the full locking vacuum was separated out into two parts one which analyzes statistics and doesn't lock the tables and another which does what the old vacuum did by reordering data blocks to shrink the size on disk.... the bonus is that before you might run vacuum once a week or so because of the impact it has to a production system by doing a full lock on each table it vacuumed, now you can run it much more frequently as all it consumes is cpu time. Shrinking size on disk is nice, but it's the statistics that help the query planner turn SQL into faster queries.
Re:Congrats to the PostgreSQL Development team!
by
GooberToo
·
· Score: 4, Insightful
Always remember that table statistics are used for approximate best guesses for inputs to the query optimizer. It is not uncommon or unheard of to actually see somes types of queries run slower after table statistics have been updated. I've seen this on Oracle, Sybase and SQL Server. I doubt that this is an issue unique to those RDBMS since the conceptual implementations and basis for algorithms tend to all be more or less the same.
Re:Congrats to the PostgreSQL Development team!
by
nconway
·
· Score: 4, Interesting
I should probably clarify that - the full locking vacuum was separated out into two parts one which analyzes statistics and doesn't lock the tables and another which does what the old vacuum did by reordering data blocks to shrink the size on disk....
This is incorrect (I believe it describes the situation after 7.1: the VACUUM ANALYZE command only needed to lock the table exclusively when VACUUMing, only a read lock was needed for ANALYZE).
In 7.2, the ANALYZE command can now be used separately, as you say. However, there are other (more important) improvements: ANALYZE only takes a look at a statistical sampling of the rows in the table. This means that collecting statistics on even enormous tables is very fast. Furthermore, VACUUM has been made "lazy" by default: this means that it doesn't attempt to reclaim space as aggressively as before, but it no longer requires an exclusive lock on the database (instead, it cooperates with other DB clients). The old behavior is available as "VACUUM FULL", and it is suggested whenever you need to reclaim a lot of diskspace (e.g. you delete hundreds of thousands of rows of data and need the space).
you might run vacuum once a week or so
It was (and is) suggested that you run VACUUM once per day.
it's the statistics that help the query planner turn SQL into faster queries.
As far as I know, you only really need to update your planner stats when you change the statistical distribution of your data. Of course, running ANALYZE's reasonably often won't do any harm, and is a relatively cheap operation (performance-wise).
Re:Congrats to the PostgreSQL Development team!
by
thing12
·
· Score: 4, Informative
I'm not disagreeing with anything you said, in fact you all but reiterated everything I said.
The 7.1 vacuum analyze required table locks. Doesn't matter which phase of it required locks - it required exclusive locks because it vacuumed. By breaking that into a separate commands the need for downtime is reduced drastically (down to the example which you point out - deleting thousands of rows at a time).
I know that you're recommended to run vacuum once per day, but I found that on a large database running on a fast server a daily vacuum took nearly 30 minutes to complete... that's 30 minutes of sequentially locked tables. Can't afford to do that every day - moving it to once a week may have degraded performance but it reduced the downtime window from 30 minutes per day to 1 hour per week.
I'm just happy that I don't have to bring a production server to its knees once a week (or for that matter once a day) just to do some table maintenance.
Re:Congrats to the PostgreSQL Development team!
by
nconway
·
· Score: 4, Informative
I'm not disagreeing with anything you said, in fact you all but reiterated everything I said.
No I didn't, read my post again.
The 7.1 vacuum analyze required table locks.
PostgreSQL has lots of different types of locks of varying granularities. Saying "table locks" doesn't mean a whole lot.
Doesn't matter which phase of it required locks
It does though -- in 7.1, splitting vacuum and analyze internally reduced the time that an exclusive lock needs to be held.
By breaking that into a separate commands the need for downtime is reduced drastically
This is where you're wrong. The reduction in downtime has nothing to do with allowing ANALYZE to be executed separately. It is entirely the result of the new vacuum code (which is "lazy", unlike a VACUUM FULL -- which does a 7.1-style VACUUM). In 7.2, running VACUUM (with or without ANALYZE) is fast, and doesn't require an exclusive lock -- so your database can continue serving clients while a VACUUM is executing. Whether you choose to run ANALYZE at the same time or separately is really irrelevant.
I'm just happy that I don't have to bring a production server to its knees once a week (or for that matter once a day) just to do some table maintenance.
On that, we agree;-)
Re:Congrats to the PostgreSQL Development team!
by
bwt
·
· Score: 3, Informative
That is definately a risk. It is often possible to harm overall system performance by upgrading an RDBMS that includes optimizer improvements. Any changes to an optimizer will change execution plans. Hopefully most of them get better, but a few get worse, often dramatically worse. Finding the ones that get worse and tuning them is an important activity. Bad SQL plans are often the biggest impact tuning activity, so it is very important to understand what will happen to your specific application before you make changes that affect how your SQL-statements are implemented.
This is one area in which Oracle shows its power over the open source databases. (It's also a big oportunity because Oracle can be improved on). Oracle can actually tap into continuous statistics gathering on a per SQL level by using oracle's v$sqlarea dictionary view. If you need high-powered scrutiny on a particular activity, you can trace the session to logs and see the row statistics at every step of the exectution plan. Oracle has more optimizer hints, and has a facility to "pin" an execution plan, so that it won't be reevaluated if optimizer behavior changes. Oracle is working toward server-side SQL tuning, where you can ID bad SQL's and "intercept" them at runtime by adding hints on the server side. That will be an absolutely huge feature, since often SQL hits your system that you can't directly control but can predict.
Ingres code did make it in to Postgres, but I don't know if any of it is in PostgreSQL. I'm sure there was a lot initially, but I seriously doubt there's any there anymore.
Keep in mind that when you abbreviate PostgreSQL to "Postgres", you're really talking about a seperate, older product.
I guess it doesn't really matter, since Postgres is long gone, but it still annoys me every time I see it.
highlights...
by
bob@dB.org
·
· Score: 5, Informative
VACUUM: Vacuuming no longer locks tables, thus allowing normal user access during the vacuum. A new "VACUUM FULL" command does old-style vacuum by locking the table and shrinking the on-disk copy of the table.
Transactions: There is no longer a problem with installations that exceed four billion transactions.
OID's: OID's are now optional. Users can now create tables without OID's for cases where OID usage is excessive.
Optimizer: The system now computes histogram column statistics during "ANALYZE", allowing much better optimizer choices.
Security: A new MD5 encryption option allows more secure storage and transfer of passwords. A new Unix-domain socket authentication option is available on Linux and BSD systems.
Statistics: Administrators can use the new table access statistics module to get fine-grained information about table and index usage.
Internationalization: Program and library messages can now be displayed in several languages.
.. with many many more bug fixes, enhancements and performance related changes...
Re:highlights...
by
Zeut
·
· Score: 5, Informative
One issue that is not mentioned in the release highlights is the marked improvement that is now available for SMP boxes. In some cases throughput has been increased by more than a factor of 2.
I have only one feature request for PostgreSQL...
by
Trepalium
·
· Score: 3, Interesting
I really wish there was an embedable version of PostgreSQL... It's a very good database, but it's sometimes a real pain to write a program that ties together a SQL database with anything else, unless it's a local-use only program. I know MySQL added this feature in 4.x (but their transaction support is too new, IMO).
-- I used up all my sick days, so I'm calling in dead.
Re:some might disagree
by
dietz
·
· Score: 5, Interesting
That's a cheesy way to dispute his claims.
They're not use MySQL to store all their critical data. They're dumping all their data, presumably from some other more reliable database (Oracle, it sounds like), into mysql for quick web searches.
IOW, they're using mysql for what it does best: As a fast datastore for when data integrity isn't important (because they have all the data backed up in Oracle and could redump it to mysql at any time).
Admittedly, some of this post is conjecture, but you'd be crazy to suggest that the Census Bureau would trust all their critical data to mysql.
Re:some might disagree
by
GooberToo
·
· Score: 3, Informative
Don't take this the wrong way but that is seemingly exactly an ideal project for which MySQL was made for. Data access appears to be primarily read-only with none of it being critical. Worse case, if the database becomes corrupt, you simply restore the whole thing from back up. Since the data doesn't appear (based on description) to be changing rapidly (if at all or perhaps backend updates only), issue such as online back or high availability is likely not to be an issue in the least. If they were, they'd probably be using that Oracle license.
Why use PostgreSQL instead of MySQL?: ACID
by
Sivar
·
· Score: 5, Informative
PostgreSQL is an ACID compliant database. MySQL is not (unless that has changed recently--if so please let me know).
ACID (an acronymn for Atomicity Consistency Isolation Durability) is a 'keyword' that business professionals generally look for when evaluating databases. Frankly, non-ACID databases aren't taken very seriously, even if they are used by the likes of Yahoo and Slashdot (like MySQL is).
Here is a quick description of what it means to be ACID compliant:
1. Atomicity is an all-or-none proposition. Suppose you define a transaction that contains an UPDATE, an INSERT, and a DELETE statement. With atomicity, these statements are treated as a single unit, and thanks to consistency (the C in ACID) there are only two possible outcomes: either they all change the database or none of them do. This is important in situations like bank transactions where transferring money between accounts could result in disaster if the server were to go down after a DELETE statement but before the corresponding INSERT statement.
2. Consistency guarantees that a transaction never leaves your database in a half-finished state. If one part of the transaction fails, all of the pending changes are rolled back, leaving the database as it was before you initiated the transaction. For instance, when you delete a customer record, you should also delete all of that customer's records from associated tables (such as invoices and line items). A properly configured database wouldn't let you delete the customer record, if that meant leaving its invoices, and other associated records stranded.
3. Isolation keeps transactions separated from each other until they're finished. Transaction isolation is generally configurable in a variety of modes. For example, in one mode, a transaction blocks until the other transaction finishes. In a different mode, a transaction sees obsolete data (from the state the database was in before the previous transaction started). Suppose a user deletes a customer, and before the customer's invoices are deleted, a second user updates one of those invoices. In a blocking transaction scenario, the second user would have to wait for the first user's deletions to complete before issuing the update. The second user would then find out that the customer had been deleted, which is much better than losing changes without knowing about it.
4. Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination. Hence, even if the database server is unplugged in the middle of a transaction, it will return to a consistent state when it's restarted. The database handles this by storing uncommitted transactions in a transaction log. By virtue of consistency (explained above), a partially completed transaction won't be written to the database in the event of an abnormal termination. However, when the database is restarted after such a termination, it examines the transaction log for completed transactions that had not been committed, and applies them.
It is difficult to trust mission critical data to a database that does not guarantee that it will complete not screw up (short of a bug, of course), this such compliance--even when it is more political than technical--is very important.
-- Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
Re:Why use PostgreSQL instead of MySQL?: ACID
by
Pathwalker
·
· Score: 5, Funny
Here's an example of why an ACID database is useful that hits close to all of our hearts - Slashdot moderation:
You may have noticed that if several people try to whack a troll at the same time, they all expend one moderator point, even if only a fraction of those points were required to push that troll down into the dreaded depths of -1.
If an ACID complient database were used, and the two steps of whacking the troll, and deducting the moderator points were placed in the same transaction (with a check constraint on the score of the posts to prevent them from dropping below -1) then the later moderators who tried to whack the troll would not have their points deducted, as the transaction would rollback when the constraint on the score of the post was exceeded.
Alas, mysql is not ACID complient, and so this sensless waste of moderator points continues to this day...
Re:Why use PostgreSQL instead of MySQL?: ACID
by
jrimmer
·
· Score: 3, Informative
MySQL is most definitely capable of supporting ACID functionality.
One of the nice features of MySQL is the capability of having pluggable persistence managers. An example of that is the default, MyISAM, which you are correct in saying does not support ACID. But with the release of MySQL-Max, which happened awhileago(tm), and MySQL v4 out of the box, support for 3 additional backends was added, BerkelyDB, Gemini, and InnoDB, all of which have complete ACID support. InnoDB also supports row level locking and even an initial implementation of foreign keys.
InnoDb is is in use here at Slashdot as well as a good deal of other sites demanding high-transaction throughput with full ACID support.
With the addition of foreign keys and stored procedures functionality, all of which are on the slate for the 4.x series, the reasons not to use MySQL are lessening every day.
Side note: Yeah, I know Gemini is the red-haired stepchild of the MySQL world. It's still a decent table manager.
Re:Why use PostgreSQL instead of MySQL?: ACID
by
Pathwalker
·
· Score: 5, Informative
You Said: Actually, this is not the fault of the database. If you mod a post that is already +5 up or mod a -1 post down you will lose
a mod point and the score will go unchanged (the moderation total values will increase though)
I Reply:
I have a hard time believing that this behavior
started out as a feature. I find it much more likely that it was initially a bug. This bug, being found useful was then elevated to the status of a "feature". You are correct that it is not the fault of the database, but transaction and constraint support at the database level would have made it easy to prevent this problem from ever cropping up in the first place.
You Said: the use of a transaction to record a moderation is fairly frivilous and probably more of a waste of CPU time than moderation point
I Reply:
For a system which recieves as much activity as Slashdot, and with a constant stream of friendly trolls looking for any crack in the system that they can use to share the sight of their favorite gaping asshole with the unwilling members of the rest of the population, if I were coding it, I would insist on inserting checks of basic constraints at different levels of the system. The database layer is your last line of defense against abuse of the system.
Secondly, these double checks are useful for finding errors in other levels of the system. Remember the problems that used to crop up from time to time with comments being moderated to -2 or to 6? If the value of the moderations was constrained in the database, not only would users not see this problem, but an error log generated (for the admins only) when a transaction is rolled back in a situaition where it is not expected would have helped isolate the fault very quickly.
The database level checks would also help against rogue activity of people in positions of (limited) trust. Worried about an editor editing one of their accounts to give themselves a huge number of modpoints? Cap the level in the database at 5; it would make it impossible for this nefarious subterfuge to take place.
As for the speed issue; if you are willing to sacrefice verification of correct operation for a small increase in speed, you have severly underspecified your hardware requirements.
Finally, I would like to include a small SQL fragment, showing some of the checks that I would feel are absolutely necessary for a web based discussion system that people are trying to subvert:
--First we create a table for a couple of users
CREATE TABLE "users" (
"uid" integer serial,
"mod_points" integer default 0,
"name" text not null,
CONSTRAINT "user_mod_const" CHECK (((mod_points > -1) AND (mod_points <
6)))
);
--now a table for some posts
CREATE TABLE "posts" (
"date" timestamp with time zone DEFAULT 'now()',
"pid" serial,
"parent" int4
"uid" int4,
"mod" integer DEFAULT 1,
"body" text not null,
"section" integer,
CONSTRAINT "mod_const" CHECK (((mod > -2) AND (mod < 6))),
CONSTRAINT "user_key" FOREIGN KEY (uid) REFERENCES users(uid)
on delete cascade
on update cascade
); -- the constraint to ensure parent is equal to zero, or another pid in the posts table is left to the reader.
--And now for a function to access them. (Remember - direct SQL is icky; run things through functions to ensure a consistant interface)
CREATE FUNCTION "mod_down" (integer,integer) RETURNS integer AS '
begin;
update users set mod_points=mod_points-1 where uid=($1);
update posts set mod=mod-1 where pid=($2);
commit;
select mod from posts where pid=($2);
' LANGUAGE 'sql';
As you can see, this nicely serves as a check to ensure the restrictions I mentioned above. With it being so trivial to add the checks, I can't see any reason to not take this extra step to eliminate nasty surprises.
Re:Why use PostgreSQL instead of MySQL?: ACID
by
Pathwalker
·
· Score: 5, Insightful
Direct SQL is icky? Not if you grok functional languages.
I like SQL - I have no problems understanding SQL. I meant that statement in the sense that it feels more clean (to me) to call a stored procedures in the database from the outside, rather than sending the SQL statements to the database over and over.
It also serves as a handly layer of abstraction from your code, in the event that you want to make drastic changes to the structure of the underlying database.
Maybe I've just seen too much code with SQL commands scattered all over the place, doing the same thing in different fashions at different times, and using strange DB specific constructs at random.
I would prefer to keep the DB specific SQL extensions (if they must be used at all) in stored procedures in the database, and present a consistant interface of stored procedures to the external program. That way, you can support different databases, using the special features they each offer, by providing different database initialization files, and not require any changes to the main program.
Online Backups/High Availability
by
murphj
·
· Score: 3, Insightful
I didn't notice anything about online backups, point-in-time recovery, or standby databases. Is any of this possible on PostgreSQL yet? How about clustering/parallel server. Seems like these are important features to become an Oracle/SQL Server replacement.
-- SONY. Because caucasians are just too damn tall.
Re:Online Backups/High Availability
by
pthisis
·
· Score: 5, Informative
I didn't notice anything about online backups, point-in-time recovery, or standby database
Online backups: yes, for quite some time
point-in-time recovery: postgres uses WAL undo/redo logging, but I'm not sure what the state of rollback tools is at the moment.
Standby database: Assuming you mean Master/Slave replication, this is one of the major features planned for 7.3; 7.x has added a lot of the infrastructure needed for replication, and by 7.4 they hope to have multimaster replication (ie a fully distributed database).
SONY. Because caucasians are just too damn tall.
Crazy People. Hysterical movie.
Sumner
-- rage, rage against the dying of the light
Re:I have only one feature request for PostgreSQL.
by
jcoy42
·
· Score: 3, Insightful
I only have one real gripe about PostgreSQL- I hate the upgrade path.
Having to dump the database to disk and re-import is a bad thing IMO. Having to add a switch to keep integrety constraints is a very odd thing for a database (shouldn't the default be to *keep* integrety constraints?)
A separate program to preserve LOBs I can rationalize (it's a lot of generally unneeded overhead since few people use LOBs).
It would really be nice if someone would write some wrapper programs to check for foreign keys and LOBs, then wrap the pg_dumpall & pg_dump commands with the appropriate options into one set of programs.
-- Never trust an atom. They make up everything.
Re:Long time mysql user, postgresql newbie
by
Moosbert
·
· Score: 5, Informative
Anyway's let me tell you, pgsql's user permissions still make my head swim, it's a nightmare. I mean, ok there's like how many different ways to authenticate a user, plain text password, crypted password, now md5, ident, local ident, kerberos, etc etc.
Options are somtimes considered to be a good thing.
Seriously, what's the "preferred" way to add a normal, non super user, only has select, insert, update, and delete access to a given database that can connect from the local machine, and remotely. Is this even possible?
Add something like this to your pg_hba.conf:
local sameuser trust
host sameuser 127.0.0.1 255.255.255.255 trust
I guess another kind of oddity about the pgsql is that out of the box, it only does ident type local socket authetication, no tcp/ip.
We like the default setup to be reasonably secure.
I've looked forever, but I've yet to find a "mysql to postgresql" quick start guide.
Also, would it be darn nice to include a start/stop script that reads only config files and can be linked from/etc/rc2.d/ etc.
It's in contrib/start-scripts. Or you might as well download the RPMs.
Whew, that was close
by
flacco
·
· Score: 4, Funny
I'm sure I'm not the only slash-dotter who was on the verge of exceeding the 4 billion transaction limit on their pgsql-based Anime fan fiction submission website.
-- pr0n - keeping monitor glass spotless since 1981.
PostgreSQL Books
by
LarryRiedel
·
· Score: 3, Informative
There are a few decent books about PostgreSQL out there now. It is
so much nicer than a few years ago.
Practical PostgreSQL.
I think this one just came out as a bound book. I just got
it a couple days ago and it is pretty good.
It is also online.
Beginning Databases with PostgreSQL.
This is one of those Wrox books which is about 10000 pages, including 80% of what I want to know and 2000% of what I don't.
There other others, but I think they are weaker.
I was disappointed with the one just called
PostgreSQL.
Yes, I know it should be avoided, but I'll ask anyway since sometimes you have no other choice:(
I'm currently using MySQL on a Win2k server and it actually runs pretty okay and is very easy to install. What about PostgreSQL? Last I looked at it - it was a lot more difficult to get running on Windows. Has that changed or are PostgreSQL still more or less *nix only?
Re: plication
by
Smoking
·
· Score: 3, Informative
I've recently set up a master-master replication environnement on Oracle 9i and I did some research to check if it was possible with postgres.
The most advanced guys on the subject seem to be the swiss engineering school in Zürich. Here is a list of their publications.
They seem to have developped a replication scheme (Postgres-R) where they have better than linear performance improvement when they add new masters...Quite impressive
Quentin
Re:So when should we expect...
by
Smoking
·
· Score: 3, Interesting
Postgres Ti:
Done...
I've got 7.1 running on my Titanium Powerbook...
there are really nice MacOS X packages of Postgres at Marc Liyanage's home page
I also take this occasion to thank him for the nice MacOS X packages he's put together...
Quentin
Re:I have only one feature request for PostgreSQL.
by
chriskl
·
· Score: 3, Informative
I'm a postgres developer and I really have no idea what you mean here!
Postgres always keeps its integrity constraints, including when you dump and restore. It's done this, as far as I am aware, since at least 7.0.
LOBs are no longer a problem, since 7.1 supported unlimited row length with binary or ascii data - just use 'bytea' or 'text' fields...
Chris
Thanks for the notice!
by
alexhmit01
·
· Score: 3, Interesting
We run PostgreSQL on a dual-processor Linux box to feed our OpenBSD web servers. We got a HUGE speed gain from the OpenBSD -> Linux change (even when we ran it on a slower machine while testing it), and any SMP gains will be helpful.
When we did OpenBSD we had to be VERY careful not to do more queries than necessary (including some complicated joins and then having PHP parse the results). With Linux as the database server, I feel that I can throw hardware at it (including moving to Solaris if need be) and optimize the queries a bit less to abstract the programming.
SMP improvement is important, as the next step up for us is a Quad-Xeon processor, then Sun Hardware. (PostgreSQL seemed to run best on Linux and Solaris from the old website)...
It's such a shame that they never figured out the PostgreSQL support model. I would have happily paid for some support, but it always seemed easier to get the OpenBSD port or the Redhat RPM than pay for their CDs. They never included much beyond installation support. I knew how to install it, having some support (not the mailling list) for some of my optimization questions would have saved days and been worth a support contract.
1. Point in time recovery
2. Reconstruct SQL from write ahead log
3. Function based indexes with SQL rewrite
4. Materialized views with SQL rewrite
5. Analogue to Oracle's v$sqlarea
6. Wait statistics
7. Tablespaces
8. Inline views (from clause subselects)
9. Parallel query capability
10. Partioned tables
11. Bitmap indexes
12. IO monitoring (read/write per object)
13. Dynamic sort and hash area allocation
14. Detailed SQL tracing (rows per plan step)
15. Multiplexed WAL writes
16. SQL optimizer hints
Re:Long time mysql user, postgresql newbie
by
slamb
·
· Score: 3, Informative
Seriously, what's the "preferred" way to add a normal, non super user, only has select, insert, update, and delete access to a given database that can connect from the local machine, and remotely. Is this even possible?
Add something like this to your pg_hba.conf:
local sameuser trust
host sameuser 127.0.0.1 255.255.255.255 trust
That's not authentication! "trust" just allows logins, period. Try "psql -U postgres" as anyone on that machine. You'll instantly be logged in as the superuser.
Something like this works fairly well on Postgresql 7.1:
host all 127.0.0.1 255.255.255.255 ident sameuser
host all 0.0.0.0 0.0.0.0 password
Then enable TCP/IP connections ("tcpip_socket = true" in postgresql.conf)
Very important: make sure your ident server is trustworthy. Many ident servers have an option to allow a user to fake identification. Turn it off.
Also, the config I posted there will let any user connect to any database. That's the simplest, but not the most secure. The "sameuser" in the database field won't be enough to let the superuser connect to databases. You might add a seperate line for that with an ident map containing only postgres (the file would have only the words "postgres postgres" in it, on one line). And then "all" in the database field with that map. I.e., "host all 127.0.0.1 255.255.255.255 ident postgres"
For remote connections, just make sure they have a password in the database:
create user slamb with password '12345';
alter user bob with password 'newpassword';
There's no authentication method here specified for UNIX domain sockets, so they just don't work. You'll need to set the PGHOSTNAME="localhost" environmental variable for stuff to authenticate correctly. I did this because pgsql 7.1 did not support ident on UNIX domain sockets. pgsql 7.2 now does, on certain platforms. (Just replace "host <db> <ip> <netmask> ident <map>" with "local <db> ident <map>")
pgsql 7.2 adds pam support. If your UNIX and PostgreSQL usernames correspond, it should work.
pgsql 7.2 also adds support of encrypted passwords. There's an option for storing password encrypted in the database and an option for challenge-based encryption. I think these methods are incompatible - good challenge-based encryption requires the password be stored in plaintext on the server.
There has been Kerberos auth for some time. I'm trying to switch over to this now, as I'm setting up Kerberos on my network. It's a more complicated system to set up correctly, though. Get something else working first.
The only major hurdle left is replication built into the server.
Ingres code did make it in to Postgres, but I don't know if any of it is in PostgreSQL. I'm sure there was a lot initially, but I seriously doubt there's any there anymore.
Keep in mind that when you abbreviate PostgreSQL to "Postgres", you're really talking about a seperate, older product.
I guess it doesn't really matter, since Postgres is long gone, but it still annoys me every time I see it.
Highlights of this release are as follows:
Acts@core.mailboks.com Acrux@core.mailboks.com Adam@core.mailboks.com Adar@core.mailboks.com Ada@core.mailboks.com
I really wish there was an embedable version of PostgreSQL... It's a very good database, but it's sometimes a real pain to write a program that ties together a SQL database with anything else, unless it's a local-use only program. I know MySQL added this feature in 4.x (but their transaction support is too new, IMO).
I used up all my sick days, so I'm calling in dead.
That's a cheesy way to dispute his claims.
They're not use MySQL to store all their critical data. They're dumping all their data, presumably from some other more reliable database (Oracle, it sounds like), into mysql for quick web searches.
IOW, they're using mysql for what it does best: As a fast datastore for when data integrity isn't important (because they have all the data backed up in Oracle and could redump it to mysql at any time).
Admittedly, some of this post is conjecture, but you'd be crazy to suggest that the Census Bureau would trust all their critical data to mysql.
Don't take this the wrong way but that is seemingly exactly an ideal project for which MySQL was made for. Data access appears to be primarily read-only with none of it being critical. Worse case, if the database becomes corrupt, you simply restore the whole thing from back up. Since the data doesn't appear (based on description) to be changing rapidly (if at all or perhaps backend updates only), issue such as online back or high availability is likely not to be an issue in the least. If they were, they'd probably be using that Oracle license.
PostgreSQL is an ACID compliant database. MySQL is not (unless that has changed recently--if so please let me know).
ACID (an acronymn for Atomicity Consistency Isolation Durability) is a 'keyword' that business professionals generally look for when evaluating databases. Frankly, non-ACID databases aren't taken very seriously, even if they are used by the likes of Yahoo and Slashdot (like MySQL is).
Here is a quick description of what it means to be ACID compliant:
1. Atomicity is an all-or-none proposition. Suppose you define a transaction that contains an UPDATE, an INSERT, and a DELETE statement. With atomicity, these statements are treated as a single unit, and thanks to consistency (the C in ACID) there are only two possible outcomes: either they all change the database or none of them do. This is important in situations like bank transactions where transferring money between accounts could result in disaster if the server were to go down after a DELETE statement but before the corresponding INSERT statement.
2. Consistency guarantees that a transaction never leaves your database in a half-finished state. If one part of the transaction fails, all of the pending changes are rolled back, leaving the database as it was before you initiated the transaction. For instance, when you delete a customer record, you should also delete all of that customer's records from associated tables (such as invoices and line items). A properly configured database wouldn't let you delete the customer record, if that meant leaving its invoices, and other associated records stranded.
3. Isolation keeps transactions separated from each other until they're finished. Transaction isolation is generally configurable in a variety of modes. For example, in one mode, a transaction blocks until the other transaction finishes. In a different mode, a transaction sees obsolete data (from the state the database was in before the previous transaction started). Suppose a user deletes a customer, and before the customer's invoices are deleted, a second user updates one of those invoices. In a blocking transaction scenario, the second user would have to wait for the first user's deletions to complete before issuing the update. The second user would then find out that the customer had been deleted, which is much better than losing changes without knowing about it.
4. Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination. Hence, even if the database server is unplugged in the middle of a transaction, it will return to a consistent state when it's restarted. The database handles this by storing uncommitted transactions in a transaction log. By virtue of consistency (explained above), a partially completed transaction won't be written to the database in the event of an abnormal termination. However, when the database is restarted after such a termination, it examines the transaction log for completed transactions that had not been committed, and applies them.
It is difficult to trust mission critical data to a database that does not guarantee that it will complete not screw up (short of a bug, of course), this such compliance--even when it is more political than technical--is very important.
Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
I didn't notice anything about online backups, point-in-time recovery, or standby databases. Is any of this possible on PostgreSQL yet? How about clustering/parallel server. Seems like these are important features to become an Oracle/SQL Server replacement.
SONY. Because caucasians are just too damn tall.
I only have one real gripe about PostgreSQL- I hate the upgrade path.
Having to dump the database to disk and re-import is a bad thing IMO. Having to add a switch to keep integrety constraints is a very odd thing for a database (shouldn't the default be to *keep* integrety constraints?)
A separate program to preserve LOBs I can rationalize (it's a lot of generally unneeded overhead since few people use LOBs).
It would really be nice if someone would write some wrapper programs to check for foreign keys and LOBs, then wrap the pg_dumpall & pg_dump commands with the appropriate options into one set of programs.
Never trust an atom. They make up everything.
Options are somtimes considered to be a good thing.
Seriously, what's the "preferred" way to add a normal, non super user, only has select, insert, update, and delete access to a given database that can connect from the local machine, and remotely. Is this even possible?
Add something like this to your pg_hba.conf:
I guess another kind of oddity about the pgsql is that out of the box, it only does ident type local socket authetication, no tcp/ip.
We like the default setup to be reasonably secure.
I've looked forever, but I've yet to find a "mysql to postgresql" quick start guide.
try here
Also, would it be darn nice to include a start/stop script that reads only config files and can be linked from
It's in contrib/start-scripts. Or you might as well download the RPMs.
I'm sure I'm not the only slash-dotter who was on the verge of exceeding the 4 billion transaction limit on their pgsql-based Anime fan fiction submission website.
pr0n - keeping monitor glass spotless since 1981.
There are a few decent books about PostgreSQL out there now. It is so much nicer than a few years ago.
Practical PostgreSQL. I think this one just came out as a bound book. I just got it a couple days ago and it is pretty good. It is also online.
Postgresql : Developer's Handbook. I (as a developer) like this one best of all that are out now.
PostgreSQL Essential Reference. This one is pretty good, but I would not say it is essential. :-)
Beginning Databases with PostgreSQL. This is one of those Wrox books which is about 10000 pages, including 80% of what I want to know and 2000% of what I don't.
There other others, but I think they are weaker. I was disappointed with the one just called PostgreSQL.
Yes, I know it should be avoided, but I'll ask anyway since sometimes you have no other choice :(
I'm currently using MySQL on a Win2k server and it actually runs pretty okay and is very easy to install. What about PostgreSQL? Last I looked at it - it was a lot more difficult to get running on Windows. Has that changed or are PostgreSQL still more or less *nix only?
I've recently set up a master-master replication environnement on Oracle 9i and I did some research to check if it was possible with postgres.
In fact there are many solutions available (check techdocs.postgresql.org for a list...)
The most advanced guys on the subject seem to be the swiss engineering school in Zürich. Here is a list of their publications.
They seem to have developped a replication scheme (Postgres-R) where they have better than linear performance improvement when they add new masters...Quite impressive
Quentin
Postgres Ti:
Done...
I've got 7.1 running on my Titanium Powerbook...
there are really nice MacOS X packages of Postgres at Marc Liyanage's home page
I also take this occasion to thank him for the nice MacOS X packages he's put together...
Quentin
I'm a postgres developer and I really have no idea what you mean here!
Postgres always keeps its integrity constraints, including when you dump and restore. It's done this, as far as I am aware, since at least 7.0.
LOBs are no longer a problem, since 7.1 supported unlimited row length with binary or ascii data - just use 'bytea' or 'text' fields...
Chris
We run PostgreSQL on a dual-processor Linux box to feed our OpenBSD web servers. We got a HUGE speed gain from the OpenBSD -> Linux change (even when we ran it on a slower machine while testing it), and any SMP gains will be helpful.
When we did OpenBSD we had to be VERY careful not to do more queries than necessary (including some complicated joins and then having PHP parse the results). With Linux as the database server, I feel that I can throw hardware at it (including moving to Solaris if need be) and optimize the queries a bit less to abstract the programming.
SMP improvement is important, as the next step up for us is a Quad-Xeon processor, then Sun Hardware. (PostgreSQL seemed to run best on Linux and Solaris from the old website)...
It's such a shame that they never figured out the PostgreSQL support model. I would have happily paid for some support, but it always seemed easier to get the OpenBSD port or the Redhat RPM than pay for their CDs. They never included much beyond installation support. I knew how to install it, having some support (not the mailling list) for some of my optimization questions would have saved days and been worth a support contract.
Alex
Here's my Postgres wish list:
1. Point in time recovery
2. Reconstruct SQL from write ahead log
3. Function based indexes with SQL rewrite
4. Materialized views with SQL rewrite
5. Analogue to Oracle's v$sqlarea
6. Wait statistics
7. Tablespaces
8. Inline views (from clause subselects)
9. Parallel query capability
10. Partioned tables
11. Bitmap indexes
12. IO monitoring (read/write per object)
13. Dynamic sort and hash area allocation
14. Detailed SQL tracing (rows per plan step)
15. Multiplexed WAL writes
16. SQL optimizer hints
That's not authentication! "trust" just allows logins, period. Try "psql -U postgres" as anyone on that machine. You'll instantly be logged in as the superuser.
Something like this works fairly well on Postgresql 7.1:
host all 127.0.0.1 255.255.255.255 ident sameuserhost all 0.0.0.0 0.0.0.0 password
Then enable TCP/IP connections ("tcpip_socket = true" in postgresql.conf)
Very important: make sure your ident server is trustworthy. Many ident servers have an option to allow a user to fake identification. Turn it off.
Also, the config I posted there will let any user connect to any database. That's the simplest, but not the most secure. The "sameuser" in the database field won't be enough to let the superuser connect to databases. You might add a seperate line for that with an ident map containing only postgres (the file would have only the words "postgres postgres" in it, on one line). And then "all" in the database field with that map. I.e., "host all 127.0.0.1 255.255.255.255 ident postgres"
For remote connections, just make sure they have a password in the database:
create user slamb with password '12345';
alter user bob with password 'newpassword';
There's no authentication method here specified for UNIX domain sockets, so they just don't work. You'll need to set the PGHOSTNAME="localhost" environmental variable for stuff to authenticate correctly. I did this because pgsql 7.1 did not support ident on UNIX domain sockets. pgsql 7.2 now does, on certain platforms. (Just replace "host <db> <ip> <netmask> ident <map>" with "local <db> ident <map>")
pgsql 7.2 adds pam support. If your UNIX and PostgreSQL usernames correspond, it should work.
pgsql 7.2 also adds support of encrypted passwords. There's an option for storing password encrypted in the database and an option for challenge-based encryption. I think these methods are incompatible - good challenge-based encryption requires the password be stored in plaintext on the server.
There has been Kerberos auth for some time. I'm trying to switch over to this now, as I'm setting up Kerberos on my network. It's a more complicated system to set up correctly, though. Get something else working first.
Official docs are here