PostgreSQL v7.2 Final Release

highlights... by bob@dB.org · 2002-02-06 16:24 · Score: 5, Informative

from http://www.us.postgresql.org/news.html

Highlights of this release are as follows:

VACUUM: Vacuuming no longer locks tables, thus allowing normal user access during the vacuum. A new "VACUUM FULL" command does old-style vacuum by locking the table and shrinking the on-disk copy of the table.
Transactions: There is no longer a problem with installations that exceed four billion transactions.
OID's: OID's are now optional. Users can now create tables without OID's for cases where OID usage is excessive.
Optimizer: The system now computes histogram column statistics during "ANALYZE", allowing much better optimizer choices.
Security: A new MD5 encryption option allows more secure storage and transfer of passwords. A new Unix-domain socket authentication option is available on Linux and BSD systems.
Statistics: Administrators can use the new table access statistics module to get fine-grained information about table and index usage.
Internationalization: Program and library messages can now be displayed in several languages.

.. with many many more bug fixes, enhancements and performance related changes ...

--
Acts@core.mailboks.com Acrux@core.mailboks.com Adam@core.mailboks.com Adar@core.mailboks.com Ada@core.mailboks.com

Re:highlights... by Zeut · 2002-02-06 16:43 · Score: 5, Informative

One issue that is not mentioned in the release highlights is the marked improvement that is now available for SMP boxes. In some cases throughput has been increased by more than a factor of 2.

Why use PostgreSQL instead of MySQL?: ACID by Sivar · 2002-02-06 17:01 · Score: 5, Informative

PostgreSQL is an ACID compliant database. MySQL is not (unless that has changed recently--if so please let me know).
ACID (an acronymn for Atomicity Consistency Isolation Durability) is a 'keyword' that business professionals generally look for when evaluating databases. Frankly, non-ACID databases aren't taken very seriously, even if they are used by the likes of Yahoo and Slashdot (like MySQL is).
Here is a quick description of what it means to be ACID compliant:
1. Atomicity is an all-or-none proposition. Suppose you define a transaction that contains an UPDATE, an INSERT, and a DELETE statement. With atomicity, these statements are treated as a single unit, and thanks to consistency (the C in ACID) there are only two possible outcomes: either they all change the database or none of them do. This is important in situations like bank transactions where transferring money between accounts could result in disaster if the server were to go down after a DELETE statement but before the corresponding INSERT statement.

2. Consistency guarantees that a transaction never leaves your database in a half-finished state. If one part of the transaction fails, all of the pending changes are rolled back, leaving the database as it was before you initiated the transaction. For instance, when you delete a customer record, you should also delete all of that customer's records from associated tables (such as invoices and line items). A properly configured database wouldn't let you delete the customer record, if that meant leaving its invoices, and other associated records stranded.

3. Isolation keeps transactions separated from each other until they're finished. Transaction isolation is generally configurable in a variety of modes. For example, in one mode, a transaction blocks until the other transaction finishes. In a different mode, a transaction sees obsolete data (from the state the database was in before the previous transaction started). Suppose a user deletes a customer, and before the customer's invoices are deleted, a second user updates one of those invoices. In a blocking transaction scenario, the second user would have to wait for the first user's deletions to complete before issuing the update. The second user would then find out that the customer had been deleted, which is much better than losing changes without knowing about it.

4. Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination. Hence, even if the database server is unplugged in the middle of a transaction, it will return to a consistent state when it's restarted. The database handles this by storing uncommitted transactions in a transaction log. By virtue of consistency (explained above), a partially completed transaction won't be written to the database in the event of an abnormal termination. However, when the database is restarted after such a termination, it examines the transaction log for completed transactions that had not been committed, and applies them.

It is difficult to trust mission critical data to a database that does not guarantee that it will complete not screw up (short of a bug, of course), this such compliance--even when it is more political than technical--is very important.

--
Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra

Re:Why use PostgreSQL instead of MySQL?: ACID by Pathwalker · 2002-02-06 19:11 · Score: 5, Informative

You Said:
Actually, this is not the fault of the database. If you mod a post that is already +5 up or mod a -1 post down you will lose a mod point and the score will go unchanged (the moderation total values will increase though)

I Reply:
I have a hard time believing that this behavior started out as a feature. I find it much more likely that it was initially a bug. This bug, being found useful was then elevated to the status of a "feature".
You are correct that it is not the fault of the database, but transaction and constraint support at the database level would have made it easy to prevent this problem from ever cropping up in the first place.

You Said:
the use of a transaction to record a moderation is fairly frivilous and probably more of a waste of CPU time than moderation point

I Reply:
For a system which recieves as much activity as Slashdot, and with a constant stream of friendly trolls looking for any crack in the system that they can use to share the sight of their favorite gaping asshole with the unwilling members of the rest of the population, if I were coding it, I would insist on inserting checks of basic constraints at different levels of the system. The database layer is your last line of defense against abuse of the system.

Secondly, these double checks are useful for finding errors in other levels of the system. Remember the problems that used to crop up from time to time with comments being moderated to -2 or to 6? If the value of the moderations was constrained in the database, not only would users not see this problem, but an error log generated (for the admins only) when a transaction is rolled back in a situaition where it is not expected would have helped isolate the fault very quickly.

The database level checks would also help against rogue activity of people in positions of (limited) trust. Worried about an editor editing one of their accounts to give themselves a huge number of modpoints? Cap the level in the database at 5; it would make it impossible for this nefarious subterfuge to take place.

As for the speed issue; if you are willing to sacrefice verification of correct operation for a small increase in speed, you have severly underspecified your hardware requirements.

Finally, I would like to include a small SQL fragment, showing some of the checks that I would feel are absolutely necessary for a web based discussion system that people are trying to subvert:

--First we create a table for a couple of users
CREATE TABLE "users" ( "uid" integer serial, "mod_points" integer default 0, "name" text not null, CONSTRAINT "user_mod_const" CHECK (((mod_points > -1) AND (mod_points < 6))) );--now a table for some postsCREATE TABLE "posts" ( "date" timestamp with time zone DEFAULT 'now()', "pid" serial, "parent" int4 "uid" int4, "mod" integer DEFAULT 1, "body" text not null, "section" integer, CONSTRAINT "mod_const" CHECK (((mod > -2) AND (mod < 6))), CONSTRAINT "user_key" FOREIGN KEY (uid) REFERENCES users(uid) on delete cascade on update cascade );-- the constraint to ensure parent is equal to zero, or another pid in the posts table is left to the reader.

--And now for a function to access them. (Remember - direct SQL is icky; run things through functions to ensure a consistant interface)
CREATE FUNCTION "mod_down" (integer,integer) RETURNS integer AS '
begin; update users set mod_points=mod_points-1 where uid=($1);
update posts set mod=mod-1 where pid=($2);
commit;
select mod from posts where pid=($2);
' LANGUAGE 'sql';

As you can see, this nicely serves as a check to ensure the restrictions I mentioned above. With it being so trivial to add the checks, I can't see any reason to not take this extra step to eliminate nasty surprises.

Re:Congrats to the PostgreSQL Development team! by thing12 · 2002-02-06 17:17 · Score: 4, Informative

I'm not disagreeing with anything you said, in fact you all but reiterated everything I said.

The 7.1 vacuum analyze required table locks. Doesn't matter which phase of it required locks - it required exclusive locks because it vacuumed. By breaking that into a separate commands the need for downtime is reduced drastically (down to the example which you point out - deleting thousands of rows at a time).

I know that you're recommended to run vacuum once per day, but I found that on a large database running on a fast server a daily vacuum took nearly 30 minutes to complete... that's 30 minutes of sequentially locked tables. Can't afford to do that every day - moving it to once a week may have degraded performance but it reduced the downtime window from 30 minutes per day to 1 hour per week.

I'm just happy that I don't have to bring a production server to its knees once a week (or for that matter once a day) just to do some table maintenance.

Re:Congrats to the PostgreSQL Development team! by nconway · 2002-02-06 17:30 · Score: 4, Informative

I'm not disagreeing with anything you said, in fact you all but reiterated everything I said.

No I didn't, read my post again.

The 7.1 vacuum analyze required table locks.

PostgreSQL has lots of different types of locks of varying granularities. Saying "table locks" doesn't mean a whole lot.

Doesn't matter which phase of it required locks

It does though -- in 7.1, splitting vacuum and analyze internally reduced the time that an exclusive lock needs to be held.

By breaking that into a separate commands the need for downtime is reduced drastically

This is where you're wrong. The reduction in downtime has nothing to do with allowing ANALYZE to be executed separately. It is entirely the result of the new vacuum code (which is "lazy", unlike a VACUUM FULL -- which does a 7.1-style VACUUM). In 7.2, running VACUUM (with or without ANALYZE) is fast, and doesn't require an exclusive lock -- so your database can continue serving clients while a VACUUM is executing. Whether you choose to run ANALYZE at the same time or separately is really irrelevant.

I'm just happy that I don't have to bring a production server to its knees once a week (or for that matter once a day) just to do some table maintenance.

On that, we agree ;-)

Re:Long time mysql user, postgresql newbie by Moosbert · 2002-02-06 17:37 · Score: 5, Informative

Anyway's let me tell you, pgsql's user permissions still make my head swim, it's a nightmare. I mean, ok there's like how many different ways to authenticate a user, plain text password, crypted password, now md5, ident, local ident, kerberos, etc etc.

Options are somtimes considered to be a good thing.

Seriously, what's the "preferred" way to add a normal, non super user, only has select, insert, update, and delete access to a given database that can connect from the local machine, and remotely. Is this even possible?

Add something like this to your pg_hba.conf:

local sameuser trust

host sameuser 127.0.0.1 255.255.255.255 trust

I guess another kind of oddity about the pgsql is that out of the box, it only does ident type local socket authetication, no tcp/ip.

We like the default setup to be reasonably secure.

I've looked forever, but I've yet to find a "mysql to postgresql" quick start guide.

try here

Also, would it be darn nice to include a start/stop script that reads only config files and can be linked from /etc/rc2.d/ etc.

It's in contrib/start-scripts. Or you might as well download the RPMs.

Re:Online Backups/High Availability by pthisis · 2002-02-06 18:18 · Score: 5, Informative

I didn't notice anything about online backups, point-in-time recovery, or standby database

Online backups: yes, for quite some time

point-in-time recovery: postgres uses WAL undo/redo logging, but I'm not sure what the state of rollback tools is at the moment.

Standby database: Assuming you mean Master/Slave replication, this is one of the major features planned for 7.3; 7.x has added a lot of the infrastructure needed for replication, and by 7.4 they hope to have multimaster replication (ie a fully distributed database).

SONY. Because caucasians are just too damn tall.

Crazy People. Hysterical movie.

Sumner

--
rage, rage against the dying of the light

Slashdot Mirror

PostgreSQL v7.2 Final Release

8 of 258 comments (clear)