PostgreSQL v7.2 Final Release

Re:Congrats to the PostgreSQL Development team! by thing12 · 2002-02-06 16:22 · Score: 5, Interesting

Eliminating the locking vacuum...

I should probably clarify that - the full locking vacuum was separated out into two parts one which analyzes statistics and doesn't lock the tables and another which does what the old vacuum did by reordering data blocks to shrink the size on disk.... the bonus is that before you might run vacuum once a week or so because of the impact it has to a production system by doing a full lock on each table it vacuumed, now you can run it much more frequently as all it consumes is cpu time. Shrinking size on disk is nice, but it's the statistics that help the query planner turn SQL into faster queries.

highlights... by bob@dB.org · 2002-02-06 16:24 · Score: 5, Informative

from http://www.us.postgresql.org/news.html

Highlights of this release are as follows:

VACUUM: Vacuuming no longer locks tables, thus allowing normal user access during the vacuum. A new "VACUUM FULL" command does old-style vacuum by locking the table and shrinking the on-disk copy of the table.
Transactions: There is no longer a problem with installations that exceed four billion transactions.
OID's: OID's are now optional. Users can now create tables without OID's for cases where OID usage is excessive.
Optimizer: The system now computes histogram column statistics during "ANALYZE", allowing much better optimizer choices.
Security: A new MD5 encryption option allows more secure storage and transfer of passwords. A new Unix-domain socket authentication option is available on Linux and BSD systems.
Statistics: Administrators can use the new table access statistics module to get fine-grained information about table and index usage.
Internationalization: Program and library messages can now be displayed in several languages.

.. with many many more bug fixes, enhancements and performance related changes ...

--
Acts@core.mailboks.com Acrux@core.mailboks.com Adam@core.mailboks.com Adar@core.mailboks.com Ada@core.mailboks.com

Re:highlights... by Zeut · 2002-02-06 16:43 · Score: 5, Informative

One issue that is not mentioned in the release highlights is the marked improvement that is now available for SMP boxes. In some cases throughput has been increased by more than a factor of 2.

Re:Congrats to the PostgreSQL Development team! by GooberToo · 2002-02-06 16:42 · Score: 4, Insightful

Always remember that table statistics are used for approximate best guesses for inputs to the query optimizer. It is not uncommon or unheard of to actually see somes types of queries run slower after table statistics have been updated. I've seen this on Oracle, Sybase and SQL Server. I doubt that this is an issue unique to those RDBMS since the conceptual implementations and basis for algorithms tend to all be more or less the same.

Re:some might disagree by dietz · 2002-02-06 16:45 · Score: 5, Interesting

That's a cheesy way to dispute his claims.

They're not use MySQL to store all their critical data. They're dumping all their data, presumably from some other more reliable database (Oracle, it sounds like), into mysql for quick web searches.

IOW, they're using mysql for what it does best: As a fast datastore for when data integrity isn't important (because they have all the data backed up in Oracle and could redump it to mysql at any time).

Admittedly, some of this post is conjecture, but you'd be crazy to suggest that the Census Bureau would trust all their critical data to mysql.

Re:Congrats to the PostgreSQL Development team! by nconway · 2002-02-06 16:55 · Score: 4, Interesting

I should probably clarify that - the full locking vacuum was separated out into two parts one which analyzes statistics and doesn't lock the tables and another which does what the old vacuum did by reordering data blocks to shrink the size on disk....

This is incorrect (I believe it describes the situation after 7.1: the VACUUM ANALYZE command only needed to lock the table exclusively when VACUUMing, only a read lock was needed for ANALYZE).

In 7.2, the ANALYZE command can now be used separately, as you say. However, there are other (more important) improvements: ANALYZE only takes a look at a statistical sampling of the rows in the table. This means that collecting statistics on even enormous tables is very fast. Furthermore, VACUUM has been made "lazy" by default: this means that it doesn't attempt to reclaim space as aggressively as before, but it no longer requires an exclusive lock on the database (instead, it cooperates with other DB clients). The old behavior is available as "VACUUM FULL", and it is suggested whenever you need to reclaim a lot of diskspace (e.g. you delete hundreds of thousands of rows of data and need the space).

you might run vacuum once a week or so

It was (and is) suggested that you run VACUUM once per day.

it's the statistics that help the query planner turn SQL into faster queries.

As far as I know, you only really need to update your planner stats when you change the statistical distribution of your data. Of course, running ANALYZE's reasonably often won't do any harm, and is a relatively cheap operation (performance-wise).

Why use PostgreSQL instead of MySQL?: ACID by Sivar · 2002-02-06 17:01 · Score: 5, Informative

PostgreSQL is an ACID compliant database. MySQL is not (unless that has changed recently--if so please let me know).
ACID (an acronymn for Atomicity Consistency Isolation Durability) is a 'keyword' that business professionals generally look for when evaluating databases. Frankly, non-ACID databases aren't taken very seriously, even if they are used by the likes of Yahoo and Slashdot (like MySQL is).
Here is a quick description of what it means to be ACID compliant:
1. Atomicity is an all-or-none proposition. Suppose you define a transaction that contains an UPDATE, an INSERT, and a DELETE statement. With atomicity, these statements are treated as a single unit, and thanks to consistency (the C in ACID) there are only two possible outcomes: either they all change the database or none of them do. This is important in situations like bank transactions where transferring money between accounts could result in disaster if the server were to go down after a DELETE statement but before the corresponding INSERT statement.

2. Consistency guarantees that a transaction never leaves your database in a half-finished state. If one part of the transaction fails, all of the pending changes are rolled back, leaving the database as it was before you initiated the transaction. For instance, when you delete a customer record, you should also delete all of that customer's records from associated tables (such as invoices and line items). A properly configured database wouldn't let you delete the customer record, if that meant leaving its invoices, and other associated records stranded.

3. Isolation keeps transactions separated from each other until they're finished. Transaction isolation is generally configurable in a variety of modes. For example, in one mode, a transaction blocks until the other transaction finishes. In a different mode, a transaction sees obsolete data (from the state the database was in before the previous transaction started). Suppose a user deletes a customer, and before the customer's invoices are deleted, a second user updates one of those invoices. In a blocking transaction scenario, the second user would have to wait for the first user's deletions to complete before issuing the update. The second user would then find out that the customer had been deleted, which is much better than losing changes without knowing about it.

4. Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination. Hence, even if the database server is unplugged in the middle of a transaction, it will return to a consistent state when it's restarted. The database handles this by storing uncommitted transactions in a transaction log. By virtue of consistency (explained above), a partially completed transaction won't be written to the database in the event of an abnormal termination. However, when the database is restarted after such a termination, it examines the transaction log for completed transactions that had not been committed, and applies them.

It is difficult to trust mission critical data to a database that does not guarantee that it will complete not screw up (short of a bug, of course), this such compliance--even when it is more political than technical--is very important.

--
Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra

Re:Why use PostgreSQL instead of MySQL?: ACID by Pathwalker · 2002-02-06 17:20 · Score: 5, Funny

Here's an example of why an ACID database is useful that hits close to all of our hearts - Slashdot moderation:

You may have noticed that if several people try to whack a troll at the same time, they all expend one moderator point, even if only a fraction of those points were required to push that troll down into the dreaded depths of -1.

If an ACID complient database were used, and the two steps of whacking the troll, and deducting the moderator points were placed in the same transaction (with a check constraint on the score of the posts to prevent them from dropping below -1) then the later moderators who tried to whack the troll would not have their points deducted, as the transaction would rollback when the constraint on the score of the post was exceeded.

Alas, mysql is not ACID complient, and so this sensless waste of moderator points continues to this day...
Re:Why use PostgreSQL instead of MySQL?: ACID by Pathwalker · 2002-02-06 19:11 · Score: 5, Informative

You Said:
Actually, this is not the fault of the database. If you mod a post that is already +5 up or mod a -1 post down you will lose a mod point and the score will go unchanged (the moderation total values will increase though)

I Reply:
I have a hard time believing that this behavior started out as a feature. I find it much more likely that it was initially a bug. This bug, being found useful was then elevated to the status of a "feature".
You are correct that it is not the fault of the database, but transaction and constraint support at the database level would have made it easy to prevent this problem from ever cropping up in the first place.

You Said:
the use of a transaction to record a moderation is fairly frivilous and probably more of a waste of CPU time than moderation point

I Reply:
For a system which recieves as much activity as Slashdot, and with a constant stream of friendly trolls looking for any crack in the system that they can use to share the sight of their favorite gaping asshole with the unwilling members of the rest of the population, if I were coding it, I would insist on inserting checks of basic constraints at different levels of the system. The database layer is your last line of defense against abuse of the system.

Secondly, these double checks are useful for finding errors in other levels of the system. Remember the problems that used to crop up from time to time with comments being moderated to -2 or to 6? If the value of the moderations was constrained in the database, not only would users not see this problem, but an error log generated (for the admins only) when a transaction is rolled back in a situaition where it is not expected would have helped isolate the fault very quickly.

The database level checks would also help against rogue activity of people in positions of (limited) trust. Worried about an editor editing one of their accounts to give themselves a huge number of modpoints? Cap the level in the database at 5; it would make it impossible for this nefarious subterfuge to take place.

As for the speed issue; if you are willing to sacrefice verification of correct operation for a small increase in speed, you have severly underspecified your hardware requirements.

Finally, I would like to include a small SQL fragment, showing some of the checks that I would feel are absolutely necessary for a web based discussion system that people are trying to subvert:

--First we create a table for a couple of users
CREATE TABLE "users" ( "uid" integer serial, "mod_points" integer default 0, "name" text not null, CONSTRAINT "user_mod_const" CHECK (((mod_points > -1) AND (mod_points < 6))) );--now a table for some postsCREATE TABLE "posts" ( "date" timestamp with time zone DEFAULT 'now()', "pid" serial, "parent" int4 "uid" int4, "mod" integer DEFAULT 1, "body" text not null, "section" integer, CONSTRAINT "mod_const" CHECK (((mod > -2) AND (mod < 6))), CONSTRAINT "user_key" FOREIGN KEY (uid) REFERENCES users(uid) on delete cascade on update cascade );-- the constraint to ensure parent is equal to zero, or another pid in the posts table is left to the reader.

--And now for a function to access them. (Remember - direct SQL is icky; run things through functions to ensure a consistant interface)
CREATE FUNCTION "mod_down" (integer,integer) RETURNS integer AS '
begin; update users set mod_points=mod_points-1 where uid=($1);
update posts set mod=mod-1 where pid=($2);
commit;
select mod from posts where pid=($2);
' LANGUAGE 'sql';

As you can see, this nicely serves as a check to ensure the restrictions I mentioned above. With it being so trivial to add the checks, I can't see any reason to not take this extra step to eliminate nasty surprises.
Re:Why use PostgreSQL instead of MySQL?: ACID by Pathwalker · 2002-02-06 19:57 · Score: 5, Insightful

Direct SQL is icky? Not if you grok functional languages.

I like SQL - I have no problems understanding SQL. I meant that statement in the sense that it feels more clean (to me) to call a stored procedures in the database from the outside, rather than sending the SQL statements to the database over and over.

It also serves as a handly layer of abstraction from your code, in the event that you want to make drastic changes to the structure of the underlying database.

Maybe I've just seen too much code with SQL commands scattered all over the place, doing the same thing in different fashions at different times, and using strange DB specific constructs at random.

I would prefer to keep the DB specific SQL extensions (if they must be used at all) in stored procedures in the database, and present a consistant interface of stored procedures to the external program. That way, you can support different databases, using the special features they each offer, by providing different database initialization files, and not require any changes to the main program.

Re:Congrats to the PostgreSQL Development team! by thing12 · 2002-02-06 17:17 · Score: 4, Informative

I'm not disagreeing with anything you said, in fact you all but reiterated everything I said.

The 7.1 vacuum analyze required table locks. Doesn't matter which phase of it required locks - it required exclusive locks because it vacuumed. By breaking that into a separate commands the need for downtime is reduced drastically (down to the example which you point out - deleting thousands of rows at a time).

I know that you're recommended to run vacuum once per day, but I found that on a large database running on a fast server a daily vacuum took nearly 30 minutes to complete... that's 30 minutes of sequentially locked tables. Can't afford to do that every day - moving it to once a week may have degraded performance but it reduced the downtime window from 30 minutes per day to 1 hour per week.

I'm just happy that I don't have to bring a production server to its knees once a week (or for that matter once a day) just to do some table maintenance.

Re:Congrats to the PostgreSQL Development team! by nconway · 2002-02-06 17:30 · Score: 4, Informative

I'm not disagreeing with anything you said, in fact you all but reiterated everything I said.

No I didn't, read my post again.

The 7.1 vacuum analyze required table locks.

PostgreSQL has lots of different types of locks of varying granularities. Saying "table locks" doesn't mean a whole lot.

Doesn't matter which phase of it required locks

It does though -- in 7.1, splitting vacuum and analyze internally reduced the time that an exclusive lock needs to be held.

By breaking that into a separate commands the need for downtime is reduced drastically

This is where you're wrong. The reduction in downtime has nothing to do with allowing ANALYZE to be executed separately. It is entirely the result of the new vacuum code (which is "lazy", unlike a VACUUM FULL -- which does a 7.1-style VACUUM). In 7.2, running VACUUM (with or without ANALYZE) is fast, and doesn't require an exclusive lock -- so your database can continue serving clients while a VACUUM is executing. Whether you choose to run ANALYZE at the same time or separately is really irrelevant.

I'm just happy that I don't have to bring a production server to its knees once a week (or for that matter once a day) just to do some table maintenance.

On that, we agree ;-)

Re:Long time mysql user, postgresql newbie by Moosbert · 2002-02-06 17:37 · Score: 5, Informative

Anyway's let me tell you, pgsql's user permissions still make my head swim, it's a nightmare. I mean, ok there's like how many different ways to authenticate a user, plain text password, crypted password, now md5, ident, local ident, kerberos, etc etc.

Options are somtimes considered to be a good thing.

Seriously, what's the "preferred" way to add a normal, non super user, only has select, insert, update, and delete access to a given database that can connect from the local machine, and remotely. Is this even possible?

Add something like this to your pg_hba.conf:

local sameuser trust

host sameuser 127.0.0.1 255.255.255.255 trust

I guess another kind of oddity about the pgsql is that out of the box, it only does ident type local socket authetication, no tcp/ip.

We like the default setup to be reasonably secure.

I've looked forever, but I've yet to find a "mysql to postgresql" quick start guide.

try here

Also, would it be darn nice to include a start/stop script that reads only config files and can be linked from /etc/rc2.d/ etc.

It's in contrib/start-scripts. Or you might as well download the RPMs.

Whew, that was close by flacco · 2002-02-06 17:59 · Score: 4, Funny

I'm sure I'm not the only slash-dotter who was on the verge of exceeding the 4 billion transaction limit on their pgsql-based Anime fan fiction submission website.

--
pr0n - keeping monitor glass spotless since 1981.

Re:Online Backups/High Availability by pthisis · 2002-02-06 18:18 · Score: 5, Informative

I didn't notice anything about online backups, point-in-time recovery, or standby database

Online backups: yes, for quite some time

point-in-time recovery: postgres uses WAL undo/redo logging, but I'm not sure what the state of rollback tools is at the moment.

Standby database: Assuming you mean Master/Slave replication, this is one of the major features planned for 7.3; 7.x has added a lot of the infrastructure needed for replication, and by 7.4 they hope to have multimaster replication (ie a fully distributed database).

SONY. Because caucasians are just too damn tall.

Crazy People. Hysterical movie.

Sumner

--
rage, rage against the dying of the light

Slashdot Mirror

PostgreSQL v7.2 Final Release

15 of 258 comments (clear)