Slashdot Mirror


PostgreSQL v7.2 Final Release

vvizard writes "After almost a full year of development since PostgreSQL v7.1 was released, the PostgreSQL Global Development Group is proud to announce the availability of their latest development milestone ... PostgreSQL v7.2, another step forward for the project."

21 of 258 comments (clear)

  1. Re:Excellent!! by dietz · · Score: 3, Informative

    Ingres code did make it in to Postgres, but I don't know if any of it is in PostgreSQL. I'm sure there was a lot initially, but I seriously doubt there's any there anymore.

    Keep in mind that when you abbreviate PostgreSQL to "Postgres", you're really talking about a seperate, older product.

    I guess it doesn't really matter, since Postgres is long gone, but it still annoys me every time I see it.

  2. highlights... by bob@dB.org · · Score: 5, Informative
    from http://www.us.postgresql.org/news.html

    Highlights of this release are as follows:

    • VACUUM: Vacuuming no longer locks tables, thus allowing normal user access during the vacuum. A new "VACUUM FULL" command does old-style vacuum by locking the table and shrinking the on-disk copy of the table.
    • Transactions: There is no longer a problem with installations that exceed four billion transactions.
    • OID's: OID's are now optional. Users can now create tables without OID's for cases where OID usage is excessive.
    • Optimizer: The system now computes histogram column statistics during "ANALYZE", allowing much better optimizer choices.
    • Security: A new MD5 encryption option allows more secure storage and transfer of passwords. A new Unix-domain socket authentication option is available on Linux and BSD systems.
    • Statistics: Administrators can use the new table access statistics module to get fine-grained information about table and index usage.
    • Internationalization: Program and library messages can now be displayed in several languages.

    .. with many many more bug fixes, enhancements and performance related changes ...

    --
    Acts@core.mailboks.com Acrux@core.mailboks.com Adam@core.mailboks.com Adar@core.mailboks.com Ada@core.mailboks.com
    1. Re:highlights... by Zeut · · Score: 5, Informative

      One issue that is not mentioned in the release highlights is the marked improvement that is now available for SMP boxes. In some cases throughput has been increased by more than a factor of 2.

  3. Re:some might disagree by GooberToo · · Score: 3, Informative

    Don't take this the wrong way but that is seemingly exactly an ideal project for which MySQL was made for. Data access appears to be primarily read-only with none of it being critical. Worse case, if the database becomes corrupt, you simply restore the whole thing from back up. Since the data doesn't appear (based on description) to be changing rapidly (if at all or perhaps backend updates only), issue such as online back or high availability is likely not to be an issue in the least. If they were, they'd probably be using that Oracle license.

  4. Why use PostgreSQL instead of MySQL?: ACID by Sivar · · Score: 5, Informative

    PostgreSQL is an ACID compliant database. MySQL is not (unless that has changed recently--if so please let me know).
    ACID (an acronymn for Atomicity Consistency Isolation Durability) is a 'keyword' that business professionals generally look for when evaluating databases. Frankly, non-ACID databases aren't taken very seriously, even if they are used by the likes of Yahoo and Slashdot (like MySQL is).
    Here is a quick description of what it means to be ACID compliant:
    1. Atomicity is an all-or-none proposition. Suppose you define a transaction that contains an UPDATE, an INSERT, and a DELETE statement. With atomicity, these statements are treated as a single unit, and thanks to consistency (the C in ACID) there are only two possible outcomes: either they all change the database or none of them do. This is important in situations like bank transactions where transferring money between accounts could result in disaster if the server were to go down after a DELETE statement but before the corresponding INSERT statement.

    2. Consistency guarantees that a transaction never leaves your database in a half-finished state. If one part of the transaction fails, all of the pending changes are rolled back, leaving the database as it was before you initiated the transaction. For instance, when you delete a customer record, you should also delete all of that customer's records from associated tables (such as invoices and line items). A properly configured database wouldn't let you delete the customer record, if that meant leaving its invoices, and other associated records stranded.

    3. Isolation keeps transactions separated from each other until they're finished. Transaction isolation is generally configurable in a variety of modes. For example, in one mode, a transaction blocks until the other transaction finishes. In a different mode, a transaction sees obsolete data (from the state the database was in before the previous transaction started). Suppose a user deletes a customer, and before the customer's invoices are deleted, a second user updates one of those invoices. In a blocking transaction scenario, the second user would have to wait for the first user's deletions to complete before issuing the update. The second user would then find out that the customer had been deleted, which is much better than losing changes without knowing about it.

    4. Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination. Hence, even if the database server is unplugged in the middle of a transaction, it will return to a consistent state when it's restarted. The database handles this by storing uncommitted transactions in a transaction log. By virtue of consistency (explained above), a partially completed transaction won't be written to the database in the event of an abnormal termination. However, when the database is restarted after such a termination, it examines the transaction log for completed transactions that had not been committed, and applies them.


    It is difficult to trust mission critical data to a database that does not guarantee that it will complete not screw up (short of a bug, of course), this such compliance--even when it is more political than technical--is very important.

    --
    Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
    1. Re:Why use PostgreSQL instead of MySQL?: ACID by jrimmer · · Score: 3, Informative
      MySQL is most definitely capable of supporting ACID functionality.

      One of the nice features of MySQL is the capability of having pluggable persistence managers. An example of that is the default, MyISAM, which you are correct in saying does not support ACID. But with the release of MySQL-Max, which happened awhileago(tm), and MySQL v4 out of the box, support for 3 additional backends was added, BerkelyDB, Gemini, and InnoDB, all of which have complete ACID support. InnoDB also supports row level locking and even an initial implementation of foreign keys.

      InnoDb is is in use here at Slashdot as well as a good deal of other sites demanding high-transaction throughput with full ACID support.

      With the addition of foreign keys and stored procedures functionality, all of which are on the slate for the 4.x series, the reasons not to use MySQL are lessening every day.

      Side note: Yeah, I know Gemini is the red-haired stepchild of the MySQL world. It's still a decent table manager.

    2. Re:Why use PostgreSQL instead of MySQL?: ACID by rycamor · · Score: 2, Informative

      Yes, you can move to InnoDB tables, but your performance goes South quickly, thus taking away one of the few reasons to use MySQL.

      One of my clients had a Linux Apache/PHP/MySQL box that was being heavily pounded by 250+ web-based instant messaging users. (I know... using a database to handle instant messages :-( ). Anyway, I, in my simple logic, thought

      1. The message tables are constantly subject to INSERT, SELECT, and DELETE queries.

      2. MyISAM tables require a full lock for every INSERT or DELETE

      3. InnoDB tables have row-level locking

      4. Thus, I will install MySQL-Max, and use InnoDB tables for the messaging data, thus relieving PHP of the continuous connect...wait...reconnect scenario

      Well, I read every note about configuring MySQL for performance, and I made the change to InnoDB, and promptly every single one of the 30 or 40 MySQL processes began taking up at least 60 MB RAM! The system slowed to a standstill, and I had to roll back to regular MyISAM tables 'real quick'. (the users on the system take about 10 seconds to start pestering the admin with trouble reports)

      I know the real answer is to use shared memory or some such, but anyway, the performance benchmarks always show PostgreSQL ruling when you have mixed INSERTS and DELETES.

    3. Re:Why use PostgreSQL instead of MySQL?: ACID by Pathwalker · · Score: 5, Informative

      You Said:
      Actually, this is not the fault of the database. If you mod a post that is already +5 up or mod a -1 post down you will lose a mod point and the score will go unchanged (the moderation total values will increase though)

      I Reply:
      I have a hard time believing that this behavior started out as a feature. I find it much more likely that it was initially a bug. This bug, being found useful was then elevated to the status of a "feature".
      You are correct that it is not the fault of the database, but transaction and constraint support at the database level would have made it easy to prevent this problem from ever cropping up in the first place.

      You Said:
      the use of a transaction to record a moderation is fairly frivilous and probably more of a waste of CPU time than moderation point

      I Reply:
      For a system which recieves as much activity as Slashdot, and with a constant stream of friendly trolls looking for any crack in the system that they can use to share the sight of their favorite gaping asshole with the unwilling members of the rest of the population, if I were coding it, I would insist on inserting checks of basic constraints at different levels of the system. The database layer is your last line of defense against abuse of the system.

      Secondly, these double checks are useful for finding errors in other levels of the system. Remember the problems that used to crop up from time to time with comments being moderated to -2 or to 6? If the value of the moderations was constrained in the database, not only would users not see this problem, but an error log generated (for the admins only) when a transaction is rolled back in a situaition where it is not expected would have helped isolate the fault very quickly.

      The database level checks would also help against rogue activity of people in positions of (limited) trust. Worried about an editor editing one of their accounts to give themselves a huge number of modpoints? Cap the level in the database at 5; it would make it impossible for this nefarious subterfuge to take place.

      As for the speed issue; if you are willing to sacrefice verification of correct operation for a small increase in speed, you have severly underspecified your hardware requirements.


      Finally, I would like to include a small SQL fragment, showing some of the checks that I would feel are absolutely necessary for a web based discussion system that people are trying to subvert:

      --First we create a table for a couple of users
      CREATE TABLE "users" (
      "uid" integer serial,
      "mod_points" integer default 0,
      "name" text not null,
      CONSTRAINT "user_mod_const" CHECK (((mod_points > -1) AND (mod_points < 6)))
      );

      --now a table for some posts
      CREATE TABLE "posts" (
      "date" timestamp with time zone DEFAULT 'now()',
      "pid" serial,
      "parent" int4
      "uid" int4,
      "mod" integer DEFAULT 1,
      "body" text not null,
      "section" integer,
      CONSTRAINT "mod_const" CHECK (((mod > -2) AND (mod < 6))),
      CONSTRAINT "user_key" FOREIGN KEY (uid) REFERENCES users(uid)
      on delete cascade
      on update cascade
      );
      -- the constraint to ensure parent is equal to zero, or another pid in the posts table is left to the reader.

      --And now for a function to access them. (Remember - direct SQL is icky; run things through functions to ensure a consistant interface)
      CREATE FUNCTION "mod_down" (integer,integer) RETURNS integer AS '
      begin; update users set mod_points=mod_points-1 where uid=($1);
      update posts set mod=mod-1 where pid=($2);
      commit;
      select mod from posts where pid=($2);
      ' LANGUAGE 'sql';


      As you can see, this nicely serves as a check to ensure the restrictions I mentioned above. With it being so trivial to add the checks, I can't see any reason to not take this extra step to eliminate nasty surprises.

    4. Re:Why use PostgreSQL instead of MySQL?: ACID by jrimmer · · Score: 2, Informative

      Sounds like your tuning was out of whack. You're not very specific about what you tuned (heck, if everyone was specific what fun would Slashdot be?) but I'd bet you had 2 wires crossed somewhere. We're running an system with, what would appear to be, a larger number of inserts and retrievals, for stock quotes and have had completely different results from your experience. We're very happy with the performance and find in a mixed insert/select situation InnoDB is quite a bit faster than MyISAM.

      After your bad experience did you send an email to Mr. InnoDB, Heikki Tuuri? He's quite attentive and I'm sure would like to hear what happened with your application.

      Not to mention when you're playing with databases or other vaguely unknown programmatic quantities, best not to test them in production. Perhaps if you had run the system in a test environment the lessened pressure would've enabled you to be better equipped to resolve whatever situation was occurring.

  5. Re:Congrats to the PostgreSQL Development team! by thing12 · · Score: 4, Informative
    I'm not disagreeing with anything you said, in fact you all but reiterated everything I said.

    The 7.1 vacuum analyze required table locks. Doesn't matter which phase of it required locks - it required exclusive locks because it vacuumed. By breaking that into a separate commands the need for downtime is reduced drastically (down to the example which you point out - deleting thousands of rows at a time).

    I know that you're recommended to run vacuum once per day, but I found that on a large database running on a fast server a daily vacuum took nearly 30 minutes to complete... that's 30 minutes of sequentially locked tables. Can't afford to do that every day - moving it to once a week may have degraded performance but it reduced the downtime window from 30 minutes per day to 1 hour per week.

    I'm just happy that I don't have to bring a production server to its knees once a week (or for that matter once a day) just to do some table maintenance.

  6. Re:Excellent!! by Moosbert · · Score: 2, Informative
    BTW, does anyone know if any of the original Ingres code made it into postgres? My understanding was that it was supposed to be the "next" Ingres...


    INGRES and POSTGRES were two separate projects at Berkeley that shared no code. PostgreSQL is based on the latter.


    See also the POSTGRES FAQ.

  7. Re:Congrats to the PostgreSQL Development team! by nconway · · Score: 4, Informative
    I'm not disagreeing with anything you said, in fact you all but reiterated everything I said.


    No I didn't, read my post again.

    The 7.1 vacuum analyze required table locks.


    PostgreSQL has lots of different types of locks of varying granularities. Saying "table locks" doesn't mean a whole lot.

    Doesn't matter which phase of it required locks


    It does though -- in 7.1, splitting vacuum and analyze internally reduced the time that an exclusive lock needs to be held.

    By breaking that into a separate commands the need for downtime is reduced drastically


    This is where you're wrong. The reduction in downtime has nothing to do with allowing ANALYZE to be executed separately. It is entirely the result of the new vacuum code (which is "lazy", unlike a VACUUM FULL -- which does a 7.1-style VACUUM). In 7.2, running VACUUM (with or without ANALYZE) is fast, and doesn't require an exclusive lock -- so your database can continue serving clients while a VACUUM is executing. Whether you choose to run ANALYZE at the same time or separately is really irrelevant.

    I'm just happy that I don't have to bring a production server to its knees once a week (or for that matter once a day) just to do some table maintenance.


    On that, we agree ;-)
  8. Re:Long time mysql user, postgresql newbie by Moosbert · · Score: 5, Informative
    Anyway's let me tell you, pgsql's user permissions still make my head swim, it's a nightmare. I mean, ok there's like how many different ways to authenticate a user, plain text password, crypted password, now md5, ident, local ident, kerberos, etc etc.


    Options are somtimes considered to be a good thing.


    Seriously, what's the "preferred" way to add a normal, non super user, only has select, insert, update, and delete access to a given database that can connect from the local machine, and remotely. Is this even possible?


    Add something like this to your pg_hba.conf:


    local sameuser trust

    host sameuser 127.0.0.1 255.255.255.255 trust


    I guess another kind of oddity about the pgsql is that out of the box, it only does ident type local socket authetication, no tcp/ip.


    We like the default setup to be reasonably secure.


    I've looked forever, but I've yet to find a "mysql to postgresql" quick start guide.


    try here


    Also, would it be darn nice to include a start/stop script that reads only config files and can be linked from /etc/rc2.d/ etc.


    It's in contrib/start-scripts. Or you might as well download the RPMs.

  9. Re:Online Backups/High Availability by pthisis · · Score: 5, Informative

    I didn't notice anything about online backups, point-in-time recovery, or standby database

    Online backups: yes, for quite some time

    point-in-time recovery: postgres uses WAL undo/redo logging, but I'm not sure what the state of rollback tools is at the moment.

    Standby database: Assuming you mean Master/Slave replication, this is one of the major features planned for 7.3; 7.x has added a lot of the infrastructure needed for replication, and by 7.4 they hope to have multimaster replication (ie a fully distributed database).

    SONY. Because caucasians are just too damn tall.

    Crazy People. Hysterical movie.

    Sumner

    --
    rage, rage against the dying of the light
  10. PostgreSQL Books by LarryRiedel · · Score: 3, Informative

    There are a few decent books about PostgreSQL out there now. It is so much nicer than a few years ago.

    Practical PostgreSQL. I think this one just came out as a bound book. I just got it a couple days ago and it is pretty good. It is also online.

    Postgresql : Developer's Handbook. I (as a developer) like this one best of all that are out now.

    PostgreSQL Essential Reference. This one is pretty good, but I would not say it is essential. :-)

    Beginning Databases with PostgreSQL. This is one of those Wrox books which is about 10000 pages, including 80% of what I want to know and 2000% of what I don't.

    There other others, but I think they are weaker. I was disappointed with the one just called PostgreSQL.

  11. Re: plication by Smoking · · Score: 3, Informative

    I've recently set up a master-master replication environnement on Oracle 9i and I did some research to check if it was possible with postgres.

    In fact there are many solutions available (check techdocs.postgresql.org for a list...)

    The most advanced guys on the subject seem to be the swiss engineering school in Zürich. Here is a list of their publications.
    They seem to have developped a replication scheme (Postgres-R) where they have better than linear performance improvement when they add new masters...Quite impressive

    Quentin

  12. Re:I have only one feature request for PostgreSQL. by chriskl · · Score: 3, Informative

    I'm a postgres developer and I really have no idea what you mean here!

    Postgres always keeps its integrity constraints, including when you dump and restore. It's done this, as far as I am aware, since at least 7.0.

    LOBs are no longer a problem, since 7.1 supported unlimited row length with binary or ascii data - just use 'bytea' or 'text' fields...

    Chris

  13. PostgreSQL Documentation by Confuse+Ed · · Score: 2, Informative

    PostgreSQL also comes with some great documentation (probably the best I've yet seen with any software I've used (the only contender I can think of offhand might be the Devpac assembler on the AtariST many years ago))
    (This is also of course online, e.g. the 7.1 version is up at http://www.ca.postgresql.org/users-lounge/docs/7.1 /postgres/ , or interactivly the interactive version http://www.postgresql.org/idocs/)

    So how do we encourage other projects to develop such documentation, which I assume is very time consuming and difficult, as well as being for very little reward (How often do we see posts appreciating new software features compared to those acknowleding the associated documentation?)

  14. Re:Congrats to the PostgreSQL Development team! by bwt · · Score: 3, Informative

    That is definately a risk. It is often possible to harm overall system performance by upgrading an RDBMS that includes optimizer improvements. Any changes to an optimizer will change execution plans. Hopefully most of them get better, but a few get worse, often dramatically worse. Finding the ones that get worse and tuning them is an important activity. Bad SQL plans are often the biggest impact tuning activity, so it is very important to understand what will happen to your specific application before you make changes that affect how your SQL-statements are implemented.

    This is one area in which Oracle shows its power over the open source databases. (It's also a big oportunity because Oracle can be improved on). Oracle can actually tap into continuous statistics gathering on a per SQL level by using oracle's v$sqlarea dictionary view. If you need high-powered scrutiny on a particular activity, you can trace the session to logs and see the row statistics at every step of the exectution plan. Oracle has more optimizer hints, and has a facility to "pin" an execution plan, so that it won't be reevaluated if optimizer behavior changes. Oracle is working toward server-side SQL tuning, where you can ID bad SQL's and "intercept" them at runtime by adding hints on the server side. That will be an absolutely huge feature, since often SQL hits your system that you can't directly control but can predict.

  15. Re:Long time mysql user, postgresql newbie by slamb · · Score: 3, Informative
    Seriously, what's the "preferred" way to add a normal, non super user, only has select, insert, update, and delete access to a given database that can connect from the local machine, and remotely. Is this even possible?

    Add something like this to your pg_hba.conf:

    local sameuser trust
    host sameuser 127.0.0.1 255.255.255.255 trust

    That's not authentication! "trust" just allows logins, period. Try "psql -U postgres" as anyone on that machine. You'll instantly be logged in as the superuser.

    Something like this works fairly well on Postgresql 7.1:

    host all 127.0.0.1 255.255.255.255 ident sameuser
    host all 0.0.0.0 0.0.0.0 password

    Then enable TCP/IP connections ("tcpip_socket = true" in postgresql.conf)

    Very important: make sure your ident server is trustworthy. Many ident servers have an option to allow a user to fake identification. Turn it off.

    Also, the config I posted there will let any user connect to any database. That's the simplest, but not the most secure. The "sameuser" in the database field won't be enough to let the superuser connect to databases. You might add a seperate line for that with an ident map containing only postgres (the file would have only the words "postgres postgres" in it, on one line). And then "all" in the database field with that map. I.e., "host all 127.0.0.1 255.255.255.255 ident postgres"

    For remote connections, just make sure they have a password in the database:

    create user slamb with password '12345';
    alter user bob with password 'newpassword';

    There's no authentication method here specified for UNIX domain sockets, so they just don't work. You'll need to set the PGHOSTNAME="localhost" environmental variable for stuff to authenticate correctly. I did this because pgsql 7.1 did not support ident on UNIX domain sockets. pgsql 7.2 now does, on certain platforms. (Just replace "host <db> <ip> <netmask> ident <map>" with "local <db> ident <map>")

    pgsql 7.2 adds pam support. If your UNIX and PostgreSQL usernames correspond, it should work.

    pgsql 7.2 also adds support of encrypted passwords. There's an option for storing password encrypted in the database and an option for challenge-based encryption. I think these methods are incompatible - good challenge-based encryption requires the password be stored in plaintext on the server.

    There has been Kerberos auth for some time. I'm trying to switch over to this now, as I'm setting up Kerberos on my network. It's a more complicated system to set up correctly, though. Get something else working first.

    Official docs are here

  16. It runs under Cygwin!!! by bartwol · · Score: 2, Informative

    I haven't tested it under load, but I've installed and successfully tested PostgreSQL 7.1 under Cygwin on Windows 2000.
    <bart