PostgreSQL 7.4 Released
Christopher Kings-Lynne writes "PostgreSQL 7.4 has just been released. The list of new features is impressive and includes greatly improved OLAP performance among many other speed improvements."
← Back to Stories (view on slashdot.org)
I use PostgreSQL extensively, and I have had a hard time convincing my-mySQL (I'm so clever) exclusive friends to give it a try.
One thing that should be noted is that the JDBC drivers (http://jdbc.postgresql.org) are now among the best I've used. For those developing Java apps, the choice is now even more clear.
....still no native replication. MySQL has this one single advantage over Postgres.
Oh, raw disk use would be nice too.
Martin Brooks / Slayer99 #linux / UIN 2178117
IN/NOT IN subqueries are now much more efficient.
Queries using the explicit JOIN syntax are now better optimized.
New multikey hash join capability.
Cursors conform more closely to the SQL standard.
Sounds like they pushed closer to the SQL standards, good job guys.
Does anyone here know more about this "New client-to-server protocol" they speak of?
TruePunk | Games
If you ask me, I'm glad that people opine and wage war when it comes to PostgreSQL vs. MySQL. MySQL has done for PostgreSQL what windows has done for Linux: make it want to thrive, compete and prosper.
This is also why Microsoft WANTS there to be an enemy: they need someone to compete against to continue improving their product (which they do, even if we hate to admit it).
If you don't believe me, ask Dubya.
There are two kinds of people in the world: Those with good memory.
I run postgres on my own database servers (when I'm not making movies, that is). Now, there's a distributed database project associated with Postgres, trying to add replication into the databases' bag of tricks.
Lotus Notes implements e-mail and lots of other things on top of a database engine that performs replication. So, could Postgres be used to develop a Lotus Notes type application with replicated databased for e-mail, calendars, team rooms, etc?
This is America, damnit. Speak Spanish!
This is also why Microsoft WANTS there to be an enemy
Umm.. In the enterprise db market there is a little company called Oracle...
Trolling is a art,
I'm sure I'm gonna get modded down for this, but does anyone know when is there gonna be a version that can run in windows natively (without using Cywin)?
I ask because we are FORCED to use Windows boxes at work, and they gave all of the developers 2. We can't reformat and put linux on (or do a dual-boot) because they check to make sure everything is status-quo. And right now the atmosphere around here is not the greatest, so I'd rather not risk anything with the PHB's by trying to trick them.
I usually have my 2nd machine as a server running mySQL as a testbed for my database apps. I'd LOVE to switch to Postgresql, but I'm limited as to what I can do.
Any idea when a Windows native version will be available?
Because they care about your data, among other things! You could have the fastest database server in the world, but if you find your data is corrupt, or truncated without warning, it doesn't do you much good.
Here is huge list of MySQL Gotcha's that absolutely floored me when I first read it. In my opinion, a "gotcha" in regards to a database is a "Bad Thing(tm)"
MySQL Gotchas"
Open Source Time and Attendance, Job Costing a
Actually, Sequel was the name for the original Standard English Query language invented by IBM. SQL was the name of the second version of the language, also invented by IBM.
r y/ techarticle/0301jones/0301jones.html
http://www-106.ibm.com/developerworks/db2/libra
I'm using Postgres in a company that analyzes statistics, and maintain a ~250GB db. Trust me, when you're talking about doing a seq. scan on 250GB of data, the preformance differences are MUCH more than minor. The reports of speed increases don't cite minor increases either, they cite really major changes.
Taken from the presskit:
PERFORMANCE
Several major performance enhancements have been added in version 7.4, enabling PostgreSQL to match or exceed the speed of other enterprise database systems. These include:
* Hash aggregation in memory to make data warehousing and OLAP queries up to 20 times faster;
* Improvements in subquery handling by the planner resulting in up to 400% speed increases in some complex queries;
* New script to set more reasonable postgresql.conf defaults for shared buffers, yielding better "out of the box" performance;
* New wire protocol (version 3) increases the speed of data transfers;
* Enhanced implementation of functional indexes allows better indexing on custom data types and composite fields;
Full text searching also got another overhaul- I plan on messing around with it when I get some free time. They've included a .sql file you can just import into an existing DB.
The real power here is that the index is quick to update, and as a result, can be done in real-time via triggers and stored procedures- neither of which you can do with MySQL :-) The new release is also even more SQL compliant- something else MySQL can't claim. PostgreSQL is both SQL92+98 compliant if I recall.
It can't be said enough- PostgreSQL is now MUCH faster...and due to features like stored procedures, triggers, and some of the best locking available combined with some of the best transaction support, it's actually far faster at many of the same tasks if you take advantage of these greater abilities.
Even back as early as '99, PostgreSQL absolutely mopped the floor with MySQL when as little as 10% inserts or updates were thrown into a select test. Why? Piss-poor locking and zero transaction support. The stuff you have to do in the application layer to make up for proper(or ANY) transaction support will make most benchmarks completely pointless.
MySQL always has, and always will be, a DB best suited for blogs and 2-guys-in-a-garage; it's slapped together, has a low featureset, and is not standard-compliant. PostgreSQL is not an enterprise fish(replication still needs work if I understand it correctly)- Oracle, DB2 etc have that market pretty well covered- but it's great for everyone else who isn't, say, a multibillion $ company...if those people just bothered to have an open mind instead of pointing their fingers at benchmarks showing MySQL running out of an in-ram-only table can select 50,000 rows faster than PostgreSQL can, and whining about how they need to make a cron job to vacuum/vacuum analyze tables at an appropriate time(with autovacuum, also in this release, there goes that excuse!)
Please help metamoderate.
For some applications with a chance of growth I've had two issues with Postgresql. One is that despite the fact that they have talked about being an "enterprise level" database for ages, we found that in any kind of swift moving transaction enviroment we had to VACUUM pretty regularly. How they expected folks to leave pgsql running over extended periods of time (months -> years) is beyond me. Looks like they may have solved it. It will be interesting to see if the systems can take a pounding and stay up 24/7 for a while without slowing to a crawl.
The other issue has been replication. With mysql this has saved our bacon more then once. Nead to do intensive analysis on live data and don't want to disturb active system? Set up a nice slave and query away.
Want basic fault tolerance? Set up a slave, you have a live mirror of the data.
Have lots of queries coming in (load balance the reads at least).
PostgreSQL now has some type of replication available from PostgreSQL Inc, but it looked to me like somewhat of a hodge podge of perl, triggers and who knows what else.
I think I'll try it out, and if I can get the same replication speed as I do with a mysql array I'd switch over, but first glance it didn't look like I would. Anyone compared the replication performance yet (and ease of setup, I was very impressed with mysql in this regard).
I work with both of them, so I can compare both. Personally, I see many cases, especially when the data model is complicated enough, when PostgreSQL is faster than MySQL. But for that I am spending some extra efforts, because many OSS projects are ported to work with one DBMS, not with both.
I love PostgreSQL and it's functionality but unfortunately there are still many developers of other open source projects who heard about MySQL and did not do any research for existing alternatives and thus made his project based on MySQL.
And, again unfortunately, while PostgreSQL is very close to SQL standard, but MySQL is not that close, so you cannot just substitute the database library - you have to re-write (and thus re-test) all SQL code of the project. So, that's why I still have to use MySQL.
With all my respect to great technical quality of PostgreSQL software, I think PostgreSQL team doesn't do a great job to make PostgreSQL being popular. The athmosphere in PostgreSQL community reminds me the one of BSD (read: very unfriendly).
Less is more !
And it is very, very nice. If you have people resistant to using pgsql (which I like a lot personally) pgAdmin III will give them all the GUI stuff they are used to from programs like SQL Server Enterprise Manager.
The postgreSQL community is extremely helpful, key developers are very active in helping out users and addressing issues rapidly. It is a project that just exemplifies what is good about open source.
I am compiling 7.4 on my development server right now in preparation for moving our production server soon. I guess maybe I sound like a fan boy but as a database administrator I just can't over emphasize my joy at getting to work with such an excellent product.
It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
New autovacuum tool
The new autovacuum tool in "contrib/autovacuum" monitors the database statistics tables for "INSERT"/"UPDATE"/"DELETE" activity and automatically vacuums tables when needed.
Not true. Starting in 7.3, the default version of VACUUM no longer locks the table. From the 7.3 docs:
We have one client who uses Linux, the rest are all Windows-based... Is there an unbiased (as far as can be) comparison ?
Now, see your problem?
Neither PosgreSQL or mySQL are full, complete, and utterly perfect implemtations of a database. Neither is Oracle, BTW.
mySQL got a HUGE push some time ago. Back then, mySQL couldn't be beat for handling read-only (Actually, highly read-almost exclusively always). mySQL was a champ when you had a web site, mostly static catalog of products (for example), and had really limited demand for SQL (Like one query that read 'select * from catalog;')
That basis of comparison is no longer true.
So, at the time, hords of little corporate minions lined up and specified mySQL. Not a bad bet at the time, but open mindedness only seems to happen once in computer circles. Day 1 you have a need, day 2 you actually research available solutions, and day 3 you declare a "winner" and it is forevermore cast in stone as the "one true solution". The fact masses of people tend to go thorugh the same process at basically the same time doesn't help. Thus the broad noise that mySQL is "the Answer(tm)".
Anyway, postreSQL has always sought to compete in the full function space. Oracle was/is a much better "comparison" to postgreSQL than mySQL.
Now, both mySQL and postreSQL have improved over time - greatly. postgreSQL seems to be focused on getting things "correct", while mySQL doesn't seem so concerned. Bascially postgreSQL will not provide a feature, while mySQL will hacking it together in some bizzare way (re: early "transaction" handling). mySQL has quite a few anti-social behaviors. Over time, their refinement of those various behaviors drive certain development costs and create some degree of lock-in dependency (a continuing basis for self-justification).
Bottom line, if you invested in learning and implementing mySQL, and it is still working for you, then there is absolurely no need to be concerned with postgreSQL yet.
If you are in the database selection mode, you should surely look towards postgreSQL and try to de-hype yourself from any pro-mySQL bias. Hype has inertia and much of the pro-mySQL hype is based on old comparisons and narrow needs. Yes, evaluate both, but don't assume mySQL or postgreSQL is "better" based on what you hear.
Now, my understanding of the vacuum command was that it effectively took the DB offline (not good with the hit-rate I have), and my understanding of 'auto-vacuum' was that it would negate that effective downtime. It appears that that is not the case.
Normal VACUUM commands do not lock tables as of 7.3. Only the full vacuum command does this, which you probably only need to use when permanently retiring a client, rather than just rotating out their data, as the lock is used to actually repack the database on the filesystem to reduce the filesize.
For your needs the normal vacuum provided by auto-vacuum should be sufficient, as you're going to fill up the empty filespace with the next period's data anyway.
If I have been able to see further than others, it is because I bought a pair of binoculars.
Sure, row-level locking is nice -- even MSSQL has that. PostgreSQL has MVCC - so that writers never block readers and likewise. Complete data consistency (i.e. repeated reads give the same results) from the start of a transaction to the end of it. Can MySQL do that? (I am actually asking....)
Ok, there is a lot of talk about vacuum what it does / doesn't do and what effect autovacuum has. Here are the details (FYI, I wrote pg_autovacuum).
Recent versions of postgresql don't take your database offline during vacuum. However, the vacuum process is an I/O intense process and can still, even 7.4, slow the server significantly while it's running. Work is has alredy been done in the 7.5 development tree to address the I/O storm created by vacuum.
Typically, you setup cron to run vacuum your entire database nightly. This is fine, except it has two main problems. 1) It wastes time vacuuming large tables that probably don't need it (think audit train table that only gets inserted into). 2) It probably doesn't vacuum tables that are constantly updated often enough, which results in bloated data files, and slower queries.
The new pg_autovacuum daemon addresses both of these concerns by monitoring database activity (using the stats system). When it sees that a table has has been modified enough to warrant a vacuum then it does so, when it sees that a table might benifit from a analyze only, then it does that. And when a large table doesnt' need to be vacuumed, it doesn't vacuum.
Yes, MySQL is small, light and fast and I use it as my general light duty DBMS, but I'm not religous about it. When the going gets tough I switch to something tougher.
Looks, it's not because you can't do things with MySQL. It's how you have to go about doing them. That lightness and speed comes at a price, it's an engineering tradeoff. There's no such thing as a free lunch and all that.
What it gives up is intergrety constraints. If you don't spend the cycles to insure data integrity you can be smaller and faster.
So let me ask you, how fast do you want your data munged?
If you don't want your data munged at all and you're using MySQL you need to pass off integrity issues to your app. Well, there you are using cycles again. The DBMS is faster, but now your app is slower (yes, you're still saving a bit of disk access time, which can add up. That's a flaw in SQL itself. There are alternatives.). More importantly you're using your time as a developer to reinvent the integrity constraint wheel in every app. Coding time goes up. Bugs go up. Support issues go up. All to accomplish something that is a logical function of the DBMS. That's why we call them a DBMS in the first place. It has been argued that MySQL doesn't even meet the definition of a DBMS.
Once I had data
My DBMS munged it
But damn it was fast!
Again, don't get me wrong, I use MySQL, but I use it in full knowledge of what it does and does not do and what it does not do is guaruntee the consistency and integrity of my data.
And I have better things to do with my time than recoding DBMS functions into my apps. I use MySQL where data integrity isn't a critical issue.
KFG
We are in the midst of moving our databases away from Oracle. There were three contenders: MySQL, Postgres, and Matisse (OODBMS).
Speedwise, PosgreSQL trails the pack by a fair bit. Sometimes it would be comparible to Oracle, and other times it wouldn't be without a fair bit of tuning. Outer-joins, for example; the optimizer can't seem to make heads or tails of it.
I spent two years lurking on the Postgres lists, and when doing performance testing, was asking for help tuning queries and the database in general; this isn't a statement made based on, "I tried it once, and it didn't work."
The guys on the list (especially Tom Lane) were very helpful and polite, but I just couldn't get reasonable performance out of the database without doing some serious SQL-rewriting (our CTO thinks that relational databases require too much tweaking already; putting optimizer hints into the queries is just too much).
Overall, the database is great - great feature set, great developers, and a good support community, but the optimizer is not efficient enough (search for the word optimizer in the PostgreSQL lists, and you'll find hundreds of posts where the optimizer is doing a sequential scan and ignoring indexes when it should be using those indexes).
MySQL (4.0.16, using InnoDB tables) has foreign keys, transactions, etc. I haven't been able to crash it yet (I miswrote a query on purpose, and let it run over 2 days at 99% CPU, and the machine stayed up, and is still up a week later).
MySQL doesn't have triggers or stored procs, but as a DBA and senior developer, I can honestly say that's a good thing.
- if you modify a table that a trigger or stored proc uses, chances are the trigger and stored procedure are invalidated quietly behind the scenese - the database doesn't tell you until you call the stored procedure or execute a statement that causes the trigger to be executed.
- debugging a stored procedure or trigger is not easy.
- people tend to forget about triggers and stored procedures; they're hidden logic that can cause no end of problems.
- triggers and stored procedures are (in most cases) database-dependant; they are a huge hinderance when moving to another database. We have 12,000 lines of Oracle stored procedures. I dislike them.
- the database is for data storage. It's not for application develoment. Keep the business logic in the application, and the data-storage logic in the database. Oracle is trying to sell their RDMS as a development tool to justify the price. Don't believe the hype.
PostgreSQL is trying to position themselves as an Oracle replacement, and thus have a similar feature set. PostgreSQL is also very good at very large databases (probably even more so than MySQL, at least until InnoDB gets multiple tablespaces in the next release).
Databases with simple queries where results are not needed instantly would do well with PostgreSQL.