PostgreSQL 8.2 Released
An anonymous reader writes to let us know that PostgreSQL 8.2 has been released (bits, release notes). 8.2 is positioned as a performance release. PostgreSQL it is still missing the SQL:2003 Window Functions that are critical in business reporting, so Oracle and DB2 will still win out for OLAP/data warehouse applications.
MySQL has been the dominant SQL server within the open source community. Between its non-standard SQL and it's lack of advanced features, many developers and DBAs are getting fed up. Thankfully, they've been able to turn to PostgreSQL.
At my firm, we switched some of our MySQL Enterprise databases over to PostgreSQL 8.1. What we found was pretty amazing: PostgreSQL outperformed MySQL by approximately 23% in terms of the number of queries it could handle per second. And this was with a very basic level of tuning! Our MySQL installations, on the other hand, had been tuned by three different consultants. Keep in mind that both were running on exactly the same system, under the same installation of FreeBSD. Were not sure exactly why there was such a remarkable increase in performance when using PostgreSQL, even without much tuning, but we're happy with it nonetheless. We're also happy to no longer being paying MySQL for support.
We're actually quite happy to get away from MySQL. The other developers I work with were quite sickened by the deal MySQL AB reached with SCO a while back. While we're strictly a BSD shop, we still think SCO's actions are quite distasteful, and we are willing to move away from companies that enter into deals with them.
How fast is it against MyISAM?
I can't remember where I heard it or who said it, but I once heard someone say words about MySQL to the effect of "if you ignore all the things that make a real database a database, you can make it really fast." Now, I get that lots of web hosts use MySQL and that it is the dominant free database out there. However, there is lots of insight in that statement. Now, in 99% of the cases where MySQL is used, it probably works great with few hitches. However, I'd rather trust my data to a something that values data integrity over speed.
Recall that not too long ago, right here on slashdot we all got to see first hand what happens when MySQL craps out. All the threading was gone. I mean seriously, what sort of database accepts invalid and then silently truncates it and moves on? Again, I don't think that the number of people with MySQL tables with 16,000,000+ rows is very large, but it is still disturbing.
If you are going for something small and light and fast and you are not too concerned about standards, then MySQL is great. Note, I am not trying to troll, I am simply pointing out that for all the people who endlessly bash on one or the other DB, that there is a market space for each.
I mean... did this OP rush to push out a lackluster FP on PG, or what?
Practically the only informative part of this post is focusing on the perceived negative (which is a dubious one, IMHO).
Never mind that Postgres has actually turned out some nice feature advances in this release, although they don't make for good marketspeak bullet points. There have been advances in performance, table partitioning, clustering, query logic, user-defined functions, etc... pretty much every area of "enterprise" database development except for the one area the OP chooses to focus.
OLAP itself is the kind of thing we thought we'd never have to do because of the power of the relational model. Well, we were wrong.
How would we know? We have never yet seen a DBMS that really implements the relational model (at least, not in the normal world of business software). Show me the word 'relational' in the SQL standard, anywhere. What we have is all sorts of incredible complication to work around the fact that SQL itself is a damaged and confused (and at times contradictory) approach to the problem.
The serious theorists I have read argue that the reason we need all the performance workarounds is *precisely because* we are not really working with a relational system, and all the vendors conflate logical levels and physical storage levels to various degrees.
> 8.2 is positioned as a performance release.
We've only got a small database (17 million records or so), and PostgreSQL 8.1 has been handling it fine. But I'm still looking forward to seeing how 8.2 improves things.
And we're using it in another production system, too, which is going to get pretty big (I hope). Lively times!
The Army reading list
You mean the magic Darwen/Date/Pascal relational model? The one nobody has managed to implement despite the 25 years it's been around?
Maybe it's because the thing can't be made to work, and its limitations (i.e. being equivalent to first-order logic, a limitation not in SQL DBMSs) make it silly even to keep trying.
What part of "A well regulated militia" do you not understand?
You mean the magic Darwen/Date/Pascal relational model? The one nobody has managed to implement despite the 25 years it's been around?
:(.
Ahh yes, the old canard. Actually, several companies and individuals have implemented the relational model MUCH more faithfully than the typical SQL vendor. The problem is not one of difficulty, but rather of popularity and marketing.
In fact, several solo-developer projects have implemented it on the logical level much better than your typical SQL vendor. The problem is that those guys don't have a) the marketing budget and 20 years of industry buy-in, and b) the developer team to implement all the "enterprisey" features like clustering, failover, etc... And by the way, there is nothing about the "true relational model" that makes those things harder to implement. They are if anything LESS difficult to implement with a true relational DBMS than with an SQL DBMS, which has to handle all kinds of oddities like duplicate rows, position-dependent syntax, pointers, and many other nonsensical rules of SQL.
I know lots of you database pros out there hate to hear from guys like Date,Darwen and co. but the thing is they are right: the DBMS world has opted for mediocrity and over-complexity. Of course, that's the way it is with most things in life
> > You mean the magic Darwen/Date/Pascal relational model? The one nobody has
:(.
> > managed to implement despite the 25 years it's been around?
> Ahh yes, the old canard. Actually, several companies and individuals have
> implemented the relational model MUCH more faithfully than the typical SQL
> vendor.
Name one, and make sure it's one that's disallowed NULLs completely. Date,
Darwen and Pascal's fear of recording states of ignorance is ill-founded in
real-world conditions. Codifying that fear isn't even well-founded in last
century's mathematical theory. Yes, it's true that multi-value logics are
just a teensy tad more complicated theoretically than 2VL. That does
not imply that they're less useful, or that the systems built around
them are more complicated than the truly wackily byzantine things D, D & P
suggest as workarounds for not having NULLs.
> The problem is not one of difficulty, but rather of popularity and
> marketing.
Nope. See below.
> In fact, several solo-developer projects have implemented it on the logical
> level much better than your typical SQL vendor. The problem is that those
> guys don't have a) the marketing budget and 20 years of industry buy-in, and
> b) the developer team to implement all the "enterprisey" features like
> clustering, failover, etc...
> And by the way, there is nothing about the "true relational model" that
> makes those things harder to implement.
That it's been 25 years and nobody has implemented it, despite
resources in industry, government, academia and open source, flatly
contradicts your assertion.
> They are if anything LESS difficult to implement with a true relational DBMS
> than with an SQL DBMS, which has to handle all kinds of oddities like
> duplicate rows, position-dependent syntax, pointers, and many other
> nonsensical rules of SQL.
> I know lots of you database pros out there hate to hear from guys like
> Date,Darwen and co.
Nonsense. It's not that we don't like to hear from theoreticians. It's that
we don't want to hear from doctrinaire ideologues like D, D & P, especially
when they have only "angels dancing on the head of a pin" to show for their
side. One theoretican whose stuff is actually worth reading is Leonid Libkin
. There are plenty of others.
> but the thing is they are right: the DBMS world has opted for mediocrity and
> over-complexity.
You know, this is really dumb prima facie. Something that you need to
have a 130 IQ and a math degree to use even at the most basic level is
something that's pretty fragile. A *really* well-designed tool is one that a
person who's not very bright can pick up and use, while not muzzling the
expression of somebody who is bright and has lots of experience. SQL
qualifies.
> Of course, that's the way it is with most things in life
Oh, puh-lease!
What part of "A well regulated militia" do you not understand?
> > > You mean the magic Darwen/Date/Pascal relational model? The one nobody has
.sig would think that big government, modern academia, and big business are the standard bearers for logic and the limits of human endeavor ;).
> > > managed to implement despite the 25 years it's been around?
> > Ahh yes, the old canard. Actually, several companies and individuals have
> > implemented the relational model MUCH more faithfully than the typical SQL
> > vendor.
> Name one, and make sure it's one that's disallowed NULLs completely. Date,
> Darwen and Pascal's fear of recording states of ignorance is ill-founded in
> real-world conditions. Codifying that fear isn't even well-founded in last
> century's mathematical theory. Yes, it's true that multi-value logics are
> just a teensy tad more complicated theoretically than 2VL. That does
> not imply that they're less useful, or that the systems built around
> them are more complicated than the truly wackily byzantine things D, D & P
> suggest as workarounds for not having NULLs.
You're mischaracterizing the argument. I said "MUCH more faithfully". I know there's no perfect implementation, nor will there likely ever be. Name one perfect implementation of the SQL standard. But, there have definitely been *better* implementations, ones that attempt to fit the concepts of the relational model more closely. You know the ones I'm going to talk about: Duro, Rel, Alphora, etc... The fact is very few people care about these, for the same reason that very few people care to be told to eat their vegetables, or in fact to be told there is a better way to do whatever it is they are doing. That doesn't make my argument wrong.
Meanwhile, I'm not a slavish ideologue about this. I personally don't care about the NULL thing, because I think there are sensible arguments on both sides, and no easy resolution. But, supporting duplicate rows, rowIDs, positional attributes, etc... seem to me such blindingly obvious bad choices. This is without even getting to the more abstract stuff like transitive closure. Of course there are trade-offs in the real world, but why trade off things that are useful to gain things that are not?
> > The problem is not one of difficulty, but rather of popularity and
> > marketing.
> Nope. See below.
> > In fact, several solo-developer projects have implemented it on the logical
> > level much better than your typical SQL vendor. The problem is that those
> > guys don't have a) the marketing budget and 20 years of industry buy-in, and
> > b) the developer team to implement all the "enterprisey" features like
> > clustering, failover, etc...
> > And by the way, there is nothing about the "true relational model" that
> > makes those things harder to implement.
> That it's been 25 years and nobody has implemented it, despite
> resources in industry, government, academia and open source, flatly
> contradicts your assertion.
In other words, it can't be done because it hasn't been done? Fallacy. Tell me a logical *reason* why it can't be done.
BTW, I have a hard time believing that someone with your
> > They are if anything LESS difficult to implement with a true relational DBMS
> > than with an SQL DBMS, which has to handle all kinds of oddities like
> > duplicate rows, position-dependent syntax, pointers, and many other
> > nonsensical rules of SQL.
> > I know lots of you database pros out there hate to hear from guys like
> > Date,Darwen and co.
> Nonsense. It's not that we don't like to hear from theoreticians. It's that
> we don't want to hear from doctrinaire ideologues like D, D & P, especially
> when they have only "angels dancing on the head of a pin" to show for their
> side. One theoretic
It sounds like you just don't know how to deal with FreeBSD. That would explain the poor performance you experienced, and how it is completely contrary to what we've found.
For the heavest application at my last job, the load pattern was very query heavy, although the application stored intermediate results in temporary tables. This application is heavily threaded, creating two threads per user connection, plus the MySQL thread, so we're talking like 150 threads created & destroyed per second.
Our original platform was Solaris, and performance was excellent (well, excellent considering the dog-slow CPUs that Sun makes).
We eventually migrated to Linux, but this was possible only after the new thread libraries (well, new at the time). Performance then was quite good.
We found MySQL under FreeBSD basically unusable under heavy loads.
We never tweaked any of the systems. We did try a few thread libraries under FreeBSD, but they all sucked.
CJ Date actually tried to generalize the concept of NULLs into "special values" in a domain. He argued that NULLs cause confusion in 3VL because NULL can mean different things. Sometimes it means "unknown", other times it means "not applicable". And in an outer join, it's not clear at all what the value NULL is supposed to represent. At least in my reading of his work, there is nothing precluding you from having a special value called "NULL" and being able to magically add it to an existing domain with simple syntax. He wanted other special values to be allowed, so that NULL was not so ambiguous. Perhaps I misunderstood, I don't have the book nearby.
I really enjoyed Date's work because he clearly explained points of confusing terminology, and suggested possibly less-confusing approaches.
SQL is far from perfect. If nothing else just the syntax is bad, and it has a lot of reserved words and key words. But one thing we have to remember is that no pure relational language is appropriate for a database. For instance, relations are unordered, by definition. Yet, it's important for a database system to be able to ORDER BY.
So, really, I think what we want is a language that facilitates all the relational operations, and at a level above that also allow non-relational operations like ORDER BY.
Social scientists are inspired by theories; scientists are humbled by facts.
(From Wikipedia and archived MySQL manuals)
Do you even lift?
These aren't the 'roids you're looking for.