Comparing MySQL and Postgresql
Mr Poet would like to start a discussion on the following: "I am an AVID user of Postgresql, over the past 6 months I have seen more and more about MySQL. I have done a little research and outside of MySQL being extremely fast I have not seen any reason why people are so excited about it. This is not a slam on MySQL. I just don't understand the reasons. Could we have a technical discussion as to why?"
That's a good question. What are the strengths and weaknesses of both programs. Are there areas of deployment where one is more appropriate than the other, or are these programs fairly interchangeable?
PostgreSQL uses multiple back-end processes, one for each connection to the database (like some other brand-name dbmses). Each backend works independantly, except when locking is required, so PostgreSQL can indeed access two or more tables at the same time, you just need two or more backends. -- cary
MySQL requires a commercial license for the server, but not for the client. Perhaps both client and server are non-free, read the licenses first to be sure.
One obvious deference between the two databases is that mysql is a relational database, whereas postgresql claims to be an object-relational database. Last week I tried postgresql, and in terms of features, illustra and to oracle8i are better. (I choose not to run any non-free software on my computer, so I will stay with postgresql.) Postgresql seems to be significantly better than mysql in terms of compliance to SQL standards and features. (A previous post gave an url on "db" comparison, which I thought had too many inaccuracies on both Oracle and postgress.) I know that postgresql supports procedural languages and jdbc. Perhaps mysql does also, or maybe not. In terms of features alone, I am far happier with postgresql than when I tried mysql.
If I remember correctly, PostgreSQL starts a new process for each connection opened and communicates through a shared memory region so multiple connections can each have requests answered.
Just out of curiousity, does anyone know how either of these databases compare to MSSQL?
I agree with the mailing list bit. I am constantly amazed when I a lowly hacker, consistantly get direct responses within a day or two from one of the primary authors of the MySQL software.
Sybase uses threads, not multiple processes like oracle. One of the main advantages of Sybase over Oracle is that it's lighter weight and actually higher performing (if used properly.) Oracle is bloated and very expensive - they have good marketting though. The reason I choose to go with mySQL over postgreSQL for certain applications is postgres's more resource intensive multiprocess archictecture. Imagine a system with 40 or 50 connetions to a database - you'd have 40 or 50 postgreSQL processes. Seems like a resource hog to me...
Unless I had a very small, maintainable, perfect program running on top of perfect libraries, I'd hesitate to use threads where they weren't necessary. Threads make everything so much more fragile, and limit your choices (and chances) for error recovery--or even error tracking, as the thread causing the corruption may not be affected by it. At least with separate processes you have a limited section of shared memory, accesses to which you can limit to a fairly small section of code. With threads, if you have a memory corrupting bug and you can't review the entire program to find it (or don't find it when you look), you're pretty much screwed...
Remember, kids, the thread you seg fault may not be your own...
FWIW, PostgreSQL still doesn't do row locking. It doesn't lock while you read. Period. This is substantially better than row locking, provided you're ready for the results.
--
Ben Kosse
Remember Ed Curry!
Just as an important note, only two RDBMS's support truly parallel executiion og queries -- Oracle and DB2. This feature is called MPP in Oracle and costs mucho buckos. MPP Oracle is one of the fastest(if not the), most scabale databases availible. It would be interesting to see if Oracle will port this feature to Beowulf.
Threads are merely a design issue. mySQL decided to execute mutliple queries over threads rather than processes. Oracle, Informix, DB2, and PostgresSQL use multiple processes in UNIX environments. This largely due to their age. When they were first written UNIX threads either didn't exist or weren't standard enough to work reliabily across platforms. IMHO, threads are better because they can be started and closed fater and require fewer resources, but then, processes are cool as well.
I believe that the HAVING clause is designed for
the use of aggregate comparisons.
-- Thorin sits down and starts singing about gold.
I've been happily using MySQL for three years; it saved our asses when one an outside vendor implemented something using mSQL. Their program worked, but took two or three minutes to run, and our client was freaking out. I discovered MySQL, ported the program over, and the MySQL system took two or three seconds to run.
I kinda miss subselects, but not much; they're scheduled for a point release in the near future. I think the word from Monty is "September", but don't hold me to that.
I've never missed transactions. The one time they would have been nice I was able to catch the exception and delete the incomplete information.
MySQL's license is pretty simple -- as long as you don't sell a product that can only function with MySQL, you don't need a license. If you provide a MySQL adaptor, a PostgreSQL adaptor, and an Oracle adaptor, I don't think you need one.
Actually in 6.5 there now is a serial type that will create the sequences and and the indexes for you and automatically increment the count with each insert.
Thus said, in my setup (web backend database), I choosed PostgreSQL, mainly for the totally random reason that MySQL was'nt compiling on my developpement platform (Debian 2.1/Sparc) and PostgreSQL was packaged for that distribution.
However, I had made some research on the issue. Here are my conclusion :
- MySQL is reputed faster (MySQL developper admit that speed is no. 1 priority), but PostgreSQL advocate point that the speed disadvantage of PostgreSQL is mostly due to fsync() call for every write request to the db, wich is a Good Thing (TM) but can be disabled if speed is an issue. My personnal opinion is that fast hardware is cheap, thus lowering the importance of this issue.
- MySQL has a lot of nice third-partie support and goodies and better doc. However, the doc for PostgreSQL 6.5 had been much improved and is quite helpful.
- As pointed out in previous post, support for transaction is a plus for PostgreSQL. Also, the PostgreSQL developpement team seem genuinely focused on SQL92 compliance, another Good Thing (TM) from my newbie point of view.
- Another nice aspect of PostgreSQL I don't know much about is native geometric data type (could be interesting in GIS application) and support for BLOB (don't know if MySQL support BLOB). I don't need these features, but it's nice to know there available.
Thus said, PostgreSQL has some shortcoming (at least in v.6.3.2, the one I use):
- Can't use aggregate function in WHERE clause. For example, the following statement is legal with MySQL :
SELECT * FROM sometable WHERE id = max(id);
In PostgreSQL, you need some hoopla to do this :
SELECT * FROM sometable WHERE id = (SELECT max(id) FROM sometable);
This might be fixed in Pg 6.5, I don't know.
- There is no AUTO_INCREMENT in PostgreSQL. You have to create a "sequence" (CREATE SEQUENCE seq;) and invoke nextval('seq') to extract a serial number. This is fixed in v6.5 : PostgreSQL now have a SERIAL data type that is auto-incrementing.
That being said, don't take my word for it (I *really* don't have much experience with database).
Just my 0.02 CDN$.
:wq
This happens to be a question I have some experience in. I have been a professional MySQL developer for about 2 years now working at a 400 million dollar corporation. We first started looking at RDBMS's for the specific task of creating a separated, abstracted data reporting model in which we converted our OLTP database into a more friendly OLAP database. The special needs associated with OLAP made us look first for speed, second for flexibility, third for hooks into programming languages.
;)
;).
I can say beyond a shadow of a doubt that for OLAP and dimensional data modeling MySQL is without peer. One of the applications I have written has an active concurrent userbase of nearly 10,000 clients. (Not 10,000 possible clients, 10000 concurrent clients!) And the MySQL engine has no problems cranking out the data that they need.
MySQL has excellent hooks into a plethora of programming languages. The first applications written at my corporation were written in Perl with the DBI interface. The interface between Perl and MySQL is flawless. I have never had a single problem in dealing with recordsets via the DBI interface. Once we shifted away from pure reporting applications and more into dynamic web-based analytical processing a shift was made to PHP3. The PHP hooks into MySQL are amazing, the speed is unmatched and the ease of use alowed for excellent rapid application development in our high pressure enviroment. Once you learn however, that until PHP4, PHP did not have a true garbage collector so if you didn't explicitly free your result sets, you ended up with Apache processes that took up 140Mbs. of ram
The current shift is away from PHP and into a more formalized OO model with Java Servlets. The MM Mysql Drivers (available for download from a link on the MySQL download page) are excellent and fast. I wrote a custom database connection pooling algorithm based on some nice circular skip list heuristics, and we have never had a problem with it. I am still not sure if I like it as much as PHP, but the ties into MySQL are amazing.
As for MySQL's limitations... well, it does NOT have subselects... yet. With MySQL 3.23.* we will soon have subselects. At first this was something that really bothered me, until I realized that 90% of all things I needed to do via subselects were easily and more efficiently implemented as joins. And the rest could easily be implemented with the use of temp-tables (after all, that's what a subselect does anyways
The lack of transaction and rollback capabilites can be a problem, but with the soon advent of atomic operations we will have most of the neccessary tools to emulate some of the functionality of transaction-rollback mechanisms. This, however, is a valid complaint, and if you are developing an OLTP instead of an OLAP application, it might be a good idea to go with an Oracle, Postgress, or Solid.
And finally, the MySQL mailing list has some of the most brilliant minds in the world of database development I have personally ever encountered. There is seldom a question asked that does not either receive a professional concise answer, or a pointer to the proper place too receive that answer. So, all that said, look at the subject of the message for a concise version of this posting, and if anyone has anything they would like to dicuss with me about MySQL development, feel free to get in touch with me.
ed
That's what it comes down to. Consider gdbm/dbm as even lower on the functionality list, and even faster (I reckon, but don't truly know). Not all programs need a full RDBMS. That's all, no hidden agenda.
--
Infuriate left and right
If your comment about "nuts" is my other comment above, you din't read it write, or I didn't write it clearly enough. There is a continuum from a full RDBMS on one end to gdbm/dbm on the other, with mysql being in the middle. There certainly are times when even mysql is overkill, and gdbm is perfectly good enough. And there are times when mysql is not good enough, and you need a real RDBMS with transactions.
--
Infuriate left and right
1) Don't you mean that MySQL can perform multiple (read) operations on *same* tables simultaneously? Doing differenting things to different tables sounds like no big deal, not nearly as hard as doing different things to the same table.
2) Is anyone aware of a current benchmark comparing PostgreSQL to other DBs? The standard line about PostgreSQL performance seems to be "PostgreSQL was pretty slow, but 6.5 is a huge improvement"... with no numbers to compare.
Interbase has had a similar design for a long time, which has not helped it get anywhere in the marketplace.
That's absurd! That is what you use transactions for -- ensuring concurrency and atomicity among updates. MySQL is barely a relational database, and you could do almost as well using serialized hashtable objects in your choice of programming languages. :-)
I have never really seen any speed problems with postgresql -- I built a web-based groupware and e-mail application for a liberal arts college and based it on postgresql; at any given time, the db has tens of thousands of rows (maintaining user preferences, info about postponed messages, addressbooks, and myriad other things in the email app alone), and performance is comparable to if not better than an Oracle namespace of similar magnitude on the development server at my current job doing db development for a major pharmaceuticals company.
The moral of the story: if you're backing a web site, your bottleneck is most likely not going to be the DB, unless you're opening the DBMS with every page you serve (i.e. use a connection pool OR ELSE). You're more likely to be network-I/O-bound most of the time, and a "lightning fast" DBMS won't help that at all. However, if you're writing an application that requires that more than one person access/update the data simultaneously, you'll want transactions, or else you can deal with the mangled db.
Good luck. I'd suggest you read Philip and Alex's Guide to Web Publishing for more info on why this is important. You should also check out Mr. Greenspun's clever ArsDigita Community System -- it does almost everything that a web/db application needs to do, and it caches db connections.
Many people write web application. These application don't need all the extra goodies of a (more or less) complete SQL implementation.
The big differences are:
1) Transaction management
2) Stored procedures
Take for example a "Slashdot" system. This system doesn't require very much of the database.
It doesn't need transactions. Most action is making selects on texts to view. After that the
"transaction" is complete. The second most use option is probably submitting a text (such as this). This can be done with a single insert command. The last option is updating user settings. This may require transaction handling: what happens if the user is "logged in" twice and changes his user settings at the same time? This situation is probably considered "unlikely" or an "acceptable bug". From this point of view, Slashdot is just like many other web-sites: all it requires is speed.
However, I you try to build a real world financial administration,you cannot live with such "acceptable bugs". This requires more, most important the two features mentioned above.
Now the question is whether many organisation would allow such critical data to live on PostgreSQL. System administrators are used to Oracle, Sybase, Informix etc. These names they trust. PostgreSQL only has become reasonably stable in the last 1.5 year and still has to build its name. Only a few months ago some critical features (e.g. row-locking) have been implemented to make it feasible to make such an administration system. The name still has to grow, but has a great potential. After it has done so, you will find more comments of people being enthousiastic about PostgreSQL.
I have done as much research as Google could point me to and as far as i can see MySQL screams but it is not feature rich. Here is the summary of an afternoon (last week) of research (Disclaimer: I still have to do benchmarks myself but this is a start.)
MySQL is fast but it lacks some important stuff that we decided we couldn't live with out.
Postgres on the other hand was developed in a very academic enviroment and is extremly feature rich. The major draw back to postgres I could see was that it doesn't scale very well and i am not sure if 6.5 addressed any of these issues. That being said if the db won't scale then it really doesn't matter how many features you have.
They guys who developed MySQL have a really cool and usfull tool on there web site that runs a script and lets you know exactly (It is VERY detailed) what features each db has.
I strongly suggest anyone insterested in benchmarking features of dbs check this out. db compare
We decided that we would use postgres until we do our own benchmarks. It may scale enough for what we need, the load shouldn't be too high for our use.
more info would be great, also the timeline for MySQL to incorporate inner selects would also be handy cause i know they havbe it at the tops of the features they are going to implement.
Although MySQL lacks some features the Postgres has, MySQL is fully threaded whereas Postgres is not. In short, Postgres cannot perform parallel operations; MySQL can operate on different tables simultaneously.
However, what are the real comparison numbers? Are we talking 3-4% on 1 million rows?
If PostgreSQL is only slightly slower that MySQL, I would consider switching.
A.
--
Adam Sherman
--
Adam Sherman
Freelance Geek
Too bad the PostgreSQL documentation is so misleading about commercial databases not having this feature. Oracle has it. So does Solid.
Excerpt:
--
OS lover
OS lover
this:
PostGresQL:
undocumented bugs, inconsistent behavior
from one release to the next. Not reliable.
MySQL: The opposite.