First "Real" Benchmark for PostgreSQL
anticlimate writes "A new benchmark published on SPEC shows PostgreSQL's performance approaching that of Oracle's and surpassing or on par with MySQL (however the test-hardwares of the other DB systems are somewhat different). The test was put together by PostgreSQL's core developers working at Sun. They certainly are not unbiased, but this is the first 'real' benchmark with PostgreSQL — according to Josh Berkus's blog. The main difference compared to earlier benchmarks (and anecdotes) seems to be the tuning of PostgreSQL."
Because Sun systems will always be different from the x86 based cores that run MySQL and Oracle, I think the best way to compare such software would be by constructing servers of equal price and seeing how PostgreSQL fares. The true question on any business person's mind is "how much to implement?"
Well, back to rejecting software patent applications.
For those of us who don't have dozens of hours to do the necessary research, can some postgresql gurus sum up some of the most significant tuning parameters so us mere mortals can see similar performance gains?
I realize that a large part of the answer is going be "it depends on application, your hardware, and you query types", but surely there must be some general tips that we can follow given various typical setups. MySQL, for example, ships with several different configuration files: One suitable for a small installation, one for a mid-sized installation, one for large installation, etc.
What tuning can someone do to tune postgresql's default (conservative) config file to make it perform better?
Get people from each group, give them the requirements and 5 different dollar amounts.
:)
Let each team setup their systems how ever they want at each price point. Some will go with clustered servers. Some will go a single monster server. They know their products best so they'll be the ones best suited to choosing the configuration.
Then run the benchmarks. And keep hammering on them until AFTER the next patch release.
Yeah, it might run fast, but still be a bitch to patch/upgrade.
At $5,000 you might find that a cluster of MySQL boxes beats everything.
At $10,000 maybe something else is best.
$25,000
$50,000
$100,000
etc.
And finally, break it. Break it bad. What happens when something goes wrong? Oracle might cost a lot, but if they can come through with your data they might just be worth it.
If nothing else, you'll get the "best practices" nicely demonstrated by each group.
I can't speak for complex queries, but here are some simple findings from my testing:
.4 seconds per 1000 inserts), but didn't test that with postgres. Decided I only needed a static table anyway, so mysql was more ideal for it's speed in my situation. Sure with I had used postgres on a few more complicated projects though...
Inserting 20 million rows, all simple inserts, only one primary key (int) with autoincrement for mysql and a sequence for postgres:
Avg Mysql time per 1000 inserts: 3 seconds
Avg Postgres time per 1000 inserts: 15 seconds (and gets worse over time)
That was after increasing the caches and disabling fsync on postgres too.
I also did a delete then insert for both (to flush out already existing rows), with similar results.
Ended up just inserting into a mysql table with no keys/indexes at all for maximum speed (average of
Posting AC to avoid flames.
Performance isn't what causes a lack of acceptance in the marketplace for PostgreSQL.
The problem is twofold:
MySQL, as others have pointed out, has better developer support and they know their target audience. They supply a fast, easy to use database for those who don't need a whole lot.
Oracle supplies an enterprise level database that MySQL doesn't aspire to. PostgreSQL doesn't know where to fit in.
Do a little investigation on setting up PostgreSQL with fault tolerance and replication and you'll quickly see why large corporations cough up money for Oracle. Performance is one aspect of the price tag, but it is certainly not the only factor.
MySQL is modular - pick'n'mix data storage engines sharing a SQL front end. I can't find the bit in TFA that says which one they compared.
I've always suspected that most MySQL vs Postgres flame wars are based on comparing Postgres with the speed of MySQL/MyISAM (No transactions or relational integrity checks - so, big surprise, dead fast for simple queries) and then waving MySQL/InnoDB around when the functionality issue is raised.
MySQL/MyISAM hits the speed/functionality sweet spot for LAMP data-driven websites, is supported by lots of free webapp software and offered by most decent web hosting services. Comparing it speedwise with Postgres has always been pointless, though. If Postgres has caught up, colour me impressed, but if they're pro-Postgres I bet they're comparing with MySQL/InnoDB (which is a bit closer to like-with-like).
Never quite seen the point of MySQL/InnoDB really - all the advantages of MySQL/MyISAM minus the speed, support by popular webapps, availbility on low-cost hosts... and still lacks the features of Postgres.
In a survey of 100 programmers, 111111 thought that duck-typing was a good idea.
Now there's a good point. Is there any good documentation out there on databasing on what is the best solution for what kind of problem? Particularly, I would like to see something between flat-files and DBMS and then between different DBMS.
Though, technically, I'm stuck between whatever our 'host' is giving us (currently PostgreSQL, but usually MySQL, and I dabbled with MSSQL when an intern for the Government), I'd like to know when it might be better to actually just store some stuff in a flat file or even use an XML DB (which one collage professor loved to talked about as a holy grail).
Outside of "Basic Database Theory and Generic SQL" books, a programming class on Databases (how to build tree structures and navigate them, etc), I've never really got any 411 on the benefits of using MSSQL, Oracal, MySQL, PostgreSQL, etc and I've never gotten the change to learn more about advanced database features such as stored procedures or functions or whatever they might be called and what they're good for.
Recommended books, anyone?
Cheers,
Fozzy
p.s. I work mostly with internet applications.
"The past was erased, the erasure was forgotten, the lie became truth." ~1984 George Orwell
Avg Mysql time per 1000 inserts: 3 seconds
Avg Postgres time per 1000 inserts: 15 seconds (and gets worse over time) OK, now do a seven-table join, including a self-join with a correlated subquery (MySQL does those now, right?). I think everybody knows by now that MySQL is pretty much untouchable as long as all you're doing is simple single-table stuff. Kind of like comparing a pickup truck to a moving van: if all you're doing is moving a couple of boxes around, then the pickup kicks. But when you need to move serious loads, then it's the pickup that gets to sit by the curb...
Just junk food for thought...
This is on 'decent' hardware running Solaris 10 (amd64). Obviously you need to tweak stuff like wal size, checkpoints, etc.. But getting this type of performance is not hard to do. I can scan an hours worth of data in a short amount of time. Each one of these 'hourly' tables contains roughly 30-32M rows. this is nothing to sneeze at from what I can tell. I haven't had a reason to re-evaluate mysql to see if there are enough tweaks to make it perform similarly, but if you're getting the crappy insert rate that you're talking about, you clearly need to change something as you're doing it wrong if you truly care about performance. E-Mail me if you're interested in my postgresql config files. I'm happy to share to minimize the FUD out there.
I worked for a company whose product ran on MS-SQL, PostgreSQL, and Oracle. Should I explain why we didn't support MySQL or not? It'll draw fanboys either way. I used the same server, reinstalled the OS (Red Hat Enterprise 3 or Windows 2000) and database between each test, and rebuilt the application server to be extra sure.
Since it was more difficult to write Oracle-compliant SQL and we didn't have a lot of Oracle customers, the developers didn't care to spend time on it, and our stuff ran about 20 percent slower there. That's after a lot of tuning time, it was 50% slower on a default install. Oracle 9 took two days to install and tune, plus another two days of preparation. I was particularly underwhelmed that I had to deal with stupid errors like tarballs that extracted onto themselves and assumptions about the shell being used. At the time, Oracle was a very Solaris-like experience; user-unfriendly to the extreme.
Postgresql 7 ran great; it was neck and neck with MS-SQL in all tests, after proper tuning, and 30 percent slower on a default install. Postgres took half a day to install and tune, but it took me a week and conversation with the postgres mailing lists to find out what needed tuning. Still, we were able to put together a document that took users from bare metal to RHEL+Postgres in four hours if they had all their media handy.
Microsoft SQL Server 2000 ran great with no tuning at all, and took fifteen minutes to install. It also cost as much as paying me to do the entire set of tests. OS installation/patching times and tested workloads were the same for all three tests.
YMMV.
"Nothing was broken, and it's been fixed." -- Jon Carroll
8.1 automatically does table maintenance (vacuum, stats, etc.)- vacuuming.html#AUTOVACUUM
http://www.postgresql.org/docs/8.2/static/routine
You can tune it as to when to do such operations, or what max percent of normal I/O can be used for those tasks.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
When you see benchmarks run and comparisons made between different databases that are conducted by a single person or company, all things are seldom equal. The people involved have more expertiese with this db package and less with that
For these published tests, Sun spent a lot of time optimizing postgress to make it fly
So, all things ARE equal, in relation to the database tuning & setup - both were done by experts with the respective packages, and we don't have to wonder if some yahoo screwed things up, intentionally or otherwise, by missing a basic configuration option.
The guidelines are pretty simple, but highly subjective.
Here are your options:
A. flat file
B. key-indexed (berkley db, dbm, gdbm)
C. feature/config light db (sqlite)
D. feature/config heavy db (postgrsql, oracle)
* Use A for configuration
* Use A if you have a small amount of data and can slurp it all into memory.
* Use B if you only access data via a known unique value and never compare records.
* Never use either A or B if the data is shared between processes unless file locking is okay.
For anything else, use C or D. If you've got a lot of familiarity with D, sometimes it's easier to use D than C, even if D seems like overkill. My recommendation is to get familiar with C and then worry about D once you understand the limitations of C. It'll help you appreciate what D gives you when you're ready for it.
*sigh* back to work...
- A nice schema designed once upon a time properly by a proper DBD that is vandalised in the application abstraction layer because the developers are are sorely pissed off by the endless wingeing of the SQL server and/or its abissmal performance. So they take the matters in their own hands and violate ACID by cashing and bypassing restrictions in the app. Sooner or later someone comes around and says - WTF, why don't we rewrite this all in software and sod off the expensive database. And surprise surprise it ends up being done in MySQL.
- An abissmal schema or no schema at all where all restrictions are done in the app. That is MySQL country all the way.
MySQL is a result of the way current software development is taught and done. Unless Jo the Average Developer starts understanding how to use SQL in his application (and he does not) and unless SQL data representation grows up to modern non-fortran-like OO semantics MySQL will proliferate. And if you think that MySQL is bad think twice. There are the object persistence frameworks and in-memory crap that follow in its wake.Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/