First "Real" Benchmark for PostgreSQL

I do not think it means what you think it means. by Control+Group · 2007-07-09 08:17 · Score: 5, Insightful

however the test-hardwares of the other DB systems are somewhat different

Which makes the results pretty much useless. But, being the intrepid slashdotter I am, I went ahead and R'ed the FA anyway, in case I could glean some useful information from it.

Which revealed that the linked article doesn't actually contain any information whatsoever about Oracle* or MySQL, much less benchmarks on named hardware.

So...what am I supposed to get out of this, again? Or is this just supposed to be some kind of PostgreSQL love-in, so I should take my wet blanket elsewhere?

*Well, the second link contains someone claiming that Oracle is only 15% faster...but without providing any actual data.

--

Reality has a conservative bias: it conserves mass, energy, momentum...

Bad firehose! by greg1104 · 2007-07-09 08:27 · Score: 5, Informative

Why this emaciated post made it while mine didn't I'll never know...here's how I submitted this story:

The current version of PostgreSQL now has its first real benchmark, a SPECjAppServer2004 submission from Sun Microsystems. The results required substantial tuning of many performance-related PostgreSQL parameters, some of which are set to extremely low values in the default configuration — a known issue that contributes to why many untuned PostgreSQL installations appear sluggish compared to its rivals. The speed result is close but slightly faster than an earlier Sun submission using MySQL 5 (with enough hardware differences to make a direct comparison of those results unfair), and comes close to keeping up with Oracle on similarly priced hardware — but with a large software savings. Having a published result on the level playing field of an industry-standard benchmark like SPECjAppServer2004, with documentation on all the tuning required to reach that performance level, should make PostgreSQL an easier sell to corporate customers who are wary of adopting open-source applications for their critical databases.

Re:Bad firehose! by MrNaz · 2007-07-09 08:33 · Score: 5, Insightful

I like yours better. The Slashdot editors need to have their balls cut off if they think the post that beat your onto the front page is better. Feel free to mod me down any time for bitching about this, but seriously, this post is SO much better than the one that made it.

--
I hate printers.

Re:on the playground... by CaseCrash · 2007-07-09 08:29 · Score: 5, Funny

boys use mysql men use PostgreSQL, and _____ use MSSQL. and people who like to collect a paycheck use MSSQL.

--
No, that link you posted to a web comic we've all seen a hundred times is not "obligatory."

Re:I do not think it means what you think it means by KillerCow · 2007-07-09 08:34 · Score: 5, Informative

I think that somebody sent the wrong link and (surprise!) the editors didn't even follow it to check.

Here's a more useful one: All SPEC jAppServer2004 Results Published by SPEC

The benchmarks aren't standardized enough for any useful comparison. The hardware and configurations vary in almost every one.

Re:I do not think it means what you think it means by Ungrounded+Lightning · 2007-07-09 08:35 · Score: 4, Insightful

however the test-hardwares of the other DB systems are somewhat different

Which makes the results pretty much useless.

Not necessarily.

It's essentially useless for separating out how much of the performance difference is the result of the software's design, implementation, and tuning versus how much is due to the platform differences.

But such tests CAN be used to examine the performance of competing ENTIRE SYSTEMS, to inform choices between them.

They say: "Oracle on does THIS well, PostgreSQL on can be tuned so it does THAT well on the same benchmark."

This lets administrators (presuming they have access to the hardware info) get a bang-for-the-buck comparison.

For the rest of us, the interesting point is that PostgreSQL, running on its team's idea of realistic hardware, can produce performance in the same ballpark as Oracle running on Oracle's choice of hardware.

(Whether the necessary remaining data (what are hardwares x and y? how was PostgreSQL tunde) is published now, later, or never, is a separate issue. B-) )

--
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way

Good, but it can be improved. by khasim · 2007-07-09 08:39 · Score: 5, Interesting

Get people from each group, give them the requirements and 5 different dollar amounts.

Let each team setup their systems how ever they want at each price point. Some will go with clustered servers. Some will go a single monster server. They know their products best so they'll be the ones best suited to choosing the configuration.

Then run the benchmarks. And keep hammering on them until AFTER the next patch release.

Yeah, it might run fast, but still be a bitch to patch/upgrade.

At $5,000 you might find that a cluster of MySQL boxes beats everything.

At $10,000 maybe something else is best.

$25,000

$50,000

$100,000

etc.

And finally, break it. Break it bad. What happens when something goes wrong? Oracle might cost a lot, but if they can come through with your data they might just be worth it.

If nothing else, you'll get the "best practices" nicely demonstrated by each group. :)

Re:Good, but it can be improved. by suv4x4 · 2007-07-09 09:27 · Score: 5, Funny

Get people from each group, give them the requirements and 5 different dollar amounts.
Let each team setup their systems how ever they want at each price point. Some will go with clustered servers. Some will go a single monster server. They know their products best so they'll be the ones best suited to choosing the configuration.

We gave each team $10000 and told them to build the best hardware and db setup they can:

** PostgreSQL got a small IBM Blade quad machine redundant setup:
"We're relying on standard industry solutions and reliability."

** MySQL clustered 4 PlayStation3 machines and wasted the rest on booze and women:
"We're practical, plus we know what is money best spent on!".

** Oracle purchased a 1200 square foot datacenter and installed a megacluster of 8132 quad-Xeon 64GB RAM 4TB disk machines. With $10'000...?!
"We... uhmm... we hit a great bargain, guys! You wouldn't believe it, but it's true!"

Re:on the playground... by morgan_greywolf · 2007-07-09 08:43 · Score: 5, Informative

DB2? It's only useable on large mainframes (big iron, so to speak) from what I understand.

Um, no. DB2 these days runs on most major UNIX variants (HP-UX, Solaris, AIX, IRIX, etc.), Linux and Windows. It's used quite often, in fact. Most Enovia/VPM installations use DB2 backends, for instance. Modern versions use XML along with regular relational database stores and are very, very up-to-date technology-wise. Very scalable.

--
My blog

Re:Mod parent way up! by Vellmont · 2007-07-09 08:53 · Score: 4, Insightful

You cannot compare benchmarks without SOMETHING standard between them.

The thing that's standard is the benchmarking software.

If I were to buy a database server, do I really care which component of the solution is providing me with the great performance, or do I just want the performance? At the end of the day the only thing that really matters is the performance that comes out of the box.

It doesn't really matter if "Postgresql" is faster than "MySQL", because they always run on a certain physical computer. What matters is "I need to accomplish X,Y and Z. I have A dollars to spend. Which solutions accomplishes X, Y and Z the best within my budget? You can't separate the software from the hardware and get an answer that's very meaningful.

This benchmark isn't the last word on anything. Even a benchmark run on the exact same hardware means very little if you have a 2 core machine instead of 8.

--
AccountKiller

DUH! by Slashdot+Parent · 2007-07-09 09:00 · Score: 4, Insightful

Why this emaciated post made it while mine didn't I'll never know. Yours wasn't posted because it didn't contain enough flamebait.

--
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock

Re:performance isn't the issue by pavera · 2007-07-09 09:25 · Score: 4, Informative

I've been using both MySQL and PostgreSQL for 7 years now... I've never felt a need for "developer support". Or if that phrase means "documentation + message boards" I've never felt a lack from postgresql. I also really don't get the PostgreSQL is hard to use argument. In every linux distro I've used in the last 7 years (redhat, centos, suse, fedora core, ubuntu) they are exactly the same... IE apt-get, yum, whatever, install postgresql-server/mysql-server. Then $startupscriptdir/postgresql start....

Do people really get that hung up on using postgres as the initial user instead of root?!? That is the ONLY difference that I can see.

Now I can agree that maybe postgresql doesn't really "target" an audience, but that is also because they are a true open source project. They don't have a commercial version. I really don't think Linus sits around saying "Ok, we need to add this feature so that more fortune 500's will adopt linux". Or "we need to add feature xyz so this will appeal to small businesses". MySQL has a "target" audience because they are selling something. If you aren't selling anything, by definition you don't have a target.

Re:performance isn't the issue by LurkerXXX · 2007-07-09 09:32 · Score: 4, Insightful

Actually I think PostgreSQL might have eaten a lot more of the MySQL market if they'd simply been faster to market with better admin tools and Windows support.

Lots of folks went with MySQL early because of those factors. They also then tended to write all their PHP, etc, applications to only talk to MySQL, thus making folks who might have preferred PostgreSQL use MySQL to run the app that they needed to run. Once that happens you are kind of in a Catch-22 place. Folks won't write the apps for PostgreSQL until it's used by a larger chunk of the market, but it won't take that large chunk because all the 'cool' apps were written MySQL only, so they have to run MySQL instead of PostgreSQL

Which MySQL? by itsdapead · 2007-07-09 09:33 · Score: 4, Interesting

MySQL is modular - pick'n'mix data storage engines sharing a SQL front end. I can't find the bit in TFA that says which one they compared.

I've always suspected that most MySQL vs Postgres flame wars are based on comparing Postgres with the speed of MySQL/MyISAM (No transactions or relational integrity checks - so, big surprise, dead fast for simple queries) and then waving MySQL/InnoDB around when the functionality issue is raised.

MySQL/MyISAM hits the speed/functionality sweet spot for LAMP data-driven websites, is supported by lots of free webapp software and offered by most decent web hosting services. Comparing it speedwise with Postgres has always been pointless, though. If Postgres has caught up, colour me impressed, but if they're pro-Postgres I bet they're comparing with MySQL/InnoDB (which is a bit closer to like-with-like).

Never quite seen the point of MySQL/InnoDB really - all the advantages of MySQL/MyISAM minus the speed, support by popular webapps, availbility on low-cost hosts... and still lacks the features of Postgres.

--
In a survey of 100 programmers, 111111 thought that duck-typing was a good idea.

Re:Mod parent way up! by Doctor+Memory · 2007-07-09 09:40 · Score: 5, Interesting

Inserting 20 million rows, all simple inserts, only one primary key (int) with autoincrement for mysql and a sequence for postgres:
Avg Mysql time per 1000 inserts: 3 seconds
Avg Postgres time per 1000 inserts: 15 seconds (and gets worse over time) OK, now do a seven-table join, including a self-join with a correlated subquery (MySQL does those now, right?). I think everybody knows by now that MySQL is pretty much untouchable as long as all you're doing is simple single-table stuff. Kind of like comparing a pickup truck to a moving van: if all you're doing is moving a couple of boxes around, then the pickup kicks. But when you need to move serious loads, then it's the pickup that gets to sit by the curb...

--
Just junk food for thought...

Re:Mod parent way up! by jaredmauch · 2007-07-09 09:54 · Score: 5, Interesting

AC, sorry, but I have a postgres install working where I get 70k inserts/second or more with a single index on the table during the day. The first insert of course is faster as the index doesn't exist yet. I'm not sure what you're doing, but I can tell you that I have tuned postgres by increasing some simple parameters. If you're using some Linux package, you're likely not seeing the benefits that are possible by stuff like changing the block size parameter in the source. Yes, it's kinda lame you have to do this, but at the same time, it's not too unreasonable. I'd like it if I could set this larger.

This is on 'decent' hardware running Solaris 10 (amd64). Obviously you need to tweak stuff like wal size, checkpoints, etc.. But getting this type of performance is not hard to do. I can scan an hours worth of data in a short amount of time. Each one of these 'hourly' tables contains roughly 30-32M rows. this is nothing to sneeze at from what I can tell. I haven't had a reason to re-evaluate mysql to see if there are enough tweaks to make it perform similarly, but if you're getting the crappy insert rate that you're talking about, you clearly need to change something as you're doing it wrong if you truly care about performance. E-Mail me if you're interested in my postgresql config files. I'm happy to share to minimize the FUD out there.

I once did benchmarking by Stinking+Pig · 2007-07-09 10:15 · Score: 5, Interesting

I worked for a company whose product ran on MS-SQL, PostgreSQL, and Oracle. Should I explain why we didn't support MySQL or not? It'll draw fanboys either way. I used the same server, reinstalled the OS (Red Hat Enterprise 3 or Windows 2000) and database between each test, and rebuilt the application server to be extra sure.

Since it was more difficult to write Oracle-compliant SQL and we didn't have a lot of Oracle customers, the developers didn't care to spend time on it, and our stuff ran about 20 percent slower there. That's after a lot of tuning time, it was 50% slower on a default install. Oracle 9 took two days to install and tune, plus another two days of preparation. I was particularly underwhelmed that I had to deal with stupid errors like tarballs that extracted onto themselves and assumptions about the shell being used. At the time, Oracle was a very Solaris-like experience; user-unfriendly to the extreme.

Postgresql 7 ran great; it was neck and neck with MS-SQL in all tests, after proper tuning, and 30 percent slower on a default install. Postgres took half a day to install and tune, but it took me a week and conversation with the postgres mailing lists to find out what needed tuning. Still, we were able to put together a document that took users from bare metal to RHEL+Postgres in four hours if they had all their media handy.

Microsoft SQL Server 2000 ran great with no tuning at all, and took fifteen minutes to install. It also cost as much as paying me to do the entire set of tests. OS installation/patching times and tested workloads were the same for all three tests.

YMMV.

--
"Nothing was broken, and it's been fixed." -- Jon Carroll

Re:Mod parent way up! by ctr2sprt · 2007-07-09 11:40 · Score: 4, Informative

You can do way better than that with PostgreSQL, at least, and I suspect with MySQL as well. I wrote a benchmark similar to yours, but a good bit more complex. I had two tables, one of which was seeded and another which was populated by the benchmark. The benchmark table had six columns (int, timestamp, 4x bigint), a primary key (int + timestamp), four check constraints (on the bigints), a foreign key constraint (int, to the seeded table), and two indexes (one int, one timestamp). I would do a commit every 75k rows, with 24 such commits per iteration and 30 passes per benchmark run, so 54 million rows total. I also used a thread pool, and there are two reasons for that. First, some amount of parallelism improves DB performance. Second, it more accurately simulated our predicted usage patterns of the database. We ran my benchmark against PSQL and IBM DB2.

The results were interesting (at least, I thought they were). First, PSQL can only handle about 10 threads doing work at once. Past about 10 threads, the DB completely falls apart. DB2, however, could handle more busy threads than Linux could, with a very gradual (and linear) degradation in performance past about 25 threads. I stopped testing at 100 threads. Second, PSQL's inserts per second (IPS) rate cut in half by the end of the bechmark. DB2 followed a similar trend until about 5 million rows, at which point IPS went up to where it started and stayed there without moving. Third, DB2 was I/O-bound, whereas PSQL was CPU-bound. I suspect it's why DB2 was able to handle an order of magnitude greater concurrency: more threads just meant the CPUs had something to do while waiting on the disks. However, it does mean that PSQL might do better with faster CPUs, whereas DB2 would not (it'd just be able to handle more threads).

And the numbers: DB2 averaged 1100 IPS, PSQL 600. Note that for the first million rows or so PSQL was faster: it just eventually dropped down to ~400 IPS after ten million rows or so, killing the average. Of course, since this table would never have fewer than 54M rows - actually, it would typically have 160M - the IPS I got at the end was the one that mattered. Also, this was on a pretty weak server, at least for this kind of workload. With more (and faster) cores, more memory, and more spindles, I'm pretty sure you could increase those numbers by 50% or more. With tuning, perhaps that much again.

Re:Mod parent way up! by arivanov · 2007-07-09 18:39 · Score: 4, Interesting

Who cares if MySQL does them or not. Show me the developers that can both develop applications and do SQL. That is a dieing breed. Most developers nowdays go for a really trivial schema and an abstraction layer. At that point the only thing that matters is row speed on simple table operations and there MySQL or in-memory OO database frameworks with a simple backing store wipe the floor. This is the reality of life. And it is not going to get better. If you look at the books on the market the only book that used to teach "proper" SQL (with joins and the lot) strictly from the context of application development was the old DB2 bible. It has not been reprinted since the late 90-es. All the rest that is out there is either heavily slanted toward the app side or towards the DB side (usually the latter). Add to that the fact that many universities try to teach "real life software engineering skills" instead of proper data structure and data manipulation classes and the picture is complete: http://www.joelonsoftware.com/articles/ThePerilsof JavaSchools.html. Add to that the fact that DBDs when you actually corner them to ask something meaningfull answer with SQL technobabble like in your post. To the average developer it sounds like fortran. And if it looks like fortran, walks like fortran and talks like fortran it gotta be fortran. From the point of view of a average software engineer SQL and especially stored procedures look like a blast from the past. He expects to see objects, constructors, destructors, private and public structures. And what does he see? He sees something that looks like written by his grandparents. As a result he turns around and starts doing delete/insert/last_insert_id instead of replace and sequential deletions in software instead of foreign keys. I have tried in the past to work with developers who write commercial apps on top of SQL to optimise their code. And I have wanted to scream all along. In 95% of the cases you deal with either one of the following:

A nice schema designed once upon a time properly by a proper DBD that is vandalised in the application abstraction layer because the developers are are sorely pissed off by the endless wingeing of the SQL server and/or its abissmal performance. So they take the matters in their own hands and violate ACID by cashing and bypassing restrictions in the app. Sooner or later someone comes around and says - WTF, why don't we rewrite this all in software and sod off the expensive database. And surprise surprise it ends up being done in MySQL.
An abissmal schema or no schema at all where all restrictions are done in the app. That is MySQL country all the way.

MySQL is a result of the way current software development is taught and done. Unless Jo the Average Developer starts understanding how to use SQL in his application (and he does not) and unless SQL data representation grows up to modern non-fortran-like OO semantics MySQL will proliferate. And if you think that MySQL is bad think twice. There are the object persistence frameworks and in-memory crap that follow in its wake.

--
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/

Re:Mod parent way up! by arivanov · 2007-07-09 21:04 · Score: 4, Insightful

That is the problem - it does not in real life. Application works in developer hands, goes out in the field and breaks (seen that one time too many). Millions if not billions of dollars have been put to make sure that RDBMS transactions are atomic and preserve data integrity. No application level interface abstraction has ever afforded the expense and could ever afford the expense to do that. In every single instance I have looked at application developers replacing SQL ACID with "bake-their-own" system I have found cases of data integrity violations. In modern multithreaded (or web server based) apps the most common result from this is race conditions which are probably the hardest to debug problem in software.

The other common problem in using application level abstractions is performance. Once again - works in developer hands, goes in the field, gets real data loaded in it and all hell breaks lose. Similar reasons to ACID as the next biggest investment after data integrity in a database is in its ability to fine-grain lock data objects. If a developer tries to replace RDBMS locking in the application layer, he usually ends up with higher granularity lock that is more contended. In addition to that to avoid race conditions, developers usually deliberately create a bottleneck by muxing all RDBMs access to a single thread and a single access point to simplify locking. In fact probably one of the most beneficial uses of MySQL is its ability to support server-based fine-grained locks that are not tied to a specific data object. You can use these in global semaphors and global locking even in cases where POSIX locks do not work (f.e. across clusters).

Overall, yeah, MySQL and your app "already works". For Proof of Concept - maybe (in fact I use it myself). For real stuff - no, not really, unless you put a lot of work in the application layer. I have done that on quite a few occasions and the performance gains can be staggering compared to ACIDising your brain with a proper RDBMS, but the effort is hardly worth it in most real life scenarios. It also makes it considerably less maintainable.

--
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/

Slashdot Mirror

First "Real" Benchmark for PostgreSQL

20 of 275 comments (clear)