First "Real" Benchmark for PostgreSQL
anticlimate writes "A new benchmark published on SPEC shows PostgreSQL's performance approaching that of Oracle's and surpassing or on par with MySQL (however the test-hardwares of the other DB systems are somewhat different). The test was put together by PostgreSQL's core developers working at Sun. They certainly are not unbiased, but this is the first 'real' benchmark with PostgreSQL — according to Josh Berkus's blog. The main difference compared to earlier benchmarks (and anecdotes) seems to be the tuning of PostgreSQL."
however the test-hardwares of the other DB systems are somewhat different
Which makes the results pretty much useless. But, being the intrepid slashdotter I am, I went ahead and R'ed the FA anyway, in case I could glean some useful information from it.
Which revealed that the linked article doesn't actually contain any information whatsoever about Oracle* or MySQL, much less benchmarks on named hardware.
So...what am I supposed to get out of this, again? Or is this just supposed to be some kind of PostgreSQL love-in, so I should take my wet blanket elsewhere?
*Well, the second link contains someone claiming that Oracle is only 15% faster...but without providing any actual data.
Reality has a conservative bias: it conserves mass, energy, momentum...
You cannot compare benchmarks without SOMETHING standard between them.
Okay, if they can't match the hardware (why not?) then focus on price points. I notice that they're looking at "$65,500 for the hardware". That's a LOT of hardware at today's prices.
I'm sure MySQL would (and will) come back with a "benchmark" on hardware costing $10,000.
There is nothing "real" about this "benchmark".
Because Sun systems will always be different from the x86 based cores that run MySQL and Oracle, I think the best way to compare such software would be by constructing servers of equal price and seeing how PostgreSQL fares. The true question on any business person's mind is "how much to implement?"
Well, back to rejecting software patent applications.
To paraphrase an old saying:
There are lies, damned lies and benchmarks.
Bearded Dragon
Why this emaciated post made it while mine didn't I'll never know...here's how I submitted this story:
The current version of PostgreSQL now has its first real benchmark, a SPECjAppServer2004 submission from Sun Microsystems. The results required substantial tuning of many performance-related PostgreSQL parameters, some of which are set to extremely low values in the default configuration — a known issue that contributes to why many untuned PostgreSQL installations appear sluggish compared to its rivals. The speed result is close but slightly faster than an earlier Sun submission using MySQL 5 (with enough hardware differences to make a direct comparison of those results unfair), and comes close to keeping up with Oracle on similarly priced hardware — but with a large software savings. Having a published result on the level playing field of an industry-standard benchmark like SPECjAppServer2004, with documentation on all the tuning required to reach that performance level, should make PostgreSQL an easier sell to corporate customers who are wary of adopting open-source applications for their critical databases.
No, that link you posted to a web comic we've all seen a hundred times is not "obligatory."
For those of us who don't have dozens of hours to do the necessary research, can some postgresql gurus sum up some of the most significant tuning parameters so us mere mortals can see similar performance gains?
I realize that a large part of the answer is going be "it depends on application, your hardware, and you query types", but surely there must be some general tips that we can follow given various typical setups. MySQL, for example, ships with several different configuration files: One suitable for a small installation, one for a mid-sized installation, one for large installation, etc.
What tuning can someone do to tune postgresql's default (conservative) config file to make it perform better?
DB2? It's only useable on large mainframes (big iron, so to speak) from what I understand. Generally speaking only the REALLY large shops would use it, so I wouldn't be surprised you'd never seen it - neither have I. We're a pretty big organisation where I work, and we have a mix of Oracle, MSSQL, and Sybase servers.
For a site about things like basic rights, Slashdot users sure do like to censor "dissent".
I think that somebody sent the wrong link and (surprise!) the editors didn't even follow it to check.
Here's a more useful one: All SPEC jAppServer2004 Results Published by SPEC
The benchmarks aren't standardized enough for any useful comparison. The hardware and configurations vary in almost every one.
however the test-hardwares of the other DB systems are somewhat different
Which makes the results pretty much useless.
Not necessarily.
It's essentially useless for separating out how much of the performance difference is the result of the software's design, implementation, and tuning versus how much is due to the platform differences.
But such tests CAN be used to examine the performance of competing ENTIRE SYSTEMS, to inform choices between them.
They say: "Oracle on does THIS well, PostgreSQL on can be tuned so it does THAT well on the same benchmark."
This lets administrators (presuming they have access to the hardware info) get a bang-for-the-buck comparison.
For the rest of us, the interesting point is that PostgreSQL, running on its team's idea of realistic hardware, can produce performance in the same ballpark as Oracle running on Oracle's choice of hardware.
(Whether the necessary remaining data (what are hardwares x and y? how was PostgreSQL tunde) is published now, later, or never, is a separate issue. B-) )
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
I dunno, I kinda like MSSQL. Hell, I use it alongside MySQL servers for my own projects (that, and having support for multiple platforms in your product is kinda a good idea). Sure, it's got horrific licensing (nowhere near as bad as Oracle's, though) but other than that, it's pretty good and reliable. I get the impression that the core of it wasn't written by MS way back when, though. And it sure wasn't built by the Windows team.
For a site about things like basic rights, Slashdot users sure do like to censor "dissent".
Get people from each group, give them the requirements and 5 different dollar amounts.
:)
Let each team setup their systems how ever they want at each price point. Some will go with clustered servers. Some will go a single monster server. They know their products best so they'll be the ones best suited to choosing the configuration.
Then run the benchmarks. And keep hammering on them until AFTER the next patch release.
Yeah, it might run fast, but still be a bitch to patch/upgrade.
At $5,000 you might find that a cluster of MySQL boxes beats everything.
At $10,000 maybe something else is best.
$25,000
$50,000
$100,000
etc.
And finally, break it. Break it bad. What happens when something goes wrong? Oracle might cost a lot, but if they can come through with your data they might just be worth it.
If nothing else, you'll get the "best practices" nicely demonstrated by each group.
Um, no. DB2 these days runs on most major UNIX variants (HP-UX, Solaris, AIX, IRIX, etc.), Linux and Windows. It's used quite often, in fact. Most Enovia/VPM installations use DB2 backends, for instance. Modern versions use XML along with regular relational database stores and are very, very up-to-date technology-wise. Very scalable.
My blog
MySQL 5 on AMD Opteron 285
The UltraSPARC has 8 cores on 1 chip and 16GB of memory.
The Opteron has 4 cores and 8GB of memory.
The UltraSPARC should smoke it every time.
nope. It's h4rdw4r3z!
My blog
Won't you guys agree, "elephant" doesn't exactly communicate "fast and modern" very well.
"Dolphin" comes a bit closer.
Who's coming up with those logos?
Oh, I agree. A benchmark of whole systems can be just as (or more) useful as a benchmark of individual pieces of software, depending on what your goals are.
But what's been presented here isn't even that. Links #1 takes us to a SPEC benchmark of PostgreSQL. It doesn't provide any information about anything else; there isn't anything to compare the benchmark to. Link #2 provides an unreferenced statement about Oracle's marginally superior performance on much more expensive equipment.
So, perhaps, one can begin to draw conclusions about PostgreSQL vs Oracle in the contexts of full systems. But neither link #1 nor link #2 provide any information about MySQL (except the quote: "[t]his publication shows that a properly tuned PostgreSQL is not only as fast or faster than MySQL").
Really, my criticism isn't of the benchmark (the data are the data, after all) or of the blog (one expects a vested PostgreSQL interest to comment on such a benchmark), but of the blurb here that either a) draws totally unwarranted conclusions, or b) depends on information it doesn't bother sharing.
Reality has a conservative bias: it conserves mass, energy, momentum...
Databases are even being used in many situations that would be better addressed using flat files. That kind of mainframe/JCL talk is not welcome here...
Website Hosting
If you want to setup a dedicated database server, you want to know what software with what hardware will run the fastest. So while the benchmarks may not be useful to people wanting to setup a small multi-purpose server, it can still be useful for some people.
No! gad damn it!
"h4rdw4r3" is however an acceptable alternate spelling.
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
I've been using both MySQL and PostgreSQL for 7 years now... I've never felt a need for "developer support". Or if that phrase means "documentation + message boards" I've never felt a lack from postgresql. I also really don't get the PostgreSQL is hard to use argument. In every linux distro I've used in the last 7 years (redhat, centos, suse, fedora core, ubuntu) they are exactly the same... IE apt-get, yum, whatever, install postgresql-server/mysql-server. Then $startupscriptdir/postgresql start....
Do people really get that hung up on using postgres as the initial user instead of root?!? That is the ONLY difference that I can see.
Now I can agree that maybe postgresql doesn't really "target" an audience, but that is also because they are a true open source project. They don't have a commercial version. I really don't think Linus sits around saying "Ok, we need to add this feature so that more fortune 500's will adopt linux". Or "we need to add feature xyz so this will appeal to small businesses". MySQL has a "target" audience because they are selling something. If you aren't selling anything, by definition you don't have a target.
Actually I think PostgreSQL might have eaten a lot more of the MySQL market if they'd simply been faster to market with better admin tools and Windows support.
Lots of folks went with MySQL early because of those factors. They also then tended to write all their PHP, etc, applications to only talk to MySQL, thus making folks who might have preferred PostgreSQL use MySQL to run the app that they needed to run. Once that happens you are kind of in a Catch-22 place. Folks won't write the apps for PostgreSQL until it's used by a larger chunk of the market, but it won't take that large chunk because all the 'cool' apps were written MySQL only, so they have to run MySQL instead of PostgreSQL
MySQL is modular - pick'n'mix data storage engines sharing a SQL front end. I can't find the bit in TFA that says which one they compared.
I've always suspected that most MySQL vs Postgres flame wars are based on comparing Postgres with the speed of MySQL/MyISAM (No transactions or relational integrity checks - so, big surprise, dead fast for simple queries) and then waving MySQL/InnoDB around when the functionality issue is raised.
MySQL/MyISAM hits the speed/functionality sweet spot for LAMP data-driven websites, is supported by lots of free webapp software and offered by most decent web hosting services. Comparing it speedwise with Postgres has always been pointless, though. If Postgres has caught up, colour me impressed, but if they're pro-Postgres I bet they're comparing with MySQL/InnoDB (which is a bit closer to like-with-like).
Never quite seen the point of MySQL/InnoDB really - all the advantages of MySQL/MyISAM minus the speed, support by popular webapps, availbility on low-cost hosts... and still lacks the features of Postgres.
In a survey of 100 programmers, 111111 thought that duck-typing was a good idea.
> MySQL, as others have pointed out, has better developer support and they know their target audience. They supply a fast,
> easy to use database for those who don't need a whole lot.
> Oracle supplies an enterprise level database that MySQL doesn't aspire to.
> PostgreSQL doesn't know where to fit in.
This is an oversimplification. Each vendor sees itself in all markets:
- oracle/db2/sql server have free versions for tiny apps and very expensive versions for massive apps
- mysql says it doesn't want to do what oracle does, but also says that this is less than 1% of the market - and knows that plenty of smallish databases are on oracle
- postgresql like the others sees itself doing anything from very small databases to very large ones (often via Enterprise DB or other vendor extensions)
And using a single product for multiple sizes isn't illogical: if you have any very large databases (hundreds of gbytes or more) then you probably have a few dozen little ones as well. It's *far* easier to manage them all on oracle/db2/sql server - even with the small additional licensing costs - than to have a frankenstein collection of products to manage.
"Best tool for the job" is a good consideration when evaluating products (along with vendor viability, cost, etc, etc) - but once you've got a single tool in house to keep adding new products - each with their own licensing, support, patch, backup/recovery procedures, etc is a nightmare. Let alone actually federating your data - and having to test out how to virtualize or replicate data from oracle 10.x.x with mysql 5.y.y
> Performance is one aspect of the price tag, but it is certainly not the only factor.
Very true - and for that reason Postgresql has more going for it than many alternatives, like:
- best licensing options - you don't need to pay a lawyer to go over your contract or license like you should if you use oracle or mysql commercially. And there's no fear that the vendor will change its license terms once you're locked in and start charging an arm and a leg.
- very good foundation - postgesql isn't built from duct tape and bailing wire. The functionality within it is well tested and robust.
- great support for standard database features - whether its subselects, stored procedures, triggers, etc - it's very simple to move from oracle to postgresql.
- great ansi sql support - again, very standard sql - no unnecessarily propretary language elements.
So, yeah - just because Postgresql is performing well on some benchmarks that doesn't mean you should immediately throw out oracle in favor of postgresql. On the other hand, you also shouldn't discard it because it is a good general purpose database solution.
Now there's a good point. Is there any good documentation out there on databasing on what is the best solution for what kind of problem? Particularly, I would like to see something between flat-files and DBMS and then between different DBMS.
Though, technically, I'm stuck between whatever our 'host' is giving us (currently PostgreSQL, but usually MySQL, and I dabbled with MSSQL when an intern for the Government), I'd like to know when it might be better to actually just store some stuff in a flat file or even use an XML DB (which one collage professor loved to talked about as a holy grail).
Outside of "Basic Database Theory and Generic SQL" books, a programming class on Databases (how to build tree structures and navigate them, etc), I've never really got any 411 on the benefits of using MSSQL, Oracal, MySQL, PostgreSQL, etc and I've never gotten the change to learn more about advanced database features such as stored procedures or functions or whatever they might be called and what they're good for.
Recommended books, anyone?
Cheers,
Fozzy
p.s. I work mostly with internet applications.
"The past was erased, the erasure was forgotten, the lie became truth." ~1984 George Orwell
PostgreSQL's command-line shell is far better than mysql. In addition to what mysql does, psql supports tab-completion and other nice things. It's been a while since I looked at mysql, but when I did the difference between the two respective command-line interfaces was like the difference between bash and sh.
2) Extensive developer community. We use python and the MySQL/python integration is great. We have a few UDFs that are home-grown but some of them were just downloaded off the net and installed. I'm sure you can find far more for MySQL than for Postgres.
Uh-huh. You are "sure". Several times I asked for support on postgresql mailing lists and the response has always been excellent. Usually I got answers within hours.
I think one of the reasons that mysql became ubiquitous is that it had proper windows support early on. So, just like windows, everyone uses mysql because everyone else does, and they are willing to jump through hoops to work around all the deficiencies the platform has, simply because they don't know any better.
___
If you think big enough, you'll never have to do it.
I worked for a company whose product ran on MS-SQL, PostgreSQL, and Oracle. Should I explain why we didn't support MySQL or not? It'll draw fanboys either way. I used the same server, reinstalled the OS (Red Hat Enterprise 3 or Windows 2000) and database between each test, and rebuilt the application server to be extra sure.
Since it was more difficult to write Oracle-compliant SQL and we didn't have a lot of Oracle customers, the developers didn't care to spend time on it, and our stuff ran about 20 percent slower there. That's after a lot of tuning time, it was 50% slower on a default install. Oracle 9 took two days to install and tune, plus another two days of preparation. I was particularly underwhelmed that I had to deal with stupid errors like tarballs that extracted onto themselves and assumptions about the shell being used. At the time, Oracle was a very Solaris-like experience; user-unfriendly to the extreme.
Postgresql 7 ran great; it was neck and neck with MS-SQL in all tests, after proper tuning, and 30 percent slower on a default install. Postgres took half a day to install and tune, but it took me a week and conversation with the postgres mailing lists to find out what needed tuning. Still, we were able to put together a document that took users from bare metal to RHEL+Postgres in four hours if they had all their media handy.
Microsoft SQL Server 2000 ran great with no tuning at all, and took fifteen minutes to install. It also cost as much as paying me to do the entire set of tests. OS installation/patching times and tested workloads were the same for all three tests.
YMMV.
"Nothing was broken, and it's been fixed." -- Jon Carroll
Back when mysql was faster than postgres for simple selects, mysql proponents were jumping up and down about performance. And, in fact, a very narrow definition of performance, with only simple reads. No writes. Single CPU. One client. You can't step too far away from those parameters because then mysql performance starts to suck.
Now that there is a respectable benchmark showing postgresql to be faster, performance is suddenly not the issue.
Yep. It's right up there with mysql developers claiming that transactions are useless back when mysql didn't have them.
___
If you think big enough, you'll never have to do it.
What is exactly easier in MySql?
"I think this line is mostly filler"
8.1 automatically does table maintenance (vacuum, stats, etc.)- vacuuming.html#AUTOVACUUM
http://www.postgresql.org/docs/8.2/static/routine
You can tune it as to when to do such operations, or what max percent of normal I/O can be used for those tasks.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
My other account has a 3-digit UID.
> Can't say anything about PgSQL, because I couldn't figure out how to set that up.
.MSI installer?
apt-get install postgresql-server too hard for you? Or are you having trouble double-clicking on the
Done with slashdot, done with nerds, getting a life.
Josh Berkus, the person whose blog was referenced here, is one of the PostgreSQL core developers, and he's currently employed by Sun to work on projects like this--which includes contributions back to the PostgreSQL development efforts. He's been doing a lot of work scaling performance upwards to larger capacity servers than the database has traditionally been deployed on.
PostgreSQL setup: 8 cores, 16GB RAM.
Oracle setup: 64 cores, 512GB RAM.
- Sw Usr
The guidelines are pretty simple, but highly subjective.
Here are your options:
A. flat file
B. key-indexed (berkley db, dbm, gdbm)
C. feature/config light db (sqlite)
D. feature/config heavy db (postgrsql, oracle)
* Use A for configuration
* Use A if you have a small amount of data and can slurp it all into memory.
* Use B if you only access data via a known unique value and never compare records.
* Never use either A or B if the data is shared between processes unless file locking is okay.
For anything else, use C or D. If you've got a lot of familiarity with D, sometimes it's easier to use D than C, even if D seems like overkill. My recommendation is to get familiar with C and then worry about D once you understand the limitations of C. It'll help you appreciate what D gives you when you're ready for it.
*sigh* back to work...
The great (sarcasm) thing to me is they have added all these features (views, stored procs, triggers) in the last year or so, unfortunately if you want to use any of those features there are huge caveats when using their main selling point of "easy" clustering. Triggers, Views and Stored Procs all break replication under pretty normal use cases. So, if you want to use them you can't use the built in clustering.
I don't know if the same holds true for PostgreSQL, but I would be extremely surprised since Postgre has had all of those features for years and anyone designing a replication scheme would take them into account. MySQL in contrast had replication first, and now the designers of the additional features have to take the replication scheme into account (which is much harder I'd imagine, as views, triggers and stored procs all have some semblance of a standard, and at the very least have a well defined feature set, and sometimes you can't bend a feature set to fit into an already existing replication scheme)
Also, even if you own the server, converting an existing MyISAM DB to InnoDB can be a huge chore. I've done this for at least 3 customers, and every time it is a much more painful process than you'd think. Mostly because existing data in MyISAM almost always has referential problems after any real world usage (like its been live for more than a month). Someone has deleted a customer, and now you have orders that are orphaned, someone ran a bad update that set the FK wrong in the orders table, and MySQL happily obliged, and now you've got orphaned data... I mean to me the most BASIC functionality of an Rdbms is that it enforces the relationships you give it. I don't even count MySQL w/MyISAM as a database. Might as well just use a bunch of flat comma separated files for tables.
SQLite has before insert triggers. Why doesn't MSSQL?