PostgreSQL Outperforms MongoDB In New Round of Tests
New submitter RaDag writes: PostgreSQL outperformed MongoDB, the leading document database and NoSQL-only solution provider, on larger workloads than initial performance benchmarks. Performance benchmarks conducted by EnterpriseDB, which released the framework for public scrutiny on GitHub, showed PostgreSQL outperformed MongoDB in selecting, loading and inserting complex document data in key workloads involving 50 million records. This gives developers the freedom to combine structured and unstructured data in a single database with ACID compliance and relational capabilities.
Because Postgres isn't web-scale. I want web-scale.
Is it web-scale?
I am confused. If they are testing the performance of ACID and BASE database systems, why did they use a data load that can easily fit on a single computer? The data size for both databases was under 150 GB which can easily sit on a single hard drive let alone a single server. Why would a BASE database have any edge over an ACID one for a data set that does not require distribution between multiple servers?
It is still important to see how much faster a more established DBMS is than a relative newcomer for smaller loads, but I still feel this comparison is a bit lacking.
-- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
Anybody who's worked on both already knows that NoSQL-based solutions simply don't live up to the hype.
DELETE MY ACCOUNT
I tried MongoDB and I even tried to like it. I do love NoSQL but what I came to realize was that MongoDB was trying to tell me how to solve my problems instead of just storing my damned data.
But the real problem with MongoDB was that nearly everything, while appearing simple, required a google search to figure out how to do it. A mark of a very well designed API is that you soon start guessing the commands and your guesses are really close or right on. But with MongoDB I found that nothing really made sense. Only after carefully crafted "debate team" arguments could any unusual aspect of MongoDB defend itself. Whereas redis is the opposite, it just works. Or even simpler systems like Memcache, that couldn't be simpler, when read the API for either of those they just made sense. There is no layer upon layer upon layer of complexity. It is data goes in, and data comes out.
In fact redis would be a good example of ease of use mixed with advanced capabilities. The basic commands are things like get, append, save, while more advanced commands are more esoteric such as PEXPIREAT which has to do with timestamp expiries. So you can happily use redis like a simple minded fool and it is wonderful. Or you can dig in deeper and only mildly shake your head at some of the command names. But with MongoDB it is just a pain in the ass from the first moment you truly have even vaguely complicated data.
But back to PostgresSQL. The JSON related features are mildly complex but appear to be solving the most common problems. Also by using PostgresSQL it solves the entire debate of relational vs NoSQL. Use PostgresSQL and you can just do both without giving it a second thought. And I for one can certainly say that I have data that demands NoSQL and I have other data that demands relational; all in the same project. But oddly enough the technique that I use is MariaDB for the relational and redis for everything else. This is ideal for me as the relational data is very simple and won't need to scale much whereas the redis stuff needs to run at rocket speeds and will be the first to scale to many machines.
But as for MongoDB, it has been deleted from all machines, development and deployment and will never be revisited regardless of this weeks propaganda.
MEAN (MongoDB, Express, Angular.js, Node.js)
I've done research against these database programs, and this is really really old news for anyone who has done testing. If you have a single machine, then Oracle is the best performing database, followed by Postgres. When you need more than 4 dedicated servers hosting a database, then mongo can handle about 180% of the volume that oracle can, and about 220% the volume of postgres, and about 110% the volume of Casandra.
As soon as you need more than one machine to host your database (which usually happens around 1000 active users on your website at any given time, depending on your application), consider switching off of an SQL database.
"MongoDB, the leading document database and NoSQL-only solution provider,"
According to who?
What happened to all the rest of them, like CouchBase or Riak?
I will admit bias, though. I like my db's eventually consistent.
Whoever did serious performance tests against PostgreSQL already knew!
Sent as ripples into the electromagnetic field. No single photon has been harmed in the process.
Anything but read a computer science textbook
And then let it be reviewed.
putting the 'B' in LGBTQ+
... a pronounceable name for the PostgreSQL software, one that does not require a FAQ entry to instruct in the correct pronunciation.
same applies for either disconnected operations (Couch) or multi-data center deployment for HA.
Even when using sendfile() in linux, the disk does a DMA transfer into RAM, and the NIC does a DMA transfer out of RAM.
While the CPU is not involved in copying the data, it still goes into RAM.
It is not FUD. They add a compatibility layer for a certain proprietary database vendor who charges an arm and a leg for functionality that PostgreSQL gives you for free.
Candygram for Mongo!
ZooKeeper? Isn't that the old version of Candy Crush Saga that had Tetris Attack-style skill chains?
Try orientdb.
Seriously.
Use access and be done with it!
Isn't CouchDB a NoSQL database solution?
Can PostgreSQL do replication? Not really.
That's news to me, I guess the data on our read servers just magically appear and what more magically appear to be the same data we need there.
... because of the way MongoDB actually stores records and parses them. It is more or less a simple tree or linked list, and hence doing almost anything involves decending branches to the leaves. This is horrendously inefficient in many contexts, while still being perfectly lovely in others. Just doing a match, though, can involve a non-polynomial time search. Maybe they've improved this from when I was trying to use Mongo to drive modelling, but I doubt it as it would have involved substantially changing the way the data is actually stored and dereferenced. I had to cheat substantially in order to get anything like decent performance, and any of the SQLs outperformed it handily.
Note well that it was strictly a scaling issue. For small trees and DBs, it probably works well enough. For large DBs with millions of records and substantial structure, it is like molasses. Only worse.
rgb
Even when the experts all agree, they may well be mistaken. --- Bertrand Russell.
My understanding is that it's easy to spread READ ONLY postgres load accross multiple servers. WRITING is a bottleneck with postgresql though because it enforces consistency, while other DBs like couch kick the consistency can down the road to the application. But I haven't seriously looked into it in years.
-73, de n1ywb
www.n1ywb.com
I see what you did there...
You and the parent are both fucking idiots. Pull your head out of your ass.
The MEAN stack is an excellent tool for specific problem domains. The patterns used in the MEAN stack are virtually indistinguishable from the patterns used in, say, a modern Microsoft web stack. ORM, MV*, templated HTML and a backing data store.
Where Postrgres and MySQL were used for web applications and they performed very well.
Launch more instances?
Look at the "MongoDB 2.4/PostgreSQL 9.4 Relative Performance Comparison" and see that MongoDB's bars are much higher than PostgreSQL's, with labels like "276%" and "465%". That looks like MongoDB is much better, right? Oh, oops! Apparently that's how much slower MongoDB is.
Dewey, what part of this looks like authorities should be involved?
I really, really hope nobody is using SQLite for a production web database, but sadly I know somebody probably is.
Can MongoDB do master-master replication? Oh, it's can't, and really only CouchDB does in the NoSQL space? Oh, that's too bad. Of course, most of us don't NEED M-M replication, as it introduces serious issues with reliability (oh I wrote the client record to server A and then queried server B on the next page load and it didn't exist yet -> Null Exception #AWESOME!) and is only useful for backups/reporting/import/export scenarios. The rest of us who actually want to GET WORK DONE will probably continue with relational DBs and post JSON documents as needed into our databases (e.g. json doc for lists/complex objects where we don't want/care to index any fields within).
And ACID doesn't fall apart at all in sharding - what are you smoking?? You implement a standard sharding scheme and the same record always goes to the same server. NoSQL doesn't do a thing for sharing... Replication is a problem, but it is for NoSQL too.
. Define sqrt(x) as something really evil like (x / rand()), and bury it deep. Watch your coworkers go nuts.
Which is ( kind of ) a common use-case for noSQL.
I'm sure a lot of people who hate noSQL actually use and like memcached a lot, which is quite ironic IMO. Memcached is a volatile noSQL database after all ( lets not get too caught up in semantics here ).
You could achieve pretty much the same effect by caching in the database, but memcached does a lot of useful things and performs really well.
My point is that noSQL can be really useful together with regular sql, and a lot of people use them that way. It's not necessarily an either or thing.
The hipsters got a case of HPV - Haskell, Postgres, and Varnish. The only cure is more buttsechs.
Copyright (c) 1990 - 2014 Dice. All rights reserved. Use of this comment is subject to certain Terms and Conditions.
Which proprietary database vendor would that be?
"You implement a standard sharding scheme and the same record always goes to the same server."
Yah, that's totally ACID. I would love to see how to put two updates on different servers on the same transaction.... Must be that I'm smoking.
Of course it outperforms PG & Oracle because little things like ACID transactions.
Throwing out things like ACID, triggers and referential integrity give you a huge performance boost.
Ok, then I am confused by why they call themselves "The Postgres Database Company" Do you mean a compatibility layer for Oracle ? Oracle has their own NoSQL database which is not Mongo.
-- I doubt, therefore I might be.
"Can MongoDB do master-master replication?"
uhmmm.. yes? All mongodb nodes are "masters".
"most of us don't need M-M".
Well, I have some uses for it, In fact I've used it a lot with MySQL. And about the asynchronicity.... some times you can go perfectly well without it. Sometimes not. But you don't have to try to use a hammer for everything.
Using ACID compliance when not needed, is a waste. As is mad not using it for anything that relays on transactions.
Our work is precisely choose wisely, that's it.
Any sufficiently advanced technology is indistinguishable from magic.
- Arthur C. Clarke.
I guess MongoDB's replication just isn't advanced enough.
"If a nation expects to be ignorant and free in a state of civilization, it expects what never was and never will be."
Just in my experience, the introduction of HStore and JSON data types in Postgres has pretty much nullified the advantages I'd get from using a NoSQL DB. Sharding, high availability, etc are all there with a little work (and help from the many 3rd party projects in the Postgres "ecosystem"). Every now and then I find myself tempted to run a project using a NoSQL DB, but the trade offs (lots of memory, lack of ACID compliance, nascent querying languages, etc) bring me back to Postgres.
Of course there are situations where Mongo or other NoSQL DBs make sense. Using something like InfluxDB for time series data looks pretty neat, and having highly optimized lookup data in a NoSQL DB is great. In the end, you use the database system that makes sense in your work - and avoid the cargo-culting of any technology just because it's the new hotness.
I've been really happy with Postgtes's performance over the years. Raw speed is not an issue - you can always add more nodes using something like Postgres-XL if you have to. It's the gradual introduction of functionality that makes my life easier that I appreciate.
It's a fine stack except for the MongoDB part. MongoDB is crap. Use Cassandra or something if you want NoSQL. Just please don't use MongoDB, its developers are clueless.
Yes, PostgreSQL does replication. It's your problem if you don't wanna read the docs.
MongoDB is great for replication as long as you don't care about your data. I don't know about you but I care about my data.
If you want really impressive replication, use Cassandra or something.
Looks like somebody already found a performance issue in their code that almost certainly makes this a poor comparison: https://github.com/EnterpriseDB/pg_nosql_benchmark/pull/6
P.S. writing solid benchmarks to compare different technologies is *really* hard.
Like other great recursively named open source projects, they should call it 'TinySQL Isn't Tiny' or something. It's both edgy and non-ironic. Maybe they worry people will stop taking their project seriously though.
Human Rights, Article 12: Freedom from Interference with Privacy, Family, Home and Correspondence
I was just wondering why nobody has mentioned TokuMX in this thread. It's a MongoDB with a new high performance backend - with full ACID.
http://www.tokutek.com/products/tokumx-for-mongodb
Can PostgreSQL do replication?
Yes.
Not really.
Yes, really.
Don't even talk about master-master replication....
That's in development as we speak.
I know you're already modded as troll, and rightly so, but it should be pointed out that you're also incorrect, and that's now done.
It might be that you're just ignorant, but it's hard to tell the difference.
Yeah, mod parent up. There's a serious misunderstanding of the causes behind the difficulty to scale relational databases.
The problem is not the ability to relate records, it's the will to do so. NoSQL takes away the ability and suddenly everybody thinks they can shard. But they can also shard with relational databases if they give up relating records in different shards. There's no need to drop ACID or switch technologies, just apply them differently.
Cassandra is even worst, mind you.
A database that uses up a whole 10 MB of disk space in postgres, uses up 600MB in Cassandra last time I tried. There's no amount of cleverness that can make that not a performance hinderance, not to mention a provisioning nightmare.
"You implement a standard sharding scheme and the same record always goes to the same server."
Yah, that's totally ACID. I would love to see how to put two updates on different servers on the same transaction.... Must be that I'm smoking.
Two-phase commit