PostgreSQL Outperforms MongoDB In New Round of Tests

← Back to Stories (view on slashdot.org)

PostgreSQL Outperforms MongoDB In New Round of Tests

Posted by Soulskill on Friday September 26, 2014 @02:15AM from the there-can-be-only-lots dept.

New submitter RaDag writes: PostgreSQL outperformed MongoDB, the leading document database and NoSQL-only solution provider, on larger workloads than initial performance benchmarks. Performance benchmarks conducted by EnterpriseDB, which released the framework for public scrutiny on GitHub, showed PostgreSQL outperformed MongoDB in selecting, loading and inserting complex document data in key workloads involving 50 million records. This gives developers the freedom to combine structured and unstructured data in a single database with ACID compliance and relational capabilities.

16 of 147 comments (clear)

Min score:

Reason:

Sort:

It doesn't matter by DoofusOfDeath · 2014-09-26 02:19 · Score: 5, Funny

Because Postgres isn't web-scale. I want web-scale.
1. Re:It doesn't matter by SQL+Error · 2014-09-26 02:31 · Score: 4, Informative
  
  Ah, memories. That had us rolling on the floor at my office at the time.
  For those who missed it, or want to relive it: http://www.youtube.com/watch?v...
2. Re:It doesn't matter by DoofusOfDeath · 2014-09-26 02:40 · Score: 5, Funny
  
  I'm afraid you were still semi-wooshed. I was actually making a reference to this.
3. Re:It doesn't matter by Anonymous Coward · 2014-09-26 02:57 · Score: 5, Informative
  
  the linux kernel doesn't even have to load it into RAM, it goes from disk to network directly.
  Oh really? so your network card is a bus master and can initiate transfers from other peripherals without using DMA?
  I assure you, the Linux kernel still loads the file into RAM. Your RAM is fast compared to SATA or Ethernet, it's an excellent staging ground for such transfers. But obviously you don't need to load the entire file before you start sending it out, there are tricks that let the kernel deal with it by tracking the DMA status of the ethernet card and using memory mapped files.
4. Re:It doesn't matter by Warbothong · 2014-09-26 03:06 · Score: 5, Funny
  
  I can only think of one database that isn't "webscale", and that's TinySQL, which I still use for personal web projects regardless.
  I hadn't heard of TinySQL, so I just Googled it. From http://sourceforge.net/project...
  > tinySQL is a SQL engine written in Java.
  Is the name meant to be ironic or something?
5. Re:It doesn't matter by plopez · 2014-09-26 03:16 · Score: 5, Funny
  
  "I can only think of one database that isn't "webscale", and that's TinySQL, which I still use for personal web projects regardless."
  You've just made all those MS Access developers cry.....
  
  --
  putting the 'B' in LGBTQ+
6. Re:It doesn't matter by gwolf · 2014-09-26 04:54 · Score: 4, Informative
  
  Just adding to what the others have stated: RAM speed is in the vicinty of a million times HDD speed. You won't notice a file going to RAM before being sent to the network interface. Doing all of the kludgework for this to happen (if possible!) would be for a negligible gain.
"Small" amount of data by ranton · 2014-09-26 02:29 · Score: 5, Interesting

I am confused. If they are testing the performance of ACID and BASE database systems, why did they use a data load that can easily fit on a single computer? The data size for both databases was under 150 GB which can easily sit on a single hard drive let alone a single server. Why would a BASE database have any edge over an ACID one for a data set that does not require distribution between multiple servers?
It is still important to see how much faster a more established DBMS is than a relative newcomer for smaller loads, but I still feel this comparison is a bit lacking.

--
-- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
1. Re:"Small" amount of data by Anonymous Coward · 2014-09-26 03:22 · Score: 5, Informative
  
  Have you used a version of PostgreSQL that is not 10 years old? The vacuum process performs some necessary work asynchronously from your transaction, so that you can have higher concurrency and scalablity. The modern autovacuum does not have locking problems.
2. Re:"Small" amount of data by Anonymous Coward · 2014-09-26 03:31 · Score: 4, Informative
  
  Except MongoDB can't do any better!
  "Important notes on compacting:
  "
  "This operation blocks all other database activity when running and should be
  "used only when downtime for your database is acceptable. If you are running
  "a replica set, you can perform compaction on secondaries in order to avoid
  "blocking the primary and use failover to make the primary a secondary before
  "compacting it.
  Source: http://blog.mongolab.com/2014/01/managing-disk-space-in-mongodb/
  The solution in both cases is to mirror your data. I don't know if this particular problem has been solved theoretically, but in any event neither support a compacting vacuum without some write locking. And both support the same workaround.
  NoSQL databases aren't magic. Magic would be asynchronous multi-master replication while maintaining relational integrity, and without any strings attached. No product can do this. Every product has a story, but they either tweak the problem or spin their solution. But that's the state of the art at the moment. We just aren't there, yet.
  If you read the fine print, rather than worshipping with the cargo cultists, you'd know this. Like transactional memory (e.g. lockless data structures), the database field is just filled with misinformation and unrealistic expectations. Developers are clueless. Mostly because no matter what they use, it's going to be fast enough for them. So they're never forced to face their assumptions.
I dipped my toe in MongoDB by EmperorOfCanada · 2014-09-26 02:31 · Score: 4, Interesting

I tried MongoDB and I even tried to like it. I do love NoSQL but what I came to realize was that MongoDB was trying to tell me how to solve my problems instead of just storing my damned data.

But the real problem with MongoDB was that nearly everything, while appearing simple, required a google search to figure out how to do it. A mark of a very well designed API is that you soon start guessing the commands and your guesses are really close or right on. But with MongoDB I found that nothing really made sense. Only after carefully crafted "debate team" arguments could any unusual aspect of MongoDB defend itself. Whereas redis is the opposite, it just works. Or even simpler systems like Memcache, that couldn't be simpler, when read the API for either of those they just made sense. There is no layer upon layer upon layer of complexity. It is data goes in, and data comes out.

In fact redis would be a good example of ease of use mixed with advanced capabilities. The basic commands are things like get, append, save, while more advanced commands are more esoteric such as PEXPIREAT which has to do with timestamp expiries. So you can happily use redis like a simple minded fool and it is wonderful. Or you can dig in deeper and only mildly shake your head at some of the command names. But with MongoDB it is just a pain in the ass from the first moment you truly have even vaguely complicated data.

But back to PostgresSQL. The JSON related features are mildly complex but appear to be solving the most common problems. Also by using PostgresSQL it solves the entire debate of relational vs NoSQL. Use PostgresSQL and you can just do both without giving it a second thought. And I for one can certainly say that I have data that demands NoSQL and I have other data that demands relational; all in the same project. But oddly enough the technique that I use is MariaDB for the relational and redis for everything else. This is ideal for me as the relational data is very simple and won't need to scale much whereas the redis stuff needs to run at rocket speeds and will be the first to scale to many machines.

But as for MongoDB, it has been deleted from all machines, development and deployment and will never be revisited regardless of this weeks propaganda.
The tipping point by tyggna · 2014-09-26 02:45 · Score: 4, Informative

I've done research against these database programs, and this is really really old news for anyone who has done testing. If you have a single machine, then Oracle is the best performing database, followed by Postgres. When you need more than 4 dedicated servers hosting a database, then mongo can handle about 180% of the volume that oracle can, and about 220% the volume of postgres, and about 110% the volume of Casandra.
As soon as you need more than one machine to host your database (which usually happens around 1000 active users on your website at any given time, depending on your application), consider switching off of an SQL database.
Re:Replication anyone? by Anonymous Coward · 2014-09-26 04:22 · Score: 5, Funny

Can PostgreSQL do replication? Not really.
That's news to me, I guess the data on our read servers just magically appear and what more magically appear to be the same data we need there.
Not surprising... by rgbatduke · 2014-09-26 04:23 · Score: 5, Interesting

... because of the way MongoDB actually stores records and parses them. It is more or less a simple tree or linked list, and hence doing almost anything involves decending branches to the leaves. This is horrendously inefficient in many contexts, while still being perfectly lovely in others. Just doing a match, though, can involve a non-polynomial time search. Maybe they've improved this from when I was trying to use Mongo to drive modelling, but I doubt it as it would have involved substantially changing the way the data is actually stored and dereferenced. I had to cheat substantially in order to get anything like decent performance, and any of the SQLs outperformed it handily.
Note well that it was strictly a scaling issue. For small trees and DBs, it probably works well enough. For large DBs with millions of records and substantial structure, it is like molasses. Only worse.
rgb

--
Even when the experts all agree, they may well be mistaken. --- Bertrand Russell.
Re:The stress-testing wasn't needed by Sarten-X · 2014-09-26 04:55 · Score: 4, Insightful

I've worked extensively on both kinds of systems over the past decade. Under a particular workload that is exactly what an RDBMS is designed for, an RDBMS has the best performance? Wow, who would have bet on that one?
Then again, I've had workloads (my go-to example is writing several billion records in a matter of hours for statistical analysis, with live intermediate results) where a NoSQL solution had the best performance.
NoSQL isn't some rebellion against traditional databases. Engineering isn't a contest. Rather, NoSQL, column-stores, distributed warehousing, or any other term you'd like to throw out all just point to an additional option for how to manage your data. Pick the right choice for your project, and use it. Don't worry about "web-scale" or "ACID compliance" talking points unless your project needs them. For the past few decades, we've been forced into the assumption that data must perfectly normalized, arranged in tables, and must be queried as relations. For some projects, massaging the data into that form will damage your performance far more than your database engine ever will, so a different engine makes a better choice.
Stop listening to hype, deserved or not, and use the right tool for the job.

--
You do not have a moral or legal right to do absolutely anything you want.
Re:The stress-testing wasn't needed by badboy_tw2002 · 2014-09-26 05:42 · Score: 4, Funny

"Engineering isn't a contest."
But...but...I have a hammer. I KNOW how to use a hammer. A hammer is the best! FUCK YOU SCREW! TAKE THAT SCREW! Job complete!