PostgreSQL Outperforms MongoDB In New Round of Tests
New submitter RaDag writes: PostgreSQL outperformed MongoDB, the leading document database and NoSQL-only solution provider, on larger workloads than initial performance benchmarks. Performance benchmarks conducted by EnterpriseDB, which released the framework for public scrutiny on GitHub, showed PostgreSQL outperformed MongoDB in selecting, loading and inserting complex document data in key workloads involving 50 million records. This gives developers the freedom to combine structured and unstructured data in a single database with ACID compliance and relational capabilities.
I was about to get wooshed, and post something about how "web-scale" isn't any kind of meaningful standard, but then I thought better of it, looked it up, and found it's MongoDB's tagline.
I can only think of one database that isn't "webscale", and that's TinySQL, which I still use for personal web projects regardless.
Ah, memories. That had us rolling on the floor at my office at the time.
For those who missed it, or want to relive it: http://www.youtube.com/watch?v...
I've done research against these database programs, and this is really really old news for anyone who has done testing. If you have a single machine, then Oracle is the best performing database, followed by Postgres. When you need more than 4 dedicated servers hosting a database, then mongo can handle about 180% of the volume that oracle can, and about 220% the volume of postgres, and about 110% the volume of Casandra.
As soon as you need more than one machine to host your database (which usually happens around 1000 active users on your website at any given time, depending on your application), consider switching off of an SQL database.
the linux kernel doesn't even have to load it into RAM, it goes from disk to network directly.
Oh really? so your network card is a bus master and can initiate transfers from other peripherals without using DMA?
I assure you, the Linux kernel still loads the file into RAM. Your RAM is fast compared to SATA or Ethernet, it's an excellent staging ground for such transfers. But obviously you don't need to load the entire file before you start sending it out, there are tricks that let the kernel deal with it by tracking the DMA status of the ethernet card and using memory mapped files.
Have you used a version of PostgreSQL that is not 10 years old? The vacuum process performs some necessary work asynchronously from your transaction, so that you can have higher concurrency and scalablity. The modern autovacuum does not have locking problems.
Except MongoDB can't do any better!
"Important notes on compacting:
"
"This operation blocks all other database activity when running and should be
"used only when downtime for your database is acceptable. If you are running
"a replica set, you can perform compaction on secondaries in order to avoid
"blocking the primary and use failover to make the primary a secondary before
"compacting it.
Source: http://blog.mongolab.com/2014/01/managing-disk-space-in-mongodb/
The solution in both cases is to mirror your data. I don't know if this particular problem has been solved theoretically, but in any event neither support a compacting vacuum without some write locking. And both support the same workaround.
NoSQL databases aren't magic. Magic would be asynchronous multi-master replication while maintaining relational integrity, and without any strings attached. No product can do this. Every product has a story, but they either tweak the problem or spin their solution. But that's the state of the art at the moment. We just aren't there, yet.
If you read the fine print, rather than worshipping with the cargo cultists, you'd know this. Like transactional memory (e.g. lockless data structures), the database field is just filled with misinformation and unrealistic expectations. Developers are clueless. Mostly because no matter what they use, it's going to be fast enough for them. So they're never forced to face their assumptions.
Just adding to what the others have stated: RAM speed is in the vicinty of a million times HDD speed. You won't notice a file going to RAM before being sent to the network interface. Doing all of the kludgework for this to happen (if possible!) would be for a negligible gain.
Locking up tables for over 30 minutes when they haven't even been updated
There is no vacuuming on tables that have no updates.