A Tale of Two Databases, Revisited: DynamoDB and MongoDB
Questioning his belief in relational database dogma, new submitter Travis Brown happened to evaluate Amazon's Dynamo DB and
MonogDB. His situation was the opposite of Jeff Cogswell's: he started
off wanting to prefer Dynamo DB, but came to the conclusion that the
benefits of Amazon managing the database for him didn't outweigh the
features Mongo offers. From the article:
"DynamoDB technically isn't a database, it's a database service.
Amazon is responsible for the availability, durability, performance,
configuration, optimization and all other manner of minutia that I
didn't want occupying my mind. I've never been a big fan of managing
the day-to-day operations of a database, so I liked the idea of taking
that task off my plate. ... DynamoDB only allows you to query against the primary key, or the primary key and range. There are ways to periodically index your
data using a separate service like CloudSearch, but we are quickly
losing the initial simplicity of it being a database
service. ... However, it turns out MongoDB isn't quite as difficult as
the nerds had me believe, at least not at our scale. MongoDB works as
advertised and auto-shards and provides a very simple way to get up
and running with replica sets."
His weblog entry has a few code snippets illustrating how he came to his
conclusions.
Did he compare MongoDB to the correct product then? I'd love to have seen him also include Amazon SimpleDB.
Nice astroturf. See here for a detailed analysis of why MongoDB is broken by design.
MongoDB is web scale
And... /thread.
The comments on the last story shows what a joke this was considered before.
"No one cares. Stop click-baiting the buzzword Slashdot sub-sites. If we wanted to go to them we would do so voluntarily."
"Having actually RTFA, it just enforces how poorly most programmers understand relational databases and shouldn't be let near them. It's so consistently wrong it could be just straight trolling (which given it's posted to post-Taco Slashdot, is likely)."
"I think the author and his team failed the customer in this case by providing them with an inflexible system. Either they forced the client into accepting these horrible limitations so they could play with new (and expensive!) toys, or the client just flat-out doesn't need this database for anything (in which case it's a waste of money.) This kind of data absolutely needs to be kept in a relational database to be useful.
Which, along with his horrible Java vs. C# comparison, makes Jeff Cogswell officially the Slashdot contributor with the worst analytical skills."
I wanted DynamoDB to work, but concluded that Mongo is a safer and more accessible place for my data.
"in some strange way my brain had been conditioned to think of modeled data in a relational way"
The relational model is not much more or less than the mathematically sound way of dealing with sets and relations between their items in ways that enforce and maintain consistency. There is no alternative to that. It's not merely the status quo, as the article states. Even when designing a datamodel for storage in a NoSQL database, the rules of the relational model are best taken into account.
The only sound reason for deviating from the relational model and its rules is that your (reasonably priced) relational database server has shortcomings, typically related to dealing with large datasets in clusters, situations in which relational database solutions typically don't scale well and a compromise is needed.
Note that NoSQL has its place and I have encountered and worked on projects in which there was just no alternative, but I wouldn't trust my precious data to any developer that chooses NoSQL over a proper datamodel for arguments other than those mentioned above, because they're bound to be wrong.
I don't get how anybody educated in computer science fails to understand this.
All hail Edgar F. Codd!
0x or or snor perron?!
Mongo not just pawn in game of DB
Twinstiq, game news
"Sure, it looks simple enough, but what about the 50th and 100th time you have to remodel your hierarchical data into different, but still hierarchical data?"
The structure of data should reflect its semantic relationships, and these are necessarily pretty static (information wouldn't be much use if its meaning constantly changed.) Therefore, I find it hard to believe that the author's data semantics are changing rapidly enough to demand 50-100 remodellings. More plausible, I think, is that a great many of those models were poorly thought-out. If this is the case, then the author seems to be selecting a database to conform to a flawed approach to software development.
He's still using "ironically" incorrectly, despite the fact that i defined it for him in the comments on last story. He saw and replied to my comment, acknowledging his error.
http://www.youtube.com/watch?v=WY_amJ0YZrM
"Ironically, it was a session at AWS Re:invent that initially scared me away from MongoDB."
I'm not sure which word he wants, but "ironically" isn't it.
we are ignoring you. - make them love you, or make them hate you, but don't ever bore them.
I am sufficiently freaked out by Amazon pricing that I just can't use it. I have two simple fears. One is that I screw up with my code and do a bazillion transactions per second that result in either a ridiculous bill or exhausting my budget resulting in my service having to shut down. Secondly I am scared that some kind of DDOS would blow through my life savings. I much prefer the control of having my own dedicated servers with extremely fixed costs. It might not be the most efficient scheme but with the service I use they can throw extra servers on pretty quickly and I can set them up in a flash. So yes I am potentially hosed if Opera or Slashdot feature my work but I sleep like a baby knowing that amazon won't be billing me a house tomorrow. I even tried out their free service and while loving it was deathly afraid of getting billed.
If my sites really grew I would even contemplate going a step further and running my own physical servers. The joy of being able to reach out and jam USB sticks into them would be pretty good.
Comparing DynamoDB with MongoDB is like comparing apples and oranges. The only thing the two share in common really is the fact that neither supports SQL (and for that reason are called NoSQL databases). Their intended purpose is completely different which is why I found it strange that the author of the original Slashdot story would pit them against each other the way he did.
If DynamoDB is to be compared against another datastore, the most similar alternative would probably be Google App Engine's Datastore/big table.
Similarities between DynamoDB and GAE Datastore
Differences between DynamoDB and GAE Datastore
One major difference between GAE Datastore and DynamoDB is that GAE supports single and multi property indexes while Dynamo does not support indexes at all aside from a table's primary key. GAE datastore supports efficient queries that use the indexes (if you try to run a query that does not use an index it will fail) along with some basic predicates like equality, inequality, greater than and less than expressions, etc. In DynamoDB, if you want an index, you have to build it yourself in a supplementary table.
GAE Datastore Self-Merge Joins
GAE datastore also supports what they call "self-merge joins" which are super powerful. I don't know if any other schema-less datastore has this.
DynamoDB Purpose
The main reason one would use DynamoDB is when they need scalable throughput; in other words, when your needs for write and/or read speeds fluctuate drastically and when you know you will occasionally spike to extremely high throughput requirements. For times when you expect to have huge throughput for writing, you can pay to scale for that small period of time and then you can reduce your costs by throttling down to a more sane limit. You can run MapReduce jobs over DynamoDB tables using Amazon Elastic Map Reduce. And you can also copy a DynamoDB table into an Amazon Redshift "warehouse"; once the data is copied into Redshift you can run efficient SQL queries over it and Redshift can efficiently do that over petabytes worth of data.
MongoDB
MongoDB, on the other hand, is a "schema-less," document oriented database that is good for organizing clumps of information as a single "item" in the datastore. So for example, you can have a single book document which contains nested information about its authors, keywords, reader reviews, and statistics about word usage in the book....all in a single mondodb "record." This is essentially impossible in DynamoDB (unless you do what the previous article's author did by
Argue all you want. My company *ships products* built on MongoDB. We get them done quickly, the DB performs super well, and these apps make us good money with the least effort.
But go on and have your little nerd fights. I'll just keep doing useful work and raking in the cash.
Cheers! :)
Someone who actually knows what they're talking about joins the conversation.
This billing anxiety has cause me to also completely rue out Amazon. All you need is an errant backup or DDOS, as you stated, and your I/O and bandwidth charges reach epic proportions.
Also, when you factor the cost against a powerful server, with average usage levels, over the course of three years, the price is more or less break even. Use that server for four or five years and physical servers become the cheaper option.