Slashdot Mirror


Why My Team Went With DynamoDB Over MongoDB

Nerval's Lobster writes "Software developer Jeff Cogswell, who matched up Java and C# and peeked under the hood of Facebook's Graph Search, is back with a new tale: why his team decided to go with Amazon's DynamoDB over MongoDB when it came to building a highly customized content system, even though his team specialized in MongoDB. While DynamoDB did offer certain advantages, it also came with some significant headaches, including issues with embedded data structures and Amazon's sometimes-confusing billing structure. He offers a walkthrough of his team's tips and tricks, with some helpful advice on avoiding pitfalls for anyone interested in considering DynamoDB. 'Although I'm not thrilled about the additional work we had to do (at times it felt like going back two decades in technology by writing indexes ourselves),' he writes, 'we did end up with some nice reusable code to help us with the serialization and indexes and such, which will make future projects easier.'"

16 of 106 comments (clear)

  1. No one cares by Anonymous Coward · · Score: 5, Insightful

    No one cares. Stop click-baiting the buzzword Slashdot sub-sites. If we wanted to go to them we would do so voluntarily.

  2. Re:I don't understand by OakDragon · · Score: 4, Funny

    MongoDB ... just a pawn in the game of life.

  3. It's so ... wrong by Anonymous Coward · · Score: 5, Insightful

    Having actually RTFA, it just enforces how poorly most programmers understand relational databases and shouldn't be let near them. It's so consistently wrong it could be just straight trolling (which given it's posted to post-Taco Slashdot, is likely).

    "However, the articles also contained data less suited to a traditional database. For example, each article could have multiple authors, so there were actually more authors than there were articles."

    This is completely wrong, that's a text book case of something perfectly suited to traditional (relational) database.

    1. Re:It's so ... wrong by MightyMartian · · Score: 5, Funny

      "Those who don't understand SQL are condemned to reinvent it, poorly." (with apologies to Harry Spencer).

      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
    2. Re:It's so ... wrong by vux984 · · Score: 5, Funny

      "However, the articles also contained data less suited to a traditional database. For example, each article could have multiple authors, so there were actually more authors than there were articles."

      Good god, how would he model invoices with multiple line items? Where, you know, there were actually more line items than invoices?! Mind blown.

      Or customers that might belong to zero more demographics? There could be more customers than defined demographics to tag them with... or less... we don't even know and it could change as more of either are added!!

      We need a whole new database paradigm!

      Or the sample Northwind database that's been shipping with access since the 90's.

    3. Re:It's so ... wrong by Torvac · · Score: 5, Insightful

      "with big data comes big responsibility". i mean a few very static 100k items require a NoSQL DB solution and cloud storage ? and a full team to do this ?

    4. Re:It's so ... wrong by Tom · · Score: 4, Insightful

      Mod parent up.

      After a few years in other fields, I'm doing some serious coding again. Postgres and Doctrine. I can do in a few lines of code and SQL what would take a small program or module to do without the power of SQL and an ORM.

      Anyone who reinvents that wheel because he thinks he can do the 2% he recoded better is a moron.

      --
      Assorted stuff I do sometimes: Lemuria.org
  4. So the gist of the article is..... by f-bomb · · Score: 4, Informative

    MongoDB would have been perfect based on the structure of the data, but the client didn't want to pay for setup and hosting costs, DynamoDB was the cheaper alternative, but more of a pain in the ass to implement. Makes we wonder if the hosting cost savings offset the additional development time.

    --
    Everyone should believe in something. I believe I'll have another beer.....
  5. Question from relational-land by mcmonkey · · Score: 4, Informative

    As someone whose work and thinking are firmly planted in traditional RDMS, a few of those decisions did not make sense.

    I understand what he's saying about normalized tables for author, keywords, and categories. But then when he has to build and maintain index tables for author, keyword, and categories, doesn't that negate any advantage of not having those tables?

    I understand he's designed things to easy retrieval of articles, but it seems the trade-offs on other functions are too great. It's nice an author's bio is right there in the article object, but when it's time to update the bio, that does mean going through and touching every article by that author?

    I've I got a bunch of similar examples, and I would not be at all surprised if they all boiled down to 'I don't understand what this guy is doing,' but basically, isn't NoSQL strength in dealing with dynamic content and in this example, serving static articles, the choice between NoSQL and traditional RDMS essentially up to personal preference?

    1. Re:Question from relational-land by ranton · · Score: 5, Insightful

      Don't try to actually make sense of the decisions made in the article. I am glad that he summed up all of the reasons why he didn't go with a relational database early in the article, so I didn't have to bother reading the rest. I am an advocate of NoSQL, but this whole article is describing a project that is almost perfect for a relational database.

      But considering this author's previous analysis of Java vs C#, I am not surprised that this article was hardly worth the time to read.

      --
      -- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
    2. Re:Question from relational-land by ranton · · Score: 4, Insightful

      Oh come on now. Play fair. If you start throwing around advanced database features like materialized views then you will immediately invalidate 90% of the use cases commonly used for choosing NoSQL over relational databases. That is just mean.

      --
      -- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
  6. Bad planning by Samantha+Wright · · Score: 5, Interesting

    Throughout the article the client says they don't want full-text search. The author says he can "add it later," then compresses the body text field. Metadata like authorship information is also stored in a nasty JSON format—so say goodbye to being able to search that later, too!

    About that compression...

    That compression proved to be important due to yet another shortcoming of DynamoDB, one that nearly made me pull my hair out and encourage the team to switch back to MongoDB. It turns out the maximum record size in DynamoDB is 64K. That’s not much, and it takes me back to the days of 16-bit Windows where the text field GUI element could only hold a maximum of 64K. That was also, um, twenty years ago.

    Which is a limit that, say, InnoDB in MySQL also has. So, let's tally it up:

    • There's no way at all to search article text.
    • Comma-separated lists must be parsed to query by author name.
    • The same applies to keywords...
    • And categories...

    So what the hell is this database for? It's unusable, unsearchable, and completely pointless. You have to know the title of the article you're interested in to query it! It sounds, honestly, like this is a case where the client didn't know what they needed. I really, really am hard-pressed to fathom a repository for scientific articles where they store the full text but only need to look up titles. With that kind of design, they could drop their internal DB and just use PubMed or Google Scholar... and get way better results!

    I think the author and his team failed the customer in this case by providing them with an inflexible system. Either they forced the client into accepting these horrible limitations so they could play with new (and expensive!) toys, or the client just flat-out doesn't need this database for anything (in which case it's a waste of money.) This kind of data absolutely needs to be kept in a relational database to be useful.

    Which, along with his horrible Java vs. C# comparison, makes Jeff Cogswell officially the Slashdot contributor with the worst analytical skills.

    --
    Bio questions? Ask me to start a Q&A journal. Computer analogies available for most topics!
    1. Re:Bad planning by hawguy · · Score: 4, Interesting

      That compression proved to be important due to yet another shortcoming of DynamoDB, one that nearly made me pull my hair out and encourage the team to switch back to MongoDB. It turns out the maximum record size in DynamoDB is 64K. That’s not much, and it takes me back to the days of 16-bit Windows where the text field GUI element could only hold a maximum of 64K. That was also, um, twenty years ago.

      I didn't understand why he dismissed S3 to store his documents in the first place:

      Amazon has their S3 storage, but that’s more suited to blob data—not ideal for documents

      Why wouldn't an S3 blob be an ideal place to store a document of unknown size that you don't care about indexing? Later he says "In the DynamoDB record, simply store the identifier for the S3 object. That doesn’t sound like much fun, but it would be doable" -- is storing an S3 pointer worse than deploying a solution that will fail on the first document that exceeds 64KB, at which point he'll need to come up with a scheme to split large docs across multiple records? Especially when DynamoDB storage costs 10 times more than S3 storage ($1/GB/month vs $0.095/GB/month)

  7. My migration path by Lieutenant_Dan · · Score: 5, Funny

    We decided that MongoDB was adequate but didn't leverage the synergies we were trying to harvest from our development methodologies.

    We looked at GumboDB and found it was lacking in visualization tools to create a warehouse for our data that would provide a real-time dashboard of the operational metrics we were seeking.

    Next up was SuperDuperDB which was great from a client-server-man-in-the-middle perspective but required a complex LDAP authentication matrix that reticulated splines within our identity management roadmap.

    After that I quit. I hear they are using Access 95 with VBA.

    --
    Wearing pants should always be optional.
  8. Re:Ironically, I came to the opposite conclusion by travispbrown · · Score: 4, Funny

    I'm sorry that I did not use the word "Ironically" correctly. You win internet.