Why My Team Went With DynamoDB Over MongoDB

← Back to Stories (view on slashdot.org)

Why My Team Went With DynamoDB Over MongoDB

Posted by timothy on Thursday February 21, 2013 @07:04AM from the mostly-for-the-web-scale-of-it dept.

Nerval's Lobster writes "Software developer Jeff Cogswell, who matched up Java and C# and peeked under the hood of Facebook's Graph Search, is back with a new tale: why his team decided to go with Amazon's DynamoDB over MongoDB when it came to building a highly customized content system, even though his team specialized in MongoDB. While DynamoDB did offer certain advantages, it also came with some significant headaches, including issues with embedded data structures and Amazon's sometimes-confusing billing structure. He offers a walkthrough of his team's tips and tricks, with some helpful advice on avoiding pitfalls for anyone interested in considering DynamoDB. 'Although I'm not thrilled about the additional work we had to do (at times it felt like going back two decades in technology by writing indexes ourselves),' he writes, 'we did end up with some nice reusable code to help us with the serialization and indexes and such, which will make future projects easier.'"

3 of 106 comments (clear)

Min score:

Reason:

Sort:

Bad planning by Samantha+Wright · 2013-02-21 07:57 · Score: 5, Interesting
Throughout the article the client says they don't want full-text search. The author says he can "add it later," then compresses the body text field. Metadata like authorship information is also stored in a nasty JSON format—so say goodbye to being able to search that later, too!
About that compression...

That compression proved to be important due to yet another shortcoming of DynamoDB, one that nearly made me pull my hair out and encourage the team to switch back to MongoDB. It turns out the maximum record size in DynamoDB is 64K. That’s not much, and it takes me back to the days of 16-bit Windows where the text field GUI element could only hold a maximum of 64K. That was also, um, twenty years ago.
Which is a limit that, say, InnoDB in MySQL also has. So, let's tally it up:
- There's no way at all to search article text.
- Comma-separated lists must be parsed to query by author name.
- The same applies to keywords...
- And categories...
So what the hell is this database for? It's unusable, unsearchable, and completely pointless. You have to know the title of the article you're interested in to query it! It sounds, honestly, like this is a case where the client didn't know what they needed. I really, really am hard-pressed to fathom a repository for scientific articles where they store the full text but only need to look up titles. With that kind of design, they could drop their internal DB and just use PubMed or Google Scholar... and get way better results!
I think the author and his team failed the customer in this case by providing them with an inflexible system. Either they forced the client into accepting these horrible limitations so they could play with new (and expensive!) toys, or the client just flat-out doesn't need this database for anything (in which case it's a waste of money.) This kind of data absolutely needs to be kept in a relational database to be useful.
Which, along with his horrible Java vs. C# comparison, makes Jeff Cogswell officially the Slashdot contributor with the worst analytical skills.
--
Bio questions? Ask me to start a Q&A journal. Computer analogies available for most topics!
1. Re:Bad planning by mcmonkey · 2013-02-21 08:15 · Score: 3, Interesting
  
  Which, along with his horrible Java vs. C# comparison, makes Jeff Cogswell officially the Slashdot contributor with the worst analytical skills.
  OK, that's what I thought. Well, first, for anyone who hasn't read or doesn't remember that "Java vs. C#" thing, don't go back and read it now. Save your time, it's horrible.
  Now, for the current article, isn't designing a database all about trade-offs? E.g. Indexes make it easier to find stuff, but then make extra work (updating indexes) when adding stuff. It's about balancing reading and writing, speed and maintenance, etc. And it seems like this guy has only thought about pulling out a single article to the exclusion of everything else.
  Do we just not understand DynamoDB? How does this system pull all the articles by a certain author or with a certain keyword? What if they need to update an author's bio? With categories stored within the article object, how does he enforce integrity, so all "general relativity" articles end up with "general relativity" and not a mix of GR, Gen Rel, g relativity, etc?
  What happens when they want to add full text search? Or pictures to articles? That 64k limit would seem like a deal breaker. 64k that includes EVERYTHING about an article--abstract, full text, authors and bios, etc.
  My first thought was, this does not make much sense. Then I thought, well, I work with old skool RDMS, and I just don't get NoSQL. But now I think, naw, this guy really doesn't know enough to merit the level of attention his blatherings get on /.
2. Re:Bad planning by hawguy · 2013-02-21 08:36 · Score: 4, Interesting
  
  That compression proved to be important due to yet another shortcoming of DynamoDB, one that nearly made me pull my hair out and encourage the team to switch back to MongoDB. It turns out the maximum record size in DynamoDB is 64K. That’s not much, and it takes me back to the days of 16-bit Windows where the text field GUI element could only hold a maximum of 64K. That was also, um, twenty years ago.
  I didn't understand why he dismissed S3 to store his documents in the first place:
  
  Amazon has their S3 storage, but that’s more suited to blob data—not ideal for documents
  Why wouldn't an S3 blob be an ideal place to store a document of unknown size that you don't care about indexing? Later he says "In the DynamoDB record, simply store the identifier for the S3 object. That doesn’t sound like much fun, but it would be doable" -- is storing an S3 pointer worse than deploying a solution that will fail on the first document that exceeds 64KB, at which point he'll need to come up with a scheme to split large docs across multiple records? Especially when DynamoDB storage costs 10 times more than S3 storage ($1/GB/month vs $0.095/GB/month)