Slashdot Mirror


Why My Team Went With DynamoDB Over MongoDB

Nerval's Lobster writes "Software developer Jeff Cogswell, who matched up Java and C# and peeked under the hood of Facebook's Graph Search, is back with a new tale: why his team decided to go with Amazon's DynamoDB over MongoDB when it came to building a highly customized content system, even though his team specialized in MongoDB. While DynamoDB did offer certain advantages, it also came with some significant headaches, including issues with embedded data structures and Amazon's sometimes-confusing billing structure. He offers a walkthrough of his team's tips and tricks, with some helpful advice on avoiding pitfalls for anyone interested in considering DynamoDB. 'Although I'm not thrilled about the additional work we had to do (at times it felt like going back two decades in technology by writing indexes ourselves),' he writes, 'we did end up with some nice reusable code to help us with the serialization and indexes and such, which will make future projects easier.'"

106 comments

  1. That's different... by Anonymous Coward · · Score: 2, Funny

    They must run their company pretty different than where I work.

    Where I work, the most senior and backstabby developer saddles the worst tools he can find on the rest of the team, and then blames them (behind their backs of course) for the results of his poor decision making.

  2. I don't understand by Anonymous Coward · · Score: 3, Funny

    But MongDB is web scale.

    1. Re:I don't understand by OakDragon · · Score: 4, Funny

      MongoDB ... just a pawn in the game of life.

    2. Re:I don't understand by K.+S.+Kyosuke · · Score: 1, Funny

      Haven't you seen the newest succ (succ (succ (succ (succ (succ (succ Zero)))))) movie, "The Web is not enough"?

      --
      Ezekiel 23:20
    3. Re:I don't understand by Anonymous Coward · · Score: 0

      Spanish balloons? Mongo take chance...!

    4. Re:I don't understand by tralfaz2001 · · Score: 1

      Please for the love of god tell me I'm not the only one that got this Blazing Saddles reference. Well done sir.

  3. No one cares by Anonymous Coward · · Score: 5, Insightful

    No one cares. Stop click-baiting the buzzword Slashdot sub-sites. If we wanted to go to them we would do so voluntarily.

    1. Re:No one cares by Anonymous Coward · · Score: 1, Funny

      But I want Dice to tell me all the ways in which backend specialists are critical to online games!

  4. MongoDB with ObjectRocket FTW by Anonymous Coward · · Score: 0

    ObjectRocket has a pretty awesome solution and the dudes there know their sh!t: http://www.objectrocket.com/

    1. Re:MongoDB with ObjectRocket FTW by Anonymous Coward · · Score: 0

      This post is sponsored by /dev/null Enterprises.

  5. devs and DB indexes by alen · · Score: 1

    there are two kinds

    the first creates a 10,000,000 row table with no indexes, no PK and then complains that the DBA's are dumb because the app is slow or the server is broke

    the second kind i've seen have a 100 row table, with 10 columns and 15 indexes on it. sometimes half my day is spent on deleting unused indexes created by our BI devs

    1. Re:devs and DB indexes by Anonymous Coward · · Score: 1

      Wait, so half of your day you is done by a cron job? Really? So you just go hide in a closet or take lunch while the job does these deletes? Are you French?

    2. Re:devs and DB indexes by larry+bagina · · Score: 1

      Huh? What does being bisexual have to do with it?

      --
      Do you even lift?

      These aren't the 'roids you're looking for.

    3. Re:devs and DB indexes by Anonymous Coward · · Score: 0

      If that's really the case, encourage them to use ORM. A descent one will do better job than this.

    4. Re:devs and DB indexes by CadentOrange · · Score: 1

      An ORM isn't a silver bullet. You still need to understand how your objects map onto the database or you're back to square one with a poorly performing database. In fact it's probably worse as you then need to figure out what the database *and* ORM are doing.

    5. Re:devs and DB indexes by Anonymous Coward · · Score: 0

      If your objects map to the database with any significant regularity then you're doing both layers wrong.

      ORM is a disease. Step 1 to being cured is admitting you have a problem.

  6. Fools by Anonymous Coward · · Score: 0

    Fools! Everyone knows DynamoDB isn't Web Scale.

  7. Worried about hosting data alongside others... by Anonymous Coward · · Score: 3, Insightful

    "Our client is paying less than $100 per month for the data. Yes, there are MongoDB hosting options for less than this; but as I mentioned earlier, those tend to be shared options where your data is hosted alongside other data."

    I think someone failed to explain how "the cloud" actually works.

  8. It's so ... wrong by Anonymous Coward · · Score: 5, Insightful

    Having actually RTFA, it just enforces how poorly most programmers understand relational databases and shouldn't be let near them. It's so consistently wrong it could be just straight trolling (which given it's posted to post-Taco Slashdot, is likely).

    "However, the articles also contained data less suited to a traditional database. For example, each article could have multiple authors, so there were actually more authors than there were articles."

    This is completely wrong, that's a text book case of something perfectly suited to traditional (relational) database.

    1. Re:It's so ... wrong by Anonymous Coward · · Score: 1

      NoSQL is a buzzword meaning "too dumb to understand a RDB". That's why they poorly reinvent the wheel.

    2. Re:It's so ... wrong by MightyMartian · · Score: 5, Funny

      "Those who don't understand SQL are condemned to reinvent it, poorly." (with apologies to Harry Spencer).

      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
    3. Re:It's so ... wrong by vux984 · · Score: 5, Funny

      "However, the articles also contained data less suited to a traditional database. For example, each article could have multiple authors, so there were actually more authors than there were articles."

      Good god, how would he model invoices with multiple line items? Where, you know, there were actually more line items than invoices?! Mind blown.

      Or customers that might belong to zero more demographics? There could be more customers than defined demographics to tag them with... or less... we don't even know and it could change as more of either are added!!

      We need a whole new database paradigm!

      Or the sample Northwind database that's been shipping with access since the 90's.

    4. Re:It's so ... wrong by Torvac · · Score: 5, Insightful

      "with big data comes big responsibility". i mean a few very static 100k items require a NoSQL DB solution and cloud storage ? and a full team to do this ?

    5. Re:It's so ... wrong by hey · · Score: 2

      Make a table of authors, make a linking table that joins authors to the article table.

    6. Re:It's so ... wrong by Tom · · Score: 4, Insightful

      Mod parent up.

      After a few years in other fields, I'm doing some serious coding again. Postgres and Doctrine. I can do in a few lines of code and SQL what would take a small program or module to do without the power of SQL and an ORM.

      Anyone who reinvents that wheel because he thinks he can do the 2% he recoded better is a moron.

      --
      Assorted stuff I do sometimes: Lemuria.org
    7. Re:It's so ... wrong by serviscope_minor · · Score: 2

      This is completely wrong

      No, it's completely right: the traditional way to use a database is to blob everything together in to one huge table, preferably with many NULLs, then limit your query to SELECT * FROM Table; and finally process the results directly in VB6, with bonus points for a buggy parser for unpicking comma separated fields.

      Note: he said "traditional" not "sane relational".

      Sarcasm aside, his reason for not using a relational database is that he'd need to use more than one table and then he'd have to perform joins on them, which sounds very much like saying the reason not to use SQL is because the problem fits exactly into what SQL is designed to do.

      But hey, his new solution is in the cloud so it must be better.

      --
      SJW n. One who posts facts.
    8. Re:It's so ... wrong by Anonymous Coward · · Score: 0

      His C# vs. Java article for the "real world" isn't any better. His whole argument basically starts at C# vs. Java where it seems like C# is doing better and quickly switches the topic to Tomcat vs. IIS for web development while maintaining that this is purely about C# vs Java, and declares Java the obvious winner.

    9. Re:It's so ... wrong by C10H14N2 · · Score: 1

      No, no, no, you let your tedious "DBAs" think they're right and do all that "normalization" and "tuning" shit they keep yammering on about (whatevs), then get the new shiny so you can blob the whole fucker up and never have to worry about anything but said "SELECT * FROM FOO." It's great because our developers no longer have to talk to our DBAs about "optimizing" all that dynamic SQL our webforms were generating. The DBAs are now screaming about resource utilization, but, HELLO, they're the ones who insisted on building all those freakin tables in the first place when everyone knows you just need one to throw in all the XML. Idiots.

    10. Re:It's so ... wrong by MatthiasF · · Score: 1

      For normalized databases, this is often considered a best practice, although another option would be to store multiple author IDs in the article tables—something that would require extra fields, since most articles had more than one author. That would also require that we anticipate the maximum of author fields needed, which could lead to problems down the road.

      A single field with delimited index keys pointing to an author table. I learned that in 1996. Then compressing the field with a dictionary, increasing the number of keys that can fit and speed up searches through it. Learned that in 1998.

      Why does that not work in NoSQL? I don't understand.

    11. Re:It's so ... wrong by UnknownSoldier · · Score: 1

      I know you jest but sometimes you DO want to re-write SQL. i.e. row store vs column store.

      NewSQL vs. NoSQL for New OLTP
      http://www.youtube.com/watch?v=uhDM4fcI2aI

      One Size Does Not Fit All in DB Systems
      http://www.youtube.com/watch?v=QQdbTpvjITM

    12. Re:It's so ... wrong by MightyMartian · · Score: 3, Insightful

      I jest slightly. Certainly there are applications where SQL and relational systems in general are overkill, or where they do not solve certain kinds of problems well. But I'll be frank, they're pretty rare. I will use binary search/sort mechanisms for simple hashes and other similar two column key-value problems, mainly because there's absolutely no need to truck along gazillions of bytes worth of RDBMS where quicksort and a binary search is all that is needed. But if you get beyond that, you're almost inevitably going to start wishing you had JOIN? And then you end up having to implement such functionality.

      Every tool for the job, to be sure, but I just happen to think there are far fewer problems that nosql style systems solve than some like to think.

      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
    13. Re:It's so ... wrong by tgd · · Score: 2

      Having actually RTFA, it just enforces how poorly most programmers understand relational databases and shouldn't be let near them. It's so consistently wrong it could be just straight trolling (which given it's posted to post-Taco Slashdot, is likely).

      "However, the articles also contained data less suited to a traditional database. For example, each article could have multiple authors, so there were actually more authors than there were articles."

      This is completely wrong, that's a text book case of something perfectly suited to traditional (relational) database.

      Well, based on how many things are wrong in the Java vs C# comparison, too, one can only guess that the "software developer" is just some hack who is comped by Slashdot to drive clicks to their sub-sites.

      Man this place has really gone to shit in the last year -- just a waste of time to read. Sucks its hard to break 15 years of habit ...

    14. Re:It's so ... wrong by PRMan · · Score: 1, Insightful

      Yeah. Going without ORM you typically get a minimum of 50% better.

      --
      Peter predicted that you would "deliberately forget" creation 2000 years ago...
    15. Re:It's so ... wrong by Anonymous Coward · · Score: 0

      I couldn't but laugh and sigh at both the article and the submission. Kids these days...

    16. Re:It's so ... wrong by Fnord666 · · Score: 1

      We need a whole new database paradigm!

      Wait, don't you just draw a different arrow on the end of the line joining the two tables and the rest happens automatically?

      --
      'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
    17. Re:It's so ... wrong by MurukeshM · · Score: 1

      Oh, that moron? 10 minutes wasted checking the comments to see if TFA is worth reading..

    18. Re:It's so ... wrong by Nbrevu · · Score: 1

      Every tool for the job, to be sure, but I just happen to think there are far fewer problems that nosql style systems solve than some like to think.

      I strongly agree with this, and because of that I've been severely chastised by quite a few kool-aid drinkers. On my current job we have a NoSQL database (a MongoDB one, actually) and we indeed have had to reinvent some SQL here and there, including a few manual joins. The job would just have been far smoother (and faster to develop), and surely more performant, if we used a well-established SQL database, but someone decided that it wasn't buzzwordy enough.

    19. Re:It's so ... wrong by cyber-vandal · · Score: 1

      Entity Framework is good when you use it with a properly designed database. It saves a lot of work which, correct me if I'm wrong, is the whole point of computers. There are so many times that people forget that very simple fact in their rush to wave their e-peen around.

    20. Re:It's so ... wrong by Tom · · Score: 2

      There's "wrong" and there's wrong.

      I'm pretty sure that my coding does not satisfy some theoretical top-of-the-mountain coding structure fanatics. But that is "wrong" in the sense that it does not satisfy opinions. And when it comes to coding styles, in the end they're just opinions and ten years from now we'll laugh about most of todays patterns.

      And then there is programmatically correct, not unnecessarily wasteful with resources and easy to understand. Those are no opinions - your code either gives the right result or it doesn't, it either uses resources well or it doesn't, it can be understood or it can't. There's a bit of grey area regarding just how easy or how wasteful is ok, so it's not binary, but it's measurable along an objective axis.

      --
      Assorted stuff I do sometimes: Lemuria.org
  9. Am I the only one by spatley · · Score: 0, Offtopic

    that is getting sick of this content-free, slashdot echo chamber, clickcrack stuff. Hey Slashdot, why do you need whole nuther site to post original articles? And why do those articles make such a deafening sucking sound?
    Problem is that I would be interested in a reasoned look at MongoDB v Dynamo but my experience with http://slashdot.org/topic/bi/ is not to waste my time by reading TFA.

  10. So the gist of the article is..... by f-bomb · · Score: 4, Informative

    MongoDB would have been perfect based on the structure of the data, but the client didn't want to pay for setup and hosting costs, DynamoDB was the cheaper alternative, but more of a pain in the ass to implement. Makes we wonder if the hosting cost savings offset the additional development time.

    --
    Everyone should believe in something. I believe I'll have another beer.....
  11. Question from relational-land by mcmonkey · · Score: 4, Informative

    As someone whose work and thinking are firmly planted in traditional RDMS, a few of those decisions did not make sense.

    I understand what he's saying about normalized tables for author, keywords, and categories. But then when he has to build and maintain index tables for author, keyword, and categories, doesn't that negate any advantage of not having those tables?

    I understand he's designed things to easy retrieval of articles, but it seems the trade-offs on other functions are too great. It's nice an author's bio is right there in the article object, but when it's time to update the bio, that does mean going through and touching every article by that author?

    I've I got a bunch of similar examples, and I would not be at all surprised if they all boiled down to 'I don't understand what this guy is doing,' but basically, isn't NoSQL strength in dealing with dynamic content and in this example, serving static articles, the choice between NoSQL and traditional RDMS essentially up to personal preference?

    1. Re:Question from relational-land by Anonymous Coward · · Score: 1

      Maybe you should factor in the usage pattern and instance counts as well.

      Someone's bio might appear in how many articles? A few hundred? And how often will the bio be updated? A couple of times a year? So, updating a bio comes down to touching a few hundred records a few times a year. Compare that with thousands of accesses per day and you've suddenly tipped the scale.

    2. Re:Question from relational-land by ranton · · Score: 5, Insightful

      Don't try to actually make sense of the decisions made in the article. I am glad that he summed up all of the reasons why he didn't go with a relational database early in the article, so I didn't have to bother reading the rest. I am an advocate of NoSQL, but this whole article is describing a project that is almost perfect for a relational database.

      But considering this author's previous analysis of Java vs C#, I am not surprised that this article was hardly worth the time to read.

      --
      -- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
    3. Re:Question from relational-land by Anonymous Coward · · Score: 1

      So... what you're saying is that the application needs a materialized view after benchmarks show that joining against the authors table is a performance bottleneck?

    4. Re:Question from relational-land by ranton · · Score: 4, Insightful

      Oh come on now. Play fair. If you start throwing around advanced database features like materialized views then you will immediately invalidate 90% of the use cases commonly used for choosing NoSQL over relational databases. That is just mean.

      --
      -- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
    5. Re:Question from relational-land by mcmonkey · · Score: 3, Informative

      Maybe you should factor in the usage pattern and instance counts as well.

      Someone's bio might appear in how many articles? A few hundred? And how often will the bio be updated? A couple of times a year? So, updating a bio comes down to touching a few hundred records a few times a year. Compare that with thousands of accesses per day and you've suddenly tipped the scale.

      That's exactly the sort of answer I was looking for. Thank you. (Actually, I'd expect most bios get updated only a handful of times over the life of the author. You start with first publications as a grad student, then you leave school, maybe change jobs a couple of times, maybe a few notable achievements, then the author dies.)

      That is the sort of design considerations I'd like to read about. That would give a useful comparison between platforms. As it is, this article boils down to "I went NoSQL over RDMS, because...well, just because. I went Amazon over something else because it's easier for my idiot client to administer."

    6. Re:Question from relational-land by Anonymous Coward · · Score: 0

      (I'm not a professional developer, just an observer from the sidelines.)

      So far the only sound use cases I've heard for NoSQL are things like Amazon and Google where:

      1. They have very large data sets.
      2. They have management that can make an educated business decision about what kind of guarantees they do and don't need to make to their customers.
      3. They have people inhouse who have a strong understanding of the underlying CS trade offs in the design of database systems who can see how to maximize performance while still maintaining the right guarantees.

      Outside of that, every pitch for NoSQL I've heard sounds like people are getting in way over their heads and won't realize it until way too late.

    7. Re:Question from relational-land by frank_adrian314159 · · Score: 1

      It's nice an author's bio is right there in the article object, but when it's time to update the bio, that does mean going through and touching every article by that author?

      Actually, you don't update the biographical information for an article. The biographical information in the article is supposed to reflect the biographical information for the author at the time at which the article is published. When you update the biographical information, it goes into any articles published after the bio is updated. Unless, of course, you want to have a completely different paradigm of publishing than that established in the days of hard copy (which may be a good thing, but is not what is done now). In fact, previous employers for the author may get quite irate that research funded and published by their institution no longer mentions the same because the author has moved on.

      No, it's not as simple as it looks. Thanks for asking...

      --
      That is all.
    8. Re:Question from relational-land by Anonymous Coward · · Score: 0

      I have no experience in NOSQL, but it is just a glorified Key/Value store. You could just use a reference to that user's bio instead of embedding it. The key could be something like the UserID+"Bio". I am not sure the preferred method to create a object reference that doesn't change based on the value, but I'm sure this would work.

    9. Re:Question from relational-land by MurukeshM · · Score: 1

      Ars Technica follows the non-traditional way, and personally, only nostalgia would be a reason to retain the original bio.

    10. Re:Question from relational-land by ScriptedReplay · · Score: 1

      Someone's bio might appear in how many articles? A few hundred? And how often will the bio be updated? A couple of times a year? So, updating a bio comes down to touching a few hundred records a few times a year. Compare that with thousands of accesses per day and you've suddenly tipped the scale.

      That would make sense if you had to pull bios with an article, which should hardly be the case. At most, you'd have to pull in current authors' affiliations. A bio would ideally stay behind an author link, and be pulled in quite rarely. I for one would much rather have a list of authors immediately followed by the abstract than having to move through several pages of biographies for an article with 4-5 authors in order to find the abstract an the actual article. So for me the decision to put every bio in every article looked like a poorly researched one. YMMV and all that.

    11. Re:Question from relational-land by adnonsense · · Score: 1

      Don't try to actually make sense of the decisions made in the article. I am glad that he summed up all of the reasons why he didn't go with a relational database early in the article, so I didn't have to bother reading the rest. I am an advocate of NoSQL, but this whole article is describing a project that is almost perfect for a relational database.

      Heck yeah, it reminds me of a project I did in 2004 or 2005, which stored over a hundred thousands of articles (some of them more than 64Kb!) with multiple authors, keywords and other fancy schmancy stuff. I've no idea what "a good amount of traffic from a niche group of scientists and researchers means in real terms, but the system I put together was getting something like 40,000 unique vistors a day, running off some not particularly spectacular hardware (this was a time when 1GB was a lot of memory). As there was no NoSQL back then, I had to "make do" with a proper relational database (PostgreSQL), which wasn't exactly a speed demon at the time, but very kindly took care of things like indexes and keeping things in sync (aka "relational integrity") leaving me free to concentrate on optimizing the whole stack. Oh yes, it was only me on the "team". And I managed to bodge a Lucene-based search system into the setup (as PostgreSQL's full-text search was a bit sucky).

      I suppose what with it being 2013 and such, it would be possible to push it into the cloud and squeeze in some JSONy bits as well if necessary

      .

      Kids of today, eh...

    12. Re:Question from relational-land by godefroi · · Score: 1

      Oracle's "snapshots" were renamed to "materialized views" in 1999, MSSQL gained "indexed views" in 2005, MongoDB "began development" in 2007.

      Doomed to reinvent it, indeed.

      --
      Karma: Poor (Mostly affected by lame karma-joke sigs)
    13. Re:Question from relational-land by godefroi · · Score: 1

      You know, I could chop off the pinky toe of my left foot, I mean, I only use it a couple times a year!

      --
      Karma: Poor (Mostly affected by lame karma-joke sigs)
    14. Re:Question from relational-land by godefroi · · Score: 1

      In my opinion, you must have a VERY good reason before even considering giving up ACID transactions. If your RDBMS isn't fast enough, almost certainly it's because you're doing it wrong, not because there's anything fundamentally wrong with the tool.

      Those who do RDBMS wrong usually do NoSQL wrong too. Shocker, I know.

      --
      Karma: Poor (Mostly affected by lame karma-joke sigs)
  12. Bad planning by Samantha+Wright · · Score: 5, Interesting

    Throughout the article the client says they don't want full-text search. The author says he can "add it later," then compresses the body text field. Metadata like authorship information is also stored in a nasty JSON format—so say goodbye to being able to search that later, too!

    About that compression...

    That compression proved to be important due to yet another shortcoming of DynamoDB, one that nearly made me pull my hair out and encourage the team to switch back to MongoDB. It turns out the maximum record size in DynamoDB is 64K. That’s not much, and it takes me back to the days of 16-bit Windows where the text field GUI element could only hold a maximum of 64K. That was also, um, twenty years ago.

    Which is a limit that, say, InnoDB in MySQL also has. So, let's tally it up:

    • There's no way at all to search article text.
    • Comma-separated lists must be parsed to query by author name.
    • The same applies to keywords...
    • And categories...

    So what the hell is this database for? It's unusable, unsearchable, and completely pointless. You have to know the title of the article you're interested in to query it! It sounds, honestly, like this is a case where the client didn't know what they needed. I really, really am hard-pressed to fathom a repository for scientific articles where they store the full text but only need to look up titles. With that kind of design, they could drop their internal DB and just use PubMed or Google Scholar... and get way better results!

    I think the author and his team failed the customer in this case by providing them with an inflexible system. Either they forced the client into accepting these horrible limitations so they could play with new (and expensive!) toys, or the client just flat-out doesn't need this database for anything (in which case it's a waste of money.) This kind of data absolutely needs to be kept in a relational database to be useful.

    Which, along with his horrible Java vs. C# comparison, makes Jeff Cogswell officially the Slashdot contributor with the worst analytical skills.

    --
    Bio questions? Ask me to start a Q&A journal. Computer analogies available for most topics!
    1. Re:Bad planning by mcmonkey · · Score: 3, Interesting

      Which, along with his horrible Java vs. C# comparison, makes Jeff Cogswell officially the Slashdot contributor with the worst analytical skills.

      OK, that's what I thought. Well, first, for anyone who hasn't read or doesn't remember that "Java vs. C#" thing, don't go back and read it now. Save your time, it's horrible.

      Now, for the current article, isn't designing a database all about trade-offs? E.g. Indexes make it easier to find stuff, but then make extra work (updating indexes) when adding stuff. It's about balancing reading and writing, speed and maintenance, etc. And it seems like this guy has only thought about pulling out a single article to the exclusion of everything else.

      Do we just not understand DynamoDB? How does this system pull all the articles by a certain author or with a certain keyword? What if they need to update an author's bio? With categories stored within the article object, how does he enforce integrity, so all "general relativity" articles end up with "general relativity" and not a mix of GR, Gen Rel, g relativity, etc?

      What happens when they want to add full text search? Or pictures to articles? That 64k limit would seem like a deal breaker. 64k that includes EVERYTHING about an article--abstract, full text, authors and bios, etc.

      My first thought was, this does not make much sense. Then I thought, well, I work with old skool RDMS, and I just don't get NoSQL. But now I think, naw, this guy really doesn't know enough to merit the level of attention his blatherings get on /.

    2. Re:Bad planning by hawguy · · Score: 4, Interesting

      That compression proved to be important due to yet another shortcoming of DynamoDB, one that nearly made me pull my hair out and encourage the team to switch back to MongoDB. It turns out the maximum record size in DynamoDB is 64K. That’s not much, and it takes me back to the days of 16-bit Windows where the text field GUI element could only hold a maximum of 64K. That was also, um, twenty years ago.

      I didn't understand why he dismissed S3 to store his documents in the first place:

      Amazon has their S3 storage, but that’s more suited to blob data—not ideal for documents

      Why wouldn't an S3 blob be an ideal place to store a document of unknown size that you don't care about indexing? Later he says "In the DynamoDB record, simply store the identifier for the S3 object. That doesn’t sound like much fun, but it would be doable" -- is storing an S3 pointer worse than deploying a solution that will fail on the first document that exceeds 64KB, at which point he'll need to come up with a scheme to split large docs across multiple records? Especially when DynamoDB storage costs 10 times more than S3 storage ($1/GB/month vs $0.095/GB/month)

    3. Re:Bad planning by Anonymous Coward · · Score: 1

      The AWS platform and the ease of scaling it offers. The application can actually scale itself with their API. I know you can scale *sql horizontally, but you cant argue that its easier.

      Fom TFA:
      "Our client said they didn't need a full-text search on the text or abstract of the documents; they only cared about people searching keywords and categories. That’s fine—we could always add further search capabilities later on, using third-party indexing and searching tools such as Apache Lucene.
      slashdot (http://s.tt/1A3VL)"

      Consider a typical website that needs text search. Would you implement text search yourself with your nicely normalized database? or do you just denormalize the data and store it in a database specialized maintained and developed for years, like Apache SOLR, or Lucene like he mentions? My point is its quite common to duplicate your data across multiple specialized db backends. This is easier with the NoSQL paradigm because you don't need to normalize your data. Concurrency is the price you pay. For an application centered around scientific articles, concurrency understandably isn't a priority.

    4. Re:Bad planning by David+Off · · Score: 1

      Interesting analysis.

      I've been messing around writing my own Java NoSQL CMS called Magneato. It stores articles in XML because I use XForms for the front end (maybe a bad choice but there isn't a good forms solution yet, not even with HTML5) and I use Lucene/Bobo for the navigation and search side of things. It is focussed on facetted navigation although you can have relations between articles: parent of, sibling etc via Lucene.

      It actually sounds like my efforts are better than this team have produced.

    5. Re:Bad planning by Anonymous Coward · · Score: 0

      Which, along with his horrible Java vs. C# comparison, makes Jeff Cogswell officially the Slashdot contributor with the worst analytical skills.

      Yes, now that Jon Katz has disappeard.

    6. Re:Bad planning by Samantha+Wright · · Score: 1

      What functionality is DynamoDB providing in this context that Lucene wouldn't? And what the hell is the client going to do with the database before Lucene is put into place?

      --
      Bio questions? Ask me to start a Q&A journal. Computer analogies available for most topics!
    7. Re:Bad planning by gargleblast · · Score: 1

      No no. Jeff Cogswell first man ever whip MongoDB. MongoDB impressed.

    8. Re:Bad planning by Anonymous Coward · · Score: 0

      I'm assuming Dynamo meets his requirements as far as storing and fetching data, and as far as I can tell, the main advantage Dynamo offers is zero maintenance and it scales automatically. After skimming the product details ( which I doubt you did ) I see that it integrates with hadoop so you can use mapreduce to query it and get pretty much whatever you need in any given context out of it. That's actually pretty nice! As a developer who is forced to do DBA work at a startup, its certainly attractive to me. You can theoretically use Lucene as a database, but that would be like using sql as a text search engine, and isn't optimal.

      SQL is great, and if you decide you only want one db backend its a safe choice, but in the end the choice to use only one database limits your options and the features you can provide. There is a reason NoSQL is translated to 'not ONLY sql'.

      If they decide to implement search later they have the choice to use Lucene or SOLR and maintain it themselves, or they can use amazons search thing which offers the same scalability and is maintenance free.

  13. This article is garbage by JDG1980 · · Score: 0

    TL;DR: Jeff Cogswell doesn't understand how relational databases work. Or "the cloud", for that matter.

    1. Re:This article is garbage by Anonymous Coward · · Score: 0, Offtopic

      So Slashdot BI has value. It helps identify authors who don't know what they're talking about.

  14. Ironically, I just came to the opposite conclusion by Anonymous Coward · · Score: 0

    http://travispbrown.com/post/43167533260/a-tale-of-two-databases-dynamodb-and-mongodb

  15. Bad Choice by Anonymous Coward · · Score: 0

    Mongo has more punch

  16. My migration path by Lieutenant_Dan · · Score: 5, Funny

    We decided that MongoDB was adequate but didn't leverage the synergies we were trying to harvest from our development methodologies.

    We looked at GumboDB and found it was lacking in visualization tools to create a warehouse for our data that would provide a real-time dashboard of the operational metrics we were seeking.

    Next up was SuperDuperDB which was great from a client-server-man-in-the-middle perspective but required a complex LDAP authentication matrix that reticulated splines within our identity management roadmap.

    After that I quit. I hear they are using Access 95 with VBA.

    --
    Wearing pants should always be optional.
    1. Re:My migration path by mcmonkey · · Score: 2

      After that I quit. I hear they are using Access 95 with VBA.

      I think you're trying to be funny (or at least sarcastic) but the last time I worked on a system that stored multiple values in a field as delimted string--as this guy proposes storing mutiple authors and keywords--was for a late 90s dotcom running a web site off of an Access 97 mdb.

    2. Re:My migration path by Anonymous Coward · · Score: 0

      I think you're trying to be funny (or at least sarcastic) but the last time I worked on a system that stored multiple values in a field as delimted string--as this guy proposes storing mutiple authors and keywords--was for a late 90s dotcom running a web site off of an Access 97 mdb.

      We used to do that but we found the locking issues on an Access 97 database to be completely unacceptable and the install dependencies to limiting in the cloud we've moved our tree house to. Since then we've implemented a new raw file based that doesn't have the nasty performance overhead of locking systems and integrity and is really simple to implement

      System.IO.File.ReadAllLines(SettingsConfig.Instance.DatabaseFile).Contains("search query")

      We've got it implemented in about every platform including XSLT but those are only for paying users of our "pro" product.

  17. Ironically, I came to the opposite conclusion by travispbrown · · Score: 2
    1. Re:Ironically, I came to the opposite conclusion by Anonymous Coward · · Score: 1

      Where's the irony exactly?

    2. Re:Ironically, I came to the opposite conclusion by travispbrown · · Score: 1

      I guess it's ironic that we both recently posted articles that documented our thought process on choosing a NoSQL database, but coming to an opposite conclusion. Apologies, maybe it was just coincidental?

    3. Re:Ironically, I came to the opposite conclusion by sexconker · · Score: 1

      Where's the irony exactly?

      Unless Travis Brown ejaculated while reaching his conclusion, there is none.
      http://www.youtube.com/watch?v=WY_amJ0YZrM

    4. Re:Ironically, I came to the opposite conclusion by travispbrown · · Score: 4, Funny

      I'm sorry that I did not use the word "Ironically" correctly. You win internet.

    5. Re:Ironically, I came to the opposite conclusion by Jeng · · Score: 1

      Did you submit it as an article here?

      If not please do.

      --
      Don't know something? Look it up. Still don't know? Then ask.
    6. Re:Ironically, I came to the opposite conclusion by Anonymous Coward · · Score: 1

      Yes, there is nothing ironic about that at all. Even when applying the retarded definition of irony popularized by Alanis Morissette.

    7. Re:Ironically, I came to the opposite conclusion by travispbrown · · Score: 1

      Thanks! I just submitted it. My first submission to slashdot.

    8. Re:Ironically, I came to the opposite conclusion by Anonymous Coward · · Score: 0

      Travis's expectation is that he made the "right" choice, and one of the products is better. This article postulates otherwise, which is against his expectations. That, my friend, is called irony. A clash between expectations and reality. Would you like some more help understanding this?

    9. Re:Ironically, I came to the opposite conclusion by Jeng · · Score: 1

      I gave it a bump in the firehose, but who knows if it will make it to the main page.

      --
      Don't know something? Look it up. Still don't know? Then ask.
    10. Re:Ironically, I came to the opposite conclusion by Anonymous Coward · · Score: 0

      That's not irony.

    11. Re:Ironically, I came to the opposite conclusion by Anonymous Coward · · Score: 0

      Merriam Webster, especially (3a):

      incongruity between the actual result of a sequence of events and the normal or expected result

      If you truly were not aware of the definition of irony, let me suggest the irony of your attempt to correct other's use of the word.

      *You might argue that it was not ironic to you because you had no such expectations. But, you would still be unable to argue that it was not ironic to Travis; his use of the word remains correct.

      **captcha: commando

    12. Re:Ironically, I came to the opposite conclusion by Anonymous Coward · · Score: 0

      Whatever, Travis. Stop defending yourself as AC. It's pretty lame.

  18. Related: Why waffle makers are better than puppies by Anonymous Coward · · Score: 0

    We decided we wanted something comforting, so naturally we chose a waffle maker.

    Is it just me, or is anyone else tired of seeing authors trying to pass off stuff like that as reasoning?

  19. What? by Anonymous Coward · · Score: 0

    This guy is seriously throwing all his data into one comma delimited field? What's the database for again?

    1. Re:What? by Desler · · Score: 1

      His solution becomes web scale.

  20. Cogswell has the analytical skills of a wet noodle by Anonymous Coward · · Score: 0

    Why does slashdot keep giving him exposure?

  21. Comment removed by account_deleted · · Score: 3, Insightful

    Comment removed based on user account deletion

  22. Reinventing the wheel by Anonymous Coward · · Score: 0

    Re: "at times it felt like going back two decades in technology by writing indexes ourselves"

    More like double that, to four decades. Custom written index maintenance code? Really!? This is no kind of positive recommendation for DynamoDB, more like an indictment of it.

  23. Re:The crux of the entire article... by Anonymous Coward · · Score: 0

    MongoDB is free, if you dont care about "vendor" support. There's certainly a big enough community around MongoDB where 99% of your problems can be answered by simply googling it. Fail on the authors part.

  24. so what he is saying is by Anonymous Coward · · Score: 0

    So what he is saying is: the tools aren't mature, you have to re-invent the wheel with them, the wheel they invented is ok, but you will have to invent your own wheel. Somehow, that's all good though, you should try to follow what we did instead of using mature tools that already exist to build web infrastructure. We have religion about some of the products we use, and hope you will pick some of the tools we used for the same irrational reasons we used. It will take more time, cost more and won't get you any further ahead, but you might feel warm and fuzzy inside afterwards. On the other hand, you might not get as far as us, there is no code sharing so you are all on your own, and the time delays and extra costs might just kill your business/idea. This is news for nerds, just not good news for nerds. More like a cautionary tale.

  25. I just sneezed into my punch cards by adnonsense · · Score: 2

    FTFA:

    We weren't thrilled about this, because writing your own indexes can be problematic. Any time we stored a document, we would have to update the index. That's fine, except if anything goes wrong in between those two steps, the index would be wrong. However, the coding wouldnâ(TM)t be too terribly difficult, and so we decided this wouldn't be a showstopper. But just to be sure, we would need to follow best practices, and include code that periodically rebuilds the indexes.

    Hello, I'm a time traveller from 1973 where I've been fondly imagining you folks in the future had written software to solve this kind of problem in a more generic fashion. Back in the past we have some visionary guy by the name of Codd, and in my wilder dreams I sometimes imagine by the year 2000 someone has created some kind of revolutionary database software which is based on his "SEQUEL" ideas and does fancy stuff like maintaining its own indexes.

    Then I wake up and realise it was just a flight of fantasy.

  26. Why this approach? by Anonymous Coward · · Score: 0

    tens of thousands of articles

    What a modest amount of data. Why not just import them into a robust open source CMS?

    The client was a small organization with only six employees and a tight budget.

    Why spend their budget researching and developing a custom DB solution, which will probably be impossible for anyone else to support or extend?

    their Website received a good amount of traffic from a niche group of scientists and researchers

    How much traffic? Since articles and metadata hardly change very often, why bother doing the optimizations for structural performance in your DB? Varnish or any other cache engine will handle that for you.

    It's probably a very fast scalable DB solution, but I am under the impression that clients want robust, cheap, easy to maintain and use products and not databases or other similar technologies or tools we developers choose among.

    I wonder if the team also hand coded modules for authentication, wysiwyg editors, metadata editors, translitteration and security features etc. within the tight budget

    I'm all for trying out all kinds of engines, but unless the client specifically asked for a hand coded database experiment I'm sure they are in for unpleasant surprises in the near future