Slashdot Mirror


NoSQL Document Storage Benefits and Drawbacks

Nerval's Lobster writes "NoSQL databases sometimes feature a concept called document storage, a way of storing data that differs in radical ways from the means available to traditional relational SQL databases. But what does 'document storage' actually mean, and what are its implications for developers and other IT pros? This SlashBI article focuses on MongoDB; the techniques utilized here are similar in other document-based databases."

13 of 96 comments (clear)

  1. Same as the old boss by greg1104 · · Score: 5, Interesting

    It's so cute how NoSQL developers have reinvented the XML database.

    1. Re:Same as the old boss by Nadaka · · Score: 3, Informative

      JSON is crap for storing arbitrary structured data and collections for web applications.

      In javascript you can easily construct an object that is both an "Array" and has named attributes (an associative array). However, you can't recreate that object with valid JSON.

      JSON also introduces a fantastic new method of inserting arbitrarily executing code into a web application, demanding yet another set of defenses against insertion attacks to be developed.

      It is a problem masquerading as a solution to a problem it can't actually solve.

  2. The article is barely a description of MongoDB... by Nadaka · · Score: 5, Informative

    The article is barely a description of MongoDB records. It does not really detail any real drawbacks or benefits beyond "look ma, random structure in my record!"

  3. Another NoSQL article on /. by Sarten-X · · Score: 5, Insightful

    Oh, look, it's a NoSQL article.

    Cue the hundreds of Slashdotters who proclaim "Oh, they're reinvented obsolete databases" and "Just wait until they need ACID, then they'll be fucked", the NoSQL blind-faith followers who harp about pure scalability and clustering, and at least a dozen references to an animated video of a retarded strawman saying "webscale" repeatedly.

    Somewhere in the depths of poorly-researched comments will be some guy who thinks that NoSQL is a tool that really just might be useful for particular use cases, and should be used where appropriate, and nowhere else. Sadly, his post will be missed because everyone's too busy talking about how everything can be done just as easily on a $500,000 server farm running Oracle's latest and greatest turd.

    --
    You do not have a moral or legal right to do absolutely anything you want.
    1. Re:Another NoSQL article on /. by greg1104 · · Score: 5, Informative

      Sadly, his post will be missed because everyone's too busy talking about how everything can be done just as easily on a $500,000 server farm running Oracle's latest and greatest turd.

      Actually, I was going to talk about how PostgreSQL 9.2 (expected in Q3 of this year) will include JSON support. The database also has non-relational key value storage, and that feature is even available in Heroku deployments now.

      PostgreSQL also lets you relax ACID for performance when that makes sense, at the transaction level, using synchronous_commit parameter and unlogged tables.

      There are two things PostgreSQL doesn't do as well as MongoDB. It won't do simple key/value lookups quite as fast; I normally eliminate that problem by putting a memcached server in at some level. And you can't split writes among multiple nodes easily yet.

    2. Re:Another NoSQL article on /. by Hognoxious · · Score: 3, Funny

      For the benefit of readers in the US, F1 is like Indycar but the cars can turn both right and left.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  4. Article is not useful by claytongulick · · Score: 5, Insightful

    I'm not sure what the point of this "article" is. It is light on actual information or anything useful, it's basically just a few paragraphs that say "a NoSQL database called Mongo stored data in JSON format. This may or may not work for you".

    If we're going to have "BI" articles, they should be informative, containing useful information that we couldn't have gathered ourselves in 10 secs of googling.

    How about some comparisons between various NoSQL solutions? How about binary access API v/s RESTful approach ala Couch? How about clustering, replication and scalability? How about stability concerns (with Couch, for example). Real world use cases? Examples of companies using them for specific solutions? Performance comparisons with RDBMS's? Problem domains that NoSQL/schema less DB is more suited to than a RDMBS?

    I'm not trying to be pointlessly critical here, I'm trying to provide some constructive feedback on the new slashdot BI format. This article wasn't useful to me at all. I'll probably not spend time reading these articles in the future if the content is as light as this article.

    --
    Drinking habits can be dangerous. You can choke on the cloth and the nuns will wonder where their clothes are.
  5. Unstructured Data by Bigby · · Score: 4, Interesting

    I don't know when unstructured data turned into NoSQL or Big Data, but it is a pretty simple concept with complex Enterprise level requirements. I work in this field and have for various companies. The biggest obstacle is conforming to the laws of various jurisdictions and levels of government.

    You have unstructured data, but it NEEDS some level of structure. That structure is there to restrict access to certain groups within the organization and also for retention rules, which differ by type of data being stored. Not to mention that you must store certain documents in the country of origin, so structured field-based distributed storage plays a role. Oh yea, laws/policies around encryption and whether or not an index violates those laws/policies.

    This doesn't work well with a relational database. Sure, you can jam it into a RDBMS like IBM Content Manager, but it becomes inflexible. However, there are constraints that must be followed and all documents need some kind of structure wrapped around them in a RDBMS-like fashion.

    I haven't dove into these NoSQL systems myself. They seem like a good idea, but I hesitate if they are too loose. In an Enterprise with sensitive information, you need to deny first. Also, how do they index the fields? Like when you have 100,000,000 documents with invoice numbers...

    1. Re:Unstructured Data by Bigby · · Score: 3, Interesting

      Some of those documents with invoice numbers are not invoices. In fact, they could have many invoice numbers. An invoice numbers are just an example. There is a lot of value to a company to find all documents relating to product #XYZ that was shipped to company ABC. Maybe throw some date constraints in there. And they don't want useless garbage in the results. Also, all invoices should have an invoice number. And an invoice number should have a certain pattern. Otherwise, garbage-in garbage-out.

      Also, the part where RDBMS based document storage falls flat on it's face is versioning of the schema itself. Business requirements change; they want to require a field that wasn't required before. They want to make one optional. They want to change the type or the pattern format. But the searches should still go across all those documents. NoSQL based stuff, assuming they are properly and efficiently indexed, may do better in this department.

  6. Re:The article is barely a description of MongoDB. by Moses48 · · Score: 3, Insightful

    I read this article with the hope of seeing some of the benifits and drawbacks (as the title implied). No talk of scalability, indexing, speed, etc. I actually feel dumber for having read the article.

  7. Worse than the old boss by jd · · Score: 5, Interesting

    The "old old boss" would be the CDF/NetCDF/HDF family of self-describing distributed storage solutions. They predate XML by a long way and are - I believe - the first true self-describing method of storing, indexing and searching data.

    For the most part, they support network interconnections between instances, so you can have your virtual storage distributed over as many physical systems as you like. The users will never see the difference except in terms of speed. This gives you all the benefit of NoSQL's distributed model (which XML lacks) but with several decades more development in the database design.

    But wait! There's more! If you order in the next gazillion years, you get OpENDAP absolutely free! (Which it is anyway.) OpENDAP will translate between any two data formats, so if one site wants to view the data as, say, a conventional database, another wants to look at it as a collection of spreadsheets and a third is expecting XML data, you'd have OpENDAP translate between client form and central repository form.

    I have no objections to Mongo or Memcache, they're very powerful and are very useful, but we're still ultimately talking about technology everyone else has had since 1985, thanks be to NASA, and many NoSQL technologies are really just network-aware versions of the DBM/NDBM/BDB/GDBM/QDBM family which have existed since Unix began.

    NoSQL definitely has a place - I would not want to try serving cached web data from HDF5 - and it's an important place. But that's just as true for Hierarchical Databases, Star Databases (aka "Data Warehouses"), "genuine" (ie: actually complies with Codd's rules) relational databases (SQL isn't truly relational in the Codd model, merely a subset), and so on.

    It's time we got away from one-size-fits-all ideas, which violates the Unix ethos anyway, and get back to using best solutions for specific problems rather than passable solutions that fail at everything. These are all wonderful, highly specialized solutions to highly specific problem types. Treating them as such will always produce a better answer than force-fitting solutions into not-quite-failing with problems they aren't designed for.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  8. Re:Wrong, wrong, and wrong. by gazbo · · Score: 4, Informative
    {1:"a",2:"b",3:"c","foo":"bar"}

    Sure it won't create an instance of Array, but if you're using an Array to also be an associative array then really I think JSON is the least of your worries.

  9. Re:Wrong, wrong, and wrong. by gazbo · · Score: 3, Funny

    Or if you want to be avant garde, I suppose you could begin the numbering at zero *blames wine*