Slashdot Mirror


Making Sense of the NoSQL Standouts

snydeq writes "InfoWorld's Peter Wayner provides an overview of the more compelling NoSQL data stores on offer today in hopes of helping IT pros get started experimenting with these powerful tools. From Cassandra, to MongoDB, to Neo4J, each appears geared for a particular set of application types, providing DBAs with a wealth of opportunity for experimentation, and a measure of confusion in finding the right tool for their environment. 'There are great advantages to this Babelization if the needs of your project fit the abilities of one of the new databases. If they line up well, the performance boosts can be incredible because the project developers aren't striving to build one Dreadnought to solve every problem,' Wayner writes. 'The experimentation is also fun because the designers don't feel compelled to make sure their data store is a drop-in replacement that speaks SQL like a native.'"

2 of 152 comments (clear)

  1. not worth reading by rla3rd · · Score: 5, Informative

    Don' t bother reading this fluff. Wikipedia offers a better overview. http://en.wikipedia.org/wiki/NoSQL. Oh I forgot, this is slashdot, no one here reads the articles :).

    1. Re:not worth reading by doublebackslash · · Score: 4, Informative

      The abridged version:
      Atomicity: actions or sets of actions complete or they don't. No half states. Ever.
      Consistency: The database has rules. Rules like, "this can only be X when X exists in this other table" or "You cannot put a picture of a jabberwocky in this column." The rules are always obeyed even if one transaction fails. The DB itself will still be clean.
      Isolation: Everything accessing the DB views it as if it were the only thing accessing the DB.
      Durability: If the DB tells you it happened that means that you could yank the network jack, axe the power, or any other Bad Thing(tm) and so long as the disks are still there and intact your data also will be.

      That is SQL. NoSQL: Pick three, or two.

      Is it faster? You bet your ass it is. The limitations are, generally, that the DB won't do things like JOINs for you, or perhaps you have to deal with the idea of a half state, etc. Aside from ACID guarantees being, generally, broken the DB might act more as a key->value lookup (think a dictionary or encyclopedia, but with data). It might not have rigid fixed columns (some SQL databases do this too, but it is not a standard feature and generally comes with more cost vs a NoSQL that offers it).

      NoSQL is useful, though, if you have a tremendous (REALLY REALLY huge, I mean it has to be worth it!) data set or some strange demanding special need. Some things don't need isolation because the actions are intrinsically isolated (Slashdot comments, for example, are just appended and only one column needs to be mutated (the moderation)) . Durability might not need to be met at the disk level, you might be comfortable with writing it to two node's memory (Cassandra even lets you return after it is in the target node's memory and after it has been flushed to the network send buffer. You know, to kill those pesky nanoseconds of latency). If your nodes are good and isolated this might be fine. Atomicity might not be a big deal.... though I can't think of any that don't provide THIS. Atomicity is really rather important almost everywhere. Getting rid of fixed tables or "relations" (foreign keys) makes consistency a non-issue. Consistency is one of the first things to be tinkered with in most of these NoSQL things, though it is not 100% gone (still can't put that jabberwocky in that int column!)

      So by trading off some guarantees for a more simplistic DB one can gain speed and some degree of burden can be lifted from the programmer to work within the confines of that guarantee system. However, an ACID SQL system is universal (can store anything and meet any guarantees you require, but not necessarily quickly). NoSQL systems only work for some workloads and requirements. Almost (but not quite) anything can be shoehorned into them but weather it is a good idea remains a question to ask before you dive right in. If you can see gain from NoSQL then it might be a good idea, but don't paint yourself into a corner where you trade a working system of moderate speed for a blazingly fast system that has subtle (or blatant!) flaws which effect your company or customers.

      Hope that helps!

      --
      md5sum /boot/vmlinuz
      d41d8cd98f00b204e9800998ecf8427e /boot/vmlinuz