Slashdot Mirror


Making Sense of the NoSQL Standouts

snydeq writes "InfoWorld's Peter Wayner provides an overview of the more compelling NoSQL data stores on offer today in hopes of helping IT pros get started experimenting with these powerful tools. From Cassandra, to MongoDB, to Neo4J, each appears geared for a particular set of application types, providing DBAs with a wealth of opportunity for experimentation, and a measure of confusion in finding the right tool for their environment. 'There are great advantages to this Babelization if the needs of your project fit the abilities of one of the new databases. If they line up well, the performance boosts can be incredible because the project developers aren't striving to build one Dreadnought to solve every problem,' Wayner writes. 'The experimentation is also fun because the designers don't feel compelled to make sure their data store is a drop-in replacement that speaks SQL like a native.'"

21 of 152 comments (clear)

  1. One page by just_another_sean · · Score: 2

    less ads.

    Print version

    --
    Creationist Textbook Stickers Declared Unconstitutional by CowboyNeal
  2. not worth reading by rla3rd · · Score: 5, Informative

    Don' t bother reading this fluff. Wikipedia offers a better overview. http://en.wikipedia.org/wiki/NoSQL. Oh I forgot, this is slashdot, no one here reads the articles :).

    1. Re:not worth reading by houstonbofh · · Score: 2

      I just read it for the centerfolds.

    2. Re:not worth reading by doublebackslash · · Score: 4, Informative

      The abridged version:
      Atomicity: actions or sets of actions complete or they don't. No half states. Ever.
      Consistency: The database has rules. Rules like, "this can only be X when X exists in this other table" or "You cannot put a picture of a jabberwocky in this column." The rules are always obeyed even if one transaction fails. The DB itself will still be clean.
      Isolation: Everything accessing the DB views it as if it were the only thing accessing the DB.
      Durability: If the DB tells you it happened that means that you could yank the network jack, axe the power, or any other Bad Thing(tm) and so long as the disks are still there and intact your data also will be.

      That is SQL. NoSQL: Pick three, or two.

      Is it faster? You bet your ass it is. The limitations are, generally, that the DB won't do things like JOINs for you, or perhaps you have to deal with the idea of a half state, etc. Aside from ACID guarantees being, generally, broken the DB might act more as a key->value lookup (think a dictionary or encyclopedia, but with data). It might not have rigid fixed columns (some SQL databases do this too, but it is not a standard feature and generally comes with more cost vs a NoSQL that offers it).

      NoSQL is useful, though, if you have a tremendous (REALLY REALLY huge, I mean it has to be worth it!) data set or some strange demanding special need. Some things don't need isolation because the actions are intrinsically isolated (Slashdot comments, for example, are just appended and only one column needs to be mutated (the moderation)) . Durability might not need to be met at the disk level, you might be comfortable with writing it to two node's memory (Cassandra even lets you return after it is in the target node's memory and after it has been flushed to the network send buffer. You know, to kill those pesky nanoseconds of latency). If your nodes are good and isolated this might be fine. Atomicity might not be a big deal.... though I can't think of any that don't provide THIS. Atomicity is really rather important almost everywhere. Getting rid of fixed tables or "relations" (foreign keys) makes consistency a non-issue. Consistency is one of the first things to be tinkered with in most of these NoSQL things, though it is not 100% gone (still can't put that jabberwocky in that int column!)

      So by trading off some guarantees for a more simplistic DB one can gain speed and some degree of burden can be lifted from the programmer to work within the confines of that guarantee system. However, an ACID SQL system is universal (can store anything and meet any guarantees you require, but not necessarily quickly). NoSQL systems only work for some workloads and requirements. Almost (but not quite) anything can be shoehorned into them but weather it is a good idea remains a question to ask before you dive right in. If you can see gain from NoSQL then it might be a good idea, but don't paint yourself into a corner where you trade a working system of moderate speed for a blazingly fast system that has subtle (or blatant!) flaws which effect your company or customers.

      Hope that helps!

      --
      md5sum /boot/vmlinuz
      d41d8cd98f00b204e9800998ecf8427e /boot/vmlinuz
  3. Re:Bend Over ... by telekon · · Score: 2

    More typically, it goes:

    Dev: We should use MongoDB.
    DBA: THE END IS UPON US!!! The Beast and his armies shall rise from the Pit and make war against God!!! ZALGO!!! HE COMES!!!

    --

    To understand recursion, you must first understand recursion.

  4. Re:NoSQL is garbage, plain and simple. by Anonymous Coward · · Score: 5, Interesting

    If you view it as a SQL replacement, then yes, utter garbage. But if you take it for what it is, then no.

    The problem is there is a fad surrounding NoSQL and young, ignorant, inexperienced developers think RDBMs are for old farts who refuse to get with the times rather than viewing it as a different tool for solving a different problem. If you want/need ACID properties, you go with SQL. If you don't, NoSQL may be appropriate.

  5. In b4... by Anonymous Coward · · Score: 3, Informative

    This discussion is likely to lean towards "OMG NoSQL IS SO RETARDED!". So let me just say that if you don't care about NoSQL, then fine. If MySQL/Postgres/Oracle/MS-SQL fit your needs, then fine.

    That doesn't mean "NoSQL" databases are useless.

    I've had exposure to both MongoDB and CouchDB so far. CouchDB is the newest experience, as part of a Chef installation. Yes, it is a very immature product, and yes it has a long way to go, but it's very simple to configure and it does it's job with very few resources. I don't personally have a need for CouchDB myself, but I can see why people use it for certain specific needs (I.e. I can understand why Chef uses).

    MongoDB is a little marvel for certain applications. In my current and previous jobs we've used MongoDB for Syslog collection and SMTP mail logging. MongoDB is excellent for this sort of thing: each log entry is a single entry in the collection, the data is NOT relational in any interesting way and the insertion rate is far beyond anything a traditional relational database engine could manage on the same hardware at the same resource utilisation. Even better you can write some quite clever Map/Reduce functions on top that allow you to do some amazingly deep inspections of the log data, so you can produce on-demand data as well as graph out long term trends.

    NoSQL is a NOT a replacement for traditional SQL databases, but it sure is useful for stuff where SQL databases struggle.

  6. Re:Bend Over ... by C0vardeAn0nim0 · · Score: 2

    actually:

    Dev: We should use MongoDB.
    DBA: BWAHAHAHAHAHAHA!!! NO !!! Oracle. get used to it or GTFO!

    --
    What ? Me, worry ?
  7. Really, wikipedia? by Anonymous Coward · · Score: 3, Funny

    Key-value store

    Key-value stores allow the application to store its data in a schema-less way. The data could be stored in a datatype of a programming language or an object. Because of this, there is no need for a fixed data model. This is generally of interest to friendless sperglords only.[16] The following types exist:

    Crowdsourcing at its finest.. Although, I suppose the comment is accurate?

  8. Mysql ITSELF is a "NoSQL" solution by mcrbids · · Score: 4, Interesting

    Sure, some solutions are faster than MySQL out of the box by skipping much of the language parsing and stuff that any SQL solution has to do. But that's not to say that they are actually more efficient at key retrieval.

    For example, one developer found that the best no-sql solution was.... MySQL, which excels at simple key retrieval. He was able to best MemCached by a factor of almost 2.

    Use the right tool for the job.

    --
    I have no problem with your religion until you decide it's reason to deprive others of the truth.
    1. Re:Mysql ITSELF is a "NoSQL" solution by Billly+Gates · · Score: 2

      The issue with SQL is with joins particularly. MySQl is not a noSQL solution to this problem. If you do not use them and just need a single database you will be fine with traditional SQL. NoSQL wont be a benefit. If you host a simple website you wont run into that scalability problem.

      Now imagine your a systam analyst who needs joins to do things, such as comparing a pricing database with a sales order database to see if a discount worked and by how much? This is where you need join. Now imagine the size of both databases are 1 terabyte? Also imagine you have to pull this data from a regular ethernet connection shared by 100 other users only offering 10 mpbs speed? Also imagine the database is distributed among a cluster of computers and the RDMS needs to wait on the other servers to pull the whole table? See the performance problem?

      Can noSQL offer a solution where I could do this?

      My above example is why companies love Oracle and people doing analysis or statistics or even accountants need joins. The problem is with many databases and big iron and probably fiber optic connections and switches is that it gets very very expensive and is too much for a startup. The licensing costs then come up as well. Zdnet (dont have link) did an article showing it would cost $650,000,000 for Google to use Oracle to host Youtube. I can see why they went with their own solution.

  9. Re:NoSQL is garbage, plain and simple. by fusiongyro · · Score: 3, Interesting

    Yeah, the problem is that you want and need ACID, even if you don't know what it means. Very, very rarely, you may find yourself in a situation where availability demands are too great for systems with the ACID property, and then you should consider using one of these non-relational systems. The problem from where I'm sitting, is that too many young, ignorant, inexperienced developers think that their shitty little website needs to be prepared for handling millions of hits per second, and jump to two conclusions: one, that the problem is their database (and not the way they're using it), and two, that ACID should be thrown out the window to fix it.

    All other things being equal, you are much more likely to be implicitly depending on ACIDity than in a situation where demand is great enough that choosing NoSQL is worth the trouble you're going to get into.

  10. Chicken/Egg Problem (with NoSQL) by Manip · · Score: 2

    We want to jump on the NoSQL ship. I won't bore you with all of the details but briefly put SQL databases and tables are too restrictive for our work. Unfortunately because there are SO many NoSQL solutions, and none of them are backed by big names nobody here has the balls to sign off on one. Unfortunately, and ironically, NoSQL's biggest downside is the lack of cross compatibility. Once you make that call you're stuck with it good or bad.

    The other issue, is that because all of these solutions are relatively young the toolsets simply don't exist for many of them. No libraries, backup solutions, third party support, etc. I wish we'd see someone like Microsoft, Oracle, IBM, or any big name roll out some kind of complete solution (in particular XML compatible). I know a few big Cloud solutions exist but again we come back to being locked into a solution.

    1. Re:Chicken/Egg Problem (with NoSQL) by bhcompy · · Score: 3, Informative

      Not every solution is young. PICK is a NoSQL db that predates SQL. It's descendants are supported and cross-compatible to a degree. NoSQL is a generic term. You need a specific database. For a PICK based solution, I'd look at Reality. Reality has been around for decades and is highly supported and has many features for compatibility with modern databases and modern operating systems. OpenQM is GPL licensed and of the same class. jBASE might be a more recognizable descendent.

    2. Re:Chicken/Egg Problem (with NoSQL) by PCM2 · · Score: 2

      NoSQL is write once, never update but read often. SQL is read, write update all the time.

      And yet most MySQL installations (Web apps, anyway) are: read all the time; write some; update seldom. That's why MySQL became a popular database for Web apps -- it was faster for that model than Oracle (on the same hardware). SQL or the relational model wasn't the problem. The implementation was the problem.

      I'm sure there are some cases where NoSQL is absolutely game-changing -- but those cases seem rare, and where they have occurred, the companies that really need NoSQL seem to be the ones who invented it (as you might expect). But "Google uses it so I should" is a poor argument; you are not Google, no matter what your VP of sales likes to think.

      E.g. when you have to write giga bytes per second to the DB you are out of luck with any of our days RDBSs.

      I suppose that's true, but can you really process gigabytes of data per second? Maybe this is a case for data warehousing, and you don't even use a traditional database to capture the data. Er, wait -- maybe I just gave a case for using NoSQL. But in this case, NoSQL isn't a replacement for a RDBMS, it's an adjunct to one, so I guess all I'm really saying is that it gets tiresome to read discussions of NoSQL this, NoSQL that, when most folks seem to have a poor understanding of the dimensions of their own problem spaces and they've chosen a tool before they've figured out how they'll use it.

      --
      Breakfast served all day!
  11. learn something useful first by roman_mir · · Score: 3, Interesting

    First you need to learn something useful, like understand a normal database, like PostgreSQL, SQLLite, DB2 or whatever your heart desires (not MySQL, that's just not right.) Once you really understand the normal databases and you understand your requirements only then you can make a statement by going 'nosql' something, otherwise it's most likely for most scenarios is counterproductive, you are not all FBs out there.

  12. Breaking the backs of DBAs by EmperorOfCanada · · Score: 4, Interesting

    One the many reasons that programmers that I know are adopting these technologies is that it breaks the back of the in-house DBA. Often there are a few in-house DBAs with certifications up the wazoo who squeeze themselves into every project that has to store data(all projects). But somehow their word becomes the final word. Getting a table added to a schema can take days or even weeks and might not be approved at all. Suddenly with MongoDB or whatever the DBA has no possible input. One can make all kinds of arguments for and against relational systems and how valuable a DBA is to the long term health of a datastore but from many developer's / project manager's perspective a modern DBA often acts as a brick wall to on time on budget.

    1. Re:Breaking the backs of DBAs by Tenareth · · Score: 2

      One of the main reasons for this is that the DBAs are the ones that keep the production environment functioning. Devs get to put in whatever random thought that crosses their mind and when it breaks in production and data is lost, or clients are impacted they just shrug and say "Odd, didn't expect that".

      A 'modern' DBA should be trained in whatever development cycle that dev is using, which may include Scrum/Agile, in which case the process would be integrated and the delay of implementation would be greatly reduced, but not eliminated. It really isn't a bad thing to stop and think about the big picture from time to time.

      The issue is when the management sets up a reward system for DBAs to be roadblocks (this is usually done by crucifying a DBA for a database failure, even if it is proven to be a poor design from Development) that creates the type of environment you are talking about. It is a perfectly valid response to management to be protective of their job. The issue isn't the DBA, it is the structure around the technology groups.

      --
      This sig is the express property of someone.
  13. Re:Bend Over ... by epiphani · · Score: 3, Insightful

    No, it should go...

    DEV: We should use MongoDB

    DBA: Really? Here, have a nice big frosty glass of shut the fuck up. Now go back to your toy scripting languages and leave the data to those of us who actually understand data storage.

    That should be the end of the discussion right then and there. The problem with these script kiddies is that 99.5% of them don't fucking have a clue about data. They are the ones who still embed SQL statements, log in credentials and the like in their php/python/rails/whatever.scripting.language.is.popular.this.week code.

    Congrats. You're the reason we get devs storing images in databases.

    Either you have to educate your developers on what is appropriate to go into a relational database, or you need to get out of the way. Your attitude is exactly the reason NoSQL is picking up steam. I'm not a dev, but I've done dev work - nor am I a DBA, but I've done DBA work. And I can tell you, DBA's are often folks running around with a hammer: everything looks like a nail.

    Devs, on the other hand, are looking for a solution, and thinking like devs: I'll build the solution to my problem! Of course, they usually end up reimplementing stuff other people have done.

    If devs understood how full RDBMS's worked, database use would drop like a stone. If DBAs tool the time to understand requirements, database use would drop like a stone. NoSQL makes a _huge_ amount of sense. While you maintain your "script kiddies" attitude, the rest of the world will happily glide past you.

    RDBMS's are 90% misused, and a massive waste of money. NoSQL is an overraction to that fact. Sometime in the future people will swing back to the middle and realize that files in directories are a surprisingly good way of storing data -- and each will have its place.

    --
    .
  14. Re:Bend Over ... by Anonymous Coward · · Score: 2, Insightful

    insert into users values('bob','123 Main Street','Springfield','NY');

    I want to punch you in the head for not specifying the columns you're inserting into!

  15. Re:Bend Over ... by dido · · Score: 2

    The MongoDB record looks no more complex to me than the insert statement. In fact, the MongoDB record looks more readable, but what do I know, I'm probably one of the "script kiddies" you like to so disparage. I like to have my column names next to the data that actually goes into them, rather than some mess like insert into users (username, address, street, city, state) values('bob','123 Main Street','Springfield','NY'); that the true equivalent SQL would have been. By the way, I wonder why SQL uses such a syntax, when the SQL UPDATE statement is much more readable, and by the way, an update statement would look not much different from the MongoDB record, with equals signs instead of columns, and a few keywords instead.

    As is always in the world of software, there are some jobs for which NoSQL is in fact a very good idea, and others for which relational databases are better. If the fine folks at Google thought as you did and believed a traditional RDBMS was the only tool they could use then I doubt that Google would have grown to the size it has. They knew and understood that their problem did not map well into the concept of a standard relational database and acted accordingly. Of course, you also need to recognize when such an approach is warranted, as more often than not you'd be better off using a real RDBMS, and it would not be wise to shift to NoSQL databases just because you're driven by buzzword compliance.

    --
    Qu'on me donne six lignes écrites de la main du plus honnête homme, j'y trouverai de quoi le faire pendre.