Slashdot Mirror


The NoSQL Ecosystem

abartels writes 'Unprecedented data volumes are driving businesses to look at alternatives to the traditional relational database technology that has served us well for over thirty years. Collectively, these alternatives have become known as NoSQL databases. The fundamental problem is that relational databases cannot handle many modern workloads. There are three specific problem areas: scaling out to data sets like Digg's (3 TB for green badges) or Facebook's (50 TB for inbox search) or eBay's (2 PB overall); per-server performance; and rigid schema design.'

11 of 381 comments (clear)

  1. Dynamic Relational: change it, DON'T toss it by Tablizer · · Score: 5, Interesting

    The performance claims will probably be disputed by Oracle whizzes. However, the "rigid schema" claim bothers me. RDBMS can be built that have a very dynamic flavor to them. For example, treat each row as a map (associative array). Non-existent columns in any given row are treated as Null/empty instead of an error. Perhaps tables can also be created just by inserting a row into the (new) target table. No need for explicit schema management. Constraints, such as "required" or "number" can incrementally be added as the schema becomes solidified. We have dynamic app languages, so why not dynamic RDBMS also? Let's fiddle with and stretch RDBMS before outright tossing them. Maybe also overhaul or enhance SQL. It's a bit long in the tooth.

    More at:
    http://geocities.com/tablizer/dynrelat.htm
    (And you thought geocities was de

    1. Re:Dynamic Relational: change it, DON'T toss it by sco08y · · Score: 4, Interesting

      However, the "rigid schema" claim bothers me. RDBMS can be built that have a very dynamic flavor to them. For example, treat each row as a map (associative array).

      You described an entity attribute value model, which winds up reinventing half the DBMS, poorly. Don't worry, *everyone* does one once until they realize it's a bad idea.

      Constraints, such as "required" or "number" can incrementally be added as the schema becomes solidified.

      A "rigid" schema is preventing a ton of totally redundant code being written on the app side. All those constraints wind up in the schema because your UI designer doesn't want to consider that Mary might have 5 addresses or 6 mothers or work 7 jobs simultaneously. And your UI tester doesn't want to test an exploding combinatorial number of possibilities.

      I'd like to see, however, a decent type system, proper logical / physical separation, etc.

      Maybe also overhaul or enhance SQL. It's a bit long in the tooth.

      I'm starting from scratch. (Currently I'm slowly retyping about 40 pages into Latex...)

  2. Starting to love the idea by Just+Some+Guy · · Score: 4, Interesting

    I'm a huge PostgreSQL fan and took classes in formal database theory in college. I'm saying this as someone who understands and thoroughly appreciates relational databases: I'm starting to love schema-less systems. I've only been playing with CouchDB for a few weeks but can certainly see what such stores bring to the table. Specifically, a lot of the data I've stored over the years doesn't neatly map to a predefined tuple, and while one-to-one tables can go a long way toward addressing that, they're certainly not the most elegant or efficient or convenient representation of arbitrary data.

    I'm certainly not going to stop using an RDBMS for most purposes, but neither am I going to waste a lot of time trying to shoehorn an everchanging blob into one. Each tool has its place and I'm excited to see what niche this ecosystem evolves to fill.

    --
    Dewey, what part of this looks like authorities should be involved?
  3. Re:bad design by JavaPunk · · Score: 4, Interesting

    Yes it does (look through 50TB of data), and how would you design it? It has to access all of your friends and find their postings. Robert Johnson gave an excellent talk on facebook's design two weeks ago at OOPSLA (it should be in the ACM digital library soon). He stated that there is no clear segregation of data, the (friend) network is too connected and extracting groups of friends isn't possible. Basically they have a huge mysql farm with memcached on top. Loading an inbox will hit multiple servers (maybe even a different server for each of your friends) across the farm.

  4. Everything old is new again by QuoteMstr · · Score: 5, Interesting

    We didn't start with relationship databases. RDBMSes were responses to the seductive but unmanageable navigational databases that preceded them. There were good reasons for moving to relational databases, and those reasons are still valid today.

    Computer Science doesn't change because we're writing in Javascript now instead of PL/1.

    1. Re:Everything old is new again by QuoteMstr · · Score: 3, Interesting

      Your question reminds me of the people who say, "if flight records are so strong, why don't we just build the whole plane out of the stuff they use to make them?" You might as well ask, "if DNS is so great, why don't we implement filesystems in terms of it?" Your post demonstrates that you you haven't considered context and purpose.

      Relational databases are models. You can certainly describe DNS in terms of a relational schema. In principle, you could construct a wrapper and query it with SQL. But there's no reason to do that, because with someone as simple as DNS, the full power of a relational query engine doesn't buy you much.

      Most datasets aren't that simple.

      Furthermore, DNS is an open standard that needs to be accessible in as simple a way as possible. Complicating it with relational semantics wouldn't have been worthwhile (because of DNS's relative simplicity), and would have significantly hampered DNS's interoperability.

      That is, if relational databases had existed when DNS was implemented, which they didn't.

      Furthermore, DNS is a distributed, decentralized database. You couldn't use a RDBMS (the software that realized the abstract model of a relational database) to manage it even if you wanted to. That doesn't apply to most datasets, which however large, are still managed by a single organization, and which are accessed by software under the control of that organization.

      Your comparison really makes no sense whatsoever. The vast majority of databases aren't put under the same constraints DNS, and so can take advantage of the much greater flexibility an RDBMS affords.

      You're basically arguing that we can't have efficient engines in automobiles because of a few of them might need to tow 18 ton trailers and withstand mortar rounds. It's ridiculous.

  5. Vendor Hype Orange Alert (Re:hmm) by Tablizer · · Score: 3, Interesting

    Notice that this blog is from a company pushing a cloud based solution.

    That is indeed suspicious. But if they want to sell clouds, then make a RDBMS that *does* scale across cloud nodes instead of bashing SQL. (SQL as a language doesn't define implementation; that's one of it's selling points.) It may be that since there's not one out yet, they instead hype the existing non-RDBMS that can span clouds.

    (I agree that SQL could use some improvements, such as named sub-queries instead of massive deep nesting to make one big run-on statement. Some dialects already have this to some extent.)
             

  6. Re:hmm by buchner.johannes · · Score: 4, Interesting

    The first sign for me that someone is selling bullshit is when they try to act like this is some never before seen problem, when in fact there is a good four decades of research of database optimization.

    Your point is valid, but I think there is more to it. And the problems these solutions try to solve are quite old too. For example:

    Ever tried to design a database, but got the requirement that you should be able to reconstruct the modification history? It boils down to not deleting (ever), and 'deleted' flag fields and other uglyness. A multi-version relational database would be nice, you actually don't need modification/delete operations in this scenario, just 'updates' that add to the previous status. CouchDB does append operations.

    In some cases you may not need a complete SQL database, just key->value relations, but have them scaling very well. http://project-voldemort.com/ states: "It is basically just a big, distributed, persistent, fault-tolerant hash table." Then they state that they provide horizontal scalability, which MySQL doesn't (OTOH, we should really look at Oracle for these things).

    And you can't really say MapReduce/Hadoop is pointless.

    --
    NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.
  7. Re:bad design by Ragzouken · · Score: 3, Interesting

    "Also, when was the last time you tried to visit Facebook and it was down? They're doing quite well for people who need to stop and actually think about their "implimentation"."

    When was the last time you tried to use Facebook or Facebook chat and didn't get failed transport requests, unsent chat messages, unavailable photos, or random blank pages?

  8. Re:Why worry? by Anonymous Coward · · Score: 5, Interesting

    You laugh, but the things I see done in Excel on a daily basis in production environments getting a LOT of work done are a testament to it's power. It is one of the best rapid application development platforms in existance. People with no CS background programming away in a functional style and getting shit done and not even realising they are programming. It could be so much better but it's still the best of breed. Any yes I have tried, and seen others try, O.O. et al. Forget it. Lets not go down that worn old road.

  9. Re:I know the type well by QuoteMstr · · Score: 3, Interesting

    Right. Don't forget PostgreSQL too. Really, the problem here is MySQL. Hell, look at the "tips and tricks" comments for this story: they all deal with ways to work around deficiencies in MySQL (and old versions of MySQL at that.)

    The guy who recommends using the first two characters of the MD5 hash to select a table is particularly hilarious. Doesn't he realize that's what a database index already does, and that databases (even MySQL) will do that for him?