Slashdot Mirror


The NoSQL Ecosystem

abartels writes 'Unprecedented data volumes are driving businesses to look at alternatives to the traditional relational database technology that has served us well for over thirty years. Collectively, these alternatives have become known as NoSQL databases. The fundamental problem is that relational databases cannot handle many modern workloads. There are three specific problem areas: scaling out to data sets like Digg's (3 TB for green badges) or Facebook's (50 TB for inbox search) or eBay's (2 PB overall); per-server performance; and rigid schema design.'

8 of 381 comments (clear)

  1. hmm by buddyglass · · Score: 4, Insightful

    With regard to scalability, it strikes me that the problem isn't so much SQL but the fact that current SQL-based RDBMS implementations are optimized for smaller data sets.

    1. Re:hmm by phantomfive · · Score: 5, Insightful

      The biggest problem is the cloud. A lot of cloud APIs don't allow full relational database access, so now it seems we are coming up with all these justifications for why we don't really need it. Notice that this blog is from a company pushing a cloud based solution.

      --
      Qxe4
    2. Re:hmm by MightyMartian · · Score: 4, Insightful

      That's my take as well. We have these crippled semi-databases that lack a lot of useful features that anyone that has used RDBMSs over the last few decades have gotten used to, so suddenly it becomes a justification game; "Well, SQL doesn't deliver the output we need, so here's some half-way-to-SQL tools which are really better, kinda... oh yes, and Netcraft confirms it, SQL is dying!!!!"

      I have a feeling that this part hype, part inept programmers who don't actually understand SQL, or database optimization. The first sign for me that someone is selling bullshit is when they try to act like this is some never before seen problem, when in fact there is a good four decades of research of database optimization.

      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
    3. Re:hmm by KalvinB · · Score: 4, Insightful

      For the vast majority of use cases, large data sets can be made logically small with indexes or physically small with hashes.

      If you're dealing with massive data you're probably not dealing with complex relationships. E-Mail servers associate data with only one index: the e-mail address. Google only associates content with keywords. E-mail servers logically and physically separate email folders. Google logically and physically separates the datasets for various keywords. So by the time you hit it, it knows instantly where to look for what you want. You don't have a whole complex system of relationships between the data. It looks at the keywords , finds the predetermined results for each and combines the results.

    4. Re:hmm by mzito · · Score: 4, Insightful

      Uh, no, that is not correct. Relational DBMSes such as Oracle, Teradata, DB2, even SQL Server are all designed to scale into the multi-terabyte to petabyte range. The issue is one of a couple of things:

      - Cost - "real" relational databases are expensive. I once had a conversation with someone who worked at Google, who talked about how much infrastructure they have written/built/maintain to deal with MySQL. Many of those problems were solved in an "enterprise" DBMS 3-10 years ago. However, the cost of implementing one of those enterprise DBMS is so high that it is cheaper to build application layer intelligence on top of a stupid RDBMS than purchase something that works out of the box
      - Workload style - most of the literature around tuning DBMS is for OLTP or DSS workloads. Either small question, small response time (show me the five last things I bought from amazon.com) or big question, long response time (look through the last two years worth of shipping data and figure out where the best places to put our distribution centers would be). Many of these workloads are combos - there could be very large data sets and complex data interdependencies, with low latency requirements. It may be possible to write good SQL that does these things (in fact, I know a couple luminaries in the SQL space that will claim just that), but the community knowledge isn't there.
      - Application development - when you're building your app from scratch, you can afford to work around "quirks" (bugs) and "gaps" (fatal flaws) to get what you need. This dovetails with the other issues, but when your core business is building infrastructure, it's worth your while to deal with this. When your core business is selling insurance or widgets, or whatever, it is not.

      None of this is to say that the "nosql" movement is a bad thing, or that there's no reason for its existence, or that no one should bother looking at it. However, there is a definite trend of "this is so much better than SQL" for no good reason. SQL has scaled for years, and I know loads of companies who work with terabytes and terabytes of data on a single database without any issue.

      A far more interesting discussion is the data warehouse appliance space - partitioning SQL down to a large number of small CPUs and pushing those as close to the disk as possible.

      --
      me@mzi.to
    5. Re:hmm by QuoteMstr · · Score: 4, Insightful

      I think I'd rather see the opposite: That non-relation DBs become the mainstream, and they have SQL added for the odd occasion it is useful. Relational has some nice properties for ad-hoc querying, but for everything else they are a nuisance.

      Berkeley DB is a very good non-relational database with multiple language bindings, several storage engines, and transaction support. It's been around for 24 years, and has seen some appreciable use.

      But that use was nothing compared to the database explosion that SQLite brought about when it was released. SQLite is almost exactly like Berkeley DB, except that it has a SQL engine on top. Almost everyone is using SQLite, and many Berkeley DB users are moving over to it.

      Why? Because SQLite is relational! That constitutes some serious evidence that relationship databases are more than "a nuisance".

  2. Re:bad design by munctional · · Score: 5, Insightful

    Ever heard of bloom filters? Sharding? Indexes? They are clearly not doing a table scan on 50gb of data every time you open your Facebook inbox.

    You know, there's a certain point where people need to stop and actually think about the implimentation.

    Um, they do. They regularly blog about their solutions to their problems and open source their solutions and contributions to existing projects. They come up with amazing solutions to their large scale problems. They're running over five million Erlang processes for their chat system!

    http://developers.facebook.com/news.php?blog=1

    http://github.com/facebook

    Also, when was the last time you tried to visit Facebook and it was down? They're doing quite well for people who need to stop and actually think about their "implimentation".

    --
    Functional programming... for real men!
  3. Re:bad design by Zombywuf · · Score: 4, Insightful

    The problem is when people don't think about the solution and apply the cargo cult mentality. Facebook uses Eeeerlaaaang therefore we should. Facebook wrote it's own database, therefore we should. People end up writing their own database engines that do exactly the same thing as modern relational engines, with all the bugs that were fixed in the relational engines 10 years ago (5 for Microsoft). Even MS SQL will split a large group by aggregate operation (which takes 3 lines to specify) across multiple CPUS by turning it into a map reduce problem, and it will do this all without you having to be aware of it. Oracle (and many others, Oracles is supposed to be the best) will maintain multiple concurrent versions of your data in order to allow multiple users to work with a snapshot that doesn't change under them while others are changing the data, and this happens transparently. You can go ahead and implement all this stuff yourself if you want, in C and sockets, call me when your done, in 10-20 years.

    The real issue I have with the NoSQL people is they're a bunch of whiny babies, who haven't even taken the time to understand the problem before lashing out at the first thing they see. Just the name tells you this, they call themselves "No SQL" and then lash out at relational databases. SQL is is a terrible language, which really needs replacing, but it is only one possible language for querying relational databases. Relational databases represent several decades of research into how to query data in a fault tolerant scalable way as a standing implementation, re-implementing them is a waste of time.

    --
    If you can read this you've gone too far.