The NoSQL Ecosystem
abartels writes 'Unprecedented data volumes are driving businesses to look at alternatives to the traditional relational database technology that has served us well for over thirty years. Collectively, these alternatives have become known as NoSQL databases. The fundamental problem is that relational databases cannot handle many modern workloads. There are three specific problem areas: scaling out to data sets like Digg's (3 TB for green badges) or Facebook's (50 TB for inbox search) or eBay's (2 PB overall); per-server performance; and rigid schema design.'
Microsoft Access is here!
With regard to scalability, it strikes me that the problem isn't so much SQL but the fact that current SQL-based RDBMS implementations are optimized for smaller data sets.
I'm a terabyte sized binary blob, you insensitive clod!
The CB App. What's your 20?
The performance claims will probably be disputed by Oracle whizzes. However, the "rigid schema" claim bothers me. RDBMS can be built that have a very dynamic flavor to them. For example, treat each row as a map (associative array). Non-existent columns in any given row are treated as Null/empty instead of an error. Perhaps tables can also be created just by inserting a row into the (new) target table. No need for explicit schema management. Constraints, such as "required" or "number" can incrementally be added as the schema becomes solidified. We have dynamic app languages, so why not dynamic RDBMS also? Let's fiddle with and stretch RDBMS before outright tossing them. Maybe also overhaul or enhance SQL. It's a bit long in the tooth.
More at:
http://geocities.com/tablizer/dynrelat.htm
(And you thought geocities was de
Table-ized A.I.
I think I've heard of non-relational databases before. There's a particularly famous one, in fact. What could it be? Let's see: first started shipping in 1969, now in its eleventh major version, JDBC and ODBC access, full XML support in and out, available with an optional paired transaction manager, extremely high performance, and holds a very large chunk of the world's financial information (among other things). It also ranks up there with Microsoft Windows as among the world's all-time highest grossing software products.
....You bet non-relational is still highly relevant and useful in many different roles. Different tools for different jobs and all.
Ever heard of bloom filters? Sharding? Indexes? They are clearly not doing a table scan on 50gb of data every time you open your Facebook inbox.
You know, there's a certain point where people need to stop and actually think about the implimentation.
Um, they do. They regularly blog about their solutions to their problems and open source their solutions and contributions to existing projects. They come up with amazing solutions to their large scale problems. They're running over five million Erlang processes for their chat system!
http://developers.facebook.com/news.php?blog=1
http://github.com/facebook
Also, when was the last time you tried to visit Facebook and it was down? They're doing quite well for people who need to stop and actually think about their "implimentation".
Functional programming... for real men!
I'm a huge PostgreSQL fan and took classes in formal database theory in college. I'm saying this as someone who understands and thoroughly appreciates relational databases: I'm starting to love schema-less systems. I've only been playing with CouchDB for a few weeks but can certainly see what such stores bring to the table. Specifically, a lot of the data I've stored over the years doesn't neatly map to a predefined tuple, and while one-to-one tables can go a long way toward addressing that, they're certainly not the most elegant or efficient or convenient representation of arbitrary data.
I'm certainly not going to stop using an RDBMS for most purposes, but neither am I going to waste a lot of time trying to shoehorn an everchanging blob into one. Each tool has its place and I'm excited to see what niche this ecosystem evolves to fill.
Dewey, what part of this looks like authorities should be involved?
Yes it does (look through 50TB of data), and how would you design it? It has to access all of your friends and find their postings. Robert Johnson gave an excellent talk on facebook's design two weeks ago at OOPSLA (it should be in the ACM digital library soon). He stated that there is no clear segregation of data, the (friend) network is too connected and extracting groups of friends isn't possible. Basically they have a huge mysql farm with memcached on top. Loading an inbox will hit multiple servers (maybe even a different server for each of your friends) across the farm.
Don't forget flux capacitors, FTL drives and crossfading splicers.
We didn't start with relationship databases. RDBMSes were responses to the seductive but unmanageable navigational databases that preceded them. There were good reasons for moving to relational databases, and those reasons are still valid today.
Computer Science doesn't change because we're writing in Javascript now instead of PL/1.
That is indeed suspicious. But if they want to sell clouds, then make a RDBMS that *does* scale across cloud nodes instead of bashing SQL. (SQL as a language doesn't define implementation; that's one of it's selling points.) It may be that since there's not one out yet, they instead hype the existing non-RDBMS that can span clouds.
(I agree that SQL could use some improvements, such as named sub-queries instead of massive deep nesting to make one big run-on statement. Some dialects already have this to some extent.)
Table-ized A.I.
So... every time I open my inbox in Facebook, it has to search through 50TB of data? That sounds like a design problem. What has always floored me is why people think everything needs to be stuffed into a database. Terabyte sized binary blobs? You know, there's a certain point where people need to stop and actually think about the implimentation.
Could be worse. They could try to find something on my desk.
Help save the critically endangered Blue Iguana
"Also, when was the last time you tried to visit Facebook and it was down? They're doing quite well for people who need to stop and actually think about their "implimentation"."
When was the last time you tried to use Facebook or Facebook chat and didn't get failed transport requests, unsent chat messages, unavailable photos, or random blank pages?
The problem is when people don't think about the solution and apply the cargo cult mentality. Facebook uses Eeeerlaaaang therefore we should. Facebook wrote it's own database, therefore we should. People end up writing their own database engines that do exactly the same thing as modern relational engines, with all the bugs that were fixed in the relational engines 10 years ago (5 for Microsoft). Even MS SQL will split a large group by aggregate operation (which takes 3 lines to specify) across multiple CPUS by turning it into a map reduce problem, and it will do this all without you having to be aware of it. Oracle (and many others, Oracles is supposed to be the best) will maintain multiple concurrent versions of your data in order to allow multiple users to work with a snapshot that doesn't change under them while others are changing the data, and this happens transparently. You can go ahead and implement all this stuff yourself if you want, in C and sockets, call me when your done, in 10-20 years.
The real issue I have with the NoSQL people is they're a bunch of whiny babies, who haven't even taken the time to understand the problem before lashing out at the first thing they see. Just the name tells you this, they call themselves "No SQL" and then lash out at relational databases. SQL is is a terrible language, which really needs replacing, but it is only one possible language for querying relational databases. Relational databases represent several decades of research into how to query data in a fault tolerant scalable way as a standing implementation, re-implementing them is a waste of time.
If you can read this you've gone too far.
I did profile my code. It is not my gut feeling, but my experience.
Nae king! Nae laird! Nae yurrupiean pressedent! We willna be fooled again!
SQL databases if designed properly DO handle enourmous datasets. the problem starts when you have wits designing the database and then managers attempting to use the DB for purposes it wasn't meant for.
If you mod me down, I will become more powerful than you can imagine....
Right. Don't forget PostgreSQL too. Really, the problem here is MySQL. Hell, look at the "tips and tricks" comments for this story: they all deal with ways to work around deficiencies in MySQL (and old versions of MySQL at that.)
The guy who recommends using the first two characters of the MD5 hash to select a table is particularly hilarious. Doesn't he realize that's what a database index already does, and that databases (even MySQL) will do that for him?