The NoSQL Ecosystem
abartels writes 'Unprecedented data volumes are driving businesses to look at alternatives to the traditional relational database technology that has served us well for over thirty years. Collectively, these alternatives have become known as NoSQL databases. The fundamental problem is that relational databases cannot handle many modern workloads. There are three specific problem areas: scaling out to data sets like Digg's (3 TB for green badges) or Facebook's (50 TB for inbox search) or eBay's (2 PB overall); per-server performance; and rigid schema design.'
With regard to scalability, it strikes me that the problem isn't so much SQL but the fact that current SQL-based RDBMS implementations are optimized for smaller data sets.
Ever heard of bloom filters? Sharding? Indexes? They are clearly not doing a table scan on 50gb of data every time you open your Facebook inbox.
You know, there's a certain point where people need to stop and actually think about the implimentation.
Um, they do. They regularly blog about their solutions to their problems and open source their solutions and contributions to existing projects. They come up with amazing solutions to their large scale problems. They're running over five million Erlang processes for their chat system!
http://developers.facebook.com/news.php?blog=1
http://github.com/facebook
Also, when was the last time you tried to visit Facebook and it was down? They're doing quite well for people who need to stop and actually think about their "implimentation".
Functional programming... for real men!
The problem is when people don't think about the solution and apply the cargo cult mentality. Facebook uses Eeeerlaaaang therefore we should. Facebook wrote it's own database, therefore we should. People end up writing their own database engines that do exactly the same thing as modern relational engines, with all the bugs that were fixed in the relational engines 10 years ago (5 for Microsoft). Even MS SQL will split a large group by aggregate operation (which takes 3 lines to specify) across multiple CPUS by turning it into a map reduce problem, and it will do this all without you having to be aware of it. Oracle (and many others, Oracles is supposed to be the best) will maintain multiple concurrent versions of your data in order to allow multiple users to work with a snapshot that doesn't change under them while others are changing the data, and this happens transparently. You can go ahead and implement all this stuff yourself if you want, in C and sockets, call me when your done, in 10-20 years.
The real issue I have with the NoSQL people is they're a bunch of whiny babies, who haven't even taken the time to understand the problem before lashing out at the first thing they see. Just the name tells you this, they call themselves "No SQL" and then lash out at relational databases. SQL is is a terrible language, which really needs replacing, but it is only one possible language for querying relational databases. Relational databases represent several decades of research into how to query data in a fault tolerant scalable way as a standing implementation, re-implementing them is a waste of time.
If you can read this you've gone too far.