Horizontal Scaling of SQL Databases?
still_sick writes "I'm currently responsible for operations at a software-as-a-service startup, and we're increasingly hitting limitations in what we can do with relational databases. We've been looking at various NoSQL stores and I've been following Adrian Cockcroft's blog at Netflix which compares the various options. I was intrigued by the most recent entry, about Translattice, which purports to provide many of the same scaling advantages for SQL databases. Is this even possible given the CAP theorem? Is anyone using a system like this in production?"
Postgres seems to not charge extra for that.
I didn't expect we'd be on Slashdot just yet. I'm Michael Lyle, CTO and cofounder of Translattice.
With regards to the original submitter's question, we'd love to talk to him. How much we can help, of course, depends on the specific scenario he's hitting.
What we've built is an application platform constituted from identical nodes, each containing a geographically decentralized relational database, a distributed (J2EE compatible) application container, and distributed load balancing and management capabilities. Massive relational data is transparently sharded behind the scenes and assigned redundantly to the computing resources in the cluster, and a distributed consensus protocol keeps all of the transactions in flight coherent and provides ACID guarantees. In essence, we allow existing enterprise applications to scale out horizontally while keeping the benefits of the existing programming model for transactional applications, by letting computing resources from throughout an organization combine to run enterprise workloads.
Current stacks are really complicated, multi-vendor, and require extensive integration/custom engineering for each application install. We're striving to create a world where massively performing infrastructure can be built from identical pieces.
A lot of people don't understand how a database really works, so they do it horribly wrong. As a result, it's dreadfully slow. So they go and use some key/value lookup system because "they're fast". There you often get one of two things:
They still don't understand the problem, so they recreate it yet again. If you don't understand what's wrong with reading an entire table with a million records, and discarding all but 5 of them client-side, then replacing the SQL DB with a key/value system just isn't going to make things better.
Or, they improve performance, but since they don't understand what ACID is for, they eventually end up with weird inconsistencies. In some cases this might be acceptable, but you really don't want to see it happening in an order tracking system.
The sickening feeling people get is not because it's a competitor. In a large part it isn't a competitor, but a different class of system with different tradeoffs. The sickening feeling comes from seeing people not understand what they're doing, and then run towards the latest technology because it's what $BIG_COMPANY uses without understanding it any better, and generally making an even bigger mess.
The performance of specialized solutions like key/value systems doesn't come from magic. They're not really new, and don't use anything very groundbreaking. They simply use different tradeoffs at the cost of sacrificing quite a lot of what is present in a RDBMS. It's important to understand first whether you can really afford to discard those things, because if you can't, it's either not going to work right, or you'll have to graft all that you removed on top of it anyway.
I recently came across Rick Cattell's site which addresses just the questions you're asking.
Rick Cattell has written an excellent comparison guide of horizontally scalable datastores of different types (RDBMS as well as a variety of NoSQL systems).
Cattell has also written an academic paper with database expert Mike Stonebraker, which weighs the system design factors required to make a datastore scalable.
Executive summary of Cattell's work: although NoSQL may be a huge fad, the things that make a datastore scalable can be implemented in SQL RDBMS systems as well. Also, implementing do-it-yourself ACID in NoSQL systems is extremely difficult and error-prone, and is a significant advantage of most RDBMS systems. Stonebraker is the author of VoltDB, which is an open-source RDBMS designed for horizontal scalability, but they give a very fair and thorough look at competing datastores as well.
My bicyles