Slashdot Mirror


Horizontal Scaling of SQL Databases?

still_sick writes "I'm currently responsible for operations at a software-as-a-service startup, and we're increasingly hitting limitations in what we can do with relational databases. We've been looking at various NoSQL stores and I've been following Adrian Cockcroft's blog at Netflix which compares the various options. I was intrigued by the most recent entry, about Translattice, which purports to provide many of the same scaling advantages for SQL databases. Is this even possible given the CAP theorem? Is anyone using a system like this in production?"

22 of 222 comments (clear)

  1. Re:XML by Anonymous Coward · · Score: 5, Funny

    XXXML

  2. What limitations are you running into? by Anonymous Coward · · Score: 5, Insightful

    It would be a lot easier to talk about solutions if you said which limitations you run into.

    Is your dataset to large (large tables), are you having to much joins, too many transactions per second? In short, what is the problem we're trying to solve here?

    1. Re:What limitations are you running into? by Anonymous Coward · · Score: 5, Interesting

      It would be a lot easier to talk about solutions if you said which limitations you run into.

      Is your dataset to large (large tables), are you having to much joins, too many transactions per second? In short, what is the problem we're trying to solve here?

      My money is on "No one here likes SQL" and "There aren't any exports on RDBMs to help us get things set up properly".

    2. Re:What limitations are you running into? by DarkOx · · Score: 4, Insightful

      I would have to agree, its really hard to imagine a "start up" can't make anything work on traditional SQL RDBMS(es). If you put the right hardware underneath it even SQL Server 2000 (64bit anyway) will scale just fine to terabyte size databases at thousands of transactions per second. That is not on impossible hardware for a successful start to buy either, we are talking a dedicated storage controller with gigabyte or so cache and few dozen SAS drives. I know I have worked on such projects.

      You need the schema right, and if its more reads than writes you might even de-normalize a little and you will need to partition the data appropriately, but it can be done. This is why realDBAs still make the big bucks. There is a lot to know in that domain. You probably should hire someone who is an expert on whatever stuff you are using now to consult before you go down the path of NOSQL. All you told us is you are a growing start up with is not much to go on but without know what you are doing its hard for me to believe you are doing anything on a scale that can't be done well with a relational database; but maybe I am wrong and maybe you are doing something huge. Remember as soon as you go down the NOSQL path you are going to have to be doing a great deal of heavy lifting because the quantity of libraries and off the shelf stuff out there is not great.

      --
      Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
  3. Relational stuff scales by Anonymous Coward · · Score: 5, Insightful

    Learn partitioning principles, get a database product that does partitioning properly, learn normalization, never worry again about not being able to scale with relational databases. It just requires some real skills but relational databases really do scale all the way up.

    1. Re:Relational stuff scales by h4rr4r · · Score: 4, Informative

      Postgres seems to not charge extra for that.

  4. Call me skeptical by Kjella · · Score: 5, Insightful

    Call me skeptical but there are companies out there with massive amounts of data in relational databases, if you as a setup are "constantly hitting limitations" you're either a very odd startup or using it very wrong. As long as the volume is small you can make almost anything happen on SQL. Hell, most small business I've known run mostly on Excel. Somehow I don't see a startup needing NoSQL unless they specialized in processing huge amounts of data, in which case trying to make slashdot work on your core business seems stupid. But maybe I missed something...

    --
    Live today, because you never know what tomorrow brings
    1. Re:Call me skeptical by Squeebee · · Score: 5, Funny

      Agreed, we have massive sites serving millions of requests a day using Open Source relational databases and yet it seems everyone wants to use NoSQL because it's the hip new thing.

      Naturally I start thinking of this: http://xtranormal.com/watch/6995033

    2. Re:Call me skeptical by RobertM1968 · · Score: 4, Insightful

      Agreed... the biggest limitation I see with SQL (My, DB2, Postgres anyway... found plenty in MS) are people who don't know how to lay out a database, people who don't know how to install and configure the server daemon(s), people who have no idea how to properly select appropriate hardware, and people who don't know how the heck to do a query (as a for instance, I worked on some code done by someone else, where on massive records, they were always selecting "*" instead of the needed or anticipated values. Big waste when one needs (by ID#) last and first name and selects a whole row instead - then wonders why it's not scaling upwards).

    3. Re:Call me skeptical by Cylix · · Score: 5, Funny

      I just select * from * and then sort it out with grep and cut.

      --
      "You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
    4. Re:Call me skeptical by vadim_t · · Score: 4, Informative

      A lot of people don't understand how a database really works, so they do it horribly wrong. As a result, it's dreadfully slow. So they go and use some key/value lookup system because "they're fast". There you often get one of two things:

      They still don't understand the problem, so they recreate it yet again. If you don't understand what's wrong with reading an entire table with a million records, and discarding all but 5 of them client-side, then replacing the SQL DB with a key/value system just isn't going to make things better.

      Or, they improve performance, but since they don't understand what ACID is for, they eventually end up with weird inconsistencies. In some cases this might be acceptable, but you really don't want to see it happening in an order tracking system.

      The sickening feeling people get is not because it's a competitor. In a large part it isn't a competitor, but a different class of system with different tradeoffs. The sickening feeling comes from seeing people not understand what they're doing, and then run towards the latest technology because it's what $BIG_COMPANY uses without understanding it any better, and generally making an even bigger mess.

      The performance of specialized solutions like key/value systems doesn't come from magic. They're not really new, and don't use anything very groundbreaking. They simply use different tradeoffs at the cost of sacrificing quite a lot of what is present in a RDBMS. It's important to understand first whether you can really afford to discard those things, because if you can't, it's either not going to work right, or you'll have to graft all that you removed on top of it anyway.

    5. Re:Call me skeptical by Natural+Join · · Score: 5, Interesting

      The small startups are using NoSQL because there is, more and more, a push in the web app market to store data which does not fit into any schema.

      There is no such thing as "data which does not fit into any schema", just like there is no such thing as data which cannot be encoded into binary. All data necessarily has a schema. However much or little of the schema you may choose to model in your (SQL or other type of) schema is, like the rest of software engineering, a design tradeoff.

      The various NoSQL approaches do not solve the full generality of data management problems the way SQL databases do. They are narrower in scope, and as is generally the case, they can achieve better performance by virtue of doing less. They can be much faster with certain data access paths, but at a cost of the fact that other data access paths become prohibitive.

      The frustrating thing for many of us is that the NoSQL spin on data management is about where mainstream data management was in the 1960s. As the field matured, it learned many important lessons, all of which are now being tossed out the window by people saying "oh we don't need that" but of course, they just haven't needed it yet. As these problems become apparent to them, they will spend the next decades of their lives reinventing what the data management field figured out in the 80s and 90s. Until then, they'll be making beginner mistakes, like thinking that their data somehow doesn't fit into any schema.

  5. Is it a technical or a budget problem? by ducomputergeek · · Score: 4, Insightful

    Given my past 12 years between working at consultancies and start ups, I've seen this a few times. It's usually not a technical hurdle, it's a "We can't solve this problem within our budget" problem. Either by going out and hiring someone who is an expert at performance tuning with their DB of choice or moving from certain db's to real databases that could handle the work like MSSQL, DB2, Oracle, or in some cases Teradata if dealing with Data warehousing.

    Because I've worked around some very large database installs in my day. Every time the scaling question/problem came up, it was solvable with RDBMS's, but the solution wasn't cheap.

    --
    "The problem with socialism is eventually you run out of other people's money" - Thatcher.
  6. you're doing something wrong by Surt · · Score: 4, Insightful

    "I'm currently responsible for operations at a software-as-a-service startup, and we're increasingly hitting limitations in what we can do with relational databases. "

    Relational databases scale to pretty amazing heights. The notion that you are hitting some limit of relational databases at a startup stretches the imagination. I mean, really, you've already hit exabyte data sizes? That's typically where relational starts to struggle.

    You really need to define your problem with much greater specificity to get a valuable answer.

    --
    "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
  7. Wow by mlyle · · Score: 5, Informative

    I didn't expect we'd be on Slashdot just yet. I'm Michael Lyle, CTO and cofounder of Translattice.

    With regards to the original submitter's question, we'd love to talk to him. How much we can help, of course, depends on the specific scenario he's hitting.

    What we've built is an application platform constituted from identical nodes, each containing a geographically decentralized relational database, a distributed (J2EE compatible) application container, and distributed load balancing and management capabilities. Massive relational data is transparently sharded behind the scenes and assigned redundantly to the computing resources in the cluster, and a distributed consensus protocol keeps all of the transactions in flight coherent and provides ACID guarantees. In essence, we allow existing enterprise applications to scale out horizontally while keeping the benefits of the existing programming model for transactional applications, by letting computing resources from throughout an organization combine to run enterprise workloads.

    Current stacks are really complicated, multi-vendor, and require extensive integration/custom engineering for each application install. We're striving to create a world where massively performing infrastructure can be built from identical pieces.

    1. Re:Wow by Cylix · · Score: 4, Insightful

      He posted to slashdot.... do you really think he can afford you?

      --
      "You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
    2. Re:Wow by Squeebee · · Score: 5, Funny

      Congratulations, you just won Slashdot's buzzword bingo, please collect your prize at the cashier window in the back of the hall.

    3. Re:Wow by mlyle · · Score: 4, Interesting

      The short answer is, CA/CP/AP on a transaction-by-transaction basis depending on application requirements. Also of note: network delay is effectively a special "partition", requiring an engine that can have massive workloads in flight and reconcile/order non-commutative changesets in a distributed fashion.

  8. Justification for new toys? by StuartHankins · · Score: 5, Insightful

    The post is so vaguely worded, I imagine the author is merely trying to find some justification to purchase some new toys. "See, Slashdot people think this is a good idea!"

    I agree with most of the posts so far -- if you're truly hitting a limit, you are most likely doing something wrong. Hire an outside DBA to make recommendations if you don't have the resources in-house. I strongly suspect this is the real issue.

  9. MySQL scales just fine. by poptix_work · · Score: 4, Interesting

    I work with some very high traffic sites, storing large data sets (100GB+).

      Depending on the application (if it allows for different write-only/read-only database configurations) we'll have a master-master replication setup, then a number of slaves hanging off each MySQL master. In front of all of this is haproxy* which performs TCP load balancing between all slaves, and all masters. Slaves that fall behind the master are automatically removed from the pool to ensure that clients receive current data.

      This provides:
      * Redundancy
      * Scaling
      * Automatic failover

      The whole NoSQL movement is as bad as the XML movement. I'm sure it's a great idea in some cases, but otherwise it's a solution looking for a problem.

    (*) http://haproxy.1wt.eu/

    --
    Just because you disagree doesn't make it offtopic or flamebait.
  10. Re:What you should really be doing... by ErikZ · · Score: 4, Funny

    I could do that, but your tears are delicious.

    --
    Democrats or Republicans. They are both taking us to the same place and they are not afraid of us anymore.
  11. Rick Cattell's work on scalable datastores by MoxFulder · · Score: 5, Informative

    I recently came across Rick Cattell's site which addresses just the questions you're asking.

    Rick Cattell has written an excellent comparison guide of horizontally scalable datastores of different types (RDBMS as well as a variety of NoSQL systems).

    Cattell has also written an academic paper with database expert Mike Stonebraker, which weighs the system design factors required to make a datastore scalable.

    Executive summary of Cattell's work: although NoSQL may be a huge fad, the things that make a datastore scalable can be implemented in SQL RDBMS systems as well. Also, implementing do-it-yourself ACID in NoSQL systems is extremely difficult and error-prone, and is a significant advantage of most RDBMS systems. Stonebraker is the author of VoltDB, which is an open-source RDBMS designed for horizontal scalability, but they give a very fair and thorough look at competing datastores as well.