Slashdot Mirror


Yale Researchers Prove That ACID Is Scalable

An anonymous reader writes "The has been a lot of buzz in the industry lately about NoSQL databases helping Twitter, Amazon, and Digg scale their transactional workloads. But there has been some recent pushback from database luminaries such as Michael Stonebraker. Now, a couple of researchers at Yale University claim that NoSQL is no longer necessary now that they have scaled traditional ACID compliant database systems."

23 of 272 comments (clear)

  1. Pfah. by stonecypher · · Score: 5, Interesting

    NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL - regularly deal with data volumes at Google and Walmart that make the sites that built these databases in desperation look positively tiny.

    Digg's engineers wear clown shoes to work.

    --
    StoneCypher is Full of BS
    1. Re:Pfah. by mini+me · · Score: 5, Insightful

      NoSQL is not really about scalability, it is about modelling your data the same way your application does.

      There is a strong disconnect between the way SQL represents data and the way traditional programming languages do. While we've come up with some clever solutions like ORM to alleviate the problem, why not just store the data directly without any mapping?

      I am not suggesting that SQL is never the right tool for the job, but it most certainly is not the right tool for every job. It is good to have many different kinds of hammers, and perhaps even a screwdriver or two.

    2. Re:Pfah. by bluefoxlucid · · Score: 5, Interesting

      NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL - regularly deal with data volumes at Google

      Google uses BigTable, a NoSQL database.

    3. Re:Pfah. by bluefoxlucid · · Score: 5, Insightful

      There is a strong disconnect between the way SQL represents data and the way traditional programming languages do.

      Yes but there is a strong disconnect between computer RAM and information. Computer RAM contains DATA; information comes in associated tables. Relational databases represent data in tables with indexes, keys, etc. A Person is unique (has a unique ID), but they may share First Name, Last Name, and even Address (junior/senior in same household). There are many Races, and a Person will be of a given Race (or mix, but this is horribly difficult to index anyway). A Person will own a specific Car; that Car, in turn, will be a particular Make-Model-Year-Trim, which itself is a hierarchy of tables (Trim and Year are pretty separate, Model however will be of a particular Make, while a particular car available is going to be Model-Year-Trim).

      Indexing and relating data in this way turns it into information, which is what we want and need. Separating the data eliminates redundancies and lets us use fewer buffers along the way, crunching down smaller tables and making fast comparisons to small-size keys before we even reference big, complex tables. Meanwhile, we're still essentially asking questions like "Find me all people who own a 1996-2010 Year Toyota Prius." Someone might own 15 cars, so we're looking in the table of all individual Cars with MYT where table MYT.Model = (Toyota Prius) and .Year is between 1996 and 2010, and pulling all entries in table Persons for each unique Cars.Owner = Persons.ID (an inner join).

      Information theory versus programming. We're studying information here. We might have something more interesting to do than look in a giant array of Cars[VIN] = &Owners[Index]. For the actual data, the model we use makes sense; programmers get an API that says "Yeah, ask me a specific structured question and I'll give you a two-dimensional array to work with as an answer." That two-dimensional array is suitable for programming logic to manipulate specific structured data; extracting that data from the huge store of structured information is complex, but handled by a front-end that has its own language. You tell that front-end to find this data based on these parameters and string it together; it does tons of programming shit to search, sort, select, copy, and structure the data for you.

    4. Re:Pfah. by DragonWriter · · Score: 4, Insightful

      NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL - regularly deal with data volumes at Google and Walmart that make the sites that built these databases in desperation look positively tiny.

      Database size was never the main driving force beyond the new move toward NoSQL databases. Support for distributed architectures is. In part, this is about handling lots of queries rather than handling lots of data; it also -- particularly if you are Google -- deals with latency when the consumers of data are widely distributed geographically.

      And note that one of the companies that is heavily involved in building, using, and supplying non-SQL distributed databases is Google, who, as you so well point out, is very much aware of both the capabilities and limits of scaling with current relational DBs.

      This new research may offer new prospects for better databases in the future -- but TFA indicates that the new design has a limitation which seems common in distributed, strongly-consistent system "It turns out that the deterministic scheme performs horribly in disk-based environments".

      In fact, given that it proposes strong consistency, distribution, and relies on in-memory operation for performance, it sounds a lot like existing distributed, strongly-consistent systems based around the Paxos algorithm, like Scalaris. And it seems likely to face the same criticism from those who think that durability requires disk-based persistence, and that replacing storage on disks (which, one should keep in mind, can also fail) with storage in-memory simultaneously on a sufficient number of servers (which, yes, could all simultaneously fail, but durability is never absolute, its at best a matter of the degree to which data is protected against probable simultaneous combinations of failures.)

      So -- reading only the blog post that is TFA announcing the paper and not the paper itself yet -- I don't get the impression that this is necessary are giant leap forward, though more work on distributed, strongly-consistent databases is certainly a good thing.

    5. Re:Pfah. by Splab · · Score: 5, Funny

      "Your Googling May Vary."

      Yes, that is exactly the problem with NoSQL.

    6. Re:Pfah. by Trieuvan · · Score: 5, Informative

      It is if you use innodb .

    7. Re:Pfah. by GooberToo · · Score: 4, Funny

      Funny. Insightful. Informative. So many options with your post. I'm sure at least one moderator will get it figured out.

    8. Re:Pfah. by h4nk · · Score: 5, Insightful

      Well said. This "problem" has more to do with architects and developers understanding the concepts of layering and information hiding. When programmers are allowed to dictate architecture under the pretense that certain interfaces to a Service should determine the structure of the Information itself, there is a huge problem at the business level. How does this happen? Uninvolved, or under-skilled DBAs and data architects. This is their job. My experience is that business managers and programmers have always seen the database as some sort of necessary evil without understanding its full purpose. Too many programmers with very little database experience are given direct access to databases themselves. The motivation of "Get it to work" takes precedence over well-researched and proven approaches, approaches that will only benefit in the long run. Companies that implement poor strategies for the sake of short-term gains usually have the idea that the best approach is somehow the one that takes the most time to implement. Short-sighted solutions are put into play and almost as soon as they are implemented, the scalability and data requirement issues begin to crop. These poor strategies are often the result of inexperience and poor education on all levels. This is why it is so important to hire people that really know what they are doing from C-level management down to the programmers. I have seen bad thinking gut companies. A service built on sound architecture will have issues maturing, not doubt. How well it matures depends on the wisdom and skill of the company.

    9. Re:Pfah. by smooth+wombat · · Score: 4, Funny

      they tried to switch to Oracle, but it was too slow.

      What? Oracle too slow? How dare you besmirch the all-powerful Larry Ellison. We switched from a mainframe environment which handled all our sales data to an Oralce-based ERP system. I'll show you how fast this puppy now runs. Let me show you our sales data for the last month...

      Hang on, I'll get the answer in a minute...

      Bear with me, it will be here soon...

      Here's a bottle of Mountain Dew while you wait...

      Can I get you anything to snack on? M&M's? Doritos? A Snickers bar perhaps?

      --
      We will bankrupt ourselves in the vain search for absolute security. -- Dwight D. Eisenhower
    10. Re:Pfah. by DragonWriter · · Score: 5, Informative

      Doesn't work so well if you've got a graph structure or a tree. If in a family tree, you want to find all 5'th descendants or all descendants of some guy, SQL won't make you happy.

      A decade plus ago, and that would be true.

      Standard SQL from SQL-99 on will, in fact, do this quite easily with via recursive Common Table Expressions. Now, some SQL-based DBMSs don't support enough of the standard to use this, but, current versions of, I believe, DB2, Firebird, PostgreSQL, and SQL Server all implement standard CTEs well enough to do those examples in SQL fairly directly, and Oracle has its own proprietary syntax (CONNECT BY) that works for the examples that you pose, though its less general than SQL-99 recursive CTEs.

    11. Re:Pfah. by QuoteMstr · · Score: 5, Informative

      An ACID compliant RDBMS can't even get read access to the user, car, friend, picture and pet_survey_answer table set as long as any of the million users of the system is making a change to his data, even if the application only locks one table at a time for write access, let alone the problem of a million users trying to gain write access to the same table at the same time.

      You have no idea what you're talking about, probably because your brain has been irreversibly warped by MySQL. Concurrent writing is widely-supported.

      Hint: MVCC.

    12. Re:Pfah. by hey! · · Score: 5, Insightful

      NoSQL is not really about scalability, it is about modelling your data the same way your application does.

      I've actually been in the business long enough to remember when relational databases were the new thing. What people seem to forget is that modeling your data in a different way than your application does *was the whole point*. The idea was to make data a reusable resource *across applications*. Of course, that turned out to be a lot harder than we thought it would be. Philosophically, one might well ask whether it is possible to understand data at all apart from its intended applications. Of course, by the time we'd figured that out, a whole new generation was coming up trying to create a Semantic Web.

      I basically agree that SQL isn't always the right tool for the job. I happen to think certain aspects of the relational model are somewhat broken (e.g. composite keys), and SQL is a pretty crappy query language in any case. But I think because RDBMSs are a mature technology, recently trained programmers don't bother to understand them, and cover that lack of understanding by pooh-pooh-ing the stuff that's over their head. I went through a patch a few years ago where I was interviewing programming candidates who had XML coming out of their ears but hadn't the foggiest idea of what "NULL" means in the relational model. Naturally they had all kinds of problems on the relational end of things, and tended to view the RDBMS as a kind of pitfall in which bad things inexplicably happen. Consequently, they tended to think of the database as simply a backing store for the application *they* were working on. In some cases this is acceptable, but one often sees abominable schema that are the product of ignorance, pure and simple.

      Naturally, non-relational systems are most attractive where performance is at a higher premium than flexibility. This characterizes many web applications that do a small number of relatively simple things, but to do it on a scale that takes special expertise to achieve using a relational model. That was very much the case at the beginning of the relational era, when applications tended to be narrower in scope and query optimization primitive. You thought of order line items as "part-of" an order, whereas in relational thinking they could just as easily be considered attributes of products. This made the programmer's job a lot easier, so long as the RDBMS could process invoices fast enough to make the users happy.

      --
      Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
    13. Re:Pfah. by LurkerXXX · · Score: 4, Informative

      An ACID compliant RDBMS can't even get read access to the user, car, friend, picture and pet_survey_answer table set as long as any of the million users of the system is making a change to his data, even if the application only locks one table at a time for write access, let alone the problem of a million users trying to gain write access to the same table at the same time.

      Wow. Just wow. Any serious ACID complient RDBMS can do that with no problem.

    14. Re:Pfah. by Johnno74 · · Score: 5, Interesting

      Totally agree. Only problem is writing recursive CTE queries is beyond most programmers. Hell, a lot of programmers struggle with anything but simple inner joins.

      IMHO CTE's are one of the most underused and powerful features of SQL. Not just for recursive queries, but for bridging the gap between functional and procedural programming.

      I write all my complex queries as a series of simple CTE's now - each CTE gets me one step closer to the actual query I need, and the magic of the query optimizer combines them all into a single query plan. Makes testing, debugging and maintaining a complex query about a million times easier.

  2. digg does not need to worry anymore by Dan667 · · Score: 5, Funny

    digg has chased all their users away with the new version of their site so they could probably change over to MS Access and be ok.

    1. Re:digg does not need to worry anymore by BabyDuckHat · · Score: 5, Funny

      Yeah, now instead of being a glorified RSS feed for reddit, they're an actual RSS feed for reddit. Great change!

    2. Re:digg does not need to worry anymore by Dan667 · · Score: 4, Insightful

      actually most of the change was to allow auto submitting of stories from big publishers/companies. They basically changed digg into a paid for RSS ad service. If you hated the gaming of the old site digg I am sure you just stopped using the new site digg all together. No one goes to a website to read ads.

  3. Interesting thesis by Peeteriz · · Score: 5, Interesting

    In essence, TFA claims that if the traditional ACID guarantee "if three transactions (let's call them A, B and C) are active ... the resulting database state will be the same as if it had run them one-by-one. No promises are made, however, about which particular order execution it will be equivalent to: A-B-C, B-A-C, A-C-B" is not abandoned (as in NoSQL systems), but is even strengthened to a guarantee that the result will always be as if they arrived in A-B-C order, then it solves all kinds of possible replication problems, requires less networking between the many servers involved, and allows for high scaling while also keeping all the integrity constraints.

  4. Re:I hate SQL and Databases in General... by jeff4747 · · Score: 5, Insightful

    Why is it that we continue to use a technology based on a 1960's view of a problem when clearly there ARE other solutions and ways to approach said problem?

    Because it works.

    "It's old" is a terrible reason to replace something. Go back to your previous arguments an you have a case. After all, a Core i7 is based on a 1960's view of a problem with an enormous number of band-aids applied in the intervening years, but you don't seem too concerned with replacing that.

  5. Re:I hate SQL and Databases in General... by poet · · Score: 5, Informative

    Spoken with proud ignorance.

    Anyone who has properly scaled an application knows the database isn't the problem. If it was, it wouldn't take 12 applications servers to bring the thing to its knees. That said, most of your gripes equate to:

    I am not a DBA and therefore I do not understand DBA and therefore I must complain.

    Further SQL has nothing to do with ACID. AT ALL!

    --
    Get your PostgreSQL here: http://www.commandprompt.com/
  6. You hate what you don't understand by frist · · Score: 5, Insightful

    Sounds like you don't really understand what you're talking about. The reason we continue to use ACID compliant RDBMS is because they work and they work well. If you don't think that RDBMS have changed over the years, you're simply lacking experience. I feel this is most likely the case as you comlain about the interface language (SQL), and don't understand how to CM stored procedures, or how to test a DB (OMG I have to make a copy of the DB to test - so hard!) Comlaining about the overhead of using an RDBMS in an application that doesn't require an RDBMS is tantamount to complaining about how hot you get while wearing a spacsuit when you jog in the park.

  7. Summary by azmodean+1 · · Score: 4, Informative

    Short Summary:
    We make some claims about scaling ACID databases, but then don't support them.

    Longer summary:
    We don't like NoSQL and enjoy making baseless cracks about it such as it being a "lazy" approach.
    In our paper we demonstrate that our unconventional version of an ACID database scales better than a traditional ACID database in a specific environment, while merely throwing away some robustness guarantees and changing how transaction ordering works.
    No direct comparison to any NoSQL implementation is made.

    So yea, I'm not holding my breath for companies to start migrating away from NoSQL.