Yale Researchers Prove That ACID Is Scalable
An anonymous reader writes "The has been a lot of buzz in the industry lately about NoSQL databases helping Twitter, Amazon, and Digg scale their transactional workloads. But there has been some recent pushback from database luminaries such as Michael Stonebraker. Now, a couple of researchers at Yale University claim that NoSQL is no longer necessary now that they have scaled traditional ACID compliant database systems."
NoSQL is not really about scalability, it is about modelling your data the same way your application does.
There is a strong disconnect between the way SQL represents data and the way traditional programming languages do. While we've come up with some clever solutions like ORM to alleviate the problem, why not just store the data directly without any mapping?
I am not suggesting that SQL is never the right tool for the job, but it most certainly is not the right tool for every job. It is good to have many different kinds of hammers, and perhaps even a screwdriver or two.
There is a strong disconnect between the way SQL represents data and the way traditional programming languages do.
Yes but there is a strong disconnect between computer RAM and information. Computer RAM contains DATA; information comes in associated tables. Relational databases represent data in tables with indexes, keys, etc. A Person is unique (has a unique ID), but they may share First Name, Last Name, and even Address (junior/senior in same household). There are many Races, and a Person will be of a given Race (or mix, but this is horribly difficult to index anyway). A Person will own a specific Car; that Car, in turn, will be a particular Make-Model-Year-Trim, which itself is a hierarchy of tables (Trim and Year are pretty separate, Model however will be of a particular Make, while a particular car available is going to be Model-Year-Trim).
Indexing and relating data in this way turns it into information, which is what we want and need. Separating the data eliminates redundancies and lets us use fewer buffers along the way, crunching down smaller tables and making fast comparisons to small-size keys before we even reference big, complex tables. Meanwhile, we're still essentially asking questions like "Find me all people who own a 1996-2010 Year Toyota Prius." Someone might own 15 cars, so we're looking in the table of all individual Cars with MYT where table MYT.Model = (Toyota Prius) and .Year is between 1996 and 2010, and pulling all entries in table Persons for each unique Cars.Owner = Persons.ID (an inner join).
Information theory versus programming. We're studying information here. We might have something more interesting to do than look in a giant array of Cars[VIN] = &Owners[Index]. For the actual data, the model we use makes sense; programmers get an API that says "Yeah, ask me a specific structured question and I'll give you a two-dimensional array to work with as an answer." That two-dimensional array is suitable for programming logic to manipulate specific structured data; extracting that data from the huge store of structured information is complex, but handled by a front-end that has its own language. You tell that front-end to find this data based on these parameters and string it together; it does tons of programming shit to search, sort, select, copy, and structure the data for you.
Support my political activism on Patreon.
Because it works.
"It's old" is a terrible reason to replace something. Go back to your previous arguments an you have a case. After all, a Core i7 is based on a 1960's view of a problem with an enormous number of band-aids applied in the intervening years, but you don't seem too concerned with replacing that.
Sounds like you don't really understand what you're talking about. The reason we continue to use ACID compliant RDBMS is because they work and they work well. If you don't think that RDBMS have changed over the years, you're simply lacking experience. I feel this is most likely the case as you comlain about the interface language (SQL), and don't understand how to CM stored procedures, or how to test a DB (OMG I have to make a copy of the DB to test - so hard!) Comlaining about the overhead of using an RDBMS in an application that doesn't require an RDBMS is tantamount to complaining about how hot you get while wearing a spacsuit when you jog in the park.
Well said. This "problem" has more to do with architects and developers understanding the concepts of layering and information hiding. When programmers are allowed to dictate architecture under the pretense that certain interfaces to a Service should determine the structure of the Information itself, there is a huge problem at the business level. How does this happen? Uninvolved, or under-skilled DBAs and data architects. This is their job. My experience is that business managers and programmers have always seen the database as some sort of necessary evil without understanding its full purpose. Too many programmers with very little database experience are given direct access to databases themselves. The motivation of "Get it to work" takes precedence over well-researched and proven approaches, approaches that will only benefit in the long run. Companies that implement poor strategies for the sake of short-term gains usually have the idea that the best approach is somehow the one that takes the most time to implement. Short-sighted solutions are put into play and almost as soon as they are implemented, the scalability and data requirement issues begin to crop. These poor strategies are often the result of inexperience and poor education on all levels. This is why it is so important to hire people that really know what they are doing from C-level management down to the programmers. I have seen bad thinking gut companies. A service built on sound architecture will have issues maturing, not doubt. How well it matures depends on the wisdom and skill of the company.
NoSQL is not really about scalability, it is about modelling your data the same way your application does.
I've actually been in the business long enough to remember when relational databases were the new thing. What people seem to forget is that modeling your data in a different way than your application does *was the whole point*. The idea was to make data a reusable resource *across applications*. Of course, that turned out to be a lot harder than we thought it would be. Philosophically, one might well ask whether it is possible to understand data at all apart from its intended applications. Of course, by the time we'd figured that out, a whole new generation was coming up trying to create a Semantic Web.
I basically agree that SQL isn't always the right tool for the job. I happen to think certain aspects of the relational model are somewhat broken (e.g. composite keys), and SQL is a pretty crappy query language in any case. But I think because RDBMSs are a mature technology, recently trained programmers don't bother to understand them, and cover that lack of understanding by pooh-pooh-ing the stuff that's over their head. I went through a patch a few years ago where I was interviewing programming candidates who had XML coming out of their ears but hadn't the foggiest idea of what "NULL" means in the relational model. Naturally they had all kinds of problems on the relational end of things, and tended to view the RDBMS as a kind of pitfall in which bad things inexplicably happen. Consequently, they tended to think of the database as simply a backing store for the application *they* were working on. In some cases this is acceptable, but one often sees abominable schema that are the product of ignorance, pure and simple.
Naturally, non-relational systems are most attractive where performance is at a higher premium than flexibility. This characterizes many web applications that do a small number of relatively simple things, but to do it on a scale that takes special expertise to achieve using a relational model. That was very much the case at the beginning of the relational era, when applications tended to be narrower in scope and query optimization primitive. You thought of order line items as "part-of" an order, whereas in relational thinking they could just as easily be considered attributes of products. This made the programmer's job a lot easier, so long as the RDBMS could process invoices fast enough to make the users happy.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.