Yale Researchers Prove That ACID Is Scalable

← Back to Stories (view on slashdot.org)

Yale Researchers Prove That ACID Is Scalable

Posted by CmdrTaco on Wednesday September 1, 2010 @04:49AM from the i-could-prove-lunch dept.

An anonymous reader writes "The has been a lot of buzz in the industry lately about NoSQL databases helping Twitter, Amazon, and Digg scale their transactional workloads. But there has been some recent pushback from database luminaries such as Michael Stonebraker. Now, a couple of researchers at Yale University claim that NoSQL is no longer necessary now that they have scaled traditional ACID compliant database systems."

12 of 272 comments (clear)

Min score:

Reason:

Sort:

Re:Pfah. by TheSunborn · 2010-09-01 05:05 · Score: 3, Insightful

It was newer database size which were the problem but the number of queries per second(Aka performance) which could be executed.
You can run a Google size database from MySQL, but you can't use to MySQL* to implement a search solution with performance like Google, without requiring much much much hardware.
*Or an other sql database.
Re:Pfah. by mini+me · 2010-09-01 05:06 · Score: 5, Insightful

NoSQL is not really about scalability, it is about modelling your data the same way your application does.
There is a strong disconnect between the way SQL represents data and the way traditional programming languages do. While we've come up with some clever solutions like ORM to alleviate the problem, why not just store the data directly without any mapping?
I am not suggesting that SQL is never the right tool for the job, but it most certainly is not the right tool for every job. It is good to have many different kinds of hammers, and perhaps even a screwdriver or two.
Possible != Practical by Tablizer · 2010-09-01 05:17 · Score: 3, Insightful

A bigger issue may be the cost of ACID even if it can in theory scale. Supporting ACID is not free. A free web service may be able to afford losing say 1 out of 10,000 web transactions. Banks cannot do it, but Google Experiments can. The extra expense of big-iron ACID may not make up for the relatively minor cost of losing an occasional transaction or customer. It's a business decision.

--
Table-ized A.I.
Re:Pfah. by bluefoxlucid · 2010-09-01 05:28 · Score: 5, Insightful

There is a strong disconnect between the way SQL represents data and the way traditional programming languages do.
Yes but there is a strong disconnect between computer RAM and information. Computer RAM contains DATA; information comes in associated tables. Relational databases represent data in tables with indexes, keys, etc. A Person is unique (has a unique ID), but they may share First Name, Last Name, and even Address (junior/senior in same household). There are many Races, and a Person will be of a given Race (or mix, but this is horribly difficult to index anyway). A Person will own a specific Car; that Car, in turn, will be a particular Make-Model-Year-Trim, which itself is a hierarchy of tables (Trim and Year are pretty separate, Model however will be of a particular Make, while a particular car available is going to be Model-Year-Trim).
Indexing and relating data in this way turns it into information, which is what we want and need. Separating the data eliminates redundancies and lets us use fewer buffers along the way, crunching down smaller tables and making fast comparisons to small-size keys before we even reference big, complex tables. Meanwhile, we're still essentially asking questions like "Find me all people who own a 1996-2010 Year Toyota Prius." Someone might own 15 cars, so we're looking in the table of all individual Cars with MYT where table MYT.Model = (Toyota Prius) and .Year is between 1996 and 2010, and pulling all entries in table Persons for each unique Cars.Owner = Persons.ID (an inner join).
Information theory versus programming. We're studying information here. We might have something more interesting to do than look in a giant array of Cars[VIN] = &Owners[Index]. For the actual data, the model we use makes sense; programmers get an API that says "Yeah, ask me a specific structured question and I'll give you a two-dimensional array to work with as an answer." That two-dimensional array is suitable for programming logic to manipulate specific structured data; extracting that data from the huge store of structured information is complex, but handled by a front-end that has its own language. You tell that front-end to find this data based on these parameters and string it together; it does tons of programming shit to search, sort, select, copy, and structure the data for you.

--
Support my political activism on Patreon.
Re:Pfah. by DragonWriter · 2010-09-01 05:29 · Score: 4, Insightful

NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL - regularly deal with data volumes at Google and Walmart that make the sites that built these databases in desperation look positively tiny.
Database size was never the main driving force beyond the new move toward NoSQL databases. Support for distributed architectures is. In part, this is about handling lots of queries rather than handling lots of data; it also -- particularly if you are Google -- deals with latency when the consumers of data are widely distributed geographically.
And note that one of the companies that is heavily involved in building, using, and supplying non-SQL distributed databases is Google, who, as you so well point out, is very much aware of both the capabilities and limits of scaling with current relational DBs.
This new research may offer new prospects for better databases in the future -- but TFA indicates that the new design has a limitation which seems common in distributed, strongly-consistent system "It turns out that the deterministic scheme performs horribly in disk-based environments".
In fact, given that it proposes strong consistency, distribution, and relies on in-memory operation for performance, it sounds a lot like existing distributed, strongly-consistent systems based around the Paxos algorithm, like Scalaris. And it seems likely to face the same criticism from those who think that durability requires disk-based persistence, and that replacing storage on disks (which, one should keep in mind, can also fail) with storage in-memory simultaneously on a sufficient number of servers (which, yes, could all simultaneously fail, but durability is never absolute, its at best a matter of the degree to which data is protected against probable simultaneous combinations of failures.)
So -- reading only the blog post that is TFA announcing the paper and not the paper itself yet -- I don't get the impression that this is necessary are giant leap forward, though more work on distributed, strongly-consistent databases is certainly a good thing.
Re:I hate SQL and Databases in General... by jeff4747 · 2010-09-01 05:34 · Score: 5, Insightful

Why is it that we continue to use a technology based on a 1960's view of a problem when clearly there ARE other solutions and ways to approach said problem?
Because it works.
"It's old" is a terrible reason to replace something. Go back to your previous arguments an you have a case. After all, a Core i7 is based on a 1960's view of a problem with an enormous number of band-aids applied in the intervening years, but you don't seem too concerned with replacing that.
You hate what you don't understand by frist · 2010-09-01 05:38 · Score: 5, Insightful

Sounds like you don't really understand what you're talking about. The reason we continue to use ACID compliant RDBMS is because they work and they work well. If you don't think that RDBMS have changed over the years, you're simply lacking experience. I feel this is most likely the case as you comlain about the interface language (SQL), and don't understand how to CM stored procedures, or how to test a DB (OMG I have to make a copy of the DB to test - so hard!) Comlaining about the overhead of using an RDBMS in an application that doesn't require an RDBMS is tantamount to complaining about how hot you get while wearing a spacsuit when you jog in the park.
Re:digg does not need to worry anymore by Dan667 · 2010-09-01 05:53 · Score: 4, Insightful

actually most of the change was to allow auto submitting of stories from big publishers/companies. They basically changed digg into a paid for RSS ad service. If you hated the gaming of the old site digg I am sure you just stopped using the new site digg all together. No one goes to a website to read ads.
Re:Pfah. by Anpheus · 2010-09-01 06:17 · Score: 3, Insightful

Well, and if you don't need it [the guarantees of ACID], why pay for it? I mean, if you have to spend any amount of time thinking about "How do I make that work?" that's a cost.
Whereas if all you care about is updating individual records without global consistency, well, don't enforce global consistency.
Re:Pfah. by h4nk · 2010-09-01 06:55 · Score: 5, Insightful

Well said. This "problem" has more to do with architects and developers understanding the concepts of layering and information hiding. When programmers are allowed to dictate architecture under the pretense that certain interfaces to a Service should determine the structure of the Information itself, there is a huge problem at the business level. How does this happen? Uninvolved, or under-skilled DBAs and data architects. This is their job. My experience is that business managers and programmers have always seen the database as some sort of necessary evil without understanding its full purpose. Too many programmers with very little database experience are given direct access to databases themselves. The motivation of "Get it to work" takes precedence over well-researched and proven approaches, approaches that will only benefit in the long run. Companies that implement poor strategies for the sake of short-term gains usually have the idea that the best approach is somehow the one that takes the most time to implement. Short-sighted solutions are put into play and almost as soon as they are implemented, the scalability and data requirement issues begin to crop. These poor strategies are often the result of inexperience and poor education on all levels. This is why it is so important to hire people that really know what they are doing from C-level management down to the programmers. I have seen bad thinking gut companies. A service built on sound architecture will have issues maturing, not doubt. How well it matures depends on the wisdom and skill of the company.
Whose data is it? by sbjornda · 2010-09-01 07:17 · Score: 3, Insightful

but it stores its data in a way that doesn't require me to deconstruct all of my data structures into tables.
I take it this is not business-type data? Otherwise you're doing it backwards. Start with your Entity-Relationship diagrams, devolve into logical than physical data models, and THEN start programming.
I forget who said it but it's true: The data belongs to the business, not to the application. The data should be structured and stored in a way that it will still be readable years after your program has become obsolete. (Unless it's data that has a short "best before" date.)
--
.nosig
Re:Pfah. by hey! · 2010-09-01 10:51 · Score: 5, Insightful

NoSQL is not really about scalability, it is about modelling your data the same way your application does.
I've actually been in the business long enough to remember when relational databases were the new thing. What people seem to forget is that modeling your data in a different way than your application does *was the whole point*. The idea was to make data a reusable resource *across applications*. Of course, that turned out to be a lot harder than we thought it would be. Philosophically, one might well ask whether it is possible to understand data at all apart from its intended applications. Of course, by the time we'd figured that out, a whole new generation was coming up trying to create a Semantic Web.
I basically agree that SQL isn't always the right tool for the job. I happen to think certain aspects of the relational model are somewhat broken (e.g. composite keys), and SQL is a pretty crappy query language in any case. But I think because RDBMSs are a mature technology, recently trained programmers don't bother to understand them, and cover that lack of understanding by pooh-pooh-ing the stuff that's over their head. I went through a patch a few years ago where I was interviewing programming candidates who had XML coming out of their ears but hadn't the foggiest idea of what "NULL" means in the relational model. Naturally they had all kinds of problems on the relational end of things, and tended to view the RDBMS as a kind of pitfall in which bad things inexplicably happen. Consequently, they tended to think of the database as simply a backing store for the application *they* were working on. In some cases this is acceptable, but one often sees abominable schema that are the product of ignorance, pure and simple.
Naturally, non-relational systems are most attractive where performance is at a higher premium than flexibility. This characterizes many web applications that do a small number of relatively simple things, but to do it on a scale that takes special expertise to achieve using a relational model. That was very much the case at the beginning of the relational era, when applications tended to be narrower in scope and query optimization primitive. You thought of order line items as "part-of" an order, whereas in relational thinking they could just as easily be considered attributes of products. This made the programmer's job a lot easier, so long as the RDBMS could process invoices fast enough to make the users happy.

--
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.