Yale Researchers Prove That ACID Is Scalable

← Back to Stories (view on slashdot.org)

Yale Researchers Prove That ACID Is Scalable

Posted by CmdrTaco on Wednesday September 1, 2010 @04:49AM from the i-could-prove-lunch dept.

An anonymous reader writes "The has been a lot of buzz in the industry lately about NoSQL databases helping Twitter, Amazon, and Digg scale their transactional workloads. But there has been some recent pushback from database luminaries such as Michael Stonebraker. Now, a couple of researchers at Yale University claim that NoSQL is no longer necessary now that they have scaled traditional ACID compliant database systems."

12 of 272 comments (clear)

Min score:

Reason:

Sort:

Pfah. by stonecypher · 2010-09-01 04:53 · Score: 5, Interesting

NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL - regularly deal with data volumes at Google and Walmart that make the sites that built these databases in desperation look positively tiny.
Digg's engineers wear clown shoes to work.

--
StoneCypher is Full of BS
1. Re:Pfah. by Anonymous Coward · 2010-09-01 05:00 · Score: 1, Interesting
  
  NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL [...]
  Is MySQL ACID?
2. Re:Pfah. by bluefoxlucid · 2010-09-01 05:14 · Score: 5, Interesting
  
  NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL - regularly deal with data volumes at Google
  Google uses BigTable, a NoSQL database.
  
  --
  Support my political activism on Patreon.
3. Re:Pfah. by RAMMS+EIN · 2010-09-01 08:15 · Score: 3, Interesting
  
  ``There is a strong disconnect between the way SQL represents data and the way traditional programming languages do.''
  I agree, but ...
  ``While we've come up with some clever solutions like ORM to alleviate the problem,''
  I don't think ORM alleviates the problem so much as entrenches it. The classes-and-instances object model and the relational model are different, but can be expressed in one another. Object-relational mapping makes this easy by pretending the models are the same, and doing the mapping behind the scenes. This works for some cases, but if you want to get the best performance, you have to express things in a way that takes into account the efficiency considerations of the actual implementation. With ORM, you run into the situation where what is most succinct to express in code is not necessarily what is most efficient in terms of disk access and network resource usage. So, for efficiency reasons, you end up breaking the abstractions that your ORM provided ...
  ``why not just store the data directly without any mapping?''
  There isn't really such a thing as "without any mapping". However, you can ensure that the constructs your API provides are equivalent to what you can efficiently fetch or store in your data store. Since typical RDBMSs are usually optimized to execute typical SQL queries efficiently, SQL is actually a fairly good starting point. You can optimize this by creating indices to speed up common operations, and by tuning your RDBMS to speed up common operations. And, no doubt, you can do even better by creating custom shortcuts for specific needs of your application.
  This is sort of what so-called NoSQL databases do: they are optimized for specific scenarios, and thus may outperform stock RDBMSs that are optimized for "we don't know what you want to do, so we try to make everything reasonably fast". It's also worth noting that NoSQL systems often return stale data or even allow inconsistencies in order to improve performance. By contrast, the strength of a good relational database is preserving the integrity of your data no matter what happens. Different tools for different jobs - or at least, different optimizations for different scenarios.
  
  --
  Please correct me if I got my facts wrong.
4. Re:Pfah. by NNKK · 2010-09-01 08:47 · Score: 3, Interesting
  
  That is an excellent question for a DBA evaluation exercise.
  So...
  Efficient SQL Usage == Programmer + DBA
  Efficient NoSQL Usage == Programmer
  Thank you for making the case for NoSQL so clearly.
5. Re:Pfah. by Johnno74 · 2010-09-01 14:00 · Score: 5, Interesting
  
  Totally agree. Only problem is writing recursive CTE queries is beyond most programmers. Hell, a lot of programmers struggle with anything but simple inner joins.
  IMHO CTE's are one of the most underused and powerful features of SQL. Not just for recursive queries, but for bridging the gap between functional and procedural programming.
  I write all my complex queries as a series of simple CTE's now - each CTE gets me one step closer to the actual query I need, and the magic of the query optimizer combines them all into a single query plan. Makes testing, debugging and maintaining a complex query about a million times easier.
Interesting thesis by Peeteriz · 2010-09-01 05:09 · Score: 5, Interesting

In essence, TFA claims that if the traditional ACID guarantee "if three transactions (let's call them A, B and C) are active ... the resulting database state will be the same as if it had run them one-by-one. No promises are made, however, about which particular order execution it will be equivalent to: A-B-C, B-A-C, A-C-B" is not abandoned (as in NoSQL systems), but is even strengthened to a guarantee that the result will always be as if they arrived in A-B-C order, then it solves all kinds of possible replication problems, requires less networking between the many servers involved, and allows for high scaling while also keeping all the integrity constraints.
Re:digg does not need to worry anymore by Kaboom13 · 2010-09-01 05:13 · Score: 2, Interesting

Because the entire site had been completely overwhelmed by spammers? Digg went from a great site to go see whats new to a glorified RSS feed for cracked.com , college humor and reddit. They had to change something,
NoSQL is about a lot of things. by Ouija · 2010-09-01 06:02 · Score: 2, Interesting

SQL syntax is dated and very obtuse. Just look at the different syntax between insert and an update. ...wouldn't you rather just have "save"?
Object-relational mapping is cumbersome and mis-matched in SQL. 1:many either yields n+1 queries or a monster cartesian product set. And, what about inheritance? It just doesn't jive.
It isn't about losing ACID- although not every purpose needs ACID. Your average shared drive filesystem isn't ACID, for example.
When you have anemic domains that aren't nailed down and need to be readily flexible without big re-designs, JSON-based No-SQL works very well.
When you want to avoid n+1 and have well-defined data needs with 4MB of data across your object graph, No-SQL works... very very well.
When you want to segregate the business services and its backing data store from the separate concern of BI, No-SQL keeps the riff-raff out of your data store.
It's different. It solves different problems. Keep your mind open.

--

-Ouija- poke 53280,11:poke 53281,12
Not NoACID, NoSchema by bokmann · 2010-09-01 06:03 · Score: 2, Interesting

Interesting article )and yes, I read the article), but the point of the NoSQL movement isn't so much about SQL, or ACID, as much as it is about Schema.
Most applications today are written in object-oriented languges like Java, C#, Ruby, etc... and most common frameworks in these languages use object-relational models to essentially 'unpack' the object into a relational model, and then reconstitute the objects on demand. this post explains the kinds of problems better than most.
NoSchema is about storing data closer to the format we process it in today. Key-Value pairs. XML. Sets and Lists. Object-Oriented data structures. This is about abstractions that make developers more productive. It is a tool in a toolbox, and useful in some circumstance and not in others.
SQL databases do not have to be the 'one persistence data mechanism to rules them all'. We don't need one; we need many that solve differing classes of problems well.
Re:I hate SQL and Databases in General... by quanticle · 2010-09-01 07:11 · Score: 2, Interesting

All of this begs the question. The real question is why we use a technology that is so sensitive to bad schema design? Why use a technology that has such a high baseline overhead? Why use a technology that is so tedious? Why use a technology that is so hard to test?
Those statements could be applied to any technology that's being used inappropriately. Why are our programs so sensitive to bad algorithm design?

--
We all know what to do, but we don't know how to get re-elected once we have done it
ACID: Scale bigger, get slower by smcdow · 2010-09-01 07:15 · Score: 2, Interesting

TFA hints at this but doesn't come out and say it: the larger you scale, the more you swamp yourself with atomicity protocol overhead. If your database is geographically distributed, then you have to decide if atomicity is more important than forgoing the very large bills for the associated network usage. I suspect that this may explain a lot about why Google, Amazon, etc., went with NoSQL solutions.

--
In the course of every project, it will become necessary to shoot the scientists and begin production.