Yale Researchers Prove That ACID Is Scalable
An anonymous reader writes "The has been a lot of buzz in the industry lately about NoSQL databases helping Twitter, Amazon, and Digg scale their transactional workloads. But there has been some recent pushback from database luminaries such as Michael Stonebraker. Now, a couple of researchers at Yale University claim that NoSQL is no longer necessary now that they have scaled traditional ACID compliant database systems."
I have a different image of ACID on Windows than they do.
NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL - regularly deal with data volumes at Google and Walmart that make the sites that built these databases in desperation look positively tiny.
Digg's engineers wear clown shoes to work.
StoneCypher is Full of BS
digg has chased all their users away with the new version of their site so they could probably change over to MS Access and be ok.
Didn't Berkeley prove back in the 60s and 70s that acid was scalable?
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
But ACID still lacks NoSQL's "Cowboy Chic".
Yale Researchers Prove That ACID Is Scalable...
It's all in their minds!
If you want news from today, you have to come back tomorrow.
Tell the yalies to stick to drama and sophistry. What they know about computer stuff?!
In essence, TFA claims that if the traditional ACID guarantee "if three transactions (let's call them A, B and C) are active ... the resulting database state will be the same as if it had run them one-by-one. No promises are made, however, about which particular order execution it will be equivalent to: A-B-C, B-A-C, A-C-B" is not abandoned (as in NoSQL systems), but is even strengthened to a guarantee that the result will always be as if they arrived in A-B-C order, then it solves all kinds of possible replication problems, requires less networking between the many servers involved, and allows for high scaling while also keeping all the integrity constraints.
IT'S SCALABLE, WHOOO!
A bigger issue may be the cost of ACID even if it can in theory scale. Supporting ACID is not free. A free web service may be able to afford losing say 1 out of 10,000 web transactions. Banks cannot do it, but Google Experiments can. The extra expense of big-iron ACID may not make up for the relatively minor cost of losing an occasional transaction or customer. It's a business decision.
Table-ized A.I.
... because on every application I have ever worked on, the Database has always been the performance bottleneck. Testing of DB applications is always a problem, because the running of tests generally changes the database, rendering tests unrepeatable without reseting the database. Configuring applications to use this database or that database also ends up being a problem for most applications.
Furthermore, while programming in general has continued to progress through many languages, exploring many different ways to describe problems, SQL is still SQL. SQL is fixed in a syntax and written with naming conventions and styles that can best be described as neo-Cobal.
Bottom line: SQL is tedious, ugly, slow, and difficult to test. And don't get me started on stored procedures and the difficulty of using source code management with stored procedures.
Last gripe: A traditional Relational database imposes ACID overhead on every application, even if you don't really need it or use it. This is like a programming language that imposes a SORT overhead on all your data structures even if you rarely or never need to sort them.
Why is it that we continue to use a technology based on a 1960's view of a problem when clearly there ARE other solutions and ways to approach said problem?
For instance, Neo4J is a scalable graph-based "nosql" DB with ACID.
Old people fall. Young people spring. Rich people summer and winter.
NoSQL's two big features are scalability and the arbitrary schemas. While the paper covers the first (though I still think map/reduce has its place) NoSQL does do taxonomy-based (hierarchical) schema better. The only way to do that in SQL is to have a property table, where the parent object is a object RID, and a huge table of attached properties and values to that. You might be able to get your indexes to perform reasonably well, but only by duplicating the some data. And on top of that, just try writing a query for hierarchical data! You'll have sub-selects for each level of hierarchy. This means in order to to something relatively simple, like KPCOFGS of species classifications, you'll need a select and 6 sub-selects. At least that one is well defined to . If its not, you just don't know how many, and you have to write a recursive function to generate your select query, or process the results from it. Either way, you repeatedly consider 99% useless records at every level. True, you can cheat at this because there are always 7 levels. But that is not true for most other trees.
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
Sounds like you don't really understand what you're talking about. The reason we continue to use ACID compliant RDBMS is because they work and they work well. If you don't think that RDBMS have changed over the years, you're simply lacking experience. I feel this is most likely the case as you comlain about the interface language (SQL), and don't understand how to CM stored procedures, or how to test a DB (OMG I have to make a copy of the DB to test - so hard!) Comlaining about the overhead of using an RDBMS in an application that doesn't require an RDBMS is tantamount to complaining about how hot you get while wearing a spacsuit when you jog in the park.
We knew ACID can scale already.
With enough money poured into it, and new implementations, ACID can scale.
They solved some problems with scaling out, not necessarily the problems with it scaling up. Scaling does not necessarily just mean replicas and quick failover -- it means good performance without millions spent on hardware too, in terms of overhead, storage requirements, storage performance, server performance.
NoSQL scales in certain cases less expensively, with less work, and doesn't require complicated DBM algorithms. The representation of data is also simpler, and requires less work to maintain than tables.
It's just a result of major existing SQL implementations being so expensive with large datasets, that sometimes it costs more in terms of performance and required hardware, than simply using NoSQL.
I also love this gem from the article:
If the system is also stripped of the right to arbitrarily abort transactions (system aborts typically occur for reasons such as node failure and deadlock), then problem (b) is also eliminated. ... given an initial database state and a sequence of transaction requests, there exists only one valid final state. In other words, determinism.
I suppose the authors are from a land where hard drive space is infinite, database server resources are always guaranteed ahead of time... I/Os never have unrecoverable errors, syscalls never return error codes, RAM is infinite, programs never crash.
The conclusion that ACID alone is the bottleneck is not necessarily true. The SQL language itself requires a complex implementation just to parse and implement queries, that can add latency.
Liquid, paper tabs, gel tabs...it's all scalable, man.
Hey mods, put down the tard-goggles.
I don't think they've proven it yet, they simply offer some solutions to what they admit is a very difficult problem. In other words, we'll see how their ideas pan out.
I went to eat some animal crackers and the box said, "Do not eat if seal is broken." I opened the box and sure enough..
From the Wikipedia Article (http://en.wikipedia.org/wiki/ACID)
"In computer science, ACID (atomicity, consistency, isolation, durability) is a set of properties that guarantee database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction."
SQL syntax is dated and very obtuse. Just look at the different syntax between insert and an update. ...wouldn't you rather just have "save"?
Object-relational mapping is cumbersome and mis-matched in SQL. 1:many either yields n+1 queries or a monster cartesian product set. And, what about inheritance? It just doesn't jive.
It isn't about losing ACID- although not every purpose needs ACID. Your average shared drive filesystem isn't ACID, for example.
When you have anemic domains that aren't nailed down and need to be readily flexible without big re-designs, JSON-based No-SQL works very well.
When you want to avoid n+1 and have well-defined data needs with 4MB of data across your object graph, No-SQL works... very very well.
When you want to segregate the business services and its backing data store from the separate concern of BI, No-SQL keeps the riff-raff out of your data store.
It's different. It solves different problems. Keep your mind open.
-Ouija- poke 53280,11:poke 53281,12
Interesting article )and yes, I read the article), but the point of the NoSQL movement isn't so much about SQL, or ACID, as much as it is about Schema.
Most applications today are written in object-oriented languges like Java, C#, Ruby, etc... and most common frameworks in these languages use object-relational models to essentially 'unpack' the object into a relational model, and then reconstitute the objects on demand. this post explains the kinds of problems better than most.
NoSchema is about storing data closer to the format we process it in today. Key-Value pairs. XML. Sets and Lists. Object-Oriented data structures. This is about abstractions that make developers more productive. It is a tool in a toolbox, and useful in some circumstance and not in others.
SQL databases do not have to be the 'one persistence data mechanism to rules them all'. We don't need one; we need many that solve differing classes of problems well.
Acid is definitely scalable if you use blotter paper.
The editors have a loose definition of the work prove. I read the article and they provide some compelling arguments. However, I saw no proof in a mathematics or scientific way.
Of course ACID is scalable, but you have to be very careful with the dosage. Even Albert Hofmann himself never doubted that.
To achieve 'nonconcurrency' one needs to introduce a global ordering of transactions. Which WILL require a shared resource among ALL of the transactions. No way around it, sorry.
And what's funny, this resource some of the problems of ACID systems. However, there should be advantages (no need for rollbacks, etc.).
Besides, all of this doesn't tackle another advantage of NoSQL systems: working with HUGE amounts of data. There'll still be problems in ACID systems if data access requires communication between several storage nodes.
And don't forget the CAP theorem. You can't get Consistency, Atomicity and Partition Tolerance at the same time. RDBMS typically 'solve' it by dropping the requirement for the partition tolerance. Usually by using quorum sensing schemas, etc.
Yes, I'd like to be able to work with RDMBS data in REAL languages, not in ugly SQL or even more uglier DB internal languages.
DB tables can be represented with lists, on which composable pure (side-effect free) functions could operate. So JOINs can be expressed as list comprehensions. 'where' naturally is expressed as filters, etc. Care should be taken to maintain purity of functions used in queries, so they can be optimized efficiently.
LINQ in C# has beginnings of something similar.
PS: Am I describing Haskell, by any chance? :)
PPS: If your query requires complex complex and non-trivial optimizations by the RDBMS engine, then it's a bad query.
This has been around for years, yeah I think used to call it a mainframe.
I've never been sure why it is, but SQL (and the relational model it can be used to implement if you know what you're doing) attracts more wild-eyed fanatics than the Amiga and Ruby. Nowhere else will you find so many people so confidently and aggressively certain that the have the One True Way to do things, at least not without getting into actual religion. That anyone, anywhere does things differently (or even thinks about it) seems to deeply threaten them and provoke the sort of contempt that normal people reserve for child pornography. It frankly baffles me. DBA compensation isn't that good, and certainly not all DBAs can be such one-trick ponies.
The simple fact of the matter is that the relational model is probably the best general purpose data storage model we have, and it has the advantage of logical rigor and, as a result, the advantage of being extremely well-understood. But this in no way changes the fact that any general purpose approach, at least in some (but probably many if not most) cases, will be outperformed by a well-designed application-specific method. This remains the case no matter what your methodological hobby horse is, except in the tiny minority of cases where a truly optimal method can be rigorously proven.
Worse -- and this is true of all kinds of fanaticism, computer science-related and otherwise -- it tends to discourage research into unexplored areas that might yield new and better methods. E.F. Codd developed the relational model through precisely such an expedition into the mathematical unknown, and someday, the model that surpasses it (at least for certain cases) will be produced in the same way. It might be a descendant of one of the current so-called NoSQL approaches. It could be a reaction to their shortcomings. It might come from a completely unexpected corner. But wherever it comes from, you can be certain that we will enjoy its benefits later than we had to because it will have to push through the reactionary resistance of people who've stared at the relational model for so long that they can conceive of nothing else.
Proud member of the Weirdo-American community.
This seems to be a reinvention of field calls, with a slightly different purpose.
Just in case anyone's interested: http://voltdb.com/
Stonebreaker started an open source database to implement the concepts he talks about.
Except for ending slavery, the Nazis, communism, & securing American independence, war has never solved anything.
TFA hints at this but doesn't come out and say it: the larger you scale, the more you swamp yourself with atomicity protocol overhead. If your database is geographically distributed, then you have to decide if atomicity is more important than forgoing the very large bills for the associated network usage. I suspect that this may explain a lot about why Google, Amazon, etc., went with NoSQL solutions.
In the course of every project, it will become necessary to shoot the scientists and begin production.
"Yeah, ask me a specific structured question and I'll give you a two-dimensional array to work with as an answer."
I thought it was more like an array of structs, where each array entry is a row and each struct member is a column. In non-C you might say each row is an object, each field-of-a-class is a column (where class : table) and each field-of-an-object is a single cell.
Then the cartesian product operation on tables of types T1 and T2 (respectively) has a type which is the product of T1 and T2, and everything matches up neatly.
but it stores its data in a way that doesn't require me to deconstruct all of my data structures into tables.
I take it this is not business-type data? Otherwise you're doing it backwards. Start with your Entity-Relationship diagrams, devolve into logical than physical data models, and THEN start programming.
I forget who said it but it's true: The data belongs to the business, not to the application. The data should be structured and stored in a way that it will still be readable years after your program has become obsolete. (Unless it's data that has a short "best before" date.)
--
.nosig
Finally. I've been telling Bob that for years, but nooo, he insists that we keep using blotter paper and sour patch kids.
Wanna fight ? Bend over, stick your head up your ass, and fight for air.
Short Summary:
We make some claims about scaling ACID databases, but then don't support them.
Longer summary:
We don't like NoSQL and enjoy making baseless cracks about it such as it being a "lazy" approach.
In our paper we demonstrate that our unconventional version of an ACID database scales better than a traditional ACID database in a specific environment, while merely throwing away some robustness guarantees and changing how transaction ordering works.
No direct comparison to any NoSQL implementation is made.
So yea, I'm not holding my breath for companies to start migrating away from NoSQL.
You don't need SQL in order to get ACID properties. And some common SQL-like languages don't provide ACID.
Furthermore, SQL wasn't designed for what it is being used for today; SQL was meant to be a database interaction language for non-experts.
And "information theory" doesn't mean what you seem to think it means...
Academic determines that if only you're willing to insert a single point of failure, all of your replication problems can be hand waved away. Also if you have this new single point of failure, somehow magically transactions will never need to abort ever again.
The reason that NoSQL is necessary is that ACID is not the only thing that developers need to think about. RDBMS was an innovative solution to the limitations of mainframe hierarchical databases circa 1970. Since then it has been the only game in town (At least for most enterprise software. Some of us do other things occasionally.)
It turns out that there are reasons to do things other ways, and having other options allows you to consider trade-offs. For many applications eventually consistent data scales just fine. For some applications, both big and small, an enterprise RDBMS is overkill. Why not just persist objects to a document store? Or even the file system?
The research is interesting, although I agree that we already knew we could scale the ACID paradigm. The conclusion is ridiculous. NoSQL has nothing to do with ACID, and it brings a richness to the conversation that has been missing for far too long. Like the Perl folks say, TMTOWTDI.
Have you ever tried XML... on WEED?
http://tinyurl.com/TMTOWTDI
SQL Antipatterns may interest you. As one of the reviews says, "An excellent guide to database design tradeoffs".
Most developers simply drop their application scalability problems down to the DB layer and/or OS layer. Then bitch that those DBAs are dumbasses, the DB server doesn't scale.
Deleted
Why should they? NULL doesn't mean anything in the relational model; NULL is an SQL construct that violates the fundamental underpinnings of the relational model.