Yale Researchers Prove That ACID Is Scalable
An anonymous reader writes "The has been a lot of buzz in the industry lately about NoSQL databases helping Twitter, Amazon, and Digg scale their transactional workloads. But there has been some recent pushback from database luminaries such as Michael Stonebraker. Now, a couple of researchers at Yale University claim that NoSQL is no longer necessary now that they have scaled traditional ACID compliant database systems."
NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL - regularly deal with data volumes at Google and Walmart that make the sites that built these databases in desperation look positively tiny.
Digg's engineers wear clown shoes to work.
StoneCypher is Full of BS
digg has chased all their users away with the new version of their site so they could probably change over to MS Access and be ok.
Didn't Berkeley prove back in the 60s and 70s that acid was scalable?
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
In essence, TFA claims that if the traditional ACID guarantee "if three transactions (let's call them A, B and C) are active ... the resulting database state will be the same as if it had run them one-by-one. No promises are made, however, about which particular order execution it will be equivalent to: A-B-C, B-A-C, A-C-B" is not abandoned (as in NoSQL systems), but is even strengthened to a guarantee that the result will always be as if they arrived in A-B-C order, then it solves all kinds of possible replication problems, requires less networking between the many servers involved, and allows for high scaling while also keeping all the integrity constraints.
A bigger issue may be the cost of ACID even if it can in theory scale. Supporting ACID is not free. A free web service may be able to afford losing say 1 out of 10,000 web transactions. Banks cannot do it, but Google Experiments can. The extra expense of big-iron ACID may not make up for the relatively minor cost of losing an occasional transaction or customer. It's a business decision.
Table-ized A.I.
For instance, Neo4J is a scalable graph-based "nosql" DB with ACID.
Old people fall. Young people spring. Rich people summer and winter.
Because it works.
"It's old" is a terrible reason to replace something. Go back to your previous arguments an you have a case. After all, a Core i7 is based on a 1960's view of a problem with an enormous number of band-aids applied in the intervening years, but you don't seem too concerned with replacing that.
Spoken with proud ignorance.
Anyone who has properly scaled an application knows the database isn't the problem. If it was, it wouldn't take 12 applications servers to bring the thing to its knees. That said, most of your gripes equate to:
I am not a DBA and therefore I do not understand DBA and therefore I must complain.
Further SQL has nothing to do with ACID. AT ALL!
Get your PostgreSQL here: http://www.commandprompt.com/
Sounds like you don't really understand what you're talking about. The reason we continue to use ACID compliant RDBMS is because they work and they work well. If you don't think that RDBMS have changed over the years, you're simply lacking experience. I feel this is most likely the case as you comlain about the interface language (SQL), and don't understand how to CM stored procedures, or how to test a DB (OMG I have to make a copy of the DB to test - so hard!) Comlaining about the overhead of using an RDBMS in an application that doesn't require an RDBMS is tantamount to complaining about how hot you get while wearing a spacsuit when you jog in the park.
We knew ACID can scale already.
With enough money poured into it, and new implementations, ACID can scale.
They solved some problems with scaling out, not necessarily the problems with it scaling up. Scaling does not necessarily just mean replicas and quick failover -- it means good performance without millions spent on hardware too, in terms of overhead, storage requirements, storage performance, server performance.
NoSQL scales in certain cases less expensively, with less work, and doesn't require complicated DBM algorithms. The representation of data is also simpler, and requires less work to maintain than tables.
It's just a result of major existing SQL implementations being so expensive with large datasets, that sometimes it costs more in terms of performance and required hardware, than simply using NoSQL.
I also love this gem from the article:
If the system is also stripped of the right to arbitrarily abort transactions (system aborts typically occur for reasons such as node failure and deadlock), then problem (b) is also eliminated. ... given an initial database state and a sequence of transaction requests, there exists only one valid final state. In other words, determinism.
I suppose the authors are from a land where hard drive space is infinite, database server resources are always guaranteed ahead of time... I/Os never have unrecoverable errors, syscalls never return error codes, RAM is infinite, programs never crash.
The conclusion that ACID alone is the bottleneck is not necessarily true. The SQL language itself requires a complex implementation just to parse and implement queries, that can add latency.
From the Wikipedia Article (http://en.wikipedia.org/wiki/ACID)
"In computer science, ACID (atomicity, consistency, isolation, durability) is a set of properties that guarantee database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction."
SQL syntax is dated and very obtuse. Just look at the different syntax between insert and an update. ...wouldn't you rather just have "save"?
Object-relational mapping is cumbersome and mis-matched in SQL. 1:many either yields n+1 queries or a monster cartesian product set. And, what about inheritance? It just doesn't jive.
It isn't about losing ACID- although not every purpose needs ACID. Your average shared drive filesystem isn't ACID, for example.
When you have anemic domains that aren't nailed down and need to be readily flexible without big re-designs, JSON-based No-SQL works very well.
When you want to avoid n+1 and have well-defined data needs with 4MB of data across your object graph, No-SQL works... very very well.
When you want to segregate the business services and its backing data store from the separate concern of BI, No-SQL keeps the riff-raff out of your data store.
It's different. It solves different problems. Keep your mind open.
-Ouija- poke 53280,11:poke 53281,12
Interesting article )and yes, I read the article), but the point of the NoSQL movement isn't so much about SQL, or ACID, as much as it is about Schema.
Most applications today are written in object-oriented languges like Java, C#, Ruby, etc... and most common frameworks in these languages use object-relational models to essentially 'unpack' the object into a relational model, and then reconstitute the objects on demand. this post explains the kinds of problems better than most.
NoSchema is about storing data closer to the format we process it in today. Key-Value pairs. XML. Sets and Lists. Object-Oriented data structures. This is about abstractions that make developers more productive. It is a tool in a toolbox, and useful in some circumstance and not in others.
SQL databases do not have to be the 'one persistence data mechanism to rules them all'. We don't need one; we need many that solve differing classes of problems well.
What alternative have you seen that handles the same workload more efficiently? Flat files? I've seen plenty of database-related performance issues, but it's almost never inherent in the database - it's the idiot that wrote the lousy table-scanning code that's reading a couple rows out of a table with millions that's the problem.
If only you could start something like a "transaction", which you could then "roll back" after finishing the test, leaving the database in its original state. And if you could somehow "back up" the database and "restore" it on a test server, or under a different name. That would be awesome.
Checking your create/change scripts into source control is no more difficult than checking your C source in prior to compiling it.
While I don't totally disagree on this point, calling SQL "fixed" is a bit like saying C# and Java are the same. I promise you any meaty SQL Server code will not run on Oracle without very significant changes that will have to be done by someone that will cost you a lot of money (and likewise with Oracle to SQL Server). The capabilities vary wildly by platform, and the syntax is only identical for the simplest of CRUD statements.
I have to give this one a LOLWUT. If you're using a big RDBMS, it's likely a multi-user system. If you've got multiple users and connections, you want ACID. This isn't like imposing sorting overhead on data structures, it's like imposing the basic memory protection, process isolation, and filesystem durability you find in any competent operating system. If you want to see what it's like without those protections, go use Mac OS 9 for a week or so, or an Access database used by a few dozen people over a network.
All of this begs the question. The real question is why we use a technology that is so sensitive to bad schema design? Why use a technology that has such a high baseline overhead? Why use a technology that is so tedious? Why use a technology that is so hard to test?
Because fairly consistently, for the past forty years, every time someone says they've created something better than SQL and released to the market, the market proves them woefully and completely wrong. As such, as much as people piss and moan about SQL, SQL has consistently proven to be an excellent, general purpose solution and amazingly poorly understood by the masses. And solutions such as MySQL has only made things worse. That's not to say there are not superior niche solutions, only that SQL is one of the few database technologies which has continued to survive for decades as a general purpose solution, and rightfully so.
Its like the world suddenly doing their own plumbing, framing, and mechanical work and then proudly exclaiming the state of architecture and the car industry stinks because the world is falling apart around them. In reality, that means we need far more qualified DBAs and far fewer people who can barely spell, "SQL", designing and condemning the world around us.
Its literally been years since I've run into a qualified DBA, despite the fact "DBA" was part of their title. Turns out, being able to spell, "DBA" is all too often enough to qualify one for such a position. And don't get me started on the all the more common case of people who don't even know what a DBA does and yet they are responsible for actually creating the schema/data model.
All of this begs the question. The real question is why we use a technology that is so sensitive to bad schema design? Why use a technology that has such a high baseline overhead? Why use a technology that is so tedious? Why use a technology that is so hard to test?
Those statements could be applied to any technology that's being used inappropriately. Why are our programs so sensitive to bad algorithm design?
We all know what to do, but we don't know how to get re-elected once we have done it
I have a different image of ACID on Windows than they do.
Is it the image of Bill Gates in an Easter bunny outfit trying to force Steve Ballmer into a large cast iron kettle filled with Skittles and baby mice? 'Cause that's the image I have of ACID on Windows...
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
TFA hints at this but doesn't come out and say it: the larger you scale, the more you swamp yourself with atomicity protocol overhead. If your database is geographically distributed, then you have to decide if atomicity is more important than forgoing the very large bills for the associated network usage. I suspect that this may explain a lot about why Google, Amazon, etc., went with NoSQL solutions.
In the course of every project, it will become necessary to shoot the scientists and begin production.
but it stores its data in a way that doesn't require me to deconstruct all of my data structures into tables.
I take it this is not business-type data? Otherwise you're doing it backwards. Start with your Entity-Relationship diagrams, devolve into logical than physical data models, and THEN start programming.
I forget who said it but it's true: The data belongs to the business, not to the application. The data should be structured and stored in a way that it will still be readable years after your program has become obsolete. (Unless it's data that has a short "best before" date.)
--
.nosig
And don't get me started on stored procedures and the difficulty of using source code management with stored procedures.
That's easily solvable:
Stored procedures don't have to be any more difficult to manage than any other code.
Dewey, what part of this looks like authorities should be involved?
Short Summary:
We make some claims about scaling ACID databases, but then don't support them.
Longer summary:
We don't like NoSQL and enjoy making baseless cracks about it such as it being a "lazy" approach.
In our paper we demonstrate that our unconventional version of an ACID database scales better than a traditional ACID database in a specific environment, while merely throwing away some robustness guarantees and changing how transaction ordering works.
No direct comparison to any NoSQL implementation is made.
So yea, I'm not holding my breath for companies to start migrating away from NoSQL.
Please don't take this wrong. I really do mean my comments respectfully and politely. It's been a long day, and I'm not sure I managed to write as sincerely as I intended.
... because on every application I have ever worked on, the Database has always been the performance bottleneck
Wow. We've had very different experiences, then. Sure, there have been plenty of times when the database was the bottle neck. But it seems like I've have more issues with network speeds. And I can think of a few cases where the file system was the issue. At my current day job, the system bus seems to be the most common bottle-neck. Not that we touch databases all that often.
Testing of DB applications is always a problem, because the running of tests generally changes the database, rendering tests unrepeatable without reseting the database.
Isn't that generally considered a "best practice" anyway? I mean, I've pretty much always just taken that as a given. What do you consider a feasible alternative?
Configuring applications to use this database or that database also ends up being a problem for most applications.
OK, now I really have to ask what kind of development environment you're using. That's always seemed like a fairly moderate "no-brainer." Sure, it's mildly inconvenient to make sure connection strings got changed when migrating from dev to test to staging to production, but it's not that big a deal.
Furthermore, while programming in general has continued to progress through many languages, exploring many different ways to describe problems, SQL is still SQL. SQL is fixed in a syntax and written with naming conventions and styles that can best be described as neo-Cobal.
That's one way of looking at it, sure. Maybe you're missing the point, though? I mean, so many other languages and approaches have changed so drastically over the years...maybe SQL hasn't because it's good enough for what it does?
Bottom line: SQL is tedious, ugly, slow, and difficult to test.
Compared to what? Keep in mind its original purpose: letting business users look up algebraic sets while programmers got on with the serious data analysis. It just happened that having a standardized API that made it relatively easy to swap out back-ends turned out to be the easiest way for programmers to do our jobs.
If you really do have access to some magic technology that lets you look up persisted data (in a way that's anywhere near as flexible as SQL) significantly faster than any of the major RDBMSs...why haven't you founded a business on that and made your fortune?
And don't get me started on stored procedures and the difficulty of using source code management with stored procedures.
You definitely need to look into some better tools. File | Save As... to stash your SP's in some directory, add to source control (if it's new), check in.
Last gripe: A traditional Relational database imposes ACID overhead on every application, even if you don't really need it or use it. This is like a programming language that imposes a SORT overhead on all your data structures even if you rarely or never need to sort them.
It's been a while since I had to mess with SQL, but I seem to recall specifying hints about how much transactional consistency I actually needed. I think you may be exaggerating the overhead a smidge. And I'm pretty sure there are ways to work around it. But that's getting way off track.
Why is it that we continue to use a technology based on a 1960's view of a problem when clearly there ARE other solutions and ways to approach said problem?
Two suggestions. 1) It works. And DBA's hate learning new technology. 2) No one's come up with an alternative that's compelling enough to convince more than a tiny fraction of companies to
The reason that NoSQL is necessary is that ACID is not the only thing that developers need to think about. RDBMS was an innovative solution to the limitations of mainframe hierarchical databases circa 1970. Since then it has been the only game in town (At least for most enterprise software. Some of us do other things occasionally.)
It turns out that there are reasons to do things other ways, and having other options allows you to consider trade-offs. For many applications eventually consistent data scales just fine. For some applications, both big and small, an enterprise RDBMS is overkill. Why not just persist objects to a document store? Or even the file system?
The research is interesting, although I agree that we already knew we could scale the ACID paradigm. The conclusion is ridiculous. NoSQL has nothing to do with ACID, and it brings a richness to the conversation that has been missing for far too long. Like the Perl folks say, TMTOWTDI.
SQL Antipatterns may interest you. As one of the reviews says, "An excellent guide to database design tradeoffs".