Why Some Devs Can't Wait For NoSQL To Die
theodp writes "Ted Dziuba can't wait for NoSQL to die. Developing your app for Google-sized scale, says Dziuba, is a waste of your time. Not to mention there is no way you will get it right. The sooner your company admits this, the sooner you can get down to some real work. If real businesses like Walmart can track all of their data in SQL databases that scale just fine, Dziuba argues, surely your company can, too."
People who don't like SQL should get their heads out of their asses and use MySQL, a robust and enterprise-ready database.
Interesting thesis...
This is like saying "I can't wait for memcached to die" just because your site doesn't need it. Fact is, some do. It's your own fault if you choose to apply unnecessary techniques.
Don't change to newer fancy techniques if you don't understand what they are for and why would you need them.
Hierarchical DBMS have been around longer than SQL-style RDBMS for a long time. OODBMS have existed for a long time as well. Many of these "NoSQL" DBs don't provide the same restrictions or assurances that an RDBMs provides but they often have other features.
BDB isn't going away and neither is SQL. Get over it.
"MySQL or PostgreSQL," for what it's worth. PostgreSQL is a pretty powerful database, and you should have to make a pretty good argument why leaving a well understood technology that powers a lot (an some of the largest parts) of the WWWeb needs to be trashed for something newer and less tested.
Put identity in the browser.
XML text files all the way! /duck
Why should anything "die"? People choose solutions based on their individual merits. If something doesn't work, exchange it for something that does. I'm sure certain people find NoSQL-type databases perfect for their needs.
In short, people should just shut up about other people's choices and get on with their own.
I liked the pictures. Is there a name for the muppet guy in the first one?
There's a place for SQL, but there are some cases where BigTable-like (ie. HyperTable) works better. Our company manages data using SQL, but when we present data to the users it's through a HyperTable implementation. SQL is easier to data management but HyperTable uses our server resources better.
It's really that simple. A standard dual socket server with the latest CPU's from Intel or AMD can handle hundreds of requests per second; if one isn't enough, just add more hardware, one month of salary can buy you another node, a year can buy you a whole cluster of rackable systems or a chassis full of blades. If it takes a few months extra for a team to solve the problem the NoSQL way, that's a few months of extra salary costs and missed sales.
Slashdot runs on SQL. I run a site of 1M pages daily (1/3-slashdot according to Alexa) with just a single system with 2x Xeon E5420, Django/PostgreSQL at 10% load. Unless you attract enough attention to require scaling past 10M pages a day, you're wasting your time reinventing the wheel with NoSQL, just stick with a standard ORM, launch your site and start convincing customers and generate sales. You can survive a slashdotting just fine without spending so much time on those exotic tools.
... that I can't tell others what to do!
So you're in surgery for 3 hours doing a kidney transplant, having used your trusty medium vascular clamp that have served you for the past 20 years. You're finally done and the patient is in recovery, so you sit down to relax with the latest copy of JAMA. They've got a great article about the latest development of Cardiac clamps, and you think to yourself "Why not use a heart clamp for kidney transplants!" Brilliant. So you order up some new clamps from MedicalClamps.com, and use them on your next patient. The surgery goes fine, but 3 months later the patient is back in your office with a failed kidney. You open 'em up, and it's obvious the clamp exerted too much pressure on the artery, damaging it in the process. Stupid carciac clamps! You're not a heart surgeon!
AccountKiller
I think this fellow's blog entry sums this up pretty nicely - especially the last paragraph: http://blog.cleverelephant.ca/2010/03/nonosql.html
FTA:
"In the meantime, DBAs should not be worried, because any company that has the resources to hire a DBA is likely has decision makers who understand business reality."
Bad English aside, I just don't agree. Money != Reality. I have worked both sides of this coin - Startups with plenty of money but don't see the value in proper maintainance of the data store (one almost was put out of business by a disk failure), and very smart startups that are running lean but do understand the risks.
That said, on the deeper level, why does business reality == SQL? Sure I can scale Oracle to support massive DB's (and have), but I could probably get more value from using Amazon's SimpleDB for things that don't require massive scaling. Use the right tool for the job - Hammers are for nails, etc. Do the design work up front, decide how its gonna work, and the right tool should present itself.
}#q NO CARRIER
I think the author of TFA is missing something: not all databases / datastores are developed for businesses to keep track of their inventories. These days, many scientific disciplines, such as bioinformatics, rely heavily on databases as well.
The latest experimental techniques produce so much data such that "old-fashioned" RDBMSs just don't cut it anymore. So, for certain application domains, NoSQL seems to be at the moment the way forward. I'm afraid the author can wish all the he wants but NoSQL is gonna be around for a while. Until something better comes up, that is.
The article was stripped of all nuance and then injected with confusing bits. e.g.
>NoSQL will never die, but it will eventually get marginalized, like how Rails was marginalized by NoSQL
What? How was Rails marginalized by NoSQL?
Also, it's nice to see the whole BerkeleyDB-ish/key-value sector of the data storage world suddenly exploding with innovation. There's a lot of dogma on both sides of the NoSQL argument (and the name "NoSQL" doesn't help), but some of the many NoSQL tools look as though they'll be pretty useful. Cassandra and MongoDB especially. And big companies getting behind the growth of new tools is never a bad thing.
Everyone's needs are different, and there are going to be different solutions for those needs. If NoSQL isn't for you then just don't use it (don't spend any time learning it, try it out, running a site with it, etc, etc). I don't have a need for it yet, but we do all sorts of sites and programming so who knows if it will be the right solution for one of our future projects? I won't unless I learn about it, test it and get my hands dirty with it.
And as far as it being 'a product of the braindead and buzzword-infested effluents of the American "education" system, where nobody understands math or logic', I don't care if it came from the bottom of a well in the middle of a jungle where they are masters of logic and math, if it could possibly meet my client's needs then I'm going to give it the time and attention it takes to make the decision for myself.
Real business track their data with SQL databases, true. However, real businesses have small numbers of transactions relative to their value. If Walmart had the same revenue but the average sale was a tenth of a cent, their fancy SQL database would be smouldering rubble.
That's what Facebook and Twitter and other large social media sites are facing. Just try running Twitter's volume and Twitter's page hits and API hits off MySQL. It doesn't matter how many replicas you run, it's not going to work. Maybe you could run it on a cluster of IBM Z-series mainframes running DB2 - but where is the money going to come from?
Cassandra and HBase and the other distributed NoSQL database solve specific problems in specific ways. They won't work for Walmart, but they'll do the job just fine for Facebook and Twitter. If you have those specific scaling problems and can live with the restrictions (you lose ACID, indexes, and joins to varying degrees) then they'll work for you.
If all you know is that your site is running slow, then implementing NoSQL is unlikely to improve things.
Why should I give two shits about what database system someone else uses?
Don't take life so seriously. No one makes it out alive.
I think some developers keep looking for the holy grail. Some magical solution that will turn development from punching in code, to Star Trek: "Computer do my job for me please".
Template languages, 4GL, NoSQL, Ruby on Rails... it is all part of an attempt to take the nasty out of development and they all... well... they all just don't really happen.
Because deep down, with all the frameworks and generators, if you want your code to do what you want it to do, you are still writing out if statements a lot.
And yes, OO and such also belong to this. Not the concept themselves, but the way most people talk about. OO means code re-use right?
If you said yes, then you are a manager, go put on your tie, you will never be any good at coding.
You can re-use all code. And it has been done for a long time.
What, did you think that people who wrote basic for the C64 went "Oh I wrote this bit of code for printing, now I need the same functionality, I am going to write it all over again!"
OO does make code re-use a bit easier BUT that is NOT the claim that people often make. Trust me, I ask this in interviews and it is always the same answer. Apparently you can't re-use functions. No way, no how. NEXT!
I see two kind of developers. Those who hate their job and those who don't. The former want to be managers, get away from writing code as fast as possible. And they will leap on anything that seems to make their jobs easier. Meanwhile the rest of us go on with actually producing stuff.
Just check, how many times do you get one of those managers wannabe introducing something they read in a magazine because it promises that you don't need to write another line of code ever!
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
I think the frustration is actually in some people not using the right tools for the job. I like NoSQL databases (specifically MongoDB), but I have not used them with anything I've written. Why? Because it wasn't the right tool for the job. I tend to use MySQL, Postgres or sqlite because it's so widely available and well known in how to administer. There are times that NoSQL will makes sense, it's just not the area I work in.
...). Since NoSQL seems to be a popular tool, and "the cloud" is a popular buzz phrase CIO's/CTO's will likely be pushing their shops to utilize "NoSQL in the cloud". While large scale applications which don't require relational information and need fast syncing across many servers is good grounds for NoSQL, these "NoSQL in the cloud" instances will probably not actually fit that status.
I do think we are going to continue seeing an uptick in NoSQL related things since many companies are fixated on "the cloud" while not really knowing what "the cloud" is (heck, no one still really, truly has a common definition of what it means
I do agree that it will be a good thing when "NoSQL for everything" dies. Just like it was a good thing when "PERL for everything", "Java for everything" and "Ruby for everything" died, but let's not throw out the whole idea because a lot of people use it wrong.
The company I work out is currently having a huge headache moving from files into databases. We currently store everything in XML which gives us a great amount of freedom and adaptability. However most database solutions fix you to a single (or handful) of data definitions. Which you can kind of re-create XML be defining all kinds of crazy relationships, it gets hugely convoluted (to say the least).
I would LOVE to see a document/XML-live database. Just needs to do things that standard databases support (e.g. Security Model, Easy Mirroring, Search/Queries) to make it worth our while moving at all. Last I checked we're up to 260,000 XML files and approx 40 different distinct file "formats" (XML layouts).
So they should all use the same data management tools as wallmart. Is that the reasoning? Better to use the right tool for each job. Some things work better in a nosql non-schema.
Better to light a candle than complain about the darkness.
At first, I thought NoSQL like Cassandra should simply be used as a store for precomputed relationships. Then I thought NoSQL was just a structureless store that can scale in any given direction with no effort.
Both sound interesting, but then the debate against NoSQL is just "well, SQL can already do all that, but you get data integrity with it. If it doesn't scale, then just build a manly man's server and it will".
So, I dunno. The whole debate has gotten very religious very quickly and as a result, no one is really doing a proper comparison because no one seems to take the approach of "right tool for the right job, so here are the jobs NoSQL Is right for, and here are the jobs your RDBMS is good for".
I'm god, but it's a bit of a drag really...
You can survive a slashdotting just fine without spending so much time on those exotic tools.
Care to provide a link to your site so we can test this?
Use the right tool for the job, except databases, eh?
The simple fact of the matter is that not every app is aiming for Google's scale. (Not every app is web-based or even going to be web-based, though people seem to forget that.) And even some large-scale apps don't fit the relational model very well, medical records being one of the more outstanding examples.
And yes, I have read Codd and Date and understand the relational model and its benefits very well, and it annoys me to no end when people break the relational model without realizing or understanding what it costs them. That said, sometimes those costs are acceptable, and sometimes an application requires features that the relational model does not (and in fact cannot) bring to the table.
It may be, as with every other silver bullet fad, that what's at work here is the basic human tendency to become familiar with something, begin to see everything in terms of it, and then try to persuade anyone who'll listen that they are in possession of the all-singing, all-dancing solution to all problems. Today, it's Ruby, multi-touch interfaces, and functional programming. But not very long ago it was COBOL and CICS. And while one must acknowledge that progress has been made, it is equally obvious that progress will continue to be made and that "one size fits all" is always BS, even in clothing.
Proud member of the Weirdo-American community.
The whole of geek debating is based on the Highlander principle.
No sig today...
Sure, Ive messed around with some NoSQL databases, they just aren't my thing, give me mysql, your spec and a cup of tea and i dont have to look round silly experiments to see the best way of doing things in new radical 'paradigms.' That being said, I am glad the experiments are being done by people who are in such an environment to experiment. I mean, like the article says, its the social networks like twitter and facebook developing things like Cassandra, and its good that there is someone pushing the bar, but they are the only people who CAN do this, they aren't necessary, nobodies gonna die from a 5 minute outtage of poking each other (that sounds bad). I havent really understood the whole NoSQL thing,I havent really ever had a problem with SQL based Databases, maybe thats just the nature of my work, but it all seems as though this has nothing to do with technology, just people who want to be heard...
Many of the NoSQL sources scale better than a normal database and are available cheap. Oracle costs a fortune, and if you want to run Oracle on a cluster good luck. They also don't let you publish benchmarks without their permission. But most people I know who use Oracle claim it totally beats everything else (without further clarification). DB2 includes a cluster edition that is also quite good. It uses a shared nothing architecture. But none of these solutions are free. Also teradata is also cited as a good parallel database. If you are a start-up and your choice is a NoSQL solution that is almost free or 100,000+ for some commercial parallel database, which do you go to?
But no matter what you will consume resources with a relationship database on ensuring consistency (which many times is what you want but not 100% of the time). Amazon's Dynamo works by not caring so much about consistency and trading consistency for availability of the overall service. For a shopping cart it is fine, but you wouldn't want to do your credit card processing using it. Google's GFS is optimized to do the file operations that google does the most. However there was an article in the ACM not that long ago comparing Map Reduce (Hadoop's implementation) against two parallel databases, and it lost. OF course the Parallel Databases were all not free....and hadoop is....
So overall I'd say the decision comes down to price mostly (as it does with most startups). If you can make do with one server than sure do PostgreSQL (or mySQL...although they always tried to force licensing for commercial products even though it is GPL...). If you need a cluster, both have clustering solutions, but as far as I can tell they are not as good as the commercial Parallel databases. If you have lots of money then sure go with Oracle, it seems through word of mouth Oracle is the best for both parallel and stand alone in terms of performance. DB2 was good enough for a former job. They had terabytes in the mid 1990's using about 20 servers. Now that the hardware is much better I'm sure it scales even better.... But if money is a consideration, then go with an open source noSQL solution. A lot of people now swear by Cassandra, I haven't had a chance to check it out yet.
If you get to the size of Walmart doing anything, you have access to the capital to get a system from IBM or Oracle for OLTP and Teradata for data wearhousing.
"The problem with socialism is eventually you run out of other people's money" - Thatcher.
I'm still fuzzy on what NoSQL is supposed to be and what it is supposed to bring to the table.
From what I've understood, it's basically a common banner for various different databases that all share the common property of not being relational databases and not providing ACID guarantees.
If so, it seems to me that the whole NoSQL vs. RDMBS debate is about a false dichotomy. There are some applications where a relational database is the right tool for the job, and there are some where a relational database is not the right tool for the job. In some of those latter cases, one of the NoSQL databases may be the right thing.
This is nothing new. Non-relational databases have been used on Unix for a long time, and are even a standard part of POSIX (see for example the manpage for dbm_open). It's also long been known that, for example, Berkeley DB can be a lot faster than an RDBMS - as long as your application doesn't make use of all the features an RDBMS provides. Lots of programs even don't use one of these database systems, but invent their own, custom format. Git is a very successful example of this.
To me, it seems that what we are seeing here is loads of people who had learned to use relational databases for all their storage needs discovering that there are other ways to store data, and that one of those methods may work better than an RDMBS for a particular application. Well, yes. Does that surprise anyone? It sure doesn't surprise me. Does it mean that RDMBSes are now useless? Not at all. Does it mean you should use a non-relational storage system where this makes more sense? Of course! Now, can we please get back to work? I don't see the point of having a holy war over whether RDBMS or NoSQL is better, when common sense says that they both have their uses.
Please correct me if I got my facts wrong.
I call: bullshit
We are using object-oriented databases like the ZODB for ten years when the data model is not relational oriented
We are using relational databases when your data is relational
We are using relational databases and object-oriented databases together in the same app when we need both
We are not using MySQL when we are in need of a *real* database.
Use the right tool for each problem - only idiots use a RDBMS for all and everything.
People complaining about SQL performance are most likely either using incorrectly scaled machines for the job, or believe they can throw a four-line SQL statement at the database and expect it to work out the optimization on its own ... query optimizers may be able to do a decent job on average, but once you go large databases (multi-million dataset tables), planing the query structure will go a long way preserving performance. ...
Yes one can write complicated queries to return exactly what you want in one query, but in many cases doing some logic around it and using smart grouping/loops will outperform the complex query
I've got news for you ... all the major stock exchanges, banks, and telecoms in the world use SQL RDBMSs to track transactions that match or exceed anything Facebook and Twitter are doing. I guarantee you, without a single doubt in my mind, that Facebook and Twitter could be run on a SQL RDBMS ... by that I mean Oracle, not MySQL.
Our development organization is heavily invested in PostgreSQL, finding it to be perfectly matched to almost all of our needs. It is exceptionally reliable, and is very (but not perfectly) manageable. (We've had issues in the past with mis-timed auto-VACUUM for instance which are now resolved.) We even found a small but significant corner-case bug which upon being reported, received immediate attention from the developers, resulting in a resolution in under 72 hours. I believe our use of this particular tool has saved us significant resources (dollars, developer time) that has allowed the development organization to direct our time and money to our own application development.
But we're finding that even PostgreSQL has limits, mostly with respect to the large and growing datasets our application uses for large scale real time control. We could transition to a really expensive SQL solution, but we are at least considering the choices that may be a better fit for these particular subsystems than PostgreSQL or any other SQL solution. Just a few weeks ago, we started seeing a good comment in teh interWebs... "NoSQL" should mean "not only SQL".
Not a rejection of a powerful toolkit that holds a central role in our organization, but rather a recognition that we would be remiss in our responsibilities if we didn't pay attention to the choices that could simplify our lives as developers.
In Soviet Russia, NoSQL kills off Devs!
Bullshit.
ActiveRecord? Definitely. Rails as a whole? You might consider replacing it with another Ruby framework, but the same ideas are going to apply. Remember how Rails and Merb are merging? Merb tends to be ORM-agnostic, but the recommended Merb stack suggested DataMapper, which does support a few NoSQL databases.
Even if you needed a different ORM per NoSQL database, it wouldn't marginalize Rails as a whole, but that simply isn't the case. Just use DataMapper, then plug in the flavor of the day.
As an example, Rails (and DataMapper) run on Google App Engine.
Don't thank God, thank a doctor!
It should probably be called NoMysql instead of NoSQL...
Here are some good posts. Seems NoSQL is just the new xml. Sure, great for some things, but not really worth the hype...
http://www.yafla.com/dforbes/Responding_to_Joe_Stump_on_the_NoSQL_Debate/
http://www.yafla.com/dforbes/Getting_Real_about_NoSQL_and_the_SQL_Isnt_Scalable_Lie/
http://www.yafla.com/dforbes/Getting_Real_about_NoSQL_and_the_SQL_Performance_Lie/
At approx 200million transactions per day, does it matter whether the source is a website or a retail system?
I thought it was reasonably well understood that one of Walmart's primary characteristics is their *amazing* control over logistics. In fact, I thought one of their big process inventions was to bring logistical activity online.
I welcome clarification, since I haven't worked for Walmart.
Perhaps some are afraid that the No-Sql movement will leak into other niches out of hype. After all, OOP leaked out of physical modeling and into other niches without being fully tested for those niches, and people started clamoring for OODBMS. (I'm of the opinion that "everything OOP" is a no-no. Use it where it helps, but not where it doesn't.)
Table-ized A.I.
The article focuses on NoSQL's claim to scalability, but isn't that just one of the features of (some of the) NoSQL options?
Google, Amazon, and Microsoft all provide NoSQL storage as a service that is easy to use and cheap, particularly for getting started. Those are two pretty important features and I would imagine that it is those features, rather then dreams of needing vast scalability, that attract the many web startups.
relational databases are great if you have a relational problem. For everything else there is NoSQL. It is surprising how much of the world's data looks like "a stack of documents" rather than "a collection of mathematically related sets of data" Lotus Notes was the only NoSQL player for 20 years, now there are lots. Notes sucked because it had no competitors, the concept was and is sound. Now there is competition and lots of NoSQL database systems and application environments on top that suck less and less by the day.
(Since the PDP11 was designed to be a hardware Fortran machine, and C was its assembler, and the i86 a poor clone of the PDP11!)
Or maybe I iGress!
Sent from my ASR33 using ASCII
For about 30 seconds, until the VC money dries up....
The point isn't (generally, there might be some pathological corner case) that the various web2.0 kiddies couldn't implement their stuff in SQL; but that they couldn't afford to do so. If you want to be able to serve large numbers of users in order to generate enough adsense pennies to keep the lights on until somebody buys you, your options are pretty much A). Software with a more or less zero per-node cost, running on commodity x86s with no exotic interconnect. or B) Bankruptcy.
WalMart has one of the largest Teradata installs, it doesn't use SQL.
...and those OLAP systems are most likely ROLAP or perhaps ROLAP with MOLAP support also. In any case, the underlying non-aggregated data is probably in an SQL database that supports materialized views and auto-aggregate tables. The OLAP is simply a multi-dimensional aggregate cache that sits in front of it.
Over-the-top Response Guy! Giving "Over-the-Top Responses" since 1970.
So you're saying that an RDBMS is the right tool for the job if your transactions have enough value, and, if the value per transaction is too low, you won't be able to afford an RDBMS, but you can still go with a NoSQL database? That's an interesting point of view.
So how do you make your system work with NoSQL? As you say in your post, "you lose ACID, indexes, and joins to varying degrees". To me, with my relational view of the world, it seems that you would want to use an RDBMS exactly because of these things. Specifically, the fact that your RDBMS does the hard work of keeping your data consistent for you. Wouldn't you have to implement that all by yourself if you went with a NoSQL system? If so, what realistic expectation can you have to come up with something that is both correct and as performant as an RDBMS which lots of smart people have worked on over the years?
Or is it just that people are throwing consistency out of the window and saying "We can afford to lose a couple of records or have a couple of dangling references here and there, as long as it SCALES". Because I can build something that scales if it doesn't have to maintain ACID, too. The difficulty is in having _both_ ACID and scalability.
Please correct me if I got my facts wrong.
For well over 30 years, airline reservation, hotel reservation, and other high volume transaction processing(HVTP) systems that are mainframe-based have not used SQL in the core transaction processing system. They use either the built-in key/value subsystem of TPF/ZTPF, or a slightly more sophisticated subsystem known as TPFDB. Using facilities similar to zOS, failover and recovery happen in record time should it be necessary. This successful real-world system and approach deserves the attention of those who would like to learn how this stuff really works.
I didn't (want to) say that all RDBMS won't cut it. The only point I wanted to make was that while I can see the point of the author that solutions like Cassandra are a bit overrated for most business applications, for other applications domains they are becoming a viable solution.
Each transaction of those 200,000,000 for WalMart is a fairly significant source of revenue. Averaging on the order of $50.00 to $100.00 per transaction. That same 200,000,000 transactions for a web application would average like $ 0.03 (yes, 3 cents). Now, if the cost per transaction using tradition RDBMS is something like $ 0.25 (25 cents), how is that going to work for the Web case? What if the cost is $ 0.01? Still epic fail for the web case.
Over-the-top Response Guy! Giving "Over-the-Top Responses" since 1970.
I'm fairly certain that at least as of 8.4 PostgreSQL supports XML fairly robustly.
Over-the-top Response Guy! Giving "Over-the-Top Responses" since 1970.
...in what way do they not map to the relational model? If you say "unstructured data", that is not an answer.
Over-the-top Response Guy! Giving "Over-the-Top Responses" since 1970.
If real businesses like Walmart can track all of their data in SQL databases that scale just fine, Dziuba argues, surely your company can, too.
Oddly enough I'm trying to get to walmart.ca right now, and it's down....
Oh shoei. Must have been a slow day. Yes everybody that uses an SQL database for no good reason is insane. Yes everybody that uses a NoSQL database because it is the latest-and-greatest has the same affliction. Use what fits your purpose. SQL or NoSQL? Does not matter.
ruurd
They have no clue how to scale their systems[1]. Therefore they pass the problem on to the underlying layer and say you do it for me.
[1] They don't understand the mathematics of what they are doing.
Deleted
It's a poor artist / programmer / cook / et. al. that blames his tools. If you know the problem, you use the best tool to solve it. SQL or Document-DBs or Graph-DBs whatever is the best fit to solve the problem is what you use. You don't go around saying something is crap because you have no need for it.
Today many people store data on their private machine using classic file systems and they use databases to store files and to tag them. In future tags or other kinds of attributes will become more important in information storage and retrieval. Therefor we need databases capable of managing such information. RDMS are very good at storing such information and to work with sets and subsets. And tags and attributes of objects/files/entities are nothing more than markers that show to which sets objects belong. So I doubt that SQL databases will go away.
Furthermore, objects in OOP languages are very restrictive. If you look for example at objects (called individuals) in OWL, you can see that data objects can have properties and relationships to other objects which cannot be expressed that easily in OOP language style. Therefor using DBs which are limited by the object model of OOP languages will not suffice.
Because they're gonna tell your non-technical boss to make you use it, and he's gonna listen when they start telling him that Google, Twitter and Facebook do.
Are you adequate?
Joe Stump wrote a post that is a perfect response to this insanity.
http://stu.mp/category/nosql
Why is it that all the people working at scale seems to be going with NoSQL solutions? Are all the devs at Google, Facebook, Twitter, Digg, Redit, etc total idiots or in fact is there a problem that they face that is actually real?
Anybody that sites Amazon, Walmart or any large retailer as an example of why SQL scales is missing the point. Retails have very few write operations compared to the read load. The vast majority of the load hits databases that serve reads and have a high tolerance for write latency. This is a field SQL is good at solving.
On the other hand, social sites that have massive cross user data ties and constant write updates where latency is very important don't fit this model that well. Sure, you can remove SQL replication from the mix, use independent instances of MySQL serving fractions of the overall site, with redundancy between them but if you do that you have functionally built a NoSQL data store. The concept isn't to get right of SQL, its to get rid of the relational aspect of data storage. You can no longer rely on all your data being available to a single SQL statement.
Being an operations guy though I should point out the number one failing of SQL in my world. If you assume that, on average, a machine will either crash or have some sort of hardware failure once a year and you consider a site with 1,000 machines then you see that nearly 3 machines will die every day. Even if you count on 2 years of continuous uptime that is over 1 a day. with 10,000 machines your failure rate is 27 per day, 100,000 machines is 273. This means that any database layer that requires a large number of machines has to build in a recovery layer. Clients need to know that a node is down, when it comes back it needs to have data uploaded to it.. etc. The NoSQL solutions like Cassandra manage this automatically. Trying to do this with MySQL becomes really complicated and you end up implementing all the same logic and constraints in NoSQL solutions anyways. I have seen this happen twice now.
This might be true if they sold items for 1/1000th of a cent, but its simply untrue for any sale anywhere.
Twitters load isn't that impressive, its a poorly written big mess of a service. Its pretty common knowledge that it could be made far better if they would just use some untrained monkeys.
Again, facebook ... bad example.
You've taken two over night one hit wonders that will be gone in a few years and used them as if they are valid examples of how to do it. They aren't, they aren't even close. They are what happens when you grow so fast you don't have a chance in hell of keeping up, so you cobble things together as best you can to survive knowing that its just a matter of time before the fad passes.
Do I think FB and twitter could survive on MySQL? Probably not, but on a real DB with real DBAs, more than likely yes.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
As soon people have an hammer on their hands, all the stuffs they see around are nails. The good IT worker is the one with different tools on his hands with the ability to choose the right one at the right time, and before you forget it remember that premature optimization is the root of all evil.
The person you're replying to is clueless as far as to what 'medical data' is I think.
You should have picked up on this when he starts naming books that he's read. The more name/buzzword dropping you see the more you know the person doesn't really have a clue.
He even had to do a quick google to find some old buzzwords to throw in, I almost want to give him points for throwing in CICS, almost.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
It's easy to hit intrinsic performance limits with SQL databases even on small apps. And for people who aren't database experts, it's even easier since they don't know the hoops to jump through to make their SQL databases perform well. For the average programmer, it's easier to get good performance out of no-SQL databases.
Using SQL databases programmatically is a fairly silly notion to begin with: SQL was originally intended as an easy-to-use query language for non-experts because people were having trouble with navigating data structures. But programmers are excellent at navigating data structures and designing efficient data structures. SQL is solving a problem that most programmers don't have, and you're paying a big performance penalty for that.
Sometimes an SQL database is the right thing to use, sometimes it isn't. People really need to use their head instead of blindly picking one or the other solution.
OK, so which OODBMS do you recommend? I know of Ozon and db4o, both for languages that I rarely use. What about an OODBMS that I can access from C++, Perl, Python, and C? How widely used are they? How good is the support?
"Or is it just that people are throwing consistency out of the window and saying "We can afford to lose a couple of records or have a couple of dangling references here and there, as long as it SCALES""
Hmm, I could actually see that being the case for some applications, probably not one common in the business world but for research it's probably fairly likely you might generate huge datasets where losing individual records wouldn't matter much.
As another poster pointed out, this is a false dichotomy. We're emerging from a technology monoculture of "every DBMS must be SQL" to "it's possible, even viable to design, implement and use a DBMS that does not implement SQL". Anyone advocating a mass exodus from SQL-land is a fool. Props to the NoSQL guys for opening our eyes to fresh ideas.
There's no need to stick with any single DBMS platform for 100% of your organization, unless you're so small that you have but one server total, and then I suspect this debate is largely irrelevant to you anyhow.
We're not Google, but we have some of the same problems Google faced with scaling up our applications for the Internet. We do use MySQL, having some 35+ instances of it. Our application processes in excess of 1000 transactions each second, and we know it'll be difficult to scale to tens of thousands of transactions per second without some fundamental changes. Today we survive by imposing limitations on developers writing OLTP applications--things like "all row operations must search by primary key" and "no table scans, ad-hoc queries or file sorts". The access language is still SQL, but increasingly we don't *need* SQL for OLTP transactions. We could plug in something today that is equivalent and far simpler.
What we're lacking in SQL-land is a good way to host DBMS applications on distributed infrastructure, e.g. in the cloud. There are clustered databases available, but these often require fast/short interconnects, may have difficulty scaling above bandwidth limits on a local network or SAN, and can be frustratingly fragile to use in the "real world". Not to mention expensive. This is due to the consistency model imposed on such systems, i.e. the "AC" in ACID.
Data sharding is a popular way to exceed the limits of SQL, but once you introduce sharding you're treading on "NoSQL" waters already. You can't retrofit sharding onto an application that isn't aware of it, not very successfully, in my experience. So developers need to become acutely aware of the storage tier and design for it, meaning they've already lost the perfect abstraction of SQL.
I'm keenly interested in emerging products like the Cassandra database, and while I have no intent of ever abandoning SQL (and probably even MySQL) in our organization, we're absolutely going to take "NoSQL" for a spin to see what it delivers in terms of cost, complexity, relability and performance.
My team is currently considering a "NoSQL" solution moving away from PostgreSQL, and the reason is: We desperately need Multimaster over the WAN that handles split brain situations gracefully. Its a tough problem and frankly no RDBMS handles it well. I suspect any group who has had to support multiple disperse locations has the same thought.
What was wrong with COBOL? Didn't it solve most of businesses problems? What makes C/C#/C++/Java/Ruby/Perl/PHP so much better?
In other news, some random hammer enthusiast posted on his blog that he just can't wait for screwdrivers to die.
Exactly. That's the point I was making with the value per transaction. The value of a bank transaction or a stock market transaction is considerable - and so are the fees. If Twitter charged you 25 cents per tweet - let along $25 - they'd have no trouble buying a suitable SQL platform to store their data. Mostly because they wouldn't exist.
All our languages are now object, but we're still using non-object databases and mapping between rows and columns and objects. WHY?? Yes, tools can help you map, but its a bandaid, it screws up performance on all but simple cases. And it means you can't do queries using the same model as your language.
Yes, relational algebra is a useful query tool, but there is no need to be beholden to relational table structures to get relational algebra. Neither is the so-called object-relational features of postgresql going to cut it. You can't even do a query and get back a list of objects of different types for goodness sake.
One thing that many people don't seem to get right: Using these "NoSQL" databases doesn't mean that you don't get ACID. Many key-value databases support ACID just fine:
You've got to remember that (simplifying drastically,) SQL is a query language layered on top of a "NoSQL" style database (whether built into the SQL DBMS implementation, or a 3rd party one). Such "NoSQL" databases have to be ACID capable in their native API and implementation first.
Banu
The way languages like F# or Haskell treat "nulls" would be a straightforward definite improvement over the way SQL does (or for that matter, the common C/C++/Java/C# paradigm). A type whose value is either a Foo or nothing is just a tagged union type. So allowing columns to take tagged unions as their type would solve that right away--and also allow to impose further logical distinctions as needed by the application.
The whole three-valued-logic "null is not a value" paradigm of SQL is a disaster, that one's for sure. There are all sorts of query optimizations that are impossible to do on the face of it.
Are you adequate?
Our company has saved immense time and made our applications faster and easier to understand, as well as [theoretically] more secure by switching to Tokyo Cabinet. F*ck SQL and F*ck MySQL especially. I personally wrote some of the interfaces to the TokyoCabinet databases we are using and at this point I have decided I never want to do anything with SQL ever again. Seriously, SQL sucks - it's clunky, easy to introduce security flaws, slow, breaks easily, difficult to access from multiple languages simultaneously, you often have to do things like create special users to do certain things which then introduced more security risk... and on and on. SQL is crappy and should be considered deprecated.
You are right. There is no comparison. Having worked with POS frameworks and the like I can tell you performance is a MUCH bigger issue there.
Interestingly the two largest databases I help with regarding LedgerSMB are a financial services business with over a hundred employees and the other is a convenience store with two tills. And with the POS environment, you have to have top performance. A 10 second delay is something that needs to be fixed quickly and a 30 second delay is almost unworkable. So yes, no comparison. Performance is MUCH more important on the brick and mortar retail end.....
LedgerSMB: Open source Accounting/ERP
He even had to do a quick google to find some old buzzwords to throw in, I almost want to give him points for throwing in CICS, almost.
What are you, twelve? I've developed COBOL and CICS applications, though thankfully, I work mostly in C++ and Java now.
You should have picked up on this when he starts naming books that he's read. The more name/buzzword dropping you see the more you know the person doesn't really have a clue.
No, but thirty-five years of software engineering has taught me that treading on the sacred turf of DBAs gets you one of two possible responses. If you don't make it clear at the outset that you do know what you're talking about, you're immediately dismissed as a clueless outsider. If you do make it clear that you know what you're talking about, you get responses like yours, which just descend into nonsensical nastiness. You can't have a meaningful discussion with people who aren't interested in dealing in good faith.
All of which serves to underscore my original point, which is that there is a deeply entrenched RDBMS faction that can only see problems in terms of the one tool that they have, and react to problems that don't fit the tool well (or at all) by simply denying their existence. The irony is that there is hardly anyone who denies the broad utility of the relational model. The hysterical reaction to the suggestion that not everything fits the model equally well and a few things don't fit it at all only highlights the blind dogma involved.
Proud member of the Weirdo-American community.
Or is it just that people are throwing consistency out of the window and saying "We can afford to lose a couple of records or have a couple of dangling references here and there, as long as it SCALES". Because I can build something that scales if it doesn't have to maintain ACID, too. The difficulty is in having _both_ ACID and scalability.
That's exactly right. To any experienced server engineer/architect, it's obvious that much greater scaling -- both horizontal (automatic sharding of data across nodes) and vertical (more writes/second/node -- can be achieved if you give up the absolute guarantee of zero data loss (...while still usually keeping data *consistency* in the non-lost portion of the data). Many of the social networking type applications... twitter, facebook and the likes, can probably afford that risk. Given that... you can do many many more txns/s with key-value type database instead of the transactionally oriented OLTP type databases. Now, for a smaller organization (shoe-string startups and such), the RDBMS model still has many benefits that can't be ignored --- vast "googleable" knowledge base behind the traditional software products, larger candidate pool with expertise in said systems etc. Unfortunately there really is no standard answer here other than evaluate your own situation carefully and make up your mind based on all data (most of which is available only to you).
Look, guys. Let's be honest here. NoSQL has been around forever; it's the default approach for data storage unless a relational database is selected as a requirement of the software being written (am I the only one who still writes his own file formats and uses record-based random access for small-time data storage? If you don't need the complexity of XML or SQL, then don't use it...)
That being said, NoSQL is just giving that obvious practice a name as if it is a new phenomenon in the development world. Agreed now that it has a name it tends to mislead developers into discarding SQL DBMS irresponsibly, but it does serve an extremely important purpose in the business world: It superficially inflates an otherwise vacuous business process, which under the guise of "innovation", drives business demand.
The IT world does this all the time. They re-package existing solutions, or disrupt them in favor of "new" solutions which to be honest are often unnecessary and more complex than the original solutions. But it drives business. It creates new hardware, new software, new job positions, new education criteria that academia can sell and creditors and government can tax, new system maintenance and migration hurdles; it turns businesses into consumers, it creates new consumers for those businesses, and justifies continuing relationships with consumers when the last product was already good enough.
The mere hype of IT "solutions", however irrelevant or pointless or unnecessary, perpetuates the industry. A lot of it is utter BS. It's all they can do during times when few real advancements are made... and sadly it works too well... and that is the REAL problem with the NoSQL trend. Not bad programming practice. Just artificial business fuel.
Sure, if your transactions are worth something.
In the world of social networking, consistency is much less important than speed. If two different users see different data because the nodes are a few seconds out of sync, no-one cares. But slow answers are wrong answers.
You can't do that with a bank or a stock exchange. It would be a disaster. For a social networking site, no-one will care - no-one will even notice.
We throw strict correctness out the window. That's where most of the performance gain comes from. You still have to build an architecture that can take advantage of this opportunity, though, and that's not trivial.
Consistency, scalability, affordability. Pick two... At most.
Strongly agreed, though I do worry that many NoSQL projects' websites are overly blase about runtime issues, including crash safety and online schema changes, as well as upgrade-safety. Now this is really all about using alpha software rather than anything conceptual/design related, but it is a real issue at the present time.
U.S. War Crimes blog. Email for free Mandriva support.
I wrote this guy off the second he mentioned Walmart's database. How Walmart uses databases vs Facebook or Twitter is completely different. I'm pretty sure Walmart doesn't have 6 million people writing to that database at any given time.
Eric Evans, who coined the term NoSQL and is a committer on the Cassandra project, responded in a blog post:
http://blog.sym-link.com/2010/03/28/haters_gonna_hate.html
Ugh.
The saddest part to me about the "new hotness" of NoSQL zealots is that a scalable, fast, flexible key-value store isn't new at all. It's called LDAP. Sadly, it's continues to be a horribly misunderstood beast. Yes, it's more than a shared address book.
In the end, you use the right tool for the job. SQL is relational. LDAP is hierarchical. Neither is new hotness, so stop pretending to invent. Both perform their jobs exceptionally well.... if you use them for the job they are intended, and learn a little something about the concepts invented before your birth. Chances are, they've been thought through before, and you're being lazy. Go read up.
I've been pretty happy with MongoDB. Why? The document architecture makes ORM a lot easier.
No, I will not work for your startup
Can you give any specifics? Btw, you know that there is more than one isolation level available in Postgres?
sig intentionally left blank
Traditional OODBMS have two major problems... well, maybe three, going against them
1. Hard to adhoc restructure data to do set-based modelling (is this really a downside?)
2. Schema evolution (changing a model from one version to another.
3. Lack of tools sophistication. For ObjectStore (which I have supported in the past and work for today) - we have always had a lack of easy-to-use tools like Crystal Reports and some visualization tools that the SQL market has had almost since day 1. Requiring a programmer to do your data mining is a serious downside to using a pure OODBMS.
Although since we added Xquery support to the product, it's getting easier to do adhoc queries without requiring access to a C++ or Java compiler.
This article seems to totally miss the point about why startups are using NoSQL databases, namely, those that are schemaless. It's because most startups are in the process of building their main product on the fly, pushing out new versions as often as a few days or hours depending on their deployment model. Schema stand in the way of rapid development since you have to CONSTANTLY redefine them as you redefine your product. So updating your db goes something like this: "Oh, I have to change this relationship from one to one to one-to-many." "Well, now that we redesigned part of the database, let's migrate it." "Ok, well enough time has passed. is it done yet? no? okay" "ah, it's done, ok, take down the servers for maintenance and restart" or, you can use something like couchdb, and just insert whatever new data you want on the fly, without defining schema, without migrations, and without downtime. A win for startups. It's not just about scalability, it's also about being able to do a simple task.