Why Some Devs Can't Wait For NoSQL To Die
theodp writes "Ted Dziuba can't wait for NoSQL to die. Developing your app for Google-sized scale, says Dziuba, is a waste of your time. Not to mention there is no way you will get it right. The sooner your company admits this, the sooner you can get down to some real work. If real businesses like Walmart can track all of their data in SQL databases that scale just fine, Dziuba argues, surely your company can, too."
People who don't like SQL should get their heads out of their asses and use MySQL, a robust and enterprise-ready database.
Interesting thesis...
This is like saying "I can't wait for memcached to die" just because your site doesn't need it. Fact is, some do. It's your own fault if you choose to apply unnecessary techniques.
Don't change to newer fancy techniques if you don't understand what they are for and why would you need them.
"MySQL or PostgreSQL," for what it's worth. PostgreSQL is a pretty powerful database, and you should have to make a pretty good argument why leaving a well understood technology that powers a lot (an some of the largest parts) of the WWWeb needs to be trashed for something newer and less tested.
Put identity in the browser.
XML text files all the way! /duck
Why should anything "die"? People choose solutions based on their individual merits. If something doesn't work, exchange it for something that does. I'm sure certain people find NoSQL-type databases perfect for their needs.
In short, people should just shut up about other people's choices and get on with their own.
There's a place for SQL, but there are some cases where BigTable-like (ie. HyperTable) works better. Our company manages data using SQL, but when we present data to the users it's through a HyperTable implementation. SQL is easier to data management but HyperTable uses our server resources better.
It's really that simple. A standard dual socket server with the latest CPU's from Intel or AMD can handle hundreds of requests per second; if one isn't enough, just add more hardware, one month of salary can buy you another node, a year can buy you a whole cluster of rackable systems or a chassis full of blades. If it takes a few months extra for a team to solve the problem the NoSQL way, that's a few months of extra salary costs and missed sales.
Slashdot runs on SQL. I run a site of 1M pages daily (1/3-slashdot according to Alexa) with just a single system with 2x Xeon E5420, Django/PostgreSQL at 10% load. Unless you attract enough attention to require scaling past 10M pages a day, you're wasting your time reinventing the wheel with NoSQL, just stick with a standard ORM, launch your site and start convincing customers and generate sales. You can survive a slashdotting just fine without spending so much time on those exotic tools.
... that I can't tell others what to do!
So you're in surgery for 3 hours doing a kidney transplant, having used your trusty medium vascular clamp that have served you for the past 20 years. You're finally done and the patient is in recovery, so you sit down to relax with the latest copy of JAMA. They've got a great article about the latest development of Cardiac clamps, and you think to yourself "Why not use a heart clamp for kidney transplants!" Brilliant. So you order up some new clamps from MedicalClamps.com, and use them on your next patient. The surgery goes fine, but 3 months later the patient is back in your office with a failed kidney. You open 'em up, and it's obvious the clamp exerted too much pressure on the artery, damaging it in the process. Stupid carciac clamps! You're not a heart surgeon!
AccountKiller
FTA:
"In the meantime, DBAs should not be worried, because any company that has the resources to hire a DBA is likely has decision makers who understand business reality."
Bad English aside, I just don't agree. Money != Reality. I have worked both sides of this coin - Startups with plenty of money but don't see the value in proper maintainance of the data store (one almost was put out of business by a disk failure), and very smart startups that are running lean but do understand the risks.
That said, on the deeper level, why does business reality == SQL? Sure I can scale Oracle to support massive DB's (and have), but I could probably get more value from using Amazon's SimpleDB for things that don't require massive scaling. Use the right tool for the job - Hammers are for nails, etc. Do the design work up front, decide how its gonna work, and the right tool should present itself.
}#q NO CARRIER
Everyone's needs are different, and there are going to be different solutions for those needs. If NoSQL isn't for you then just don't use it (don't spend any time learning it, try it out, running a site with it, etc, etc). I don't have a need for it yet, but we do all sorts of sites and programming so who knows if it will be the right solution for one of our future projects? I won't unless I learn about it, test it and get my hands dirty with it.
And as far as it being 'a product of the braindead and buzzword-infested effluents of the American "education" system, where nobody understands math or logic', I don't care if it came from the bottom of a well in the middle of a jungle where they are masters of logic and math, if it could possibly meet my client's needs then I'm going to give it the time and attention it takes to make the decision for myself.
Real business track their data with SQL databases, true. However, real businesses have small numbers of transactions relative to their value. If Walmart had the same revenue but the average sale was a tenth of a cent, their fancy SQL database would be smouldering rubble.
That's what Facebook and Twitter and other large social media sites are facing. Just try running Twitter's volume and Twitter's page hits and API hits off MySQL. It doesn't matter how many replicas you run, it's not going to work. Maybe you could run it on a cluster of IBM Z-series mainframes running DB2 - but where is the money going to come from?
Cassandra and HBase and the other distributed NoSQL database solve specific problems in specific ways. They won't work for Walmart, but they'll do the job just fine for Facebook and Twitter. If you have those specific scaling problems and can live with the restrictions (you lose ACID, indexes, and joins to varying degrees) then they'll work for you.
If all you know is that your site is running slow, then implementing NoSQL is unlikely to improve things.
I think some developers keep looking for the holy grail. Some magical solution that will turn development from punching in code, to Star Trek: "Computer do my job for me please".
Template languages, 4GL, NoSQL, Ruby on Rails... it is all part of an attempt to take the nasty out of development and they all... well... they all just don't really happen.
Because deep down, with all the frameworks and generators, if you want your code to do what you want it to do, you are still writing out if statements a lot.
And yes, OO and such also belong to this. Not the concept themselves, but the way most people talk about. OO means code re-use right?
If you said yes, then you are a manager, go put on your tie, you will never be any good at coding.
You can re-use all code. And it has been done for a long time.
What, did you think that people who wrote basic for the C64 went "Oh I wrote this bit of code for printing, now I need the same functionality, I am going to write it all over again!"
OO does make code re-use a bit easier BUT that is NOT the claim that people often make. Trust me, I ask this in interviews and it is always the same answer. Apparently you can't re-use functions. No way, no how. NEXT!
I see two kind of developers. Those who hate their job and those who don't. The former want to be managers, get away from writing code as fast as possible. And they will leap on anything that seems to make their jobs easier. Meanwhile the rest of us go on with actually producing stuff.
Just check, how many times do you get one of those managers wannabe introducing something they read in a magazine because it promises that you don't need to write another line of code ever!
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
Use the right tool for the job, except databases, eh?
The simple fact of the matter is that not every app is aiming for Google's scale. (Not every app is web-based or even going to be web-based, though people seem to forget that.) And even some large-scale apps don't fit the relational model very well, medical records being one of the more outstanding examples.
And yes, I have read Codd and Date and understand the relational model and its benefits very well, and it annoys me to no end when people break the relational model without realizing or understanding what it costs them. That said, sometimes those costs are acceptable, and sometimes an application requires features that the relational model does not (and in fact cannot) bring to the table.
It may be, as with every other silver bullet fad, that what's at work here is the basic human tendency to become familiar with something, begin to see everything in terms of it, and then try to persuade anyone who'll listen that they are in possession of the all-singing, all-dancing solution to all problems. Today, it's Ruby, multi-touch interfaces, and functional programming. But not very long ago it was COBOL and CICS. And while one must acknowledge that progress has been made, it is equally obvious that progress will continue to be made and that "one size fits all" is always BS, even in clothing.
Proud member of the Weirdo-American community.
The whole of geek debating is based on the Highlander principle.
No sig today...
Many of the NoSQL sources scale better than a normal database and are available cheap. Oracle costs a fortune, and if you want to run Oracle on a cluster good luck. They also don't let you publish benchmarks without their permission. But most people I know who use Oracle claim it totally beats everything else (without further clarification). DB2 includes a cluster edition that is also quite good. It uses a shared nothing architecture. But none of these solutions are free. Also teradata is also cited as a good parallel database. If you are a start-up and your choice is a NoSQL solution that is almost free or 100,000+ for some commercial parallel database, which do you go to?
But no matter what you will consume resources with a relationship database on ensuring consistency (which many times is what you want but not 100% of the time). Amazon's Dynamo works by not caring so much about consistency and trading consistency for availability of the overall service. For a shopping cart it is fine, but you wouldn't want to do your credit card processing using it. Google's GFS is optimized to do the file operations that google does the most. However there was an article in the ACM not that long ago comparing Map Reduce (Hadoop's implementation) against two parallel databases, and it lost. OF course the Parallel Databases were all not free....and hadoop is....
So overall I'd say the decision comes down to price mostly (as it does with most startups). If you can make do with one server than sure do PostgreSQL (or mySQL...although they always tried to force licensing for commercial products even though it is GPL...). If you need a cluster, both have clustering solutions, but as far as I can tell they are not as good as the commercial Parallel databases. If you have lots of money then sure go with Oracle, it seems through word of mouth Oracle is the best for both parallel and stand alone in terms of performance. DB2 was good enough for a former job. They had terabytes in the mid 1990's using about 20 servers. Now that the hardware is much better I'm sure it scales even better.... But if money is a consideration, then go with an open source noSQL solution. A lot of people now swear by Cassandra, I haven't had a chance to check it out yet.
I'm still fuzzy on what NoSQL is supposed to be and what it is supposed to bring to the table.
From what I've understood, it's basically a common banner for various different databases that all share the common property of not being relational databases and not providing ACID guarantees.
If so, it seems to me that the whole NoSQL vs. RDMBS debate is about a false dichotomy. There are some applications where a relational database is the right tool for the job, and there are some where a relational database is not the right tool for the job. In some of those latter cases, one of the NoSQL databases may be the right thing.
This is nothing new. Non-relational databases have been used on Unix for a long time, and are even a standard part of POSIX (see for example the manpage for dbm_open). It's also long been known that, for example, Berkeley DB can be a lot faster than an RDBMS - as long as your application doesn't make use of all the features an RDBMS provides. Lots of programs even don't use one of these database systems, but invent their own, custom format. Git is a very successful example of this.
To me, it seems that what we are seeing here is loads of people who had learned to use relational databases for all their storage needs discovering that there are other ways to store data, and that one of those methods may work better than an RDMBS for a particular application. Well, yes. Does that surprise anyone? It sure doesn't surprise me. Does it mean that RDMBSes are now useless? Not at all. Does it mean you should use a non-relational storage system where this makes more sense? Of course! Now, can we please get back to work? I don't see the point of having a holy war over whether RDBMS or NoSQL is better, when common sense says that they both have their uses.
Please correct me if I got my facts wrong.
People complaining about SQL performance are most likely either using incorrectly scaled machines for the job, or believe they can throw a four-line SQL statement at the database and expect it to work out the optimization on its own ... query optimizers may be able to do a decent job on average, but once you go large databases (multi-million dataset tables), planing the query structure will go a long way preserving performance. ...
Yes one can write complicated queries to return exactly what you want in one query, but in many cases doing some logic around it and using smart grouping/loops will outperform the complex query
Our development organization is heavily invested in PostgreSQL, finding it to be perfectly matched to almost all of our needs. It is exceptionally reliable, and is very (but not perfectly) manageable. (We've had issues in the past with mis-timed auto-VACUUM for instance which are now resolved.) We even found a small but significant corner-case bug which upon being reported, received immediate attention from the developers, resulting in a resolution in under 72 hours. I believe our use of this particular tool has saved us significant resources (dollars, developer time) that has allowed the development organization to direct our time and money to our own application development.
But we're finding that even PostgreSQL has limits, mostly with respect to the large and growing datasets our application uses for large scale real time control. We could transition to a really expensive SQL solution, but we are at least considering the choices that may be a better fit for these particular subsystems than PostgreSQL or any other SQL solution. Just a few weeks ago, we started seeing a good comment in teh interWebs... "NoSQL" should mean "not only SQL".
Not a rejection of a powerful toolkit that holds a central role in our organization, but rather a recognition that we would be remiss in our responsibilities if we didn't pay attention to the choices that could simplify our lives as developers.
Bullshit.
ActiveRecord? Definitely. Rails as a whole? You might consider replacing it with another Ruby framework, but the same ideas are going to apply. Remember how Rails and Merb are merging? Merb tends to be ORM-agnostic, but the recommended Merb stack suggested DataMapper, which does support a few NoSQL databases.
Even if you needed a different ORM per NoSQL database, it wouldn't marginalize Rails as a whole, but that simply isn't the case. Just use DataMapper, then plug in the flavor of the day.
As an example, Rails (and DataMapper) run on Google App Engine.
Don't thank God, thank a doctor!
The article focuses on NoSQL's claim to scalability, but isn't that just one of the features of (some of the) NoSQL options?
Google, Amazon, and Microsoft all provide NoSQL storage as a service that is easy to use and cheap, particularly for getting started. Those are two pretty important features and I would imagine that it is those features, rather then dreams of needing vast scalability, that attract the many web startups.
For about 30 seconds, until the VC money dries up....
The point isn't (generally, there might be some pathological corner case) that the various web2.0 kiddies couldn't implement their stuff in SQL; but that they couldn't afford to do so. If you want to be able to serve large numbers of users in order to generate enough adsense pennies to keep the lights on until somebody buys you, your options are pretty much A). Software with a more or less zero per-node cost, running on commodity x86s with no exotic interconnect. or B) Bankruptcy.
Each transaction of those 200,000,000 for WalMart is a fairly significant source of revenue. Averaging on the order of $50.00 to $100.00 per transaction. That same 200,000,000 transactions for a web application would average like $ 0.03 (yes, 3 cents). Now, if the cost per transaction using tradition RDBMS is something like $ 0.25 (25 cents), how is that going to work for the Web case? What if the cost is $ 0.01? Still epic fail for the web case.
Over-the-top Response Guy! Giving "Over-the-Top Responses" since 1970.
It's a poor artist / programmer / cook / et. al. that blames his tools. If you know the problem, you use the best tool to solve it. SQL or Document-DBs or Graph-DBs whatever is the best fit to solve the problem is what you use. You don't go around saying something is crap because you have no need for it.
Because they're gonna tell your non-technical boss to make you use it, and he's gonna listen when they start telling him that Google, Twitter and Facebook do.
Are you adequate?
As soon people have an hammer on their hands, all the stuffs they see around are nails. The good IT worker is the one with different tools on his hands with the ability to choose the right one at the right time, and before you forget it remember that premature optimization is the root of all evil.
It's easy to hit intrinsic performance limits with SQL databases even on small apps. And for people who aren't database experts, it's even easier since they don't know the hoops to jump through to make their SQL databases perform well. For the average programmer, it's easier to get good performance out of no-SQL databases.
Using SQL databases programmatically is a fairly silly notion to begin with: SQL was originally intended as an easy-to-use query language for non-experts because people were having trouble with navigating data structures. But programmers are excellent at navigating data structures and designing efficient data structures. SQL is solving a problem that most programmers don't have, and you're paying a big performance penalty for that.
Sometimes an SQL database is the right thing to use, sometimes it isn't. People really need to use their head instead of blindly picking one or the other solution.