SQL Vs. NoSQL: Which Is Better?
Nerval's Lobster writes "For the past 40-some years, relational databases have ruled the data world. Relational models first appeared in the early 1970s thanks to the research of computer science pioneers such as E.F. Codd. Early versions of SQL-like languages were also developed in the early 70s, with modern SQL appearing in the late 1970s, and becoming popular by the mid-1980s. For the past couple of years, the Internets have been filled with heated arguments regarding SQL vs NoSQL. But is the fight even legitimate? NoSQL databases have grown up a bit (and some, such as Google's BigTable, are now mature) and prove themselves worthy. And yet the fight continues. Tech writer (and programmer) Jeff Cogswell examines both sides from a programming perspective."
SQL and NoSQL are different, with different use cases.
"No".
Might as well just ask: Which is better a BMW M3 or Ford F350 4x4 with 6.7 diesel?
Both are great, have their place and will get you from point A to B, but neither are a practical replacement for the other.
My current programming project is a mixture of Cassandra and Oracle (although, to be honest, I'd rather be using PostgreSQL or even MySQL).
I'm sure many of you reading this have seen this, but it's still funny anyway... http://www.youtube.com/watch?v=b2F-DItXtZs
Replace "SQL" with "hammer" and replace "NoSQL" with "Screw Driver" and then ask the question again. You will see how silly it actually is.
The right tool is best for the particular job at hand, always. If you refuse to define the job, it is not possible to guess ahead of time which tool will be better for it.
Here's the answer to pretty much every "Which is better" question:
- Option 1 is better in cases where option 1 provides more advantages and less disadvantages than Option 2.
- Option 2 is better in cases where option 2 provides more advantages and less disadvantages than Option 1.
- In cases where neither Option 1 nor Option 2 provide a clear advantage/disadvantage distinction, neither is better and either may be used depending on preference.
Rarely is the answer ever "X is better than Y in all possible cases."
My sci-fi novel, Ghost Thief, is now available from Amazon.com.
Kang: SQL for all.
*crowd boos*
Kang: Very well, NoSQL for anyone.
*crowd boos*
Kang: Hmm... SQL for some projects, miniature NoSQL implementations for others.
*crowd cheers*
My work here is dung.
This is very close to providing the car anology I was looking for, but it just misses the bus on that.
Betteridge's Law of Headlines is the rule you're looking for.
For generic applications that do not have a vast amount of user volume or data set size, NoSQL or any SQL Generator is fine. It is also fine for most of the standard and generic go down a primary key query or do a simple join. However, the more complex queries on larger systems need reviewing. The biggest problem with NoSQL is developers just don't want to be bothered and expect their procedural logic to automatically run in a 20 terabyte database that gets over a million hits a minute. This is the higher end for systems I work on, but it also happens in smaller ones.
We get by far the worst SQL submitted to us by developers who generate SQL and in general don't know anything. Large Databases rarely stay extremely well normalized. There are rarely data architects around to enforce this. Developers who are in a hurry to meet deadlines denormalize and just add columns. When you do this, over time your sql gets more complex and query generators are not very good.
Query generators can generate alot of basic sql, but as time goes on requirements get more complex. Developers are building on what other developers did before them. A lot of this data is not normalized and have ridiculously complex logic. We generally get emails from developer going 'this query is slow' and that is it. Or we get I did a query just like this before, but this one is slow. The query generator may be making queries on the fly. So they think its the same thing, but its actually different.
One other thing that often happens that people overlook is that these tools generate too much SQL. Instead of getting data in 1 sql statement and have a normalized set of tables, I have clicked a button and run 15 sqls serially. When you get alot of traffic, the round trips and the CPU increase adds up. Developers don't know this is happening because it is all done behind the scenes.
I have had developers with over 10 years experience (some up to 30 years) who can't even figure out the following:
1. why a query that returns millions of records is slow or can understand the question of 'what the hell is the user going to do with all this?'
2. why taking fields out of the 'where' clause can affect the query. Dude, cause your no longer using an index.
3. why running the same query millions of times in a loop would be slow (this is serial). databases are optimized to do stuff in straight sql. Ok this one is not really easy to get at first and it won't be obvious, but if you have done this for 10 years you should have seen it and if I tell you this 5 times and you keep doing it... seriously. This is not that hard.
4. how different parameters can affect a query. If you run a query that brings back one record, then change the parameters and it has to go through 500 gbs of data your response time will slow down a bit right?
5. 'it worked in dev'. your Dev DB had tables with 10 records. Prod has 20 terabytes of data. We have told you that prod is much bigger. So you need to atleast check to see if its using an index. This is NOT difficult and I show them how, but they don't care.
6. your queries are 'slow' because your query generators run 26 queries(serially) when I click this one button. You can combine these to 4 or less and if you pay more attention to the data model and let us making a few simple changes we can get it to 1. However, the 4 i am giving you is fine for now. i can even show them how to audit their sessions activity and how to run a simple query to see what is going on. They click a button and they can see exactly what is happening in the DBA. Most don't care.
My biggest beef is I tell them what is wrong, I try to explain to them why this is a problem and the vast majority just don't care. They ignore and then do the same thing again. Apparently they don't care that I am subject to 24x7 paging on this stuff and I can't go home if users are complaining, while the developers can go home to their families.
My other beef is that SQL is not that hard. Its easier than coding. I have been a developer before
Typically Sql developpers tend to throw everything into the data base, then create marvelously large queries, and finally pout when you complain about performance, that "if they had had the time they would have some stored procedures, and the server is too slow anyway...."
This should be marked as flamebait. This is only true of some developers i.e. those who are ignorant about RDBMS. They should make it their business to understand RDBMS especially in large scale applications where performance is critical. This particular aspect alone forces any developer or DBA to have think "hard" about the structure of their data such as transactional vs. analytical needs. If used appropriately, an RDBMS can be quite intuitive and performance/space efficient. It all comes down to understanding the tool you're using. If you don't know how to use a screw driver or a hammer, you probably shouldn't be using it!
We'll make great pets
NoSQL does make developers' lives easier, though, because they don't necessarily have to think about their data in advance,
I've heard claims like these a lot, and I must say that I've never understood them. Programs maniuplate data. I just don't see how you can write a program without knowing about your data. It makes no sense.
SJW n. One who posts facts.
"Tech writer (and programmer) Jeff Cogswell examines both sides from a programming perspective."
Irrelevant. The data exists to serve the needs of the business and programmers/developers work to serve those needs. The company should chose the best tool for the job which is a usually a relational database as it serve the needs of the "business" the best in most cases. If you are looking to see which is easier for you then you are a shitty programmer and you need to upgrade your skills to understand how to work with relational databases. You should not be dictating what storage methodology is used for the data.
To be a competent developer, you need to have some understanding of how databases work because you cannot rely on the DBA to babysit all of the projects. You should understand what indexes are, the difference between and inner and outer join and when you can use each time and you should test your code against a large data set to find any bottlenecks on the database side.
Jesus was a compassionate social conservative who called individuals to sin no more.
For the past 40-some years, relational databases have ruled the data world.
Bullshit.
In 1972 hierarchical databases ruled the world (with a few network-model attempts here and there) and continued to do so well into the 1980s. In fact, the theory behind relational databases had only been articulated and published in June 1970. In further fact Oracle wasn't founded until 1977, and didn't ship anything until 1979, and they were the first to successfully promote that new-fangled "relational" stuff in a commercial product--prior to that IBM kept it locked up in the lab, except for some very obscure "mostly demo-ware" things, so it wouldn't threaten their then-current cash cow: IMS. IBM's entry into the relational database world, in the early 1980s or so, was a direct response to the growing sales of Oracle.
Also in the 1980s we got: Sybase, Informix, Ingres, MS SQL Server. Then in the 1990s we started getting open-source RDBMSs, along with actually robust versions for Windows-based servers. Then in the 2000s holy crap we even got good database servers on Macs!
Anyways, relational databases have really only "ruled the world" for the past 20 some years ;-)
As other have said, there isn't one that's "better" than the other in a general sense. However, there are situations in which one is better suited to a task at hand.
This is of course something that applies to many different aspects of application design and architecture.
As an example, I'm developing a high volume, high transaction website application and use both PostgreSQL and MongoDB.
We use SQL where strict relations, type checking, and data integrity are required. The SQL database has the extremely important function of making sure the data given to it by the application is coherent. I realize that MongoDB has functions for checking data integrity, but it is tricker to get right in my opinion and experience (it does allow greater flexibility however). Also, the application has the need for atomic operations and transactions, which MongoDB does not provide.
MongoDB on the other hand, is used where it delivers better performance than PostgreSQL. For example all our logging is sent there, giving near-disk performance while allowing quick and easy searching and archival. Our session is also handled by MongoDB. Finally we make great use of gridFS for all our uploaded content and document storage. We're also looking into MongoDB for data analysis and reporting, fed data from SQL.
So there's no reason to pick one over the other, a mix and match approach will yield better results. Where tasks require greater speed and have loose integrity requirements, go for NoSQL. When the data absolutely needs to be coherent and is by its nature relational, go for SQL.
Also, PostgreSQL will soon support embedding JSON objects directly, so some sort of hybridization is foreseeable in the future. As of now we simply put the Mongo ID in SQL when we need to reference.
I think it has more to do with inexperience than culture. The inexperienced tend to gravitate to the shiny hammer and look at all the problems as nails. The experienced have the skill set to pick the right tool for the job.
New tools tend to attract more inexperienced developers since they haven't developed much of a preference and seem to be the most affected by hype. This ultimately leads to the new tool being used in all cases (regardless of suitability) instead of the correct or more appropriate tool.
Notice that I'm not saying that the new tool is inferior, just being misused.
These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...
Speaking as a professional SQL Developer with OVER TWENTY YEARS of experience, an RDBMS is not the answer to every problem. Sometimes NoSQL makes sense.
For example, if I'm dumping some random user data that will never be formatted/standardized/normalized—that can be different domains for every user—NoSQL might be the right choice.
Maybe I want to store a user's favorite object (puppy, car, toy, steak 'doneness') and I don't want to have a child-table-from-hell lookup table. NoSQL is a great option.
On the other hand, if I want to do some sort of row lookback, then it is far easier in a relational DB. For example, if I want to find the salary average and of all of people in the same department as the most recent new hire or the average working 'lifespan' (how long before the person quits, gets fired or dies) of every department vs. their salary range*, then it is pretty easy.
Now get off my lawn.
* Yes, real-world examples.
Yeah, right.
I've been coding professionally for 11 years, have been hobby coding for about 20.
Recently, I've been exposed to Agile, Scrum, XP, TTD, User Stories, Sprints, Pair Programming, and now NoSQL. All these things, I have to say, are contributing massively to my strong considerations to hang up my mouse and keyboard.
My first experience of Agile was working for an investment bank where they decided that, no matter if the code was buggy or was only partly complete, we would push it out to the clients. No problem, our next sprint would fix the bugs. Another project I worked on saw me having to attend hours and hours of meetings, filling out small cards to stick to white boards, listening to people who have no relevance to my project talking about what they were doing, and constantly giving estimates to project managers so that they could make further adjustments to later sprints. When I finally sat down to code that day, it was about 3 lines. I wasn't allowed to work on anything else, because that hadn't been assigned to this sprint. Fun.
I recently had a telephone interview with a man who spoke to me for 40 minutes straight about agile and did not ask me one single technical question. Nothing on Java, Spring, Hibernate, XML, SQL, or anything else listed on my CV. He even wanted to know whether I used physical note paper or software for details the tasks and user stories. When he asked me two days later if I would like to come in for an interview, I declined. I want to code, not work for a bureaucracy.
I remember when coding was fun and we didn't have to adopt all these so-called methods. I have no idea what NoSQL is, all I remember is someone at my last contract deciding we were going to use it and then teaching everyone how to query using JSON or Javascript or whatever. It took us a few weeks to get our heads around the idea, and I have to wonder what the benefit of writing Java-JSON-Mongo DB interfaces were over SQL. He did not do this because the project needed it, but because he had heard about the system and wanted to shoe horn it into the project. Seriously, that was the only reason.
I'm not even sure what Agile is, to be honest. I think it's just some fancy term used by managers to make it sound as if they're being efficient and know what's happening.
One has to wonder exactly what was wrong with the previous approaches. We all still had working software 6 years ago, from what I remember ...
THE HONOUR OF THE KNIGHTS - CC Licensed Sci-Fi Novel
I disagree, especially for inserts and updates. There should be no raw SQL embedded in programs. Most programmers are most decidedly not competent in the relational model (the very existence of numerous NoSQL fanbois is a testament to this fact.) Indeed, even many DBA's aren't either as the job is rife with far too many sys admin types rather than people that are concerned that they do a properly normalized logical model as a mandatory first step before physical implementation.
That's kind of different than programmers not writing SQL. Since SQL is strings, and there shouldn't be hardcoded strings embedded in programs, it stands to reason that there shouldn't be raw SQL embedded in programs.
That doesn't mean that programmers shouldn't write SQL, or that non-programmers should be writing stored procedures so that programmers can avoid writing SQL.
While I agree that most programmers aren't competent in the relational model, the solution isn't to accept that and ban programmers from writing SQL, its to demand that programmers -- at least those working on the model side (less reason for concern from programmers doing, say, pure UI work) -- be competent with the relational model, which is fundamental to understanding data. Further, while I agree with your conclusion, it doesn't follow from the evidence you cite: proficiency with the relational model doesn't mean that you have an obsessive need to have either a RDBMS or a DBMS that speaks SQL (which aren't the same thing, as you can be either without the other) used for all data storage needs. Understanding the relational model is fundamental to effectively modelling data, whether its going to be stored in a black box where someone else has implemented the relational model for you or not (probably moreso when "not".)
That's why they are called data base administrators. Data modelling and database development are more related to programming than to database administration.
I was tweaking the AC's reliance on Argumentum ad Verecundiam, which is (often) a fallacy.
I looked up alternatives to Alzheimer's, and Pick's Disease looks like an introvert-troll/basement-dweller's dream. Check out the list of behavioural and emotional changes.
Yeah, right.