Why Some Devs Can't Wait For NoSQL To Die
theodp writes "Ted Dziuba can't wait for NoSQL to die. Developing your app for Google-sized scale, says Dziuba, is a waste of your time. Not to mention there is no way you will get it right. The sooner your company admits this, the sooner you can get down to some real work. If real businesses like Walmart can track all of their data in SQL databases that scale just fine, Dziuba argues, surely your company can, too."
People who don't like SQL should get their heads out of their asses and use MySQL, a robust and enterprise-ready database.
Interesting thesis...
This is like saying "I can't wait for memcached to die" just because your site doesn't need it. Fact is, some do. It's your own fault if you choose to apply unnecessary techniques.
Don't change to newer fancy techniques if you don't understand what they are for and why would you need them.
Hierarchical DBMS have been around longer than SQL-style RDBMS for a long time. OODBMS have existed for a long time as well. Many of these "NoSQL" DBs don't provide the same restrictions or assurances that an RDBMs provides but they often have other features.
BDB isn't going away and neither is SQL. Get over it.
"MySQL or PostgreSQL," for what it's worth. PostgreSQL is a pretty powerful database, and you should have to make a pretty good argument why leaving a well understood technology that powers a lot (an some of the largest parts) of the WWWeb needs to be trashed for something newer and less tested.
Put identity in the browser.
XML text files all the way! /duck
Why should anything "die"? People choose solutions based on their individual merits. If something doesn't work, exchange it for something that does. I'm sure certain people find NoSQL-type databases perfect for their needs.
In short, people should just shut up about other people's choices and get on with their own.
I liked the pictures. Is there a name for the muppet guy in the first one?
There's a place for SQL, but there are some cases where BigTable-like (ie. HyperTable) works better. Our company manages data using SQL, but when we present data to the users it's through a HyperTable implementation. SQL is easier to data management but HyperTable uses our server resources better.
It's really that simple. A standard dual socket server with the latest CPU's from Intel or AMD can handle hundreds of requests per second; if one isn't enough, just add more hardware, one month of salary can buy you another node, a year can buy you a whole cluster of rackable systems or a chassis full of blades. If it takes a few months extra for a team to solve the problem the NoSQL way, that's a few months of extra salary costs and missed sales.
Slashdot runs on SQL. I run a site of 1M pages daily (1/3-slashdot according to Alexa) with just a single system with 2x Xeon E5420, Django/PostgreSQL at 10% load. Unless you attract enough attention to require scaling past 10M pages a day, you're wasting your time reinventing the wheel with NoSQL, just stick with a standard ORM, launch your site and start convincing customers and generate sales. You can survive a slashdotting just fine without spending so much time on those exotic tools.
... that I can't tell others what to do!
So you're in surgery for 3 hours doing a kidney transplant, having used your trusty medium vascular clamp that have served you for the past 20 years. You're finally done and the patient is in recovery, so you sit down to relax with the latest copy of JAMA. They've got a great article about the latest development of Cardiac clamps, and you think to yourself "Why not use a heart clamp for kidney transplants!" Brilliant. So you order up some new clamps from MedicalClamps.com, and use them on your next patient. The surgery goes fine, but 3 months later the patient is back in your office with a failed kidney. You open 'em up, and it's obvious the clamp exerted too much pressure on the artery, damaging it in the process. Stupid carciac clamps! You're not a heart surgeon!
AccountKiller
I think this fellow's blog entry sums this up pretty nicely - especially the last paragraph: http://blog.cleverelephant.ca/2010/03/nonosql.html
Shut your whining.
Say what you want about Ted but you must respect his technical opinions. After all, the man is well known for having a job at Google at some point.
FTA:
"In the meantime, DBAs should not be worried, because any company that has the resources to hire a DBA is likely has decision makers who understand business reality."
Bad English aside, I just don't agree. Money != Reality. I have worked both sides of this coin - Startups with plenty of money but don't see the value in proper maintainance of the data store (one almost was put out of business by a disk failure), and very smart startups that are running lean but do understand the risks.
That said, on the deeper level, why does business reality == SQL? Sure I can scale Oracle to support massive DB's (and have), but I could probably get more value from using Amazon's SimpleDB for things that don't require massive scaling. Use the right tool for the job - Hammers are for nails, etc. Do the design work up front, decide how its gonna work, and the right tool should present itself.
}#q NO CARRIER
Strangely enough, Walmart does use more than just regular SQL to manage the data systems for all those items. OLAP systems provide much better
summarized system of data when SQL starts to bog down. I know this from doing tech support for them in the past.
I think the author of TFA is missing something: not all databases / datastores are developed for businesses to keep track of their inventories. These days, many scientific disciplines, such as bioinformatics, rely heavily on databases as well.
The latest experimental techniques produce so much data such that "old-fashioned" RDBMSs just don't cut it anymore. So, for certain application domains, NoSQL seems to be at the moment the way forward. I'm afraid the author can wish all the he wants but NoSQL is gonna be around for a while. Until something better comes up, that is.
The article was stripped of all nuance and then injected with confusing bits. e.g.
>NoSQL will never die, but it will eventually get marginalized, like how Rails was marginalized by NoSQL
What? How was Rails marginalized by NoSQL?
Also, it's nice to see the whole BerkeleyDB-ish/key-value sector of the data storage world suddenly exploding with innovation. There's a lot of dogma on both sides of the NoSQL argument (and the name "NoSQL" doesn't help), but some of the many NoSQL tools look as though they'll be pretty useful. Cassandra and MongoDB especially. And big companies getting behind the growth of new tools is never a bad thing.
Everyone's needs are different, and there are going to be different solutions for those needs. If NoSQL isn't for you then just don't use it (don't spend any time learning it, try it out, running a site with it, etc, etc). I don't have a need for it yet, but we do all sorts of sites and programming so who knows if it will be the right solution for one of our future projects? I won't unless I learn about it, test it and get my hands dirty with it.
And as far as it being 'a product of the braindead and buzzword-infested effluents of the American "education" system, where nobody understands math or logic', I don't care if it came from the bottom of a well in the middle of a jungle where they are masters of logic and math, if it could possibly meet my client's needs then I'm going to give it the time and attention it takes to make the decision for myself.
Look at his other blog posts and you'll realize he's just another all-knowing all-trolling developer. Screw him.
Real business track their data with SQL databases, true. However, real businesses have small numbers of transactions relative to their value. If Walmart had the same revenue but the average sale was a tenth of a cent, their fancy SQL database would be smouldering rubble.
That's what Facebook and Twitter and other large social media sites are facing. Just try running Twitter's volume and Twitter's page hits and API hits off MySQL. It doesn't matter how many replicas you run, it's not going to work. Maybe you could run it on a cluster of IBM Z-series mainframes running DB2 - but where is the money going to come from?
Cassandra and HBase and the other distributed NoSQL database solve specific problems in specific ways. They won't work for Walmart, but they'll do the job just fine for Facebook and Twitter. If you have those specific scaling problems and can live with the restrictions (you lose ACID, indexes, and joins to varying degrees) then they'll work for you.
If all you know is that your site is running slow, then implementing NoSQL is unlikely to improve things.
I'm a blogger that doesn't really have anything worthwhile to blog about so I make up bullshit things to post on my blog. Now, if you consider the millions and millions of blogs out there, it's impossible to get noticed if you have a thought provoking, insightful, honest opinion. Therefore, I have chosen to write "Trollish" and "Flamebait" "articles". So far, as you can see, it has worked quite well.
Granted, I offer nothing to knowledge and ongoing debates and conversations, but I have this fantasy that I can make money from my blog and leave my boring and monotonous IT job.
Your truly;
Some dipshit blogger.
Adding more garbage to the stinking pile of the internet.
Why should I give two shits about what database system someone else uses?
Don't take life so seriously. No one makes it out alive.
I think some developers keep looking for the holy grail. Some magical solution that will turn development from punching in code, to Star Trek: "Computer do my job for me please".
Template languages, 4GL, NoSQL, Ruby on Rails... it is all part of an attempt to take the nasty out of development and they all... well... they all just don't really happen.
Because deep down, with all the frameworks and generators, if you want your code to do what you want it to do, you are still writing out if statements a lot.
And yes, OO and such also belong to this. Not the concept themselves, but the way most people talk about. OO means code re-use right?
If you said yes, then you are a manager, go put on your tie, you will never be any good at coding.
You can re-use all code. And it has been done for a long time.
What, did you think that people who wrote basic for the C64 went "Oh I wrote this bit of code for printing, now I need the same functionality, I am going to write it all over again!"
OO does make code re-use a bit easier BUT that is NOT the claim that people often make. Trust me, I ask this in interviews and it is always the same answer. Apparently you can't re-use functions. No way, no how. NEXT!
I see two kind of developers. Those who hate their job and those who don't. The former want to be managers, get away from writing code as fast as possible. And they will leap on anything that seems to make their jobs easier. Meanwhile the rest of us go on with actually producing stuff.
Just check, how many times do you get one of those managers wannabe introducing something they read in a magazine because it promises that you don't need to write another line of code ever!
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
Walmart's primary business isn't online. No comparison.
I think the frustration is actually in some people not using the right tools for the job. I like NoSQL databases (specifically MongoDB), but I have not used them with anything I've written. Why? Because it wasn't the right tool for the job. I tend to use MySQL, Postgres or sqlite because it's so widely available and well known in how to administer. There are times that NoSQL will makes sense, it's just not the area I work in.
...). Since NoSQL seems to be a popular tool, and "the cloud" is a popular buzz phrase CIO's/CTO's will likely be pushing their shops to utilize "NoSQL in the cloud". While large scale applications which don't require relational information and need fast syncing across many servers is good grounds for NoSQL, these "NoSQL in the cloud" instances will probably not actually fit that status.
I do think we are going to continue seeing an uptick in NoSQL related things since many companies are fixated on "the cloud" while not really knowing what "the cloud" is (heck, no one still really, truly has a common definition of what it means
I do agree that it will be a good thing when "NoSQL for everything" dies. Just like it was a good thing when "PERL for everything", "Java for everything" and "Ruby for everything" died, but let's not throw out the whole idea because a lot of people use it wrong.
The company I work out is currently having a huge headache moving from files into databases. We currently store everything in XML which gives us a great amount of freedom and adaptability. However most database solutions fix you to a single (or handful) of data definitions. Which you can kind of re-create XML be defining all kinds of crazy relationships, it gets hugely convoluted (to say the least).
I would LOVE to see a document/XML-live database. Just needs to do things that standard databases support (e.g. Security Model, Easy Mirroring, Search/Queries) to make it worth our while moving at all. Last I checked we're up to 260,000 XML files and approx 40 different distinct file "formats" (XML layouts).
So they should all use the same data management tools as wallmart. Is that the reasoning? Better to use the right tool for each job. Some things work better in a nosql non-schema.
Better to light a candle than complain about the darkness.
Agree with other posters. sql is a tool. The point about nosql is that is is a different tool. ACID in a database is fine normally. However, if you can live without it, which many can, do so!
I don't like SQL, but I do like relational databases. If only someone would come up with a relational query language with nice, non-COBOL-esque syntax (maybe lispy...) that just did much the same thing as SQL, and add it to a powerful RDBMS engine like postgresql's or something. (Yes, I'm aware of the history, and that SQL was added to postgres which followed on from ingres which used a non-sql relational language that was somewhat nicer than SQL... oh, the ironing...)
At first, I thought NoSQL like Cassandra should simply be used as a store for precomputed relationships. Then I thought NoSQL was just a structureless store that can scale in any given direction with no effort.
Both sound interesting, but then the debate against NoSQL is just "well, SQL can already do all that, but you get data integrity with it. If it doesn't scale, then just build a manly man's server and it will".
So, I dunno. The whole debate has gotten very religious very quickly and as a result, no one is really doing a proper comparison because no one seems to take the approach of "right tool for the right job, so here are the jobs NoSQL Is right for, and here are the jobs your RDBMS is good for".
I'm god, but it's a bit of a drag really...
You can survive a slashdotting just fine without spending so much time on those exotic tools.
Care to provide a link to your site so we can test this?
Use the right tool for the job, except databases, eh?
The simple fact of the matter is that not every app is aiming for Google's scale. (Not every app is web-based or even going to be web-based, though people seem to forget that.) And even some large-scale apps don't fit the relational model very well, medical records being one of the more outstanding examples.
And yes, I have read Codd and Date and understand the relational model and its benefits very well, and it annoys me to no end when people break the relational model without realizing or understanding what it costs them. That said, sometimes those costs are acceptable, and sometimes an application requires features that the relational model does not (and in fact cannot) bring to the table.
It may be, as with every other silver bullet fad, that what's at work here is the basic human tendency to become familiar with something, begin to see everything in terms of it, and then try to persuade anyone who'll listen that they are in possession of the all-singing, all-dancing solution to all problems. Today, it's Ruby, multi-touch interfaces, and functional programming. But not very long ago it was COBOL and CICS. And while one must acknowledge that progress has been made, it is equally obvious that progress will continue to be made and that "one size fits all" is always BS, even in clothing.
Proud member of the Weirdo-American community.
The main issue is really that most *large* SQL servers are row based and not column. http://www.ingres.com/vectorwise/ - *opensource*
The whole of geek debating is based on the Highlander principle.
No sig today...
Sure, Ive messed around with some NoSQL databases, they just aren't my thing, give me mysql, your spec and a cup of tea and i dont have to look round silly experiments to see the best way of doing things in new radical 'paradigms.' That being said, I am glad the experiments are being done by people who are in such an environment to experiment. I mean, like the article says, its the social networks like twitter and facebook developing things like Cassandra, and its good that there is someone pushing the bar, but they are the only people who CAN do this, they aren't necessary, nobodies gonna die from a 5 minute outtage of poking each other (that sounds bad). I havent really understood the whole NoSQL thing,I havent really ever had a problem with SQL based Databases, maybe thats just the nature of my work, but it all seems as though this has nothing to do with technology, just people who want to be heard...
Many of the NoSQL sources scale better than a normal database and are available cheap. Oracle costs a fortune, and if you want to run Oracle on a cluster good luck. They also don't let you publish benchmarks without their permission. But most people I know who use Oracle claim it totally beats everything else (without further clarification). DB2 includes a cluster edition that is also quite good. It uses a shared nothing architecture. But none of these solutions are free. Also teradata is also cited as a good parallel database. If you are a start-up and your choice is a NoSQL solution that is almost free or 100,000+ for some commercial parallel database, which do you go to?
But no matter what you will consume resources with a relationship database on ensuring consistency (which many times is what you want but not 100% of the time). Amazon's Dynamo works by not caring so much about consistency and trading consistency for availability of the overall service. For a shopping cart it is fine, but you wouldn't want to do your credit card processing using it. Google's GFS is optimized to do the file operations that google does the most. However there was an article in the ACM not that long ago comparing Map Reduce (Hadoop's implementation) against two parallel databases, and it lost. OF course the Parallel Databases were all not free....and hadoop is....
So overall I'd say the decision comes down to price mostly (as it does with most startups). If you can make do with one server than sure do PostgreSQL (or mySQL...although they always tried to force licensing for commercial products even though it is GPL...). If you need a cluster, both have clustering solutions, but as far as I can tell they are not as good as the commercial Parallel databases. If you have lots of money then sure go with Oracle, it seems through word of mouth Oracle is the best for both parallel and stand alone in terms of performance. DB2 was good enough for a former job. They had terabytes in the mid 1990's using about 20 servers. Now that the hardware is much better I'm sure it scales even better.... But if money is a consideration, then go with an open source noSQL solution. A lot of people now swear by Cassandra, I haven't had a chance to check it out yet.
If you get to the size of Walmart doing anything, you have access to the capital to get a system from IBM or Oracle for OLTP and Teradata for data wearhousing.
"The problem with socialism is eventually you run out of other people's money" - Thatcher.
For me, it's not about scalability at all; I simply don't see a relational database as very good at reflecting the organisation of the data I want to store. For some data sets, it might fit perfectly, but it is usually a far-fetched way to represent the data, in my opinion.
Yep. I used to work at a place that did network analysis. There were huge volumes of data (think, something about almost every packet). We tried SQL based solutions. There was no way we could shoehorn it on the hardware. The solution was an in-house DB with only the thinnes veneer of SQL for the web front end. Almost certainly, the in-house DB wasn't proper SQL, but because it was custom, the non-SQL aspects were known. It fixed the problem.
I'm still fuzzy on what NoSQL is supposed to be and what it is supposed to bring to the table.
From what I've understood, it's basically a common banner for various different databases that all share the common property of not being relational databases and not providing ACID guarantees.
If so, it seems to me that the whole NoSQL vs. RDMBS debate is about a false dichotomy. There are some applications where a relational database is the right tool for the job, and there are some where a relational database is not the right tool for the job. In some of those latter cases, one of the NoSQL databases may be the right thing.
This is nothing new. Non-relational databases have been used on Unix for a long time, and are even a standard part of POSIX (see for example the manpage for dbm_open). It's also long been known that, for example, Berkeley DB can be a lot faster than an RDBMS - as long as your application doesn't make use of all the features an RDBMS provides. Lots of programs even don't use one of these database systems, but invent their own, custom format. Git is a very successful example of this.
To me, it seems that what we are seeing here is loads of people who had learned to use relational databases for all their storage needs discovering that there are other ways to store data, and that one of those methods may work better than an RDMBS for a particular application. Well, yes. Does that surprise anyone? It sure doesn't surprise me. Does it mean that RDMBSes are now useless? Not at all. Does it mean you should use a non-relational storage system where this makes more sense? Of course! Now, can we please get back to work? I don't see the point of having a holy war over whether RDBMS or NoSQL is better, when common sense says that they both have their uses.
Please correct me if I got my facts wrong.
I call: bullshit
We are using object-oriented databases like the ZODB for ten years when the data model is not relational oriented
We are using relational databases when your data is relational
We are using relational databases and object-oriented databases together in the same app when we need both
We are not using MySQL when we are in need of a *real* database.
Use the right tool for each problem - only idiots use a RDBMS for all and everything.
Really? My paystub says otherwise, you ignorant clod!
People complaining about SQL performance are most likely either using incorrectly scaled machines for the job, or believe they can throw a four-line SQL statement at the database and expect it to work out the optimization on its own ... query optimizers may be able to do a decent job on average, but once you go large databases (multi-million dataset tables), planing the query structure will go a long way preserving performance. ...
Yes one can write complicated queries to return exactly what you want in one query, but in many cases doing some logic around it and using smart grouping/loops will outperform the complex query
I've got news for you ... all the major stock exchanges, banks, and telecoms in the world use SQL RDBMSs to track transactions that match or exceed anything Facebook and Twitter are doing. I guarantee you, without a single doubt in my mind, that Facebook and Twitter could be run on a SQL RDBMS ... by that I mean Oracle, not MySQL.
Our development organization is heavily invested in PostgreSQL, finding it to be perfectly matched to almost all of our needs. It is exceptionally reliable, and is very (but not perfectly) manageable. (We've had issues in the past with mis-timed auto-VACUUM for instance which are now resolved.) We even found a small but significant corner-case bug which upon being reported, received immediate attention from the developers, resulting in a resolution in under 72 hours. I believe our use of this particular tool has saved us significant resources (dollars, developer time) that has allowed the development organization to direct our time and money to our own application development.
But we're finding that even PostgreSQL has limits, mostly with respect to the large and growing datasets our application uses for large scale real time control. We could transition to a really expensive SQL solution, but we are at least considering the choices that may be a better fit for these particular subsystems than PostgreSQL or any other SQL solution. Just a few weeks ago, we started seeing a good comment in teh interWebs... "NoSQL" should mean "not only SQL".
Not a rejection of a powerful toolkit that holds a central role in our organization, but rather a recognition that we would be remiss in our responsibilities if we didn't pay attention to the choices that could simplify our lives as developers.
In Soviet Russia, NoSQL kills off Devs!
Every damn time an article like this, same bs.
There are many reasons not to use an RDBMS. It doesn't mean an RDBMS is a failure, but just it might not be the right tool for the job. I don't necessarily see it as a replacement 100%, but rather a compliment. That said, I'd rather not ever use an RDBMS for various reasons, possibly alluded to here:
-Lack of desire to deal with Object Relational Mapping. If your application is highly object oriented and a lot goes on at the application layer, an RDBMS is often a bad fit.
-You need queries that can't easily be done in an RDBMS. Hierarchies (yes, there are clauses in some newer systems, but perform like piss for large datasets) and graph theory (centrality measures, shortest path problems, etc) come to mind.
-Scaling. Most databases scale only vertically well. You can try replication, but it often is not realistic for some types of applications. Data warehousing, denormalization, read-only dbs, etc are hacks for dealing with this problem.
-Your domain is objects, why introduce a second domain? Similar to the first, but more of a case where you would definitely use an object db instead.
-You want object level transactions that do not require additional layers to work at the data level.
-You judge things based on merit, not marketing, hype, and your own ignorance, hence you've concluded that for your situation, a graph/object database or non-relational store is a better fit.
There are plenty of other good reasons. I'm tired of hearing arguments both ways that miss the point. Don't use a non-relational database because it is cool or different and you suck at writing SQL. Don't use SQL just because you know it and you're too ignorant to understand the benefits of non-relational stores. As it has been said, use the right tool for the right job.
I find that is normally not a relational database that I want. When it is, I'll use LISP + Relational DB because they deal with things in a compatible manner. PHP + MySQL on the other hand makes me want to puke. Otherwise, I'm using something like Smalltalk + or C++ + ObjectDB Vendor. Casandra, NOSQL, and the rest of this stuff is just ignorant crap created by hype machines and I am amazed every time this discussion comes up, the 200 other better technologies out there are ignored. It takes a pretty big asshole to create something new when there's already great solutions out there, not realizing you're going to create the same crap poorly. Of course what can be expected of people like Twitter (Erlang message queues would like a word) and Facebook.
Bullshit.
ActiveRecord? Definitely. Rails as a whole? You might consider replacing it with another Ruby framework, but the same ideas are going to apply. Remember how Rails and Merb are merging? Merb tends to be ORM-agnostic, but the recommended Merb stack suggested DataMapper, which does support a few NoSQL databases.
Even if you needed a different ORM per NoSQL database, it wouldn't marginalize Rails as a whole, but that simply isn't the case. Just use DataMapper, then plug in the flavor of the day.
As an example, Rails (and DataMapper) run on Google App Engine.
Don't thank God, thank a doctor!
It should probably be called NoMysql instead of NoSQL...
Here are some good posts. Seems NoSQL is just the new xml. Sure, great for some things, but not really worth the hype...
http://www.yafla.com/dforbes/Responding_to_Joe_Stump_on_the_NoSQL_Debate/
http://www.yafla.com/dforbes/Getting_Real_about_NoSQL_and_the_SQL_Isnt_Scalable_Lie/
http://www.yafla.com/dforbes/Getting_Real_about_NoSQL_and_the_SQL_Performance_Lie/
I thought it was reasonably well understood that one of Walmart's primary characteristics is their *amazing* control over logistics. In fact, I thought one of their big process inventions was to bring logistical activity online.
I welcome clarification, since I haven't worked for Walmart.
Cassandra and HBase and the other distributed NoSQL database solve specific problems in specific ways.
And other NoSQL databases solve an entirely unrelated specific problem. Yet people talk about NoSQL as if all were Cassandra. Like your example is certainly a very good reason to use Cassandra, but it is a no reason to use db4o or BerkleyDB or memcached.
NoSQL databases don't have better write performance than SQL databases. Cassandra has better write performance than strictly consistent databases. There is no reason for NoSQL databases to have eventual consistency. There is no reason for SQL databases to not have eventual consistency. That is whats driving me mad about the NoSQL debate.
Perhaps some are afraid that the No-Sql movement will leak into other niches out of hype. After all, OOP leaked out of physical modeling and into other niches without being fully tested for those niches, and people started clamoring for OODBMS. (I'm of the opinion that "everything OOP" is a no-no. Use it where it helps, but not where it doesn't.)
Table-ized A.I.
The article focuses on NoSQL's claim to scalability, but isn't that just one of the features of (some of the) NoSQL options?
Google, Amazon, and Microsoft all provide NoSQL storage as a service that is easy to use and cheap, particularly for getting started. Those are two pretty important features and I would imagine that it is those features, rather then dreams of needing vast scalability, that attract the many web startups.
relational databases are great if you have a relational problem. For everything else there is NoSQL. It is surprising how much of the world's data looks like "a stack of documents" rather than "a collection of mathematically related sets of data" Lotus Notes was the only NoSQL player for 20 years, now there are lots. Notes sucked because it had no competitors, the concept was and is sound. Now there is competition and lots of NoSQL database systems and application environments on top that suck less and less by the day.
For about 30 seconds, until the VC money dries up....
The point isn't (generally, there might be some pathological corner case) that the various web2.0 kiddies couldn't implement their stuff in SQL; but that they couldn't afford to do so. If you want to be able to serve large numbers of users in order to generate enough adsense pennies to keep the lights on until somebody buys you, your options are pretty much A). Software with a more or less zero per-node cost, running on commodity x86s with no exotic interconnect. or B) Bankruptcy.
WalMart has one of the largest Teradata installs, it doesn't use SQL.
...and those OLAP systems are most likely ROLAP or perhaps ROLAP with MOLAP support also. In any case, the underlying non-aggregated data is probably in an SQL database that supports materialized views and auto-aggregate tables. The OLAP is simply a multi-dimensional aggregate cache that sits in front of it.
Over-the-top Response Guy! Giving "Over-the-Top Responses" since 1970.
So you're saying that an RDBMS is the right tool for the job if your transactions have enough value, and, if the value per transaction is too low, you won't be able to afford an RDBMS, but you can still go with a NoSQL database? That's an interesting point of view.
So how do you make your system work with NoSQL? As you say in your post, "you lose ACID, indexes, and joins to varying degrees". To me, with my relational view of the world, it seems that you would want to use an RDBMS exactly because of these things. Specifically, the fact that your RDBMS does the hard work of keeping your data consistent for you. Wouldn't you have to implement that all by yourself if you went with a NoSQL system? If so, what realistic expectation can you have to come up with something that is both correct and as performant as an RDBMS which lots of smart people have worked on over the years?
Or is it just that people are throwing consistency out of the window and saying "We can afford to lose a couple of records or have a couple of dangling references here and there, as long as it SCALES". Because I can build something that scales if it doesn't have to maintain ACID, too. The difficulty is in having _both_ ACID and scalability.
Please correct me if I got my facts wrong.
For well over 30 years, airline reservation, hotel reservation, and other high volume transaction processing(HVTP) systems that are mainframe-based have not used SQL in the core transaction processing system. They use either the built-in key/value subsystem of TPF/ZTPF, or a slightly more sophisticated subsystem known as TPFDB. Using facilities similar to zOS, failover and recovery happen in record time should it be necessary. This successful real-world system and approach deserves the attention of those who would like to learn how this stuff really works.
Cassandra has eventual consistency. That means you have a distributed database and if you change something it may take a while until every node gives the same result. But nothing is lost and you also have a deadline of when it must have happened.
So when you write a Twitter message it may take some time until everyone sees it. But you can guarantee that everyone will be able to see it withing a given time frame.
This is what Cassandra does. Other NoSQL databases don't do that. So arguing about consistency of NoSQL is moot, they can have all kinds of properties. Actually I'd say most of them have strong consistency.
Speaking about properties of NoSQL databases is like speaking about vehicles with an odd number of wheels. You don't really know if you are talking about tricycle or a jet.
> Teradata for data wearhousing. Sure you didnt mean whorehousing?
I didn't (want to) say that all RDBMS won't cut it. The only point I wanted to make was that while I can see the point of the author that solutions like Cassandra are a bit overrated for most business applications, for other applications domains they are becoming a viable solution.
Each transaction of those 200,000,000 for WalMart is a fairly significant source of revenue. Averaging on the order of $50.00 to $100.00 per transaction. That same 200,000,000 transactions for a web application would average like $ 0.03 (yes, 3 cents). Now, if the cost per transaction using tradition RDBMS is something like $ 0.25 (25 cents), how is that going to work for the Web case? What if the cost is $ 0.01? Still epic fail for the web case.
Over-the-top Response Guy! Giving "Over-the-Top Responses" since 1970.
I'm fairly certain that at least as of 8.4 PostgreSQL supports XML fairly robustly.
Over-the-top Response Guy! Giving "Over-the-Top Responses" since 1970.
...in what way do they not map to the relational model? If you say "unstructured data", that is not an answer.
Over-the-top Response Guy! Giving "Over-the-Top Responses" since 1970.
If real businesses like Walmart can track all of their data in SQL databases that scale just fine, Dziuba argues, surely your company can, too.
Oddly enough I'm trying to get to walmart.ca right now, and it's down....
Oh shoei. Must have been a slow day. Yes everybody that uses an SQL database for no good reason is insane. Yes everybody that uses a NoSQL database because it is the latest-and-greatest has the same affliction. Use what fits your purpose. SQL or NoSQL? Does not matter.
ruurd
They have no clue how to scale their systems[1]. Therefore they pass the problem on to the underlying layer and say you do it for me.
[1] They don't understand the mathematics of what they are doing.
Deleted
It's a poor artist / programmer / cook / et. al. that blames his tools. If you know the problem, you use the best tool to solve it. SQL or Document-DBs or Graph-DBs whatever is the best fit to solve the problem is what you use. You don't go around saying something is crap because you have no need for it.
My problem here is that I'm not sure that RDBMSs and Cassandra are using the term "consistency" in the same sense. RDBMSs offer transactional consistency: if you run a transaction that comprises updates A and B, other clients will either see the state before A+B or the state after A+B; they will never see a state with only A or only B. I've seen something like "eventual consistency" happen in Oracle RAC clusters, where after an A+B transaction on node 1, sessions in that node see the state after A+B, but read-only sessions in node 2 still see the previous state for a short while.
However, with the lack of transaction support in Cassandra, "eventual consistency" seems to mean that if you perform A and then B, then at some point in time thereafter, four different clients could see {}, A, B and A+B respectively. You may argue that your application doesn't need that, but I would argue then that that's just a matter of current requirements, and that at some point you're going to find yourself with new requirements that do. At that point, you're gonna be in some degree of trouble.
Today many people store data on their private machine using classic file systems and they use databases to store files and to tag them. In future tags or other kinds of attributes will become more important in information storage and retrieval. Therefor we need databases capable of managing such information. RDMS are very good at storing such information and to work with sets and subsets. And tags and attributes of objects/files/entities are nothing more than markers that show to which sets objects belong. So I doubt that SQL databases will go away.
Furthermore, objects in OOP languages are very restrictive. If you look for example at objects (called individuals) in OWL, you can see that data objects can have properties and relationships to other objects which cannot be expressed that easily in OOP language style. Therefor using DBs which are limited by the object model of OOP languages will not suffice.
Because they're gonna tell your non-technical boss to make you use it, and he's gonna listen when they start telling him that Google, Twitter and Facebook do.
Are you adequate?
Comparing walmart to facebook is silly too. The basic requirement is different. With walmart EVERY transaction must be perfect but it doesnt need to be distributed to 800 people who are looking at different views of the pages. Facebook on the other hand doesnt have to be perfectly consistent. If users 600-800 dont quite get the same view as the first 600 thats ok. They will get the right view in 10-15 minutes.
Facebook is trading C from ACID for speed. The dude running the report at Walmart doesnt care if it takes 2 hours to run the report. He wants it to be correct. The dude opening some page on Facebook doesnt care if it doesnt show every last detail just that his page opens quickly and 99% of the data is mostly right.
The problem people are having with 'SQL' is the 'big table' problem where all the other tables in the system are fairly static but you have 1 or 2 HUGE tables that everyone uses. But when it comes down to it that 'big table' doesnt really have to be a table at all. It could be something else. IF you are willing to sacrifice different parts of ACID for it.
I see this issue all the time with other aspects of programs. Either pre optimizing things or waiting until it is totally screwed up THEN trying to fix it. You need to test early to find out where your constraints are then as you approach them start pulling parts out and optimizing those. Do it to early and you can really mess things up do it to late and your screwed. But if you know about where the wheels are going to fly off you can prep for it. Then if you never get near the flying off stage you do not have to devote time and money to fixing something that is never going to happen.
After working for two years in Canada doing contract and consultancy for SQL Server (yes I can hear the boos already), it is my educated opinion that most people who design, build and query databases don't have the first clue what they're doing. These people complaining about the performance of a RDBMS is like an "application programmer" who builds a GUI, does no input validation, no error checking or exception handling, and then bitches about the programming language when the app crashes all the time. It's asinine. If you want to develop a database get a database developer, not an application developer.
Of the approximately 10 companies I worked at while in Canada:
- 6 had no DRI or other constraints on their databases. At all.
- 3 didn't even know what DRI was
- One, a company which makes passport and visa software used by governments in multiple companies, and which used nHibernate, refused to understand why (amongst other things) clustering on a GUID key was insane. They fired me after a week of me telling them that their database needed massive redesign.
- 3 had nightly database shrinks as part of their maintenance plan.
- At not one did the developers writing SQL for GUIs and reports know what a sargeable query was.
There was also widespread misunderstanding of indexes, no understanding of index internals or fragmentation and no understanding of partitioning at any but one of the companies.
Guess how many of these companies complained about the performance or consistency of their database on a regular or semi-regular basis?
The question is not that is possible to use MySQL, or similar databases, for sites with good audience and a moderate size of data. The question is, how many tricks you will need to keep your site alive!
The other question is that SQL is a good point for hacks (specially SQL injection), and a mess to make maintenance. To avoid hacks you need to filter every variable that enters in your SQL, and to allow maintenance in the future you need to avoid SQL duplications, and have a good control and centralized place to put and manage your SQL strings, or an alter table will means a search over all your code to find what SQL will be playing with some table.
The 3rd point is that modern developers don't want to work with SQL, they want to work with objects and entities. The frameworks for entities that exists today, like JPA, are slow, because there are a lot of layers to make your entities to work with a relational database, and is difficult to use specialized resources that specific databases have to get performance. And of course, relational databases are not created for all of that, but this is what we use today for abstractions like JPA.
We run a site (outside of US) that receives more than 60000 different users per day, with peak of 13000 users at the same time, with average sessions of 1h, and we know that we need a lot of tricks and caches inside of our system to avoid issues with MySQL. And in my opinion, based in real facts and my daily job, MySQL, PHP and Ruby are good only for small projects.
If you will have a project with moderated size, you will need JIT, so your system will run in machine code (not like C, but near) (I recommend Java, specially for back-end), and need to avoid SQL, or at least use some "framework" to manage your SQL in a professional way. I still don't recommend JPA for big projects (maybe version 2.0 is better, but I haven't worked with it yet), but some framework to use entities specialized for your kind of database will be good.
Or if you want, you can do like every one, start with PHP, or Ruby, and MySQL. Than, when performance starts to say hello, you change to Java, or a compiler for PHP (like facebook), and starts the tricks for MySQL. Actually what we want is way to start a projects with a technology that supports the future, but that is also easy and cheap to use just from the beginning.
Joe Stump wrote a post that is a perfect response to this insanity.
http://stu.mp/category/nosql
Why is it that all the people working at scale seems to be going with NoSQL solutions? Are all the devs at Google, Facebook, Twitter, Digg, Redit, etc total idiots or in fact is there a problem that they face that is actually real?
Anybody that sites Amazon, Walmart or any large retailer as an example of why SQL scales is missing the point. Retails have very few write operations compared to the read load. The vast majority of the load hits databases that serve reads and have a high tolerance for write latency. This is a field SQL is good at solving.
On the other hand, social sites that have massive cross user data ties and constant write updates where latency is very important don't fit this model that well. Sure, you can remove SQL replication from the mix, use independent instances of MySQL serving fractions of the overall site, with redundancy between them but if you do that you have functionally built a NoSQL data store. The concept isn't to get right of SQL, its to get rid of the relational aspect of data storage. You can no longer rely on all your data being available to a single SQL statement.
Being an operations guy though I should point out the number one failing of SQL in my world. If you assume that, on average, a machine will either crash or have some sort of hardware failure once a year and you consider a site with 1,000 machines then you see that nearly 3 machines will die every day. Even if you count on 2 years of continuous uptime that is over 1 a day. with 10,000 machines your failure rate is 27 per day, 100,000 machines is 273. This means that any database layer that requires a large number of machines has to build in a recovery layer. Clients need to know that a node is down, when it comes back it needs to have data uploaded to it.. etc. The NoSQL solutions like Cassandra manage this automatically. Trying to do this with MySQL becomes really complicated and you end up implementing all the same logic and constraints in NoSQL solutions anyways. I have seen this happen twice now.
The problem is that Teradata doesn't scale very well. A puny 50PB.
Now of course, some may claim that 50PB is more than enough. Others among us will just shake our heads and say "Nope".
This might be true if they sold items for 1/1000th of a cent, but its simply untrue for any sale anywhere.
Twitters load isn't that impressive, its a poorly written big mess of a service. Its pretty common knowledge that it could be made far better if they would just use some untrained monkeys.
Again, facebook ... bad example.
You've taken two over night one hit wonders that will be gone in a few years and used them as if they are valid examples of how to do it. They aren't, they aren't even close. They are what happens when you grow so fast you don't have a chance in hell of keeping up, so you cobble things together as best you can to survive knowing that its just a matter of time before the fad passes.
Do I think FB and twitter could survive on MySQL? Probably not, but on a real DB with real DBAs, more than likely yes.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
As soon people have an hammer on their hands, all the stuffs they see around are nails. The good IT worker is the one with different tools on his hands with the ability to choose the right one at the right time, and before you forget it remember that premature optimization is the root of all evil.
The person you're replying to is clueless as far as to what 'medical data' is I think.
You should have picked up on this when he starts naming books that he's read. The more name/buzzword dropping you see the more you know the person doesn't really have a clue.
He even had to do a quick google to find some old buzzwords to throw in, I almost want to give him points for throwing in CICS, almost.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
It's easy to hit intrinsic performance limits with SQL databases even on small apps. And for people who aren't database experts, it's even easier since they don't know the hoops to jump through to make their SQL databases perform well. For the average programmer, it's easier to get good performance out of no-SQL databases.
Using SQL databases programmatically is a fairly silly notion to begin with: SQL was originally intended as an easy-to-use query language for non-experts because people were having trouble with navigating data structures. But programmers are excellent at navigating data structures and designing efficient data structures. SQL is solving a problem that most programmers don't have, and you're paying a big performance penalty for that.
Sometimes an SQL database is the right thing to use, sometimes it isn't. People really need to use their head instead of blindly picking one or the other solution.
OK, so which OODBMS do you recommend? I know of Ozon and db4o, both for languages that I rarely use. What about an OODBMS that I can access from C++, Perl, Python, and C? How widely used are they? How good is the support?
"Or is it just that people are throwing consistency out of the window and saying "We can afford to lose a couple of records or have a couple of dangling references here and there, as long as it SCALES""
Hmm, I could actually see that being the case for some applications, probably not one common in the business world but for research it's probably fairly likely you might generate huge datasets where losing individual records wouldn't matter much.
Umm.. you do realize that Twitter contracts with Percona and has had Monty himself look at issues right? You assertion that Twitter is just a bunch of untrained monkeys is based on nothing.
As another poster pointed out, this is a false dichotomy. We're emerging from a technology monoculture of "every DBMS must be SQL" to "it's possible, even viable to design, implement and use a DBMS that does not implement SQL". Anyone advocating a mass exodus from SQL-land is a fool. Props to the NoSQL guys for opening our eyes to fresh ideas.
There's no need to stick with any single DBMS platform for 100% of your organization, unless you're so small that you have but one server total, and then I suspect this debate is largely irrelevant to you anyhow.
We're not Google, but we have some of the same problems Google faced with scaling up our applications for the Internet. We do use MySQL, having some 35+ instances of it. Our application processes in excess of 1000 transactions each second, and we know it'll be difficult to scale to tens of thousands of transactions per second without some fundamental changes. Today we survive by imposing limitations on developers writing OLTP applications--things like "all row operations must search by primary key" and "no table scans, ad-hoc queries or file sorts". The access language is still SQL, but increasingly we don't *need* SQL for OLTP transactions. We could plug in something today that is equivalent and far simpler.
What we're lacking in SQL-land is a good way to host DBMS applications on distributed infrastructure, e.g. in the cloud. There are clustered databases available, but these often require fast/short interconnects, may have difficulty scaling above bandwidth limits on a local network or SAN, and can be frustratingly fragile to use in the "real world". Not to mention expensive. This is due to the consistency model imposed on such systems, i.e. the "AC" in ACID.
Data sharding is a popular way to exceed the limits of SQL, but once you introduce sharding you're treading on "NoSQL" waters already. You can't retrofit sharding onto an application that isn't aware of it, not very successfully, in my experience. So developers need to become acutely aware of the storage tier and design for it, meaning they've already lost the perfect abstraction of SQL.
I'm keenly interested in emerging products like the Cassandra database, and while I have no intent of ever abandoning SQL (and probably even MySQL) in our organization, we're absolutely going to take "NoSQL" for a spin to see what it delivers in terms of cost, complexity, relability and performance.
My team is currently considering a "NoSQL" solution moving away from PostgreSQL, and the reason is: We desperately need Multimaster over the WAN that handles split brain situations gracefully. Its a tough problem and frankly no RDBMS handles it well. I suspect any group who has had to support multiple disperse locations has the same thought.
According to this link: http://www.americanwaymag.com/new-york-stock-exchange-gordon-charlop-robert-mccooey-jr-trading-floor-technology
Your scale estimates are bull crap. That is roughly 46 transactions a minute. Twitter does 600 TPS as of two months ago and is growing at %15 per month so its likely much higher now. According to the Twitpocalypse guys they go over 800 tweets per second during a normal weekday. That is over an order of magnitude different than the largest stock exchange in the world.
What was wrong with COBOL? Didn't it solve most of businesses problems? What makes C/C#/C++/Java/Ruby/Perl/PHP so much better?
In other news, some random hammer enthusiast posted on his blog that he just can't wait for screwdrivers to die.
Exactly. That's the point I was making with the value per transaction. The value of a bank transaction or a stock market transaction is considerable - and so are the fees. If Twitter charged you 25 cents per tweet - let along $25 - they'd have no trouble buying a suitable SQL platform to store their data. Mostly because they wouldn't exist.
All our languages are now object, but we're still using non-object databases and mapping between rows and columns and objects. WHY?? Yes, tools can help you map, but its a bandaid, it screws up performance on all but simple cases. And it means you can't do queries using the same model as your language.
Yes, relational algebra is a useful query tool, but there is no need to be beholden to relational table structures to get relational algebra. Neither is the so-called object-relational features of postgresql going to cut it. You can't even do a query and get back a list of objects of different types for goodness sake.
One thing that many people don't seem to get right: Using these "NoSQL" databases doesn't mean that you don't get ACID. Many key-value databases support ACID just fine:
You've got to remember that (simplifying drastically,) SQL is a query language layered on top of a "NoSQL" style database (whether built into the SQL DBMS implementation, or a 3rd party one). Such "NoSQL" databases have to be ACID capable in their native API and implementation first.
Banu
The way languages like F# or Haskell treat "nulls" would be a straightforward definite improvement over the way SQL does (or for that matter, the common C/C++/Java/C# paradigm). A type whose value is either a Foo or nothing is just a tagged union type. So allowing columns to take tagged unions as their type would solve that right away--and also allow to impose further logical distinctions as needed by the application.
The whole three-valued-logic "null is not a value" paradigm of SQL is a disaster, that one's for sure. There are all sorts of query optimizations that are impossible to do on the face of it.
Are you adequate?
Our company has saved immense time and made our applications faster and easier to understand, as well as [theoretically] more secure by switching to Tokyo Cabinet. F*ck SQL and F*ck MySQL especially. I personally wrote some of the interfaces to the TokyoCabinet databases we are using and at this point I have decided I never want to do anything with SQL ever again. Seriously, SQL sucks - it's clunky, easy to introduce security flaws, slow, breaks easily, difficult to access from multiple languages simultaneously, you often have to do things like create special users to do certain things which then introduced more security risk... and on and on. SQL is crappy and should be considered deprecated.
Replication, Replication, Replication.... Here's the deal. Our software runs just fine on PostgreSQL, however if our datacenter goes down or a disaster hits, I need to have that data in multiple places at once. I can't be running queries through a Master node across a continent and then replicate to slaves. This is the big problem with RDBMS. How to replicate efficiently and quickly without sacrificing performance. Cassandra seems to be a solution that works and works well. I probably will never be a Facebook fielding 100,000 queries per second to a 500TB database,but I do need to replicate efficiently. Any Key/Value store allows this. I'm interested to see how other replicate their databases. I however don't want to hear, "Just do a nightly dump and then you have backup" or anything like that. I want to hear a valid, good solution to replication. If there is a database that uses SQL for it's query language that as microsecond query times in a huge database that replicates pretty much "real-time" meaning within seconds/minutes let me know. Because PGPOOL-II just isn't cutting it.
I did QA for a massive accounting package that was being converted from Btrieve. The benefit of a "real" SQL engine was supposed to be stored statements, but it was outweighed by the massive wrong-ness in the engine. Observed behaviors too often didn't match the documentation. A lot of the self-tuning features in SQL Server 2008 are due to our head of R&D taking up matters directly with SQL Server developers.
That's just one example of why I no longer support Microsoft products on the job. I'll use them if need be, but if something breaks, someone else gets to fix it.
I don't know much about the NoSQL movement, but I do know quite a bit about some of the largest SQL databases in the world. Walmart, in particular, runs on a massively parallel SQL database called Teradata. I work for Teradata designing and implementing databases like these for our largest customers. SQL has very little to do with it, in the way English has very little to do with the current economic crisis. Sure, all the subprime mortgages were contracted in the English language, but does anybody think changing languages would have averted the crisis? Relational databases are a means to an end, a way to conduct business in a reliable and efficient manner. As much as I value efficiency, reliability is what drives the purchase decision. Reliability of cash flow. The problems with scalability only matter when cash flow is on the line. Overall, I agree with the thesis, stick with what's reliable at whatever scale you're operating. A change of database systems when you scale up should be the least of your worries.
He even had to do a quick google to find some old buzzwords to throw in, I almost want to give him points for throwing in CICS, almost.
What are you, twelve? I've developed COBOL and CICS applications, though thankfully, I work mostly in C++ and Java now.
You should have picked up on this when he starts naming books that he's read. The more name/buzzword dropping you see the more you know the person doesn't really have a clue.
No, but thirty-five years of software engineering has taught me that treading on the sacred turf of DBAs gets you one of two possible responses. If you don't make it clear at the outset that you do know what you're talking about, you're immediately dismissed as a clueless outsider. If you do make it clear that you know what you're talking about, you get responses like yours, which just descend into nonsensical nastiness. You can't have a meaningful discussion with people who aren't interested in dealing in good faith.
All of which serves to underscore my original point, which is that there is a deeply entrenched RDBMS faction that can only see problems in terms of the one tool that they have, and react to problems that don't fit the tool well (or at all) by simply denying their existence. The irony is that there is hardly anyone who denies the broad utility of the relational model. The hysterical reaction to the suggestion that not everything fits the model equally well and a few things don't fit it at all only highlights the blind dogma involved.
Proud member of the Weirdo-American community.
Or is it just that people are throwing consistency out of the window and saying "We can afford to lose a couple of records or have a couple of dangling references here and there, as long as it SCALES". Because I can build something that scales if it doesn't have to maintain ACID, too. The difficulty is in having _both_ ACID and scalability.
That's exactly right. To any experienced server engineer/architect, it's obvious that much greater scaling -- both horizontal (automatic sharding of data across nodes) and vertical (more writes/second/node -- can be achieved if you give up the absolute guarantee of zero data loss (...while still usually keeping data *consistency* in the non-lost portion of the data). Many of the social networking type applications... twitter, facebook and the likes, can probably afford that risk. Given that... you can do many many more txns/s with key-value type database instead of the transactionally oriented OLTP type databases. Now, for a smaller organization (shoe-string startups and such), the RDBMS model still has many benefits that can't be ignored --- vast "googleable" knowledge base behind the traditional software products, larger candidate pool with expertise in said systems etc. Unfortunately there really is no standard answer here other than evaluate your own situation carefully and make up your mind based on all data (most of which is available only to you).
Look, guys. Let's be honest here. NoSQL has been around forever; it's the default approach for data storage unless a relational database is selected as a requirement of the software being written (am I the only one who still writes his own file formats and uses record-based random access for small-time data storage? If you don't need the complexity of XML or SQL, then don't use it...)
That being said, NoSQL is just giving that obvious practice a name as if it is a new phenomenon in the development world. Agreed now that it has a name it tends to mislead developers into discarding SQL DBMS irresponsibly, but it does serve an extremely important purpose in the business world: It superficially inflates an otherwise vacuous business process, which under the guise of "innovation", drives business demand.
The IT world does this all the time. They re-package existing solutions, or disrupt them in favor of "new" solutions which to be honest are often unnecessary and more complex than the original solutions. But it drives business. It creates new hardware, new software, new job positions, new education criteria that academia can sell and creditors and government can tax, new system maintenance and migration hurdles; it turns businesses into consumers, it creates new consumers for those businesses, and justifies continuing relationships with consumers when the last product was already good enough.
The mere hype of IT "solutions", however irrelevant or pointless or unnecessary, perpetuates the industry. A lot of it is utter BS. It's all they can do during times when few real advancements are made... and sadly it works too well... and that is the REAL problem with the NoSQL trend. Not bad programming practice. Just artificial business fuel.
Sure, if your transactions are worth something.
In the world of social networking, consistency is much less important than speed. If two different users see different data because the nodes are a few seconds out of sync, no-one cares. But slow answers are wrong answers.
You can't do that with a bank or a stock exchange. It would be a disaster. For a social networking site, no-one will care - no-one will even notice.
We throw strict correctness out the window. That's where most of the performance gain comes from. You still have to build an architecture that can take advantage of this opportunity, though, and that's not trivial.
Consistency, scalability, affordability. Pick two... At most.
Isloation model in postgresql is fundamentally flawed. AID maybe :)
Strongly agreed, though I do worry that many NoSQL projects' websites are overly blase about runtime issues, including crash safety and online schema changes, as well as upgrade-safety. Now this is really all about using alpha software rather than anything conceptual/design related, but it is a real issue at the present time.
U.S. War Crimes blog. Email for free Mandriva support.
I wrote this guy off the second he mentioned Walmart's database. How Walmart uses databases vs Facebook or Twitter is completely different. I'm pretty sure Walmart doesn't have 6 million people writing to that database at any given time.
Eric Evans, who coined the term NoSQL and is a committer on the Cassandra project, responded in a blog post:
http://blog.sym-link.com/2010/03/28/haters_gonna_hate.html
NoSQL should be posted on The Daily WTF....
Ugh.
The saddest part to me about the "new hotness" of NoSQL zealots is that a scalable, fast, flexible key-value store isn't new at all. It's called LDAP. Sadly, it's continues to be a horribly misunderstood beast. Yes, it's more than a shared address book.
In the end, you use the right tool for the job. SQL is relational. LDAP is hierarchical. Neither is new hotness, so stop pretending to invent. Both perform their jobs exceptionally well.... if you use them for the job they are intended, and learn a little something about the concepts invented before your birth. Chances are, they've been thought through before, and you're being lazy. Go read up.
I've been pretty happy with MongoDB. Why? The document architecture makes ORM a lot easier.
No, I will not work for your startup
Traditional OODBMS have two major problems... well, maybe three, going against them
1. Hard to adhoc restructure data to do set-based modelling (is this really a downside?)
2. Schema evolution (changing a model from one version to another.
3. Lack of tools sophistication. For ObjectStore (which I have supported in the past and work for today) - we have always had a lack of easy-to-use tools like Crystal Reports and some visualization tools that the SQL market has had almost since day 1. Requiring a programmer to do your data mining is a serious downside to using a pure OODBMS.
Although since we added Xquery support to the product, it's getting easier to do adhoc queries without requiring access to a C++ or Java compiler.
This article seems to totally miss the point about why startups are using NoSQL databases, namely, those that are schemaless. It's because most startups are in the process of building their main product on the fly, pushing out new versions as often as a few days or hours depending on their deployment model. Schema stand in the way of rapid development since you have to CONSTANTLY redefine them as you redefine your product. So updating your db goes something like this: "Oh, I have to change this relationship from one to one to one-to-many." "Well, now that we redesigned part of the database, let's migrate it." "Ok, well enough time has passed. is it done yet? no? okay" "ah, it's done, ok, take down the servers for maintenance and restart" or, you can use something like couchdb, and just insert whatever new data you want on the fly, without defining schema, without migrations, and without downtime. A win for startups. It's not just about scalability, it's also about being able to do a simple task.