Enthusiasts Convene To Say No To SQL, Hash Out New DB Breed
ericatcw writes "The inaugural NoSQL meet-up in San Francisco during last month's Yahoo! Apache Hadoop Summit had a whiff of revolution about it, like a latter-day techie version of the American Patriots planning the Boston Tea Party.
Like the Patriots, who rebelled against Britain's heavy taxes, NoSQLers came to share how they had overthrown the tyranny of burdensome, expensive relational databases in favor of more efficient and cheaper ways of managing data, reports Computerworld."
Just use flat text files --- no need for expensive db's .... think of the freedom!
"i lost my dignity on a slippery wiener"
Seems to be a silly thing to be against. Relational databases and the stuctured query language may not be perfect, but I bet these people could die in their 90's and people will still be using relational dbs and sql.
If you want to tout open or cheap dbs and more lightweight types of storage/db servers, then they might have some points, but being against sql is just plain dumb.
It's true. I do a lot of INNER JOINing. Often with multiple tables.
If I was to read the article, I bet somewhere someone would be wittering on about Key Value Datastores.
The brainchild of a generation brought up on high level collections, they learn one (in this case Map) and apply it to everything.
Sadly SQL, and RDBMS, works for most people. It maps object data well (oh whaaaa, i have to do foreign keys - GROW SOME FUCKING BALLS YOU LAZY GRADUATE!) and it is well understood. And with abstractions like LINQ to query them, even the lazy dumb Windows .NET programmer doesn't have to strain their brain to learn SQL.
And when you have terabytes of specific unique data, you clearly should go away to work out how best to store it. Even a RDBMS/SQL solution is too generic for all problems.
And yet where the other corporations; the oil companies, the banks, large merchant conglomerates. In IT we seem to have this sort of myopic view that if it isn't an IT company of some kind, it doesn't exist. Google, as compared to the huge companies that use tools like Oracle, is a bit player. I know that's hard for all of us who have sucked at the teat of silicon valley for so long have a hard time dealing with, but a significant amount of data that has nothing to do with social networking and finding pr0n goes on and does use tools like SQL.
The world's burning. Moped Jesus spotted on I50. Details at 11.
SQL is not a database, it is a standard interface to a feature set commonly associated with relational models. Before everyone standardized on SQL, there were other relational query languages. The "No" part of "NoSQL" refers to the fact that some basic elements of relational implementations cannot be usefully expressed using a much simpler distributed hash table model.
All the "NoSQL" does is eliminate all the parts of traditional relational databases that do no scale -- discarding the bottleneck rather than fixing it. These are things like joins and external indexing. Unfortunately, discarding those things means you discard a lot of very important functionality as a practical matter, notably the ability to do fast, complex analytics. Adopting the NoSQL architecture runs contrary to the trend toward more real-time, contextual analytical processing. There are a great many analytical applications that are not amenable to batch-mode pattern-matching, and the NoSQL model is a lot less applicable than I think some people want to acknowledge. In its domain, it is a great tool but it has many, many prohibitive limits. We are essentially trading power for scale.
That said, do not take this as an endorsement of traditional SQL relational databases either, as they have a number of serious limitations themselves. As just mentioned, a number of the core analytical operations those models support are based on algorithms that scale poorly. The SQL language itself has mediocre support for many abstract data types (e.g. spatial) and data models (e.g. graph), which in part reflects the inadequacies of the assumed underlying database algorithms (e.g. B-trees) that are implicit in SQL. The inability to efficiently do event-driven/real-time applications is also more a reflection of the access methods used in databases than any intrinsic weakness in SQL; SQL may be clunky for that purpose, but that is not the real limiter.
A truly revolutionary deviation from SQL would usefully implement a superset of the features SQL supports, not take them away. Of course, we would need access methods more capable than hash tables and B-trees to useful implement those features, which is a lot more work than discarding features that scale poorly. NoSQL is a stopgap technical measure for that small subset of applications where the serious tradeoffs are acceptable.
Note that most of these solutions come from the interwebs, social networks, etc. And it isn't so much anti-sql as it is anti-relational database (sql != rdb).
The basic premise is that we need different solutions that: can scale very high for very narrowly scoped reads & writes, don't need to perform ranged queries / reporting /etc, and don't need ACID compliance. And that may be the case. Sites like slashdot, facebook, reddit, digg, etc don't need the data quality that ebay needs.
On the other hand, ebay achieves scalability AND data quality with relational databases. And when I've worked with architectures that scale massively and avoid the relational trap for better solutions - they inevitably later regret the lack of data quality and complete inability to actually get trends and analysis of their data. It *always* goes like this:
Me: So, is this thing (msg type, etc) increasing?
Developer: No idea.
Me: Ok, so lets find out.
Developer: How?
Me: I don't know - typical approach - lets query the database.
Developer: It'll take four+ hours to write & test that query and then days to run. And when it's done we might find that we wrote the query wrong.
Me: What?!?
Developer: We had to do it this way, you can't report on 10TB databases anyhow
Me: What?!? Are you on crack? there are dozens of *100TB* relational databases out there that people are reporting on
Developer: well, we probably don't need to know what that trend is anyhow
Me: I'm outta here
First: my mantra: Data belongs to the organization, not the application... if the app fails and data is accessible then we all go on - if the data fails or is locked away - what was the point of the app again?
In a SQL database then data is understood by the organisation, DBAs and data architects. If left to app developers taking an app-centric approach to data... I get nervous quickly.
So long as the data is just as definable and accessible as current SQL databases then all good - give me an app with some odd-ball storage then it is bye-bye.
>> Trees is a wellknown problem of SQL, but the fact is that SQL can't handle most datastructures and complex relations, only very simple one dimensional ones.
Sorry, that's not true. Have you tried analytical functions? You would be amazed how complex scenarios can be handled easily with them. And they are part of ANSI SQL standards. And db providers (Oracle etc) have taken the concept and improved a lot on it.
I think the anti-sql 'movement' has more to do with new (internet era) languages and their developers than so called 'lack' of features. In my limited experience, I have observed people coming from C (and such) background have no problem with sql, while java developers (and this is probably true for most developers working on web-based applications) are the worst kind when it comes to understanding even basics of sql. All they want is their objects.
I strongly believe that a competent programmer designing/developing system which includes data and data-storage should at least know normalization, indexes, and what does it mean by 3NF. Programming language is one thing, database is another, and knowledge of both is required to build a decent system.
Design an efficient table relating a tree structure.
Huh? Tree structures are best handled by relational databases, as it is far faster then recursion. Give row a unique ID and a parent ID, and in addition, a left hand and right hand number, the root node having a left-hand value of 1 and a right hand value of (number rows * 2), the first child node has a left-hand value of one more than the parent's, the right-hand value is one less then the left-hand of a younger sibling.
Then design queries to answer questions such as:
* Find the nodes in the subtree under B.
SELECT * FROM rows WHERE left > [left hand value of B] AND right < [right hand value of B]
* Find all ancesters of G
SELECT * FROM rows WHERE left < [left hand value of G] AND right > [right hand value of G]
* Find the nearest common ancestor of D and H
SELECT * FROM rows WHERE left < [lowest left hand value from D,H] AND right > [highest right hand value from D,H] ORDER BY right LIMIT 1
Trees is a wellknown problem of SQL, but the fact is that SQL can't handle most datastructures and complex relations, only very simple one dimensional ones.
Are you saying trees are easy or hard? And for more complex systems, that is what JOINs are for. SQL is by far the most powerful way and often the fastest way to manipulate data that I know of. The only time I can recall that I had to use a non-SQL solution that was faster then the SQL solution was a matrix operation.
Wonder what the public key field is for?
See, I don't think there is ever a good time or place for SQL.
SELECT text FROM mild_introductory_statements WHERE id=random();
Anyone who says so has never had to use it.
SELECT text FROM statements_indicating_superior_experience WHERE id=random();
I like to compare it with JavaScript.
SELECT text FROM unrelated_tool WHERE id=random();
It's a language that is difficult to refactor, maintain, and while it's a standard, the standard is so vague that it's useless.
SELECT text FROM seemingly_valid_yet_unsubstantiated_objections WHERE id=random();
Like JavaScript, people are trying to build other languages on top of it to hide its shortcomings -- for javascript you have tools like GWT, and for SQL you have HQL, Linq, etc.
SELECT text FROM wrongheaded_causal_analysis WHERE id=same_one_as_two_queries_ago();
Not to say that there is anything wrong with relational databases, we just lack a good tool to interface with them.
SELECT text FROM reasonable_sounding_parthian_shot_to_obscure_trolling WHERE id=random();
Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
The only real show stopper and a real reason to replace RDBMSs is #5. All the others can be worked around by just deeper study of data modeling techniques. Data modelling is not something most developers can figure out intuitively. There is a lot of theory to be learned to do it right and it can very easily be done badly leading to severe performance problems and an unmaintainable application. ,but that lets them get around # 5,
With regards to # 5: I went to a presentation at Javaone where some Ebay engineers explained that they do not use transactions in any of their database operations. They just leave junk rows around in the db if a transaction half completes and as long as they aren't reachable they don't consider it a big deal. They have to very carefully organize the order in which they manipulate data to avoid data corruption