SQL and NoSQL are Two Sides of the Same Coin
An anonymous reader writes "NoSQL databases have become a hot topic with their promise to solve the problem of distilling valuable information and business insight from big data in a scalable and programmer-friendly way. Microsoft researchers Erik Meijer and Gavin Bierman ... present a mathematical model and standardized query language that could be used to unify SQL and NoSQL data models."
Unify is not quite correct; the article shows that relational SQL and key-value NoSQL models are mathematically dual, and provides a monadic query language for what they have coined coSQL.
microsoft research rocks but the product division usually sucks !
Jehovah be praised, Oracle was not selected
coSQLInjection (cSQLI)
Has a nice ring to it.
An inverse tachyon pulse would disperse the relational quantum silica into a focused warp field, thus purging all forms of slipstream space based SQL databases from subspace.
Resistance is futile. Your technological distinctiveness will be added to our own. You will become one with the morgue
...is that SQL sucks as a language. It's not terribly expressive, the ordering of arguments is inconsistent, and whoever designed the way JOIN works should be in jail.
Frankly, I'd like to see SQL die and get replaced with something more modern. We don't program in Cobol anymore, so why the hell are we still using SQL?
There's no -1 for "I don't get it."
no but when you are working with objects
it is cumbersome to have to write into unchecked string.
Jehovah be praised, Oracle was not selected
Nothing new in computer engineering since 1980. Prove me wrong.
Epic M$ hate bro. There's a new web board forming called "Slash-dot" and I hear they're going to open up a comment section. You should check it out, I think you'd do well there. BTW, do you think there's anything to this "Y2K" problem or what? Either way I'm going to "party like its 1999"! :)
To my mind, SQL's biggest problem over the years has been really shitty implementations (and yeah, I'm looking at you, MySQL).
The world's burning. Moped Jesus spotted on I50. Details at 11.
no but when you are working with objects it is cumbersome to have to write into unchecked string.
Then you're doing it wrong.
Truth is like the sun. You can shut it out for a time, but it ain't goin' away. - Elvis Presley (source: imdb.com)
I really like NoSQL. It's a great tool to use when deciding whether I should hire a given software developer, or whether I should move on to the next candidate. All I have to do is ask the person what he thinks about NoSQL. If he gives a positive response, I send him on his way. If he points out its many flaws, I'm often tempted to hire him instantly. After all, those who dislike NoSQL the most generally know how to write good SQL queries, and they know how to use relational databases properly. They're the kind of people I want to hire, even if the position doesn't involve databases much. It just goes to show that they care about quality, that they care about knowing how to use their technology well, and that they care about doing the job properly.
"Relational SQL and key-value NoSQL models are mathematically dual, and provides a monadic query language for what they have coined coSQL."
I'm glad somebody clarified that! (Time to RTFA.)
main() {1;}
There are only 2 types of languages:
- those people bitch about, and
- those no one ever used
An SQL statement walks into a bar. He sees 2 tables and asks "May I join you?"
When you have to support legacy data and applications any way you do it, you are doing it wrongly.
Jehovah be praised, Oracle was not selected
If I am running a NoSql solution it is because I need every bit of speed I can muster. Putting a additional layer on top of that does nothing to reach that goal.
Got Code?
I thought the whole reason why NoSQL is "better" than SQL is it's based on column based storage, while most SQL databased are row based storage. Couldn't you make a column-based database that uses SQL as a query language? There is nothing wrong with SQL as a language, there are just some workloads where column based storage is faster (mostly data analytics).
+1
My I'm just being a nit-picky coder here, but I don't get why they call it noSQL, when they are really referring moving away from relational databases?
When I first heard of "NoSQL", I thought, "Great! SQL is a terrible syntax with all it's six letter words and easy dangerous mistakes. I would love to have a superior syntax for interacting with the relational databases that are central to my work!" But "NoSQL" should be called "NoRelational." It is kind of strange that you are changing the whole paradigm of the database around and you are describing it as changing a superficial feature. It would be like calling emails "no pen" writing.
Democracy Now! - your daily, uncensored, corporate-free
See my sig. There are some parts of SQL that bug the snot out of me.
Every time I start to have faith in humanity, I ruin it by driving to work between 7 and 8 am.
Microsoft researchers Erik Meijer and Gavin Bierman ... present a mathematical model and standardized query language that could be used to unify SQL and NoSQL data models."
Maybe we should add...
"in an [impossible] exploratory effort to embrace, extend and extinguish..."
at the end of the sentence as they have done in the past.
TFA
CONCLUSION
The nascent noSQL market is extremely fragmented, with many competing vendors and technologies. Programming, deploying, and managing noSQL solutions requires specialized and low-level knowledge that does not easily carry over from one vendor's product to another.
Notes:
1. nothing to embrace in this phase... actually too many to embrace, thus not yet a "standard"
2. can't stop to note that all the examples are LINQ-based. Is this an attempt to grow LINQ in a "standard"?
Questions raise, answers kill. Raise questions to stay alive.
That's a very interesting article, and I'm going to have to look up the research and read it a lot more carefully. But I'm worried that a lot of their analysis just assumes too strongly that relational model = SQL.
For example, their claim that SQL is "not compositional." They define "compositionality" like this:
What we observe here is SQL's lack of compositionality—the ability arbitrarily to combine complex values from simpler values without falling outside the system.
Leaving aside that "compositional" is an odd word to use for this, the first problem here is that the relational model is in fact agnostic about this so-called "compositionality" of column's value's types. The relational model, strictly speaking, doesn't forbid you from having composite-typed columns.
Some, some proposed purely relational solutions to the problems tackled by outer joins is to allow non-base columns to have relations (i.e., sets) as their values. To put it in more SQL-like terms, you could have queries whose result sets had columns whose value was also a multi-row result set. This sort of thing solves the Figure 4 problem from TFA—you would have one row in the result, with Title="The Right Stuff" and Keywords={"Book", "Hardcover", "American"} (a set-valued Keywords column in the result). We can even sketch a SQL-like query for this (not actually valid SQL):
Or this, with a fictional "SET" aggregate function: (again, not actually valid SQL):
Are you adequate?
Could I pick your brain since you have a bit of NoSQL experience?
How does indexing work in NoSQL? Are there EXPLAIN-type tools available? (EXPLAIN in MySQL tells you whether your query is using indexes or table scans, and can help you understand why your query is slow.)
I'm pretty flexible with SQL. Can you do just about any query you could with SQL? ("Find all customers who have bought at least $100 of stuff over the last year, but who haven't bought anything this year.")
I'm not a lawyer, but I play one on the Internet. Blog
I'm thinking about wading into the noSQL waters. Help me out:
If authors aren't normalized, does that basically mean you don't have a separate datastore (table, whatever) for authors? E.g., a publisher might want to keep track of author name, address, etc.
Here's another classic example: country codes vs. country names: (ca, Canada), (us, United States).
If you want to be able to use both, you'd would (classicly) store "ca" in your User table (for what country he lives in), and then have a separate Countries table that tells you what "ca" stands for.
How do you approach that in NoSQL (assuming you want to make use of both codes and full country names)?
I'm not a lawyer, but I play one on the Internet. Blog
For people who have worked with NoSQL (assuming you've worked extensively with SQL before):
1. For someone wanting to either scratch an itch, or come up with the Next Great Thing, would you recommend NoSQL-type solutions to do the standard save data coming in over the web, and later retrieve it, possibly rejigger and summarize it, and feed it back over the web when a user needs it thing?
2. Is NoSQL generally considered faster than SQL equivalents? At runtime or development time?
3. Is there a concept of DB design? Or is it just made up as you go? By doing additional .insert()'s?
I.e., you start off by .create(product). Then add fields: .addAttribute('name', 'Magic Rock') .addAttribute('manuf', 'Rock Emporium')
Then you add in detail for the manufacturer table: .addAttribute('name', 'Rock Emporium') .addAttribute('st', '123 Main St') .addAttribute('state', 'NY')
Is that how it works?
4. And can you change field names later (ALTER TABLE)?
5. What about aggregate functions (MAX, GROUP BY, HAVING)?
The whole thing seems awfully gooey.
I'm not a lawyer, but I play one on the Internet. Blog
WTB Signed tie-dye
From what I have learned about the uses for and abilities of NoSQL, its a compromise you make when affordable scalability is required to stay in business. It is nowhere near as powerful as the RDBMS/SQL combination, however it is much cheaper to run. Don't believe anyone who tries to tell you there are things you can do with NoSQL that you can't do with SQL. That is complete bunk. Maybe it makes speed cheaper, and scaling easier, but those decisions should be forced by application demand and budget constraints, not application design. I am most interested in NoSQL as a way to store denormalized data in a pre-cache for light write, heavy read applications. Any other use would probably be due to desperation to scale to keep up with demand.
Having a bookmark to Google does not make you an expert on everything.
And I'm still not convinced that it wouldn't work better in SQL. There's absolutely no reason you couldn't make relational tables like you have to use NoSQL.
NoSQL seems just an argument in laziness. It seems like a weird mix of caching and unwillingness to design databases correctly.
But I got sick of that argument, so I just with the idea that you need NoSQL for massive amounts of data...
If corporations are people, aren't stockholders guilty of slavery?
A "Reporting Database" huh.
Try "Data Warehouse"... Sounds like classic ETL to me.
dnuof eruc rof aixelsid
OLTP vs DSS. Yep. Normalize and De-Normalize based on purpose and performance. NoSQL is just another tool in the toolbox. If there were one single magic tool, they wouldn't keep inventing new ones.
Why? MongoDB is web scale, we don't need anything else!
"16MB (fuck off, MiB fascists)" - The Mighty Buzzard
Modern commercial RDBMS systems have extremely complex intelligence to manage execution cost including self optimization cost, concurrency, partitioning, escalation, versioning, distributions, index selection, data caches, auto parallelization..etc.
When I see people talking about alternatives be it object stores, key/value, log...etc I ask where is the intelligence... Where is the billions spent on R&D in these new systems? They all appear dumb imitations that always ask you to sacrifice something..be it consistancy, concurrency, model restrictions or for a human to exactly define semantics or access patterns to enable a specific solution.
Is there really a way to create a new **general purpose** data system at least as powerful and useful as the RDBMS without spending on the order of a billion dollars on optimizer design?
For some applications csv flat files run circles around the RDBMS... In the general case flat files suck.
I believe generally the same applies across the board. Yes given a specific problem you can provide a specific solution that is faster better cheaper however the shortcut would not enjoy general applicability.
Reading the introduction is kind of bizarre. Apparently the motivation for the work is to reduce the NoSQL market to a few very profitable suppliers.
*reads up*
Ah. This is the story of how every now and then the kids rediscover DBMs, that's it.
Religion is what happens when nature strikes and groupthink goes wrong.
You can use SQL for everything, or you can use the best tool for the job. Usually that's SQL, but the other choices aren't just for big data: graphdbs like neo4j allow you to efficiently query deeply, arbitrarily linked data in a way that just can't be done with SQL.
And then there's the triple/quad-stores like jena, 4store, virtuoso which allow you to answer questions from your data that you couldn't even begin to imagine doing with SQL.
So although I think these graph & "triple" DBs are the way of the future, it's an extremely crazy thing to ditch SQL for your core business data. But it's also crazy to dismiss these non-SQL technologies simply because they aren't SQL - there is a lot of potential in having your data in one or more of these "alternatives", similar to the value you get from wiring up solr/lucene for full-text search.
We've been doing a project with MongoDB (no, we're not using it as our authoritative data store, but neither are we using SQL for that), which we could have done with an SQL DB, but honestly we didn't have the resources to do so. We have less than a million records, hardly a huge amount of data - our data is arbitrarily structured, fitting the document-object model perfectly, and makes for absolutely meaningless SQL tables (with many joins, stored procedures so that we could try to service an obscure perlish query language).
Additionally, the ability for us to delegate some parts of the more complex queries to JS on the server(s) is incredibly useful. And the integration cost was very low: the perl driver is officially supported by 10gen, it fit quite naturally into the project... the proof of concept was done in just a couple of weeks.
Our contractor has, however, discovered some irony with MongoDB's "schemaless" claim: unless you can pick an extremely finite set of fields to be indexed, you can't actually do arbitrary queries on arbitrary documents... in other words you need to decide on a schema :-)
"here, here" "hear, hear"
The first is an outing of belittling sarcasm, whilst the second is more like twisting your mustache approvingly but loudly.
Seeing this is /. i chances are he was posting sarcasm.
Hivemind harvest in progress..
I'm also loving this thread. My solution: stored methods.
Inserting data should (only!) be done via a method that sits inside the database. This method also writes a crosstable matching Client_Id with LatestTransaction_Id. Voila.. and for any existing data its just a onetime batch conversion. Doesn't get any faster. Also, there is NOTHING stored twice, and with proper stored methods the chances of this crosstable getting out of sync is zero.
If you want data, store data. If you want information, extract / combine it from data.
Hivemind harvest in progress..
Your sig has nothing to do with SQL. That alone tells me more about your skill and understanding than any other opinion you might hold.
2. can't stop to note that all the examples are LINQ-based. Is this an attempt to grow LINQ in a "standard"?
No, it's an attempt to promote understanding and usage of monads. LINQ is arguably the most widely used implementation of monads, it's just that many people don't realize it.
Brian Beckman's Don't fear the Monads
An excellent article explaining how LINQ is extensible to work with any monad
A video by Erik Meijer explaining the duality of IEnumerable/IObservable and IQueryable/IQbservable, as stated in the original article
All my liberal friends think I'm a conservative, all my conservative friends think I'm a liberal.
You have a strange definition of "perfect".
In the late 70's IBM created a relational query language called BS-12 (Business System 12) that looks to be more flexible than SQL. Perhaps IBM felt it was too "mathy" to sell to executives and went with SQL instead.
There is also the "Tutorial D" family of query languages based around Chris Date's popular textbook. I personally don't like its syntax structure and don't think it fits well with dynamic typing, and thus created my own draft relational query language called SMEQL (LOR fans will like the name). It borrows from BS-12 and functional programming.
Here's an example that returns the top 6 earners in each department
And a brief guide to the primary operators:
* calc(table, columnTable) // similar to SELECT clause in SQL // similar to WHERE clause in SQL // roughly similar to GROUP BY in SQL // sorts or produces sequence numbers
* filter(table, expression)
* group(table, columnTable)
* join(table_1, table_2, expression)
* leftJoin(table_1, table_2, expression)
* orderBy(table, columnTable, [sequenceColumn])
* union(table_1, table_2)
Table-ized A.I.
And then there's the triple/quad-stores like jena, 4store, virtuoso which allow you to answer questions from your data that you couldn't even begin to imagine doing with SQL.,
Here, we're running into the stupidity of the name NoSQL. What promoters normally mean by that are 'non-relational'.
From what I understand, the databases you just listed are relational. Triplestores just use a simpler query language, and a simpler database layout, allowing for much greater speed.
Our contractor has, however, discovered some irony with MongoDB's "schemaless" claim: unless you can pick an extremely finite set of fields to be indexed, you can't actually do arbitrary queries on arbitrary documents... in other words you need to decide on a schema :-)
Document stores are great for storing documents and indexing metadata and the entire content. They are not great at indexing random parts of it.
I have a feeling if you actually got that indexed, you'd be at the point where the solution would be heavier than just using SQL.
We've been doing a project with MongoDB (no, we're not using it as our authoritative data store, but neither are we using SQL for that), which we could have done with an SQL DB, but honestly we didn't have the resources to do so. We have less than a million records, hardly a huge amount of data - our data is arbitrarily structured, fitting the document-object model perfectly, and makes for absolutely meaningless SQL tables (with many joins, stored procedures so that we could try to service an obscure perlish query language).
See, the thing is, I don't think you know what's going on. No one, at least not me, has a problem with people using a document store to store an insane amount of unstructured documents and trying to pull some sense out of them. That's what they're for.
But there are people out these who have decided to use a document store, because NoSQL is 'better' than SQL, or they're just lazy, so they lump their data together and throw it in one. Or they decide that their user database should be kept in HBase. And then people like me have to deal with the results of that shitpile.
No one here is against non-relational databases, we're against the idiotic 'NoSQL movement' that has randomly sprung up where fools decided to use something besides perfectly functional non-relational databases.
People switch to non-relational databases when what you're trying to do doesn't work well in a relational one, and that's fine. Sadly, we have a lot of people out there with no database knowledge who have decided to switch because NoSQL is 'newer and better' or because they simply don't understand how to make relational databases so throw lumps of data in an object store.
If corporations are people, aren't stockholders guilty of slavery?
Although OO is not inherently hierarchical, OO designers seem to think in terms of hierarchies first, and this is often a mistake in my opinion. The most flexible systems use many-to-many relationships, not hierarchies. And, OO does not natively handle many-to-many very well.
The key-words in the examples are an instance of many-to-many relationships. In this case they are a simple "list" of words, but if we wanted to have a better key-word "manager", then key-words would have their own definition table, and each key-word would have a value column and description column. This allows one to change or improve the description without having to change all the references to it. It also allows the use of referential integrity to make sure the keywords are spelled correctly, etc.
Table-ized A.I.
Composite value types violate 1NF (as defined by some people in the field.) since they are isomorphic to relations.
Yeah, but you wouldn't have them in the base relvars of your schema. You'd just allow queries and views to have set-valued attributes.
Are you adequate?
Citation needed
The Tao of math: The numbers you can count are not the real numbers.