Slashdot Mirror


SQL and NoSQL are Two Sides of the Same Coin

An anonymous reader writes "NoSQL databases have become a hot topic with their promise to solve the problem of distilling valuable information and business insight from big data in a scalable and programmer-friendly way. Microsoft researchers Erik Meijer and Gavin Bierman ... present a mathematical model and standardized query language that could be used to unify SQL and NoSQL data models." Unify is not quite correct; the article shows that relational SQL and key-value NoSQL models are mathematically dual, and provides a monadic query language for what they have coined coSQL.

35 of 259 comments (clear)

  1. microsoft research rocks by JonySuede · · Score: 2

    microsoft research rocks but the product division usually sucks !

    --
    Jehovah be praised, Oracle was not selected
    1. Re:microsoft research rocks by Draek · · Score: 2

      No, just their software. Their mice and keyboards are second to none, and I know more than a few gamers that wouldn't touch an X360 with a ten-foot pole but still keep one of its controllers plugged in their PCs.

      --
      No problem is insoluble in all conceivable circumstances.
  2. You forgot the inverse tachyon pulse by 0p7imu5_P2im3 · · Score: 4, Insightful

    An inverse tachyon pulse would disperse the relational quantum silica into a focused warp field, thus purging all forms of slipstream space based SQL databases from subspace.

    --
    Resistance is futile. Your technological distinctiveness will be added to our own. You will become one with the morgue
  3. The real reason people like noSQL... by MrEricSir · · Score: 3, Insightful

    ...is that SQL sucks as a language. It's not terribly expressive, the ordering of arguments is inconsistent, and whoever designed the way JOIN works should be in jail.

    Frankly, I'd like to see SQL die and get replaced with something more modern. We don't program in Cobol anymore, so why the hell are we still using SQL?

    --
    There's no -1 for "I don't get it."
    1. Re:The real reason people like noSQL... by Anonymous Coward · · Score: 4, Insightful

      so why the hell are we still using SQL?

      Why are we still using C? Why are we still using HTML? Why are we still using FORTRAN (in the scientific community)? Same reason.

      Might add that all these - C, HTML and FORTRAN - are still being updated, with new standards. So is SQL. It's really the same thing, and they all stick around for the same reasons, too.

    2. Re:The real reason people like noSQL... by sexconker · · Score: 5, Insightful

      We DO still program in COBOL.
      And we DO still use SQL.
      And we do so because it works.

      Not only does SQL work, it is the best at what it does.

      The only people who hate on SQL are the people who don't understand databases.
      Generally, these are the same people who like labels, tag clouds and ruby on rails.
      They produce a lot of high level hand waving with regards to the actual code and endless amounts of "herp derp I dunno" when asked why their shit performs slower than the 10 year old system it's supposed to replace. These are bad people.

      What really pisses me off is that everyone fucking agreed with me until Android came out, then suddenly Java was cool, the performance was considered "good", and the quality of code and coders that it tends to bring about is now the acceptable norm.

    3. Re:The real reason people like noSQL... by garyebickford · · Score: 5, Insightful

      Actually COBOL predates SQL by about 10 years. AFAIK nobody has written a language that implements the relational query model to replace SQL. And (though I have never written anything in COBOL he says thankfully) COBOL has its place even today. I would not be surprised if there are as many lines of COBOL still running in enterprises everywhere as there are of PHP or Perl in those same enterprises.

      And COBOL even now is without question a better solution for business and application programming than C ever was or ever will be. (Of course it's arguable that there are other languages better for those tasks than COBOL as well.) C is good for device drivers, kernels and as a target for interpreted and scripting languages with compiled code generators. C is, as Kernighan, Ritchie or Thompson (I forget which) said, "a structured PDP-11 Macro-assembler". Today (putting my Nomex suit on...) IMHO application programmers should not be wasting their time coping with segfaults and compile-link cycles. Their time is worth more than the machine time that any cycle-saving difference. :)

      --
      It's easier to be a result of the past, but more fun to be a cause of the future! http://www.spacefinancegroup.com/
    4. Re:The real reason people like noSQL... by O(+inf) · · Score: 2

      What's wrong with the way (ANSI) JOIN works? It's practically right out of relational algebra. As an aside, the term "NoSQL" has very little to do with SQL-as-language, and really is about relational vs other. Some "NoSQL" solutions provide SQL as a query language for their datasets, and there are some relational databases out there that don't use SQL as a query language, but would not count as "NoSQL".

    5. Re:The real reason people like noSQL... by frank_adrian314159 · · Score: 2

      The people who think that SQL sucks don't usually understand it. The first hurdle is that you need to think about it as a set-oriented language. You are dealing with operations that work on sets. You select a subset of all objects in the table, filter for only the columns that you want, modify them via update, insert new items, join two sets together, etc. If you come looking at it as a procedural replacement, you are starting out screwed.

      Next, like all languages, the pure relational model has grown warts (some significant) over the years. If you're willing to consider Java or Ruby as decent languages (and there are a lot of flaws at the language level in both), you shouldn't hate on SQL for this kind of thing.

      Finally, there are things that should have been in SQL that were never actually standardized - things like schema migration, for example. Yes, the language does a crappy job at them and in ways that are incompatible between implementations. But, when you look at todays object-oriented languages, which have no standardized way to migrate objects from one version to the next (well, except for CLOS-respecting Common Lisp implementations), I don't see these sorts of things as a shortcoming - languages get standardized when they get standardized.

      So, yes, it's a crappy language - just like all the rest out there. But at least the language core is built on a formal model which can be followed to make databases work together better. It's longevity is not just because Microsoft or Oracle said to use it. It's because, when you get right down to it, it's a pretty reasonable language for what it was meant to do. Just like FORTRAN, COBOL, C, and many others. And it still has a better syntax than Perl. What's not to love?

      --
      That is all.
    6. Re:The real reason people like noSQL... by DavidTC · · Score: 4, Insightful

      There are a few things with SQL that could be done better, and there's still some standardization needed. I'd like to see a SQL 2012 standard or something.

      But you're entirely right. There is one place nosql makes sense, and it's gigantic data stores like facebook and google, where the quantity is overwhelming, and the quality isn't that important...it's okay miss a few things, and you're looking for sorta-random stuff. That is why NoSQL was invented.

      No one should ever 'choose' to use NoSQL...if you're on the size of a project that needs NoSQL, I promise you you are nowhere near that decision...it will be decided between the seven project architects as they buy thirty servers to run the damn thing. That's the guys who have a legit need, or at least it's a legit option, of using NoSQL.

      It's not useful for any other system in existence.

      It's especially funny when toy projects try to use NoSQL. It's like idiots trying to run their watches off geothermal power. 'It's free power! FROM THE EARTH!'

      Dude, you're using half an amp, perhaps you should learn how to use a watch battery instead of driving 2 mile polls into the ground as you walk around. It's not like SQL is fucking rocket science. In fact, right now, NoSQL is actually more complicated to use.

      --
      If corporations are people, aren't stockholders guilty of slavery?
    7. Re:The real reason people like noSQL... by DeadDecoy · · Score: 2

      I've been working with mongo noSQL for a little while now, and it's nice because it's fast and you don't have to de-normalize data that should not have been normalized in the first place. E.g. article{ title, authors, keywords } instead of having a separate table for authors and keywords. This probably attributes to the faster speeds in those databases too. It's not so much that SQL sucks or JOINs are evil, rather that they fit a different use case very well. The argument is somewhat similar to: why should I use a flathead screwdriver on cross-head screws when a phillip's screwdriver is more effective?

      I think people who get entangled into the arguments of which flavor of database is better, are often leaning towards what they're familiar with as opposed to does the tool match the problem. NoSQL is relatively new and seems complementary to SQL. If there's a framework (maybe coSQL) that combines the benefits and reduces the weaknesses of both, that seems like a welcome change.

    8. Re:The real reason people like noSQL... by DogDude · · Score: 3, Insightful

      It's not painful. It's just different than what web developers doing "select *" are used to. As a system, it works well for tiny projects, all the way to the largest databases in the world. In the world of "develop it now, deal with problems later", people just can't be bothered to learning the right way to do something.

      --
      I don't respond to AC's.
    9. Re:The real reason people like noSQL... by lakeland · · Score: 2, Informative

      And we DO still use SQL. And we do so because it works

      I disagree.

      There are some things which are fast for a computer to do but are slow and awkward to do in SQL. You can see quite a few of them in SAS supported by data steps (e.g. decent RBAR support). Another is say I want to get the latest transactions for each customer, I have to do something like

      select customer_id,max(txn_datetime) latest_txn_datetime
      from fact_txn
      group by customer_id

      -- I've now got the latest transaction time but because system reversals and the like are often set to happen at the same time as the transaction I have to next...

      select customer_id,max(txn_key) latest_txn_key
      from fact_txn
      join (select customer_id,max(txn_datetime) latest_txn_datetime
      from fact_txn
      group by customer_id) latest_txn
      on (fact_txn.customer_id = latest_txn.customer_id)
      where fact_txn.txn_datetime = latest_txn.latest_txn_datetime
      group by fact_txn.customer_id

      -- Now we've got the right transaction key, but we still have to join again to get the actual transactions

      select fact_txn2.*
      from fact_txn fact_txn2
      where exists
          select 1 from
      (select customer_id,max(txn_key) latest_txn_key
      from fact_txn
      join (select customer_id,max(txn_datetime) latest_txn_datetime
      from fact_txn
      group by customer_id) latest_txn
      on (fact_txn.customer_id = latest_txn.customer_id)
      where fact_txn.txn_datetime = latest_txn.latest_txn_datetime
      group by fact_txn.customer_id) txn_keys_of_interest
      where fact_txn2.txn_key = txn_keys_of_interest.latest_txn_key;

      I can tidy this up a bit if needed - using CTEs or whatnot but the simple fact remains that a lot of the time I want to write a one liner like:

      select * from fact_txn having txn_key = max(txn_key) over (PARTITION by customer_id order by txn_datetime)

    10. Re:The real reason people like noSQL... by Jonner · · Score: 3, Interesting

      SQL definitely sucks as a language. However, the relational model it was intended to expose does not. We need languages that more fully and naturally expose the relational model.

    11. Re:The real reason people like noSQL... by Sarten-X · · Score: 2

      And yet, one of my toy projects in grad school generated over a billion data points that needed analysis. The only database server I had access to choked on the first analysis pass (of about a dozen). It took about three months of processing to get to a point that could even be considered acceptable. Since then, on a whim, I've redone the program using Hadoop and HBase. A MapReduce job completed the analysis, on worse hardware, in less than a day. A major contributing factor was the lack of a rigid structure in HBase, which greatly improved the organization of data.

      NoSQL solutions are certainly not the best tool for every job, but neither are normal relational databases. In my opinion, it's worthwhile to be familiar with both, and to be able to choose the right one for every task.

      --
      You do not have a moral or legal right to do absolutely anything you want.
    12. Re:The real reason people like noSQL... by angel'o'sphere · · Score: 2

      Lol,

      that was water on the wheels of the other posters you agree: most people have no clue about programming and SQL.

      If one of your main use cases is to find the "latest transaction" of your customer, then adapt your database to support that use case.

      Sorry, your example makes no real sense at all.

      angel'o'sphere

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    13. Re:The real reason people like noSQL... by Jonner · · Score: 3, Interesting

      Neither SQL nor its original incarnation SEQUEL was the first language based on the relational data model. There are also more recent relational languages, such as Tutorial D, though none has gained much popularity and few people know they exist, even in the database management world. We badly need a replacement for SQL that is more flexible and more fully implements the relational model.

    14. Re:The real reason people like noSQL... by Jonner · · Score: 2

      Just because the relational model SQL is supposed to implement is great doesn't mean we don't need a better replacement for SQL. There are alternatives, but they're so obscure, immature, or incompatible with other commonly used tools that hardly anyone ever considers them.

      I use PostgreSQL at work and I'm constantly expanding what I can do with SQL, both as a result of learning and new features that are being added. However, I constantly run into its limitations and extreme ugliness. Even though I work with it every day, I still commonly look up basic syntax because it's so irregular. I'd love to be able to use a better language without giving up all the excellent properties of Postgres, but implementing that would clearly be a huge job and I'm not aware of anyone attempting that.

    15. Re:The real reason people like noSQL... by sapgau · · Score: 2

      You are comparing a query from a transactional system vs. a dimensional one.

      To achieve this you have to extract your transactional data and build the cubes needed for the query. Yes it will take a few hours every night to build the cube but your dimensional queries will be fast, efficient and wont bog down the transactional databases.

    16. Re:The real reason people like noSQL... by geekpowa · · Score: 2

      Actually, the example makes perfect sense and it is a problem I've encountered regularly and one of my many bug bears with SQL.

      The generalized problem is 'find last txn' for every customer before date X.

      Now if you can access the btree directly, you can efficiently walk it and get this data. But with SQL you cannot efficently get it. PostgreSQL windowing functions make getting such data simple in terms of constructing the SQL (once you get your head around windowing functions), but these queries walk large sections of the dataset and are not optimized fully.

      The parents actual example is 'find latest txn'. This can be simplified by materializing details of the last transaction to the customer record on update. But this only works for this specific case. The generalised problem is not readily solved in SQL or at least I have not found a nice solution. The best I can come up with is to materialize data for a given date granularity; (i.e. first of every month). It works to a degree, but the solution is ugly and I know better solutions are possible (depending on cardinality of the data) if you are closer to the metal and are able to bypass the SQL layer.

    17. Re:The real reason people like noSQL... by shic · · Score: 2

      I must take issue with the claim that "The only people who hate on SQL are the people who don't understand databases."

      I think you should take some time and read the extensive publications of Chris Date (of Codd and Date International fame) - which, without exception, are extremely critical of SQL (while praising the relational model.)

      SQL has its place - but it is far from perfect... and, the 'shackle of compatibility' prevents most of what is counter-productive about SQL from being fixed. NOSQL, while an infant technology, at least shows promise... only by starting afresh can we establish better DBMS strategies. No, I don't think NO-SQL is an alternative to SQL - it is an entirely different approach... one that should be judged on merit in each individual context.

    18. Re:The real reason people like noSQL... by Compaqt · · Score: 2

      One thing that annoys me is the disjunction between UPDATE and INSERT queries for the same set of fields:

      INSERT INTO blah(uid, name, email) VALUES(12,'Goober','goob@example.com')

      But if you want to change that to an UPDATE query, you have to move each of the fields around:
      UPDATE blah SET uid=12, name='Goober', email='goob@example.com'

      This would have been better:
      UPDATE blah SET (uid, name, email) VALUES(12,'Goober','goob@example.com')

      By the way, if you want to read hard-core fundamentalist, pro-relational stuff by DB guru CJ Date and others, DBDebunk is the place:

      http://www.dbdebunk.com/

      --
      I'm not a lawyer, but I play one on the Internet. Blog
    19. Re:The real reason people like noSQL... by microbox · · Score: 2

      Relational algebra is far simpler than SQL for anything moderately complex. The theory and maths are well developed, but there are no implementations. SQL is loosely based on relational algebra, but the "lets-make-this-real-simple" mentality has had the reverse effect. Relational algebra is actually far easier to learn, and becomes expressive like a regular computer language.

      --

      Like all pain, suffering is a signal that something isn't right
    20. Re:The real reason people like noSQL... by nahdude812 · · Score: 3, Informative

      Like Angel'o'sphere said, if you can adapt your database, the problem becomes trivial. Make sure that at least for a given customer, each subsequent transaction ID is greater than the prior transaction ID (if this is not already the case, then add a new field populated by a sequence so that you have a field where it is the case).

      Here's the solution with a sub-select (because it's easier to read, it can be converted to a join for efficiency):
      SELECT
              transactions.fieldNames
      FROM transactions
      WHERE
              (transactions.customerID, transactions.transactionID) IN (
                      SELECT customerID, MAX(transactionID)
                      FROM transactions
                      GROUP BY customerID
              )

      If, as you suggest, you need it for specific date ranges, then add those to the sub-select. Like I said, for most RDBMS's this would be faster if converted to a join (and basically every sub-select can be converted to a join). For some RDBMS's they would convert it to a join as part of the execution planning anyway (I believe Postgres and Oracle do this).

      Arguments like these actually only serve to strengthen RDBMS's case over NoSQL. Database engineers have been solving these problems easily and efficiently for years, but a new generation likes to think in new patterns. Not that there's anything wrong with that - except there is a certain tendency to try to put a square peg in a round hole, a complaint when it doesn't fit right, and a sigh from the guys who've been carving pegs so they fit snugly all along.

      Key/value storage does have advantages over traditional RDBMS designs (assuming the RDBMS is designed and utilized properly), but those advantages are things like linear scalability, and very few cases where a task on the K/V side is substantially faster to complete than a properly designed solution on the RDBMS side - at least not until you are talking tens or hundreds of billions of records on 100+ CPU clusters (this is the linear scalability advantage).

  4. nothing new in computer engineering since 1980 by Hazel+Bergeron · · Score: 2, Funny

    Nothing new in computer engineering since 1980. Prove me wrong.

  5. Re:not surprising by MightyMartian · · Score: 5, Insightful

    To my mind, SQL's biggest problem over the years has been really shitty implementations (and yeah, I'm looking at you, MySQL).

    --
    The world's burning. Moped Jesus spotted on I50. Details at 11.
  6. NoSQL is great when hiring developers. by Anonymous Coward · · Score: 2, Insightful

    I really like NoSQL. It's a great tool to use when deciding whether I should hire a given software developer, or whether I should move on to the next candidate. All I have to do is ask the person what he thinks about NoSQL. If he gives a positive response, I send him on his way. If he points out its many flaws, I'm often tempted to hire him instantly. After all, those who dislike NoSQL the most generally know how to write good SQL queries, and they know how to use relational databases properly. They're the kind of people I want to hire, even if the position doesn't involve databases much. It just goes to show that they care about quality, that they care about knowing how to use their technology well, and that they care about doing the job properly.

    1. Re:NoSQL is great when hiring developers. by Sarten-X · · Score: 2

      That's okay... Facebook, StumbleUpon, and Twitter will be thrilled to have you pre-screening their applicants to weed out those who refuse to break tradition in the search for improvement.

      --
      You do not have a moral or legal right to do absolutely anything you want.
    2. Re:NoSQL is great when hiring developers. by dave87656 · · Score: 2

      So, you're one of those managers. Your black and white response is too simple. It depends on the application. For some things, NoSQL (key-value stores) are simpler, faster and more scalable. For the typical business application SQL solutions are probably better. What is your application?

  7. Endless debate by alex67500 · · Score: 3, Insightful

    There are only 2 types of languages:
    - those people bitch about, and
    - those no one ever used

  8. Joke time ! by alex67500 · · Score: 5, Funny

    An SQL statement walks into a bar. He sees 2 tables and asks "May I join you?"

  9. Relational Databases are great, not SQL itself by bigsexyjoe · · Score: 2

    My I'm just being a nit-picky coder here, but I don't get why they call it noSQL, when they are really referring moving away from relational databases?

    When I first heard of "NoSQL", I thought, "Great! SQL is a terrible syntax with all it's six letter words and easy dangerous mistakes. I would love to have a superior syntax for interacting with the relational databases that are central to my work!" But "NoSQL" should be called "NoRelational." It is kind of strange that you are changing the whole paradigm of the database around and you are describing it as changing a superficial feature. It would be like calling emails "no pen" writing.

  10. using noSQL by Compaqt · · Score: 2

    Could I pick your brain since you have a bit of NoSQL experience?

    How does indexing work in NoSQL? Are there EXPLAIN-type tools available? (EXPLAIN in MySQL tells you whether your query is using indexes or table scans, and can help you understand why your query is slow.)

    I'm pretty flexible with SQL. Can you do just about any query you could with SQL? ("Find all customers who have bought at least $100 of stuff over the last year, but who haven't bought anything this year.")

    --
    I'm not a lawyer, but I play one on the Internet. Blog
    1. Re:using noSQL by Sarten-X · · Score: 4, Interesting

      Yes, no, and yes, in that order. I'm basing my answers on HBase, with which I have the most experience. My answers are also practically guaranteed to be wrong in somebody's eyes, because HBase is so much more flexible than an RDBMS. If I describe one way of doing something, another layout may work just as well, and somebody's going to favor that way.

      How does indexing work in NoSQL? Are there EXPLAIN-type tools available?

      EXPLAIN tools aren't really necessary in HBase, because almost all nontrivial queries are a scan over a small chunk of the alphanumerically-sorted rows. It will take a while, but please allow me to explain. Each row is a multi-value key-value store, with each value having a column name. If you really want to stick to the RDBMS style, you could have your key be a numeric row ID, and scan everything for every query. It would suck, because you're not using any indexes.

      Indexes are more or less left up to the programmer. Creating an index is effectively just adding more rows to the table. For example, that RDBMS-style layout in the last paragraph could be a table of ID numbers, usernames, passwords, and permissions (for 50 billion people, I guess...). For whatever business reason, the main key will be the ID number. Those rows are easy. They have the expected value columns: username, password, permissions. To index by username, we add new rows, with just a column for the ID number. We could just duplicate the data, but let's not. Now, our table is going to be huge, but sparse. Half of the rows have three of four columns filled, and the other half has only one. Searching by name, it'll take two requests to get to the actual row we want, but that's okay. Doubling the amount of work lets us run faster.

      The reason for that is HBase's split design. HBase's table is split into column families and regions. Column families are a means to group columns, so that even on data with overlapping key space, separate data could remain separate. Column families are stored as separate files in Hadoop. In our example, the username "index" could be a separate column family. That could speed up scanning, because the rows keyed by numeric usernames won't be interspersed with the rows keyed by user id. More importantly, the table is split into regions, each containing a number of rows. Those regions are also stored as separate files, and distributed across the entire Hadoop cluster.

      The cluster is really where Hadoop gets its speed. If we were to run all of our processing from one central location, it would be horribly slow and require a ridiculous number of requests. Instead, we'll distribute everything, including the query, similar to how some RDBMS sharding schemes work. We send a request to all nodes, asking for "the row with the key that matches the value of the 'userid' column of the row with a given key". Each node will report back its results. Unlike RDBMS sharding, the partitioning is handled automatically by HBase into regions that are optimal. It's these regions that are scanned for every request.

      After all of that, it should be quite clear: With HBase, the programmer is expected to know the layout of the data, and write requests based on the key. There is no EXPLAIN tool, because everything is just a key-value lookup.

      Whew. Next question...

      Can you do just about any query you could with SQL?

      Yes, but it's different. Every lookup is handled by scanning a region (in parallel on nodes that have that region's data files), and checking each column of each row to see if:

      1. The row key matches what was requested, or falls within a given range.
      2. The row contains a column that was requested.
      3. A given filter approves each column.

      Note that last item. The filter is simply a program that tells Hadoop whether the row (or some part of it) should be included in the returned results. That program can include other HBase requests, using other filters. If you're really stuck on using RDBMS

      --
      You do not have a moral or legal right to do absolutely anything you want.
  11. Interoperate? by bigstrat2003 · · Score: 2

    Why? MongoDB is web scale, we don't need anything else!

    --
    "16MB (fuck off, MiB fascists)" - The Mighty Buzzard