Slashdot Mirror


SQL and NoSQL are Two Sides of the Same Coin

An anonymous reader writes "NoSQL databases have become a hot topic with their promise to solve the problem of distilling valuable information and business insight from big data in a scalable and programmer-friendly way. Microsoft researchers Erik Meijer and Gavin Bierman ... present a mathematical model and standardized query language that could be used to unify SQL and NoSQL data models." Unify is not quite correct; the article shows that relational SQL and key-value NoSQL models are mathematically dual, and provides a monadic query language for what they have coined coSQL.

14 of 259 comments (clear)

  1. You forgot the inverse tachyon pulse by 0p7imu5_P2im3 · · Score: 4, Insightful

    An inverse tachyon pulse would disperse the relational quantum silica into a focused warp field, thus purging all forms of slipstream space based SQL databases from subspace.

    --
    Resistance is futile. Your technological distinctiveness will be added to our own. You will become one with the morgue
  2. The real reason people like noSQL... by MrEricSir · · Score: 3, Insightful

    ...is that SQL sucks as a language. It's not terribly expressive, the ordering of arguments is inconsistent, and whoever designed the way JOIN works should be in jail.

    Frankly, I'd like to see SQL die and get replaced with something more modern. We don't program in Cobol anymore, so why the hell are we still using SQL?

    --
    There's no -1 for "I don't get it."
    1. Re:The real reason people like noSQL... by Anonymous Coward · · Score: 4, Insightful

      so why the hell are we still using SQL?

      Why are we still using C? Why are we still using HTML? Why are we still using FORTRAN (in the scientific community)? Same reason.

      Might add that all these - C, HTML and FORTRAN - are still being updated, with new standards. So is SQL. It's really the same thing, and they all stick around for the same reasons, too.

    2. Re:The real reason people like noSQL... by sexconker · · Score: 5, Insightful

      We DO still program in COBOL.
      And we DO still use SQL.
      And we do so because it works.

      Not only does SQL work, it is the best at what it does.

      The only people who hate on SQL are the people who don't understand databases.
      Generally, these are the same people who like labels, tag clouds and ruby on rails.
      They produce a lot of high level hand waving with regards to the actual code and endless amounts of "herp derp I dunno" when asked why their shit performs slower than the 10 year old system it's supposed to replace. These are bad people.

      What really pisses me off is that everyone fucking agreed with me until Android came out, then suddenly Java was cool, the performance was considered "good", and the quality of code and coders that it tends to bring about is now the acceptable norm.

    3. Re:The real reason people like noSQL... by garyebickford · · Score: 5, Insightful

      Actually COBOL predates SQL by about 10 years. AFAIK nobody has written a language that implements the relational query model to replace SQL. And (though I have never written anything in COBOL he says thankfully) COBOL has its place even today. I would not be surprised if there are as many lines of COBOL still running in enterprises everywhere as there are of PHP or Perl in those same enterprises.

      And COBOL even now is without question a better solution for business and application programming than C ever was or ever will be. (Of course it's arguable that there are other languages better for those tasks than COBOL as well.) C is good for device drivers, kernels and as a target for interpreted and scripting languages with compiled code generators. C is, as Kernighan, Ritchie or Thompson (I forget which) said, "a structured PDP-11 Macro-assembler". Today (putting my Nomex suit on...) IMHO application programmers should not be wasting their time coping with segfaults and compile-link cycles. Their time is worth more than the machine time that any cycle-saving difference. :)

      --
      It's easier to be a result of the past, but more fun to be a cause of the future! http://www.spacefinancegroup.com/
    4. Re:The real reason people like noSQL... by DavidTC · · Score: 4, Insightful

      There are a few things with SQL that could be done better, and there's still some standardization needed. I'd like to see a SQL 2012 standard or something.

      But you're entirely right. There is one place nosql makes sense, and it's gigantic data stores like facebook and google, where the quantity is overwhelming, and the quality isn't that important...it's okay miss a few things, and you're looking for sorta-random stuff. That is why NoSQL was invented.

      No one should ever 'choose' to use NoSQL...if you're on the size of a project that needs NoSQL, I promise you you are nowhere near that decision...it will be decided between the seven project architects as they buy thirty servers to run the damn thing. That's the guys who have a legit need, or at least it's a legit option, of using NoSQL.

      It's not useful for any other system in existence.

      It's especially funny when toy projects try to use NoSQL. It's like idiots trying to run their watches off geothermal power. 'It's free power! FROM THE EARTH!'

      Dude, you're using half an amp, perhaps you should learn how to use a watch battery instead of driving 2 mile polls into the ground as you walk around. It's not like SQL is fucking rocket science. In fact, right now, NoSQL is actually more complicated to use.

      --
      If corporations are people, aren't stockholders guilty of slavery?
    5. Re:The real reason people like noSQL... by DogDude · · Score: 3, Insightful

      It's not painful. It's just different than what web developers doing "select *" are used to. As a system, it works well for tiny projects, all the way to the largest databases in the world. In the world of "develop it now, deal with problems later", people just can't be bothered to learning the right way to do something.

      --
      I don't respond to AC's.
    6. Re:The real reason people like noSQL... by Jonner · · Score: 3, Interesting

      SQL definitely sucks as a language. However, the relational model it was intended to expose does not. We need languages that more fully and naturally expose the relational model.

    7. Re:The real reason people like noSQL... by Jonner · · Score: 3, Interesting

      Neither SQL nor its original incarnation SEQUEL was the first language based on the relational data model. There are also more recent relational languages, such as Tutorial D, though none has gained much popularity and few people know they exist, even in the database management world. We badly need a replacement for SQL that is more flexible and more fully implements the relational model.

    8. Re:The real reason people like noSQL... by nahdude812 · · Score: 3, Informative

      Like Angel'o'sphere said, if you can adapt your database, the problem becomes trivial. Make sure that at least for a given customer, each subsequent transaction ID is greater than the prior transaction ID (if this is not already the case, then add a new field populated by a sequence so that you have a field where it is the case).

      Here's the solution with a sub-select (because it's easier to read, it can be converted to a join for efficiency):
      SELECT
              transactions.fieldNames
      FROM transactions
      WHERE
              (transactions.customerID, transactions.transactionID) IN (
                      SELECT customerID, MAX(transactionID)
                      FROM transactions
                      GROUP BY customerID
              )

      If, as you suggest, you need it for specific date ranges, then add those to the sub-select. Like I said, for most RDBMS's this would be faster if converted to a join (and basically every sub-select can be converted to a join). For some RDBMS's they would convert it to a join as part of the execution planning anyway (I believe Postgres and Oracle do this).

      Arguments like these actually only serve to strengthen RDBMS's case over NoSQL. Database engineers have been solving these problems easily and efficiently for years, but a new generation likes to think in new patterns. Not that there's anything wrong with that - except there is a certain tendency to try to put a square peg in a round hole, a complaint when it doesn't fit right, and a sigh from the guys who've been carving pegs so they fit snugly all along.

      Key/value storage does have advantages over traditional RDBMS designs (assuming the RDBMS is designed and utilized properly), but those advantages are things like linear scalability, and very few cases where a task on the K/V side is substantially faster to complete than a properly designed solution on the RDBMS side - at least not until you are talking tens or hundreds of billions of records on 100+ CPU clusters (this is the linear scalability advantage).

  3. Re:not surprising by MightyMartian · · Score: 5, Insightful

    To my mind, SQL's biggest problem over the years has been really shitty implementations (and yeah, I'm looking at you, MySQL).

    --
    The world's burning. Moped Jesus spotted on I50. Details at 11.
  4. Endless debate by alex67500 · · Score: 3, Insightful

    There are only 2 types of languages:
    - those people bitch about, and
    - those no one ever used

  5. Joke time ! by alex67500 · · Score: 5, Funny

    An SQL statement walks into a bar. He sees 2 tables and asks "May I join you?"

  6. Re:using noSQL by Sarten-X · · Score: 4, Interesting

    Yes, no, and yes, in that order. I'm basing my answers on HBase, with which I have the most experience. My answers are also practically guaranteed to be wrong in somebody's eyes, because HBase is so much more flexible than an RDBMS. If I describe one way of doing something, another layout may work just as well, and somebody's going to favor that way.

    How does indexing work in NoSQL? Are there EXPLAIN-type tools available?

    EXPLAIN tools aren't really necessary in HBase, because almost all nontrivial queries are a scan over a small chunk of the alphanumerically-sorted rows. It will take a while, but please allow me to explain. Each row is a multi-value key-value store, with each value having a column name. If you really want to stick to the RDBMS style, you could have your key be a numeric row ID, and scan everything for every query. It would suck, because you're not using any indexes.

    Indexes are more or less left up to the programmer. Creating an index is effectively just adding more rows to the table. For example, that RDBMS-style layout in the last paragraph could be a table of ID numbers, usernames, passwords, and permissions (for 50 billion people, I guess...). For whatever business reason, the main key will be the ID number. Those rows are easy. They have the expected value columns: username, password, permissions. To index by username, we add new rows, with just a column for the ID number. We could just duplicate the data, but let's not. Now, our table is going to be huge, but sparse. Half of the rows have three of four columns filled, and the other half has only one. Searching by name, it'll take two requests to get to the actual row we want, but that's okay. Doubling the amount of work lets us run faster.

    The reason for that is HBase's split design. HBase's table is split into column families and regions. Column families are a means to group columns, so that even on data with overlapping key space, separate data could remain separate. Column families are stored as separate files in Hadoop. In our example, the username "index" could be a separate column family. That could speed up scanning, because the rows keyed by numeric usernames won't be interspersed with the rows keyed by user id. More importantly, the table is split into regions, each containing a number of rows. Those regions are also stored as separate files, and distributed across the entire Hadoop cluster.

    The cluster is really where Hadoop gets its speed. If we were to run all of our processing from one central location, it would be horribly slow and require a ridiculous number of requests. Instead, we'll distribute everything, including the query, similar to how some RDBMS sharding schemes work. We send a request to all nodes, asking for "the row with the key that matches the value of the 'userid' column of the row with a given key". Each node will report back its results. Unlike RDBMS sharding, the partitioning is handled automatically by HBase into regions that are optimal. It's these regions that are scanned for every request.

    After all of that, it should be quite clear: With HBase, the programmer is expected to know the layout of the data, and write requests based on the key. There is no EXPLAIN tool, because everything is just a key-value lookup.

    Whew. Next question...

    Can you do just about any query you could with SQL?

    Yes, but it's different. Every lookup is handled by scanning a region (in parallel on nodes that have that region's data files), and checking each column of each row to see if:

    1. The row key matches what was requested, or falls within a given range.
    2. The row contains a column that was requested.
    3. A given filter approves each column.

    Note that last item. The filter is simply a program that tells Hadoop whether the row (or some part of it) should be included in the returned results. That program can include other HBase requests, using other filters. If you're really stuck on using RDBMS

    --
    You do not have a moral or legal right to do absolutely anything you want.