Slashdot Mirror


Is the Relational Database Doomed?

DB Guy writes "There's an article over on Read Write Web about what the future of relational databases looks like when faced with new challenges to its dominance from key/value stores, such as SimpleDB, CouchDB, Project Voldemort and BigTable. The conclusion suggests that relational databases and key value stores aren't really mutually exclusive and instead are different tools for different requirements."

12 of 344 comments (clear)

  1. new record by hguorbray · · Score: 5, Interesting

    that's efficient -a summary that refutes the inflammatory headline

    I'm just sayin'

  2. Yes, but not soon. by pwnies · · Score: 3, Interesting

    The flexibility offered in key/value databases is simply too good of a feature to pass up. However, do you really think you can get people to give up MSSQL? It'll be nice for smaller projects, but corporations wont even consider it for a number of years.

    1. Re:Yes, but not soon. by SanityInAnarchy · · Score: 3, Interesting

      do you really think you can get people to give up MSSQL?

      In favor of MySQL, PostgreSQL, SQLite, even Oracle, yes, I do.

      corporations wont even consider it for a number of years.

      You must have some specific corporations in mind, because I've known many corporations to use each of the above technologies. In fact, SQLite is one of the most popular databases ever.

      No, the reason it's not soon is because these other ones (CouchDB) aren't mature, and the ones that are (BigTable) aren't available at any price.

      --
      Don't thank God, thank a doctor!
    2. Re:Yes, but not soon. by photon317 · · Score: 5, Interesting

      Yes, these newer simple key/value databases like BigTable and CouchDB are effectively a subset of RDBMS functionality, so of course the same thing can be implemented relationally by just not using features.

      The reason these projects have taken off is that the relational features being skipped comprise most of the complexity of an RDBMS. Without them, it's relatively trivial to write new database engines from scratch instead of re-using MySQL, PostgreSQL, and so-on. These new feature-poor rewrites can take on many challenges that are harder for the big relational guys, like stellar performance on huge datasets, and being truly distributed in nature.

      --
      11*43+456^2
  3. Enough with the death of the relational DB by Mr.+Underbridge · · Score: 5, Interesting

    This same basic story keeps getting submitted from the same group of people who are generally trying to sell non-relational-DB stuff. This is an ad. Move along.

  4. 99.9% of databases... by Ckwop · · Score: 3, Interesting

    99.9% of database claim to follow the relational model.

    The rest have scalability problems that 99.9% of developers will never see throughout their entire careers.

    So the answer is a simple, emphatic, no.

  5. Not buying it. by reginaldo · · Score: 5, Interesting

    In theory, I agree the most costly actions in a database are joins. It seems like the key/value model is a great solution to this, on the surface. However, what the key/value model does is push the cost to the application layer. Instead of ensuring relational integrity and conformity in the database, suddenly all app code has to do this on the frontend. Also, instead of managing this process in a single place, suddenly this process is distributed among multiple methods. Sure, the DB is more scaleable, but suddenly the app is a mess.

  6. Here's a match.. by Slicker · · Score: 3, Interesting

    Relational databases need to die. I loved them and preached the goodness of them 10 years ago, but they are just too rigid for contemporary needs. I've learned better ways of organizing and filtering data.. but the old RDBMS school is too canonical (stubborn) and self-indulging to realize that needs are changing and their model doesn't fit.

    We need efficient attribute/value models. We need to stop referencing data by where it is and start referencing it by what it is. There is too much data that needs to exist in different views, based on policy--not explicit placement.

    Dumb-tags (attributes without values) like those used with Delicious bookmarks are also broken. They are too vague.

    My own approach is that every attribute may have any number of value instances. Each value instance may, in turn, have sub-attributes. So you can look up data based on its characteristics even with disregard for its name. For example: /mycompany/mailserver1/ip of zone = infirewall

    This returns all IP addresses under the "zone" attribute while also under the mailserver1 attribute that is under the mycompany attribute.

    When validating instances of the "ip" attribute, it looks backward in the path because it is extremely quick that way.

    The data server's sole responsibility is storing and retrieving information (not just data) in context (aka filtering).

    Sorting is the responsibility of the client. This makes sense because there are an infinite number of algorithms one could have for sorting data (e.g. alphabetic mixed case, ASCII order, etc). To facilitate this, I wrote a method to return the number of values that would be returned if the values were requested. If too big a bite for the client, it can re-request the size of a smaller chunk, segmented according to the client's ordering method. This is useful for scale, in any case. Processing in chunks makes sense whether over a network of limited capacity or from directly form disk with limited memory.

    And--this is a columnar approach like Google's BigTable is.. That means you get 10+ times faster read performance.

    Matthew

  7. Re:?'s meaning - literal and implied by Cajun+Hell · · Score: 4, Interesting
    --
    "Believe me!" -- Donald Trump
  8. SQL is the problem, not RDBMSs by Savantissimo · · Score: 3, Interesting

    SQL and all its pointy-headed progeny are the real problem with databases, not the relational vs. newMarketingBuzzwordDuJour arguments.

    Database operations do not need to look like code or algorithms, the only reason they do is to provide jobs for database programmers.

    Over 15 years ago Paradox's query-by-example was light-years ahead of today's soul-killing SQL crap.

    SQL is not going away, though, any more than its idiot older brother Mumps (M, Caché).

    --
    "Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
  9. Re:Supid people who don't understand data by DougWebb · · Score: 3, Interesting

    Without any details this sounds like an urban legend. If you designed your system as you would have with a lesser system like a simple "key/value" pair, how would a RDBMS be any different?

    The difference is optimization vs generalization. Many problems can be handled using simple key/value pair relationships. You can model this in an RDBMS using two-column tables that you never join across, where all of your queries are SELECT val FROM tab WHERE key=? and INSERT INTO tab (key,val) VALUES (?,?). However, if you use the RDBMS this way, you're paying for the overhead of the SQL engine, (usually) a client/server connection, and your language's library for interacting with an RDBMS.

    The alternative is a non-relational database like BerkeleyDB, which is optimized for key/value pair operations. All the fetch and store operations do is fetch and store the value for a given key, with a minimum of overhead. BerkeleyDB is also an in-process database, where your application is accessing the database files directly using the BerkeleyDB library code. (The library handles locking so that multiple processes can use the database files at the same time.) Again, the overhead is kept to a minimum.

    BerkeleyDB is much less flexible than an RDBMS, but for the problem domains where that flexibility is not needed, BerkeleyDB is much more efficient. I've easily achieved over 6000 read/write transactions per second on modest hardware in a single-threaded process; a multi-threaded and/or multi-process application can achieve much higher rates. Compare that to a typical Oracle database connection, where you're lucky to get as many as a few hundred transactions per second, just because of the network round-trip.

  10. MapReduce is a bunch of hype by Estanislao+Mart�nez · · Score: 4, Interesting

    The name of the MapReduce framework comes from the functional programming operations "map" and "reduce." Map takes as its input a collection of data, and a function that transforms data elements into other elements; it outputs a collection where each element of the input collection has been replaced by the result of applying that function to it. Reduce takes a collection of elements, an initial value of the same type as the elements, and a two-place, commutative, associative and symmetric operation; it produces as its output the value that results from applying the operation to the initial value and each element of the collection in turn, accumulating the partial results.

    Map and reduce are operations that can be trivially parallelized. To parallelize map, you divide the collection into subcollections (in any arbitrary manner), and map over each of them in parallel. To parallelize reduce, you divide the collection into subcollections, also arbitrarily, reduce each subcollection independently, then apply the reduction operation to the partial results. (That works because the reduction operation is commutative, associative and symmetric.)

    Well, guess what: this sort of technique is trivially applicable to relational database queries. A SQL query translates down to a combination of joins (the FROM clause), filters (the WHERE clause) and maps (the SELECT clause). Joins are trivially parallelizable; you give each execution unit a subset of the tuples of the driving relation. Filtering (the WHERE clause) is a kind of reduce operation. SELECT is a kind of map operation. This means that relational queries are not any less amenable to parallel execution than the stuff Google does.

    But the killer thing here is that MapReduce says absolutely nothing about the updates problem. This is one of the big features of RDBMSs: the ability to handle concurrent query and modification. It also says nothing about the data integrity problem, which is also one of the big RDBMS features.

    So, when you get down to it, there is a good argument to be made that many applications could make use of database technologies that support much faster querying, at the expense of very little updating. But there's no convincing argument that that technology isn't best implemented in the context of an RDBMS.