Is the Relational Database Doomed?

← Back to Stories (view on slashdot.org)

Is the Relational Database Doomed?

Posted by ScuttleMonkey on Friday February 13, 2009 @09:12AM from the just-more-tools-in-the-bucket dept.

DB Guy writes "There's an article over on Read Write Web about what the future of relational databases looks like when faced with new challenges to its dominance from key/value stores, such as SimpleDB, CouchDB, Project Voldemort and BigTable. The conclusion suggests that relational databases and key value stores aren't really mutually exclusive and instead are different tools for different requirements."

17 of 344 comments (clear)

Min score:

Reason:

Sort:

new record by hguorbray · 2009-02-13 09:14 · Score: 5, Interesting

that's efficient -a summary that refutes the inflammatory headline
I'm just sayin'
Yes, but not soon. by pwnies · 2009-02-13 09:17 · Score: 3, Interesting

The flexibility offered in key/value databases is simply too good of a feature to pass up. However, do you really think you can get people to give up MSSQL? It'll be nice for smaller projects, but corporations wont even consider it for a number of years.
1. Re:Yes, but not soon. by SanityInAnarchy · 2009-02-13 09:24 · Score: 3, Interesting
  
  do you really think you can get people to give up MSSQL?
  In favor of MySQL, PostgreSQL, SQLite, even Oracle, yes, I do.
  
  corporations wont even consider it for a number of years.
  You must have some specific corporations in mind, because I've known many corporations to use each of the above technologies. In fact, SQLite is one of the most popular databases ever.
  No, the reason it's not soon is because these other ones (CouchDB) aren't mature, and the ones that are (BigTable) aren't available at any price.
  
  --
  Don't thank God, thank a doctor!
2. Re:Yes, but not soon. by photon317 · 2009-02-13 11:16 · Score: 5, Interesting
  
  Yes, these newer simple key/value databases like BigTable and CouchDB are effectively a subset of RDBMS functionality, so of course the same thing can be implemented relationally by just not using features.
  The reason these projects have taken off is that the relational features being skipped comprise most of the complexity of an RDBMS. Without them, it's relatively trivial to write new database engines from scratch instead of re-using MySQL, PostgreSQL, and so-on. These new feature-poor rewrites can take on many challenges that are harder for the big relational guys, like stellar performance on huge datasets, and being truly distributed in nature.
  
  --
  11*43+456^2
3. Re:Yes, but not soon. by SanityInAnarchy · 2009-02-13 18:55 · Score: 2, Interesting
  
  let me guess, you don't like mssql because it's microsoft?
  And because it's proprietary, single-platform, and expensive for what is, at the end of the day, just a database.
  And because I have seen new and interesting things built with MySQL, like NDB. What has MS SQL got on that?
  
  what a fucking sheep
  Look who's talking.
  More seriously, while I have pretty much no MS SQL experience, I don't particularly want to. The only good experience I've ever had from a Microsoft product was Halo. Bungie was acquired, and has now been sold, making me wonder if Microsoft had the chance to screw them up yet.
  
  --
  Don't thank God, thank a doctor!
4. Re:Yes, but not soon. by anothy · 2009-02-14 00:58 · Score: 2, Interesting
  
  But you need to hire Percona to get the same performance out of MySQL that you get from SQL Server out of the box.
  this has not been my experience. at least with version 8 (two back from current), performance was miserable compared to either mysql or postgresql of comparable vintage. this was my first serious experience using mssql, but with no tuning on either side, both mysql and postgresql outperformed mssql by a factor of about 2.
  while we never got the database on the production system swapped out (development was underway to replace the application it was supporting anyway), and thus i can't speak to mysql or postgresql's reliability in the same use environment, mssql was very unstable. the database would hang indefinitely if either a query or the resulting data was too large, and, as near as we could tell, once every other month or so for no particular reason. the data set was tens of thousands of records a month going back a few years, which is not a trivial sum of data, but shouldn't be considered a lot for a modern database.
  while it's not a direct comparison, i've used mysql in several production projects and have seen less than a half dozen hangs in production total. i've only used postgresql in production on one project, but have seen no production hangs.
  
  --
  
  i speak for myself and those who like what i say.
Enough with the death of the relational DB by Mr.+Underbridge · 2009-02-13 09:27 · Score: 5, Interesting

This same basic story keeps getting submitted from the same group of people who are generally trying to sell non-relational-DB stuff. This is an ad. Move along.
99.9% of databases... by Ckwop · 2009-02-13 09:29 · Score: 3, Interesting

99.9% of database claim to follow the relational model.
The rest have scalability problems that 99.9% of developers will never see throughout their entire careers.
So the answer is a simple, emphatic, no.
1. Re:99.9% of databases... by arevos · 2009-02-13 12:47 · Score: 2, Interesting
  
  99.9% of database claim to follow the relational model.
  The rest have scalability problems that 99.9% of developers will never see throughout their entire careers.
  Uh, actually, relational databases are pretty damn hard to scale. That's basically the main problem with them. Why do you think relational databases are so often paired with a cache made from a hashtable-based database?
Not buying it. by reginaldo · 2009-02-13 10:13 · Score: 5, Interesting

In theory, I agree the most costly actions in a database are joins. It seems like the key/value model is a great solution to this, on the surface. However, what the key/value model does is push the cost to the application layer. Instead of ensuring relational integrity and conformity in the database, suddenly all app code has to do this on the frontend. Also, instead of managing this process in a single place, suddenly this process is distributed among multiple methods. Sure, the DB is more scaleable, but suddenly the app is a mess.
Here's a match.. by Slicker · 2009-02-13 10:20 · Score: 3, Interesting

Relational databases need to die. I loved them and preached the goodness of them 10 years ago, but they are just too rigid for contemporary needs. I've learned better ways of organizing and filtering data.. but the old RDBMS school is too canonical (stubborn) and self-indulging to realize that needs are changing and their model doesn't fit.
We need efficient attribute/value models. We need to stop referencing data by where it is and start referencing it by what it is. There is too much data that needs to exist in different views, based on policy--not explicit placement.
Dumb-tags (attributes without values) like those used with Delicious bookmarks are also broken. They are too vague.
My own approach is that every attribute may have any number of value instances. Each value instance may, in turn, have sub-attributes. So you can look up data based on its characteristics even with disregard for its name. For example: /mycompany/mailserver1/ip of zone = infirewall
This returns all IP addresses under the "zone" attribute while also under the mailserver1 attribute that is under the mycompany attribute.
When validating instances of the "ip" attribute, it looks backward in the path because it is extremely quick that way.
The data server's sole responsibility is storing and retrieving information (not just data) in context (aka filtering).
Sorting is the responsibility of the client. This makes sense because there are an infinite number of algorithms one could have for sorting data (e.g. alphabetic mixed case, ASCII order, etc). To facilitate this, I wrote a method to return the number of values that would be returned if the values were requested. If too big a bite for the client, it can re-request the size of a smaller chunk, segmented according to the client's ordering method. This is useful for scale, in any case. Processing in chunks makes sense whether over a network of limited capacity or from directly form disk with limited memory.
And--this is a columnar approach like Google's BigTable is.. That means you get 10+ times faster read performance.
Matthew
Re:?'s meaning - literal and implied by Cajun+Hell · 2009-02-13 10:27 · Score: 4, Interesting

Documented here.

--
"Believe me!" -- Donald Trump
SQL is the problem, not RDBMSs by Savantissimo · 2009-02-13 11:55 · Score: 3, Interesting

SQL and all its pointy-headed progeny are the real problem with databases, not the relational vs. newMarketingBuzzwordDuJour arguments.
Database operations do not need to look like code or algorithms, the only reason they do is to provide jobs for database programmers.
Over 15 years ago Paradox's query-by-example was light-years ahead of today's soul-killing SQL crap.
SQL is not going away, though, any more than its idiot older brother Mumps (M, Caché).

--
"Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
STILL "relational" - Dynamic Relational by Tablizer · 2009-02-13 12:03 · Score: 2, Interesting

Some of those systems appear to more or less still be "relational". If each row is treated like a map (associative array) of strings, then the "schema" for a given table is the set union of all attributes used in the table, and non-existing columns for a given row can be treated as nulls.
As long as an asterisk is not used in a query (ex: "select * from tableX"), then it will pretty much act like existing RDBMS, and as long as the type-explicitness issues are resolved based on dynamic language conventions. (Asterisks can be implemented perhaps, but it could be computationally expensive.)
It's kind of like dynamic (AKA "scripting") languages versus static or type-heavy languages. The static kind of languages requires more up-front info that "protects" the integrity of the thing at the expense of flexibility and declaration volume. The same dichotomy can be applied to RDBMS also. We have RDBMS that like a lot of info up-front, and now those which accept incremental or ad-hoc insertions are starting to be common (but still less standardized).
And constraints can be incrementally added, such as later requiring that every new record in a "Cars" table have a value for "brand" or the like.
One possible exception is that there were some examples that violated "map-ness" of records, such as having two colors for a car. If they instead supplied "color_1" and "color_2", then map set rules would not be violated, keeping it closer to true relational.
In short: We don't have to abandon relational to get dynamism.

--
Table-ized A.I.
Re:Supid people who don't understand data by DougWebb · 2009-02-13 15:22 · Score: 3, Interesting

Without any details this sounds like an urban legend. If you designed your system as you would have with a lesser system like a simple "key/value" pair, how would a RDBMS be any different?
The difference is optimization vs generalization. Many problems can be handled using simple key/value pair relationships. You can model this in an RDBMS using two-column tables that you never join across, where all of your queries are SELECT val FROM tab WHERE key=? and INSERT INTO tab (key,val) VALUES (?,?). However, if you use the RDBMS this way, you're paying for the overhead of the SQL engine, (usually) a client/server connection, and your language's library for interacting with an RDBMS.
The alternative is a non-relational database like BerkeleyDB, which is optimized for key/value pair operations. All the fetch and store operations do is fetch and store the value for a given key, with a minimum of overhead. BerkeleyDB is also an in-process database, where your application is accessing the database files directly using the BerkeleyDB library code. (The library handles locking so that multiple processes can use the database files at the same time.) Again, the overhead is kept to a minimum.
BerkeleyDB is much less flexible than an RDBMS, but for the problem domains where that flexibility is not needed, BerkeleyDB is much more efficient. I've easily achieved over 6000 read/write transactions per second on modest hardware in a single-threaded process; a multi-threaded and/or multi-process application can achieve much higher rates. Compare that to a typical Oracle database connection, where you're lucky to get as many as a few hundred transactions per second, just because of the network round-trip.
MapReduce is a bunch of hype by Estanislao+Mart�nez · 2009-02-13 17:03 · Score: 4, Interesting

The name of the MapReduce framework comes from the functional programming operations "map" and "reduce." Map takes as its input a collection of data, and a function that transforms data elements into other elements; it outputs a collection where each element of the input collection has been replaced by the result of applying that function to it. Reduce takes a collection of elements, an initial value of the same type as the elements, and a two-place, commutative, associative and symmetric operation; it produces as its output the value that results from applying the operation to the initial value and each element of the collection in turn, accumulating the partial results.
Map and reduce are operations that can be trivially parallelized. To parallelize map, you divide the collection into subcollections (in any arbitrary manner), and map over each of them in parallel. To parallelize reduce, you divide the collection into subcollections, also arbitrarily, reduce each subcollection independently, then apply the reduction operation to the partial results. (That works because the reduction operation is commutative, associative and symmetric.)
Well, guess what: this sort of technique is trivially applicable to relational database queries. A SQL query translates down to a combination of joins (the FROM clause), filters (the WHERE clause) and maps (the SELECT clause). Joins are trivially parallelizable; you give each execution unit a subset of the tuples of the driving relation. Filtering (the WHERE clause) is a kind of reduce operation. SELECT is a kind of map operation. This means that relational queries are not any less amenable to parallel execution than the stuff Google does.
But the killer thing here is that MapReduce says absolutely nothing about the updates problem. This is one of the big features of RDBMSs: the ability to handle concurrent query and modification. It also says nothing about the data integrity problem, which is also one of the big RDBMS features.
So, when you get down to it, there is a good argument to be made that many applications could make use of database technologies that support much faster querying, at the expense of very little updating. But there's no convincing argument that that technology isn't best implemented in the context of an RDBMS.

--
Are you adequate?
Re:Ridiculous by xelah · 2009-02-13 22:58 · Score: 2, Interesting

Hierarchical queries are a historical weakness of SQL (but not the relational model) - that's why he chose it as an example. You'd actually do something like this (but you'll need a very recent database):
WITH RECURSIVE hierarchy AS (
SELECT * FROM employees WHERE name = 'personsname'
UNION ALL
SELECT sub.* FROM employees AS sub, employees AS super
WHERE super.id = sub.parent_department
)
SELECT * FROM hierarchy;