Is the Relational Database Doomed?
DB Guy writes "There's an article over on Read Write Web about what the future of relational databases looks like when faced with new challenges to its dominance from key/value stores, such as SimpleDB, CouchDB, Project Voldemort and BigTable. The conclusion suggests that relational databases and key value stores aren't really mutually exclusive and instead are different tools for different requirements."
that's efficient -a summary that refutes the inflammatory headline
I'm just sayin'
Someone forgot to put a where clause on that delete.
Reading slashdot one-liner: (irm http://rss.slashdot.org/Slashdot/slashdot).rdf.item | fl title,desc*
The flexibility offered in key/value databases is simply too good of a feature to pass up. However, do you really think you can get people to give up MSSQL? It'll be nice for smaller projects, but corporations wont even consider it for a number of years.
Someone type this up and submit it to Digg.
Hey, read my article! Just to make sure you do, I'll pull a Dvorak and put in some incredibly sensational headline about how RDBMs are dewmed!!!!!! BWAHAHA, feed my advertisers!!!!
(Tune in ext week, when I write about how C programming is going to become extinct in the light of fantastic new development tools like C# and Ruby on Rails!!!)
The world's burning. Moped Jesus spotted on I50. Details at 11.
This isn't digg. Posting that doesn't guarantee you +5
There's a db called Project Voldemort? That's awesome! I'm switching to that just for the name! I think my manager is a Harry Potter fan so getting approval shouldn't be too hard.
This same basic story keeps getting submitted from the same group of people who are generally trying to sell non-relational-DB stuff. This is an ad. Move along.
99.9% of database claim to follow the relational model.
The rest have scalability problems that 99.9% of developers will never see throughout their entire careers.
So the answer is a simple, emphatic, no.
Leave us RDBMS dinosaurs alone. String Name/Value pairs, that is a great innovation. In other news, Sun will be dropping all types from the Java object system and rely on the VOID type. Idiots.
Map db = new HashMap();
beginTransaction(); // Synchronize on the map // Just serialize the fucker to a file. The idiots using this won't know the difference.
db.add("key", "value");
commitTransaction();
The big dumb thing about key store values is that they are actually just a subset of relational algebra in theory and are thus readily implementable in a relational database in fact. If you really wanted to have a database just do key / store values, you could quite easily do that in any rdms.
This is my sig.
I won't believe it until Netcraft confirms it.
I have something in common with Stephen Hawking...
Is white the new black?
No, it isn't, black is the new black, and whiten and black are not really mutually exclusive. And.... I made you look. Thanks for the pageviews, suckers!
Welcome to the Panopticon. Used to be a prison, now it's your home.
It has been suggested before that the life of the relational DB is coming to an end. I must say that while I agree with this statement: -
Relational databases scale well, but usually only when that scaling happens on a single server node. When the capacity of that single node is reached, you need to scale out and distribute that load across multiple server nodes. This is when the complexity of relational databases starts to rub against their potential to scale.
I disagree with the following statement: -
Try scaling to hundreds or thousands of nodes, rather than a few, and the complexities become overwhelming, and the characteristics that make RDBMS so appealing drastically reduce their viability as platforms for large distributed systems.
I submit that the complexity can be managed and that's why we have jobs.
I am an IT consultant at a major bank and we keep all kinds of data. Data that many find useless and is spread across 27 [major] nodes. Total records in our biggest table number about 57 million with 49 rows. I can tell you that data querying and integrity maintaining are a breeze if the schematic design is correct in the first place.
We are always designing and testing different scenarios. In cases where we have had to change the schema, it has been simple if one knows what to do.
I must say that Open Source DBs have worked for us though we rely on products from IBM and Oracle.
Our philosophy is: If it works in PostgreSQL, it will even do wonders on DB2 or Oracle. I do not see how we can do away with the relational DB. Whoever designed it in the beginning did a marvelous job.
In headlines, "?" implies that something is a serious question, whose answer is likely to be yes. One that makes it worth spending the time to read the article.
Imagine the headline said "Does Obama Smoke Crack?" and the article had a bunch of stuff about the president, with a last paragraph saying: "There is absolutely no reason to thing that President Obama has ever smoked crack."
-- Support a free market in the field of government
Really rational is the best way to take a data set and be able to access it in various ways. Many of the other concepts are indeed regressions and reintroduce problems a relational database solves. Relational allows you to able to display and view data in various different ways and apply the dataset in new ways, ways that may not have originally been a part of the original design of the application. Every time we hear someone harp about some new database technology that reintroduces all of the problems of the past, but relational is still the best and most versatile way to store your data in a way that allows for query flexibility.
Turns out, there's something called a "skateboard." You can use it to travel as far as the Quickie Mart, with nothing but your feet to propel it.
In conclusion, skateboards and automobiles aren't the same thing, so probably not.
The relational database is not going anywhere and nothing in that article is based on any firm understanding of managing data.
Is the notion of a "join" obsolete? No, but it is typically impractical in a high volume system. You would probably use denormalization as a strategy.
Scaling many nodes? OK, you still gotta put your data "in" something.
key/value indexing? yawn. select val from keyvalue_tab where key = foo;
The value can be basically anything, and most "relational" databases have good object support as well as XML, JSON, etc.
So we can establish that a SQL relational database can do *everything* a simpler system can do. Now, think about ALL the things you can do with your data in a real database.
What is the point of using a limited and less functional system? A good system, like Oracle, DB2, PostgreSQL, etc (!mysql of course) will do what you need AND allow you do do more should you be successful.
The problem with data is two fold: Managing read/write/deletes and finding what you are looking for. These problems have been solved. A good database will do this for you. Want to store object? XML, JSON, binary objects, or a specialized database extension works perfectly.
form the article: "For example, a relatively simple SELECT statement could have hundreds of potential query execution paths, which the optimizer would evaluate at run time. All of this is hidden to us as users, but under the cover, RDBMS determines the "execution plan" that best answers our requests by using things like cost-based algorithms." So, you have no idea how optimizers work and how you can access tuning information, and you'd like to tell us RDBMSs are bad? Get of my lawn! (yay, I'm getting old)
Does that example of a relational DB have a serious error, or is that just me? Why have make key in two tables?
He lost cred right then.
No comprende? Let me type that a little slower for you...
In theory, I agree the most costly actions in a database are joins. It seems like the key/value model is a great solution to this, on the surface. However, what the key/value model does is push the cost to the application layer. Instead of ensuring relational integrity and conformity in the database, suddenly all app code has to do this on the frontend. Also, instead of managing this process in a single place, suddenly this process is distributed among multiple methods. Sure, the DB is more scaleable, but suddenly the app is a mess.
they think Nissan makes the Civic!
This lack of data integrity could have been prevented if they had used a relational database...
Relational databases need to die. I loved them and preached the goodness of them 10 years ago, but they are just too rigid for contemporary needs. I've learned better ways of organizing and filtering data.. but the old RDBMS school is too canonical (stubborn) and self-indulging to realize that needs are changing and their model doesn't fit.
We need efficient attribute/value models. We need to stop referencing data by where it is and start referencing it by what it is. There is too much data that needs to exist in different views, based on policy--not explicit placement.
Dumb-tags (attributes without values) like those used with Delicious bookmarks are also broken. They are too vague.
My own approach is that every attribute may have any number of value instances. Each value instance may, in turn, have sub-attributes. So you can look up data based on its characteristics even with disregard for its name. For example: /mycompany/mailserver1/ip of zone = infirewall
This returns all IP addresses under the "zone" attribute while also under the mailserver1 attribute that is under the mycompany attribute.
When validating instances of the "ip" attribute, it looks backward in the path because it is extremely quick that way.
The data server's sole responsibility is storing and retrieving information (not just data) in context (aka filtering).
Sorting is the responsibility of the client. This makes sense because there are an infinite number of algorithms one could have for sorting data (e.g. alphabetic mixed case, ASCII order, etc). To facilitate this, I wrote a method to return the number of values that would be returned if the values were requested. If too big a bite for the client, it can re-request the size of a smaller chunk, segmented according to the client's ordering method. This is useful for scale, in any case. Processing in chunks makes sense whether over a network of limited capacity or from directly form disk with limited memory.
And--this is a columnar approach like Google's BigTable is.. That means you get 10+ times faster read performance.
Matthew
If "key/value" databases do become more popular, they certainly might eat in to relational database mindshare. 90% of web applications use RDMSs merely as persistent data storage--the fact that they are "relational" doesn't matter at all; the fact that a separate SQL language is needed to get the data (rather than using language-native data structures as an interface) is even a negative for RDMs.
As a web app developer, I'm excited that something other than SQL is getting attention. RDMSs won't go away because they have properties data miners, for example, need. But they aren't ideal for the simple persistent data stores most apps call for.
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
There really isn't a true implementation of the relational model as per Codd and Date.
Also, SQL is a nightmare. A badly designed programming language which is not quite functional and not quite procedural and so needs a bunch of hacks to work properly. And then there is the issue of NULLS. And the fact that you can end up with ugly bag operations and path dependencies in SQL.
And just to start yet another flame war (Iknow, I just know some one is going to mod me as a troll today) key/value is just another way of saying "network database".
And another thing which I will probably get hammered for, if you normalize a DB properly you will get you objects almost for free. And vice versa. Where I see people having problems is that they either are :
1) lazy about defining and understanding their data
2) or likewise for their objects
3) or both.
If you do it properly will will get a nice set of multidimensional objects and fact/attribute tables which are orthogonal and lean. Easy to understand, search, join, build, compose, decompose, signal and track.
As opposed to a snarled up hacked together, overloaded, over inherited nightmare with hidden dependencies which I have seen too many times.
OK, you can slam me now.
putting the 'B' in LGBTQ+
SQL and all its pointy-headed progeny are the real problem with databases, not the relational vs. newMarketingBuzzwordDuJour arguments.
Database operations do not need to look like code or algorithms, the only reason they do is to provide jobs for database programmers.
Over 15 years ago Paradox's query-by-example was light-years ahead of today's soul-killing SQL crap.
SQL is not going away, though, any more than its idiot older brother Mumps (M, Caché).
"Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
Some of those systems appear to more or less still be "relational". If each row is treated like a map (associative array) of strings, then the "schema" for a given table is the set union of all attributes used in the table, and non-existing columns for a given row can be treated as nulls.
As long as an asterisk is not used in a query (ex: "select * from tableX"), then it will pretty much act like existing RDBMS, and as long as the type-explicitness issues are resolved based on dynamic language conventions. (Asterisks can be implemented perhaps, but it could be computationally expensive.)
It's kind of like dynamic (AKA "scripting") languages versus static or type-heavy languages. The static kind of languages requires more up-front info that "protects" the integrity of the thing at the expense of flexibility and declaration volume. The same dichotomy can be applied to RDBMS also. We have RDBMS that like a lot of info up-front, and now those which accept incremental or ad-hoc insertions are starting to be common (but still less standardized).
And constraints can be incrementally added, such as later requiring that every new record in a "Cars" table have a value for "brand" or the like.
One possible exception is that there were some examples that violated "map-ness" of records, such as having two colors for a car. If they instead supplied "color_1" and "color_2", then map set rules would not be violated, keeping it closer to true relational.
In short: We don't have to abandon relational to get dynamism.
Table-ized A.I.
Hardware makes just as big of a difference today, which is why distributed key/value stores are gaining currency at the moment. The hardware-related difference that was a big win for relational databases was their efficient use of disk space when normalized; the hardware-related difference that is a big win for distributed key/value stores now is their efficient scalability by distribution across multiple nodes.
The big thing isn't "tuple stores or graph dbs" its distributed tuple stores, and, even better, distributed transactional tuple stores. Not a whole of them from the last century.
I can see the meeting now.
Developer: "Hey boss, I found a better product for the transaction processing data! It might save us a bunch of money on Oracle licenses!"
Boss: "Great, what is it?"
Developer: "Project Voldemort!"
Boss: "..."
Developer: "No really, let me explain..."
Boss: "I have a meeting to get to, but hey, let me know if you have any other great ideas."
A SQL query walks into a bar and sees two tables. He walks up to them and says 'Can I join you?'
From Tom Kyte's blog sql joke
Analytic & algebraic topology of locally Euclidean meterization of infinitely differentiable Riemmanian manifold
Reading this I keep seeing OOP in there, and data as an object class.
This is just the OOP crowd trying to not learn SQL and do things their way. It won't replace a full RDBMS. And an RDBMS can scale quite nicely if you know what the hell you're doing.
The name of the MapReduce framework comes from the functional programming operations "map" and "reduce." Map takes as its input a collection of data, and a function that transforms data elements into other elements; it outputs a collection where each element of the input collection has been replaced by the result of applying that function to it. Reduce takes a collection of elements, an initial value of the same type as the elements, and a two-place, commutative, associative and symmetric operation; it produces as its output the value that results from applying the operation to the initial value and each element of the collection in turn, accumulating the partial results.
Map and reduce are operations that can be trivially parallelized. To parallelize map, you divide the collection into subcollections (in any arbitrary manner), and map over each of them in parallel. To parallelize reduce, you divide the collection into subcollections, also arbitrarily, reduce each subcollection independently, then apply the reduction operation to the partial results. (That works because the reduction operation is commutative, associative and symmetric.)
Well, guess what: this sort of technique is trivially applicable to relational database queries. A SQL query translates down to a combination of joins (the FROM clause), filters (the WHERE clause) and maps (the SELECT clause). Joins are trivially parallelizable; you give each execution unit a subset of the tuples of the driving relation. Filtering (the WHERE clause) is a kind of reduce operation. SELECT is a kind of map operation. This means that relational queries are not any less amenable to parallel execution than the stuff Google does.
But the killer thing here is that MapReduce says absolutely nothing about the updates problem. This is one of the big features of RDBMSs: the ability to handle concurrent query and modification. It also says nothing about the data integrity problem, which is also one of the big RDBMS features.
So, when you get down to it, there is a good argument to be made that many applications could make use of database technologies that support much faster querying, at the expense of very little updating. But there's no convincing argument that that technology isn't best implemented in the context of an RDBMS.
Are you adequate?
A "key/value" database is simply a relational database with a single table and two columns. It doesn't make any sense to build a separate server program for what current database servers can already easily do.
I'm a bit uncertain what you're saying here. Surely the fact that the server can do more than what you need doesn't hinder your program? The same goes for SQL language; surely the fact that commands sent to the database are text strings isn't a negative? In any case, you can (and probably should) separate database access into a module of its own, offering whatever API you desire for the rest of the program.
However, they can handle such data stores in a very simple fashion. A pair of "setvalue(key, value) / getvalue(key)" is trivially easy to implement on top of SQL language. It just doesn't make sense to pour resources into developing a less capable database server.
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.