Slashdot Mirror


Is the Relational Database Doomed?

DB Guy writes "There's an article over on Read Write Web about what the future of relational databases looks like when faced with new challenges to its dominance from key/value stores, such as SimpleDB, CouchDB, Project Voldemort and BigTable. The conclusion suggests that relational databases and key value stores aren't really mutually exclusive and instead are different tools for different requirements."

88 of 344 comments (clear)

  1. new record by hguorbray · · Score: 5, Interesting

    that's efficient -a summary that refutes the inflammatory headline

    I'm just sayin'

    1. Re:new record by Jah-Wren+Ryel · · Score: 4, Funny

      Yeaah. Only if you did not know the meaning of the '?' symbol.

      --
      When information is power, privacy is freedom.
    2. Re:new record by bFusion · · Score: 4, Insightful

      Well the '?' means that there's a question. The summary gave the conclusion to that question.

    3. Re:new record by julesh · · Score: 5, Funny

      that's efficient -a summary that refutes the inflammatory headline

      I'm just sayin'

      Nah. Efficient would be if the summary were "No."

    4. Re:new record by eln · · Score: 4, Funny

      Next Slashdot article: Is Jah-Wren Ryel a child molester?

      There's no evidence Jah-Wren Ryel has ever molested children, and no reason to suspect he would ever do so. Bandying about accusations like that would likely ruin his life forever.

      However, since child molestation is such a big political issue these days, as a responsible news site I believe we need to have equal representation from both sides of the argument and let our viewers decide.

    5. Re:new record by Pseudonym · · Score: 2, Informative

      This is what linguists refer to the "tabloid headline question mark". Its use is to say something inflammatory and only tangentially related to the story in order to get readers.

      Examples:

      "Is Jennifer pregnant?"
      "Steve Ballmer: Love child of Satan?"

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
  2. Uh-oh by benjymouse · · Score: 5, Funny

    Someone forgot to put a where clause on that delete.

    --
    Reading slashdot one-liner: (irm http://rss.slashdot.org/Slashdot/slashdot).rdf.item | fl title,desc*
    1. Re:Uh-oh by MarkRose · · Score: 2, Funny

      That's okay! I'll just rollback the transaction.... oh shit, that was a MyISAM table...

      --
      Be relentless!
  3. Yes, but not soon. by pwnies · · Score: 3, Interesting

    The flexibility offered in key/value databases is simply too good of a feature to pass up. However, do you really think you can get people to give up MSSQL? It'll be nice for smaller projects, but corporations wont even consider it for a number of years.

    1. Re:Yes, but not soon. by SanityInAnarchy · · Score: 3, Interesting

      do you really think you can get people to give up MSSQL?

      In favor of MySQL, PostgreSQL, SQLite, even Oracle, yes, I do.

      corporations wont even consider it for a number of years.

      You must have some specific corporations in mind, because I've known many corporations to use each of the above technologies. In fact, SQLite is one of the most popular databases ever.

      No, the reason it's not soon is because these other ones (CouchDB) aren't mature, and the ones that are (BigTable) aren't available at any price.

      --
      Don't thank God, thank a doctor!
    2. Re:Yes, but not soon. by Eravnrekaree · · Score: 5, Informative

      Actually i read TFA, and I just couldnt make sense of the benefits offered by the key value thing. You basically should be able to get the same benefits with a relational database system with a query that does a lookup on a single column index. This would involve searching the b-tree for that column, which would yield a row data address of some sort, to either a linked list of cells or a list of addresses of those cells. Once the single b-tree is done it is then very fast to find the other column values in that row. The b-tree or other index lookup also has to be done with the key value pair, the relational is just a collection of multiple key value indexes.

      There is the issue of having a variable number of pieces of data linked to a certain key. But you can do this in relational too. Just create a table with an id column, value type column and value column. A well designed relational, if you do a query on the id column, the b-tree will lead to data which has all of the row data addresses in the database that match the id. EAch of those rows will contain a different data type/data payload for the id. This is again pretty much as fast as a simple single index database.

    3. Re:Yes, but not soon. by photon317 · · Score: 5, Interesting

      Yes, these newer simple key/value databases like BigTable and CouchDB are effectively a subset of RDBMS functionality, so of course the same thing can be implemented relationally by just not using features.

      The reason these projects have taken off is that the relational features being skipped comprise most of the complexity of an RDBMS. Without them, it's relatively trivial to write new database engines from scratch instead of re-using MySQL, PostgreSQL, and so-on. These new feature-poor rewrites can take on many challenges that are harder for the big relational guys, like stellar performance on huge datasets, and being truly distributed in nature.

      --
      11*43+456^2
    4. Re:Yes, but not soon. by horza · · Score: 3, Insightful

      let me guess, you don't like mssql because it's microsoft? what a fucking sheep, mssql is a great database.
      oh and i've used all the others and for you to suggest mysql over mssql tells a lot...

      MSSQL? Isn't that the only database that isn't cross platform these days? Why would anybody want to use MSSQL outside of .Net developers? On a side note, why is it that only MSSQL appears to get crippled by worms and none of the others?

      Phillip.

    5. Re:Yes, but not soon. by Estanislao+Mart�nez · · Score: 3, Insightful

      Yes, these newer simple key/value databases like BigTable and CouchDB are effectively a subset of RDBMS functionality, so of course the same thing can be implemented relationally by just not using features.

      What worries me about these arguments, however, is that they're missing a point that's very similar to yours here: these high-performance key-value databases can be implemented as features in an RDBMS. Basically, if you have a technology that allows some limited type of database to be distributed across tons of nodes and to be queried really fast, well, that's a kind of limited-functionality materialized view with a special engine to access it. So put it in as a subsystem to the full RDBMS, and use your plain old full-featured relational engine as the system of record that solves the concurrent transactional update and data integrity problems, and have it also push out the deltas to the specialized store that supports the the high-performance distributed querying.

      Nobody is denying that there are many applications where you don't need all that the relational model provides, and that those applications can be made to perform faster by not providing certain features. What people repeatedly fail to understand is that this is not a refutation of the relational data model, because it is a logical and general data model that's capable of modeling the data in such applications, and does not dictate the implementation.

    6. Re:Yes, but not soon. by encoderer · · Score: 3, Informative

      Suggesting that you could replace a MS-SQL server with SQLite basically forces anybody in the know to ignore every other point you make.

      MySQL is good, unless you need a highly performent query analyzer.

      Postgres is good, unless you need actual replication features.

      SQLite is good, if your datastore is less than 1GB.

      Oracle is no-doubt a valid replacement and improvement upon SQL Server. And I use MySQL more than any other DB. But you need to hire Percona to get the same performance out of MySQL that you get from SQL Server out of the box.

    7. Re:Yes, but not soon. by SanityInAnarchy · · Score: 3, Informative

      Suggesting that you could replace a MS-SQL server with SQLite basically forces anybody in the know to ignore every other point you make.

      You're assuming that the person using MS-SQL Server knows what they're doing. How do you know it's more than just a glorified Access database?

      MySQL is good, unless you need a highly performent query analyzer.

      In other words, the query analyzer is slow? Because the queries work well enough.

      Postgres is good, unless you need actual replication features.

      Like these?

      SQLite is good, if your datastore is less than 1GB.

      Another quick Google, and we find these limits -- by default, the maximum database size is just under 32 terabytes.

      Not that I'm suggesting it's a good choice at that point, especially with multiple processes. But it does make it kind of hard to take you seriously with that kind of imagined limit, unless you're suggesting there's a practical, performance wall after 1 gig.

      --
      Don't thank God, thank a doctor!
    8. Re:Yes, but not soon. by SanityInAnarchy · · Score: 2, Interesting

      let me guess, you don't like mssql because it's microsoft?

      And because it's proprietary, single-platform, and expensive for what is, at the end of the day, just a database.

      And because I have seen new and interesting things built with MySQL, like NDB. What has MS SQL got on that?

      what a fucking sheep

      Look who's talking.

      More seriously, while I have pretty much no MS SQL experience, I don't particularly want to. The only good experience I've ever had from a Microsoft product was Halo. Bungie was acquired, and has now been sold, making me wonder if Microsoft had the chance to screw them up yet.

      --
      Don't thank God, thank a doctor!
    9. Re:Yes, but not soon. by anothy · · Score: 2, Interesting

      But you need to hire Percona to get the same performance out of MySQL that you get from SQL Server out of the box.

      this has not been my experience. at least with version 8 (two back from current), performance was miserable compared to either mysql or postgresql of comparable vintage. this was my first serious experience using mssql, but with no tuning on either side, both mysql and postgresql outperformed mssql by a factor of about 2.
      while we never got the database on the production system swapped out (development was underway to replace the application it was supporting anyway), and thus i can't speak to mysql or postgresql's reliability in the same use environment, mssql was very unstable. the database would hang indefinitely if either a query or the resulting data was too large, and, as near as we could tell, once every other month or so for no particular reason. the data set was tens of thousands of records a month going back a few years, which is not a trivial sum of data, but shouldn't be considered a lot for a modern database.
      while it's not a direct comparison, i've used mysql in several production projects and have seen less than a half dozen hangs in production total. i've only used postgresql in production on one project, but have seen no production hangs.

      --

      i speak for myself and those who like what i say.
    10. Re:Yes, but not soon. by EastCoastSurfer · · Score: 2, Insightful

      First, all applications have bugs that open them up to security flaws. Picking on MSSQL in that area is a non-starter.

      What you're missing are all of the tools that come with a MSSQL license. SISS and MSAS are two big ones that are hard to replace with open source tools (Pentaho is interesting). If all you're looking to replace is a pure data store then yeah, postgre is what I would move to. When you start replacing all functionality offered by MSSQL it gets a little more complicated.

  4. Top 25 Reasons the Relational Database is Doomed by MillionthMonkey · · Score: 5, Funny

    Someone type this up and submit it to Digg.

  5. Hey! by MightyMartian · · Score: 4, Insightful

    Hey, read my article! Just to make sure you do, I'll pull a Dvorak and put in some incredibly sensational headline about how RDBMs are dewmed!!!!!! BWAHAHA, feed my advertisers!!!!

    (Tune in ext week, when I write about how C programming is going to become extinct in the light of fantastic new development tools like C# and Ruby on Rails!!!)

    --
    The world's burning. Moped Jesus spotted on I50. Details at 11.
    1. Re:Hey! by dkleinsc · · Score: 5, Insightful

      Especially when the claim is as ridiculous as this one.

      There's a reason relational databases took over the world of databases: They provide a good combination of flexibility and structure to efficiently represent data. Which is what databases are supposed to do.

      --
      I am officially gone from /. Long live http://www.soylentnews.com/
    2. Re:Hey! by Just+Some+Guy · · Score: 5, Insightful

      There's a reason relational databases took over the world of databases: They provide a good combination of flexibility and structure to efficiently represent data.

      Especially since so many databases really are inherently relational. The textbook example of 1-customer:n-invoices, 1-invoice:n-items plays out quite a bit in the workplace.

      --
      Dewey, what part of this looks like authorities should be involved?
  6. Re:Karma Whoring by Anonymous Coward · · Score: 2, Insightful

    This isn't digg. Posting that doesn't guarantee you +5

  7. Voldemort! by GreatRedShark · · Score: 3, Funny

    There's a db called Project Voldemort? That's awesome! I'm switching to that just for the name! I think my manager is a Harry Potter fan so getting approval shouldn't be too hard.

    1. Re:Voldemort! by youthoftoday · · Score: 4, Funny

      A Harry Potter fan? Voldemort? Surely the name is the one thing that'll *prevent* approval?

      --
      -1 not first post
    2. Re:Voldemort! by fuzzyfuzzyfungus · · Score: 5, Funny

      The name might be cool; but the length of some of the commands will really get to you. How many times do you want to type AVADA_KEDAVRA TABLE?

    3. Re:Voldemort! by the_B0fh · · Score: 4, Funny

      **SPOILER ALERT**

      In book 8, it turns out that good ol' Voldy is actually Harry's older brother. They had a tearful reunion, and Voldy now works for Harry.

    4. Re:Voldemort! by jollyreaper · · Score: 5, Funny

      The name might be cool; but the length of some of the commands will really get to you. How many times do you want to type AVADA_KEDAVRA TABLE?

      Better than PokemonDB. Then you have to jump on top of your desk and shout "Customer Table, I select you!" every time you run a damn query.

      --
      Kwisatz Haderach
      Sell the spice to CHOAM
      This Mahdi took Shaddam's Throne
    5. Re:Voldemort! by GreatRedShark · · Score: 5, Funny

      You're right, that is a bit cumbersome. Hopefully, they'll release a friendly GUI wizard to make working with it more efficient.

    6. Re:Voldemort! by WuphonsReach · · Score: 4, Funny

      Better than PokemonDB. Then you have to jump on top of your desk and shout "Customer Table, I select you!" every time you run a damn query.

      *polite golf clap*

      --
      Wolde you bothe eate your cake, and have your cake?
    7. Re:Voldemort! by daveime · · Score: 3, Funny

      It's already been done ...

      HAI
      CAN HAS DBASE?
      I HAZ A VARIABLE1 IS NOTHING
      IM IN YR DATA ;) "test.mdb"
              CAN I PLZ GET column1 column2 column3
              ALL UP IN table1
              OMG column1 IZ BIGGER THAN 5
              ALL UR BASE R BELONG 2 VARIABLE1
      IM OUTTA YR DATA
      VISIBLE VARIABLE1
      KTHXBYE

  8. Enough with the death of the relational DB by Mr.+Underbridge · · Score: 5, Interesting

    This same basic story keeps getting submitted from the same group of people who are generally trying to sell non-relational-DB stuff. This is an ad. Move along.

    1. Re:Enough with the death of the relational DB by Penguinshit · · Score: 5, Funny

      Don't online dating sites use relational databases?

    2. Re:Enough with the death of the relational DB by Yetihehe · · Score: 3, Funny

      They do. Object databases are only for insensitive clods.

      --
      Extreme Programming - Redundant Array of Inexpensive Developers
  9. 99.9% of databases... by Ckwop · · Score: 3, Interesting

    99.9% of database claim to follow the relational model.

    The rest have scalability problems that 99.9% of developers will never see throughout their entire careers.

    So the answer is a simple, emphatic, no.

    1. Re:99.9% of databases... by arevos · · Score: 2, Interesting

      99.9% of database claim to follow the relational model.

      The rest have scalability problems that 99.9% of developers will never see throughout their entire careers.

      Uh, actually, relational databases are pretty damn hard to scale. That's basically the main problem with them. Why do you think relational databases are so often paired with a cache made from a hashtable-based database?

    2. Re:99.9% of databases... by arevos · · Score: 2, Informative

      What you call a "hash table database" others might call an "indexed cursor".

      Others would be wrong ;)

      An indexed cursor only contains a reference to the original data. Memcached contains a duplicate of the original data, so I'd argue it was a database in its own right.

      However, even if Memcached doesn't meet the criteria of a database, DBM-based databases certain do. They operate on a similar principle; a unique key points to a specific piece of data. Unlike Memcached, they are persistent, but like Memcached they are very fast and easily scalable.

      I was asking for an example of a data storage technique that scales better than RDB.

      Well, consider a modern DBM-based database like Tokyo Cabinet. Let's say we want to distribute it evenly across 16 machines, labelled 1 to F. When a request for data comes in, we MD5 the key and use the first 4 bits to determine the machine to use. This gives us an even and consistent spread of data between machines.

      Relational databases can't easily use the same trick, because table joins are very costly to perform if the table data is distributed across several machines. In a nutshell, the flexibility of relational databases reduces their speed and scalability compared to databases with a more limited scope.

  10. Finally the OODB people will by thammoud · · Score: 5, Insightful

    Leave us RDBMS dinosaurs alone. String Name/Value pairs, that is a great innovation. In other news, Sun will be dropping all types from the Java object system and rely on the VOID type. Idiots.

  11. A great open source implementation by thammoud · · Score: 5, Funny

    Map db = new HashMap();

    beginTransaction(); // Synchronize on the map
    db.add("key", "value");
    commitTransaction(); // Just serialize the fucker to a file. The idiots using this won't know the difference.

  12. ah, stupid. by tjstork · · Score: 2

    The big dumb thing about key store values is that they are actually just a subset of relational algebra in theory and are thus readily implementable in a relational database in fact. If you really wanted to have a database just do key / store values, you could quite easily do that in any rdms.

    --
    This is my sig.
    1. Re:ah, stupid. by poot_rootbeer · · Score: 3, Insightful

      If you really wanted to have a database just do key / store values, you could quite easily do that in any rdms.

      Sure, but it's not likely that a key/value store implemented within a general-purpose RDBMS can achieve the same raw performance that a system designed to do nothing but implement a key/value store -- nor the distributability, for that matter.

  13. In relation to what? by Penguinshit · · Score: 5, Funny

    I won't believe it until Netcraft confirms it.

  14. Re:Yeah by idontgno · · Score: 3, Funny

    Is white the new black?

    No, it isn't, black is the new black, and whiten and black are not really mutually exclusive. And.... I made you look. Thanks for the pageviews, suckers!

    --
    Welcome to the Panopticon. Used to be a prison, now it's your home.
  15. This is an old argument which will not fly by bogaboga · · Score: 5, Informative

    It has been suggested before that the life of the relational DB is coming to an end. I must say that while I agree with this statement: -

    Relational databases scale well, but usually only when that scaling happens on a single server node. When the capacity of that single node is reached, you need to scale out and distribute that load across multiple server nodes. This is when the complexity of relational databases starts to rub against their potential to scale.

    I disagree with the following statement: -

    Try scaling to hundreds or thousands of nodes, rather than a few, and the complexities become overwhelming, and the characteristics that make RDBMS so appealing drastically reduce their viability as platforms for large distributed systems.

    I submit that the complexity can be managed and that's why we have jobs.

    I am an IT consultant at a major bank and we keep all kinds of data. Data that many find useless and is spread across 27 [major] nodes. Total records in our biggest table number about 57 million with 49 rows. I can tell you that data querying and integrity maintaining are a breeze if the schematic design is correct in the first place.

    We are always designing and testing different scenarios. In cases where we have had to change the schema, it has been simple if one knows what to do.

    I must say that Open Source DBs have worked for us though we rely on products from IBM and Oracle.

    Our philosophy is: If it works in PostgreSQL, it will even do wonders on DB2 or Oracle. I do not see how we can do away with the relational DB. Whoever designed it in the beginning did a marvelous job.

    1. Re:This is an old argument which will not fly by cat_jesus · · Score: 2, Informative

      Total records in our biggest table number about 57 million with 49 rows.

      I think you mean columns.

    2. Re:This is an old argument which will not fly by Anonymous Coward · · Score: 2, Informative

      E F Codd, an IBM mathematician. And I won't even look at a technology that claims to replace the RDB until I've seen a fully developed mathematical treatment that at least approaches the sophistication of Codd's work.

  16. ?'s meaning - literal and implied by qbzzt · · Score: 5, Insightful

    In headlines, "?" implies that something is a serious question, whose answer is likely to be yes. One that makes it worth spending the time to read the article.

    Imagine the headline said "Does Obama Smoke Crack?" and the article had a bunch of stuff about the president, with a last paragraph saying: "There is absolutely no reason to thing that President Obama has ever smoked crack."

    --
    -- Support a free market in the field of government
    1. Re:?'s meaning - literal and implied by 117 · · Score: 5, Funny

      President Obama smokes crack?!!?!??!!?!

    2. Re:?'s meaning - literal and implied by Cajun+Hell · · Score: 4, Interesting
      --
      "Believe me!" -- Donald Trump
    3. Re:?'s meaning - literal and implied by digitig · · Score: 4, Insightful

      In headlines, "?" implies that something is a sensationalized question, whose answer is "almost certainly, no".

      Fixed that for ya.

      --
      Quidnam Latine loqui modo coepi?
    4. Re:?'s meaning - literal and implied by zenlunatics · · Score: 2, Insightful

      so your hate for Obama is strong enough to wish that the entire country has a bad 4 years? gee, thanks.

    5. Re:?'s meaning - literal and implied by value_added · · Score: 4, Funny

      President Obama smokes crack?!!?!??!!?!

      Dunno. Has he stopped beating his wife?

  17. Ridiculous by Eravnrekaree · · Score: 3, Insightful

    Really rational is the best way to take a data set and be able to access it in various ways. Many of the other concepts are indeed regressions and reintroduce problems a relational database solves. Relational allows you to able to display and view data in various different ways and apply the dataset in new ways, ways that may not have originally been a part of the original design of the application. Every time we hear someone harp about some new database technology that reintroduces all of the problems of the past, but relational is still the best and most versatile way to store your data in a way that allows for query flexibility.

    1. Re:Ridiculous by Grapedrink · · Score: 2, Insightful

      I agree with you on a lot of points, particularly people coming up with stupid solutions and creating new problems, but how is the rest of this insightful? Sure, relational is a good general fit databases, but it sounds like you are saying the fact that you can query and modify it using something like SQL in most implementations makes it great?

      Exactly how is that easier than some other ways, such as building an object database? Can't you just write a few lines of code that are far more expressive than any SQL ever could be in a language like Common Lisp, Smalltalk, Python, Ruby, etc? Isn't that more accomodating than a relational model which limits your options due to performance vs. flexibility vs. integrity vs. extensibility vs. scalability? How does SQL give you more ways to manipulate things than a map, collect, slice, reduce, anonymous function/lambda, etc?

      I use both relational and object databases (preference to object dbs in all honesty). For an object database, my process both in use and development is to write and modify like it sounds, objects. Instances of objects in those classes are automatically stored for me and even in most implementations, class level data as well. I simply write my code and trudge along and do not worry about some ridiculous ORM. If I need transactions, I have them at the object level which I would want anyway even with a relational DB.

      If I need a query, it is done in a well-known language that I used to write the application. I can of course see if there was no application, it might annoying to do this and relational can make some of that easier, but that is rarely the case. Further, I don't hit as many bumps where I need to denormalize my data to do reporting or data warehousing. I simply once again write code as normal to get what I want.

      A great example is try storing an organizational hierarchy in a database. Query it for basic info such as a list of a manager and all subordinates and superiors. Now try to ask it for the full path between employees. Keep asking it questions about the hierarchy. In just about every relational db it is a fail. Oracle for instance even realized things like this and added "Connect By." Storing the data itself is a nightmare and you end up needing something like nested sets, self joining queries, cursors (never), handing it off to an application (aka relational failure), or materialized path.

      You run into other similar problems where you see hackish solutions in the realtional world like table inheritance. Why have it if a relational database is so good? It is there because relational completely fails here, just like object databases fail elsewhere. There is no ideal solution, and for general cases both work great in my experience, even giving an edge for web applications to object dbs.

      There are so many areas where either the relational model itself, or SQL fails. If you have not hit them, then you have not used relational databases as much more than a glorified spreadsheet. The amount of time I spend tuning my queries in a relational db is ridiculous, even for relatively simple data. Hints, denormalization, columns as rows, cursors, triggers, user defined functions, and other such devices are all crutches for relational dbs. Of course some of those are also caused by bad devs of course, but it need not be that hard in the first place.

      Anyway, I am not trying to slam the relational model. Rather, I think you are wrong to say it's the best and most flexible. Like all things, it depends what you are doing, and in my own experience object databases have been far easier to work with and maintain. I must save months of work every time I use one, but general ignorance often forces me to use either object or relational. If people better understood the strengths of each and paid more attention to each specific task rather than marketing, we would all be happier. It's sad that complaints about tools for example are even valid points. If you market the hell out of something and it just becomes the standard for whatever reason, then of course it is going to win in areas like that. You would think with all the anti-Microsoft rhetoric around here, people would get it.

      For now, I'll continue to use both and enjoy them for different reasons.

    2. Re:Ridiculous by xelah · · Score: 2, Interesting
      Hierarchical queries are a historical weakness of SQL (but not the relational model) - that's why he chose it as an example. You'd actually do something like this (but you'll need a very recent database):

      WITH RECURSIVE hierarchy AS (
      SELECT * FROM employees WHERE name = 'personsname'
      UNION ALL
      SELECT sub.* FROM employees AS sub, employees AS super
      WHERE super.id = sub.parent_department
      )
      SELECT * FROM hierarchy;

    3. Re:Ridiculous by xelah · · Score: 2, Insightful

      Sure, relational is a good general fit databases, but it sounds like you are saying the fact that you can query and modify it using something like SQL in most implementations makes it great?

      If you're a DBA, system administrator or tester - or if you simply have to do something ad-hoc and dodgy as a quick fix on a live system - then this makes it not so much great as absolutely fantastic. You can do things like:

      • Look at the most time consuming queries and analyze them, optimize them, add/remove indexes or move tables or indexes between different sets of disks. And when you do, the query plans will change because they have been frozen in to the application code.
      • Make ad-hoc changes, or generate ad-hoc reports (or run a query from cron, say) without having to write a little program every time.
      • Examine the data following your software screwing up, and fix it.
      • Run the queries your software has generated and check the results. Correct the query and try again.
      • Fetch a list of currently held locks, or examine the queries which have resulted in deadlocks being reported to the log.
      • Add columns to support admin or reporting functions (or a second application) without worrying about the effect on the (still running) original application.
      • Write a reporting system which programmatically generates queries and has the DBMS do the difficult bit of working out query plans.

      These aren't specific to relation databases or SQL, of course. However, having a query language is amazingly useful.

      I'm surprised you're complaining about having to tune your queries. A lot of databases and SQL have shortcomings, but it's really not that hard if you know your database well (and haven't chosen, say, MySQL). You must still have a query plan with your object databases - it's just implied by your code. (I'm assuming you're not using some sort of alternative query language, because you're comment suggests otherwise and you'd only have to tune that instead). It won't adapt to changing data or indexes, and you're going to have a lot of work to do if you want to duplicate some of the more sophisticated techniques a modern database will use. Worse still, you're going to have to change your application, add some sort of profiling and run it in-place or in a test harness to work out why it's taking as long as it does. And when you want to try a different plan you have to rewrite your code.

      It's the ORM layer that's the real pain in the arse (assuming you're using OOD, and assuming you actually want a direct mapping between your object model and relational model). Things like Hibernate and judicious use of code generate make it a lot easier, but you still need to know what's going on and you still need to (and can!) choose between navigating among objects (letting the ORM do the queries) and generating a hand-written query. To some extent an ORM (and the RDBMS vs OODBMS choice) is just a reflection of the different requirements of on-disk vs in-memory representations of objects. On-disk storage is all about efficient and flexible querying, retrieval, (distributed) concurrency, storage and management of huge data-sets, whereas in-memory storage is all about assigning behaviour and navigating relationships between smaller sets of objects whilst carrying out that behaviour.

      In any case, the original article is just silly. How does taking all the formal structure away make any difference to the fundamental scalability restrictions - your applications need for data consistency (across nodes) and concurrency control? I work in ticketing. It's not the relational model that causes scalability problems, it's the fundamental fact that 100k people are competing for access to 10k seat statuses, that when we check per-person ticket limits or assign seats we need 100% up-to-date data, that we regularly need to fetch the status of all the seats in a block for display, etc. I believe that concurrency and scalability concerns

  18. Is the automobile doomed? by Renegade+Iconoclast · · Score: 3, Insightful

    Turns out, there's something called a "skateboard." You can use it to travel as far as the Quickie Mart, with nothing but your feet to propel it.

    In conclusion, skateboards and automobiles aren't the same thing, so probably not.

  19. Supid people who don't understand data by mlwmohawk · · Score: 4, Informative

    The relational database is not going anywhere and nothing in that article is based on any firm understanding of managing data.

    Is the notion of a "join" obsolete? No, but it is typically impractical in a high volume system. You would probably use denormalization as a strategy.

    Scaling many nodes? OK, you still gotta put your data "in" something.

    key/value indexing? yawn. select val from keyvalue_tab where key = foo;

    The value can be basically anything, and most "relational" databases have good object support as well as XML, JSON, etc.

    So we can establish that a SQL relational database can do *everything* a simpler system can do. Now, think about ALL the things you can do with your data in a real database.

    What is the point of using a limited and less functional system? A good system, like Oracle, DB2, PostgreSQL, etc (!mysql of course) will do what you need AND allow you do do more should you be successful.

    The problem with data is two fold: Managing read/write/deletes and finding what you are looking for. These problems have been solved. A good database will do this for you. Want to store object? XML, JSON, binary objects, or a specialized database extension works perfectly.

    1. Re:Supid people who don't understand data by sl0ppy · · Score: 5, Insightful

      The relational database is not going anywhere and nothing in that article is based on any firm understanding of managing data.

      no, the relational database is not going anywhere, you are correct. but, that does not mean that there aren't instances where a non-relational database, with the addition of map/reduce, aren't extremely useful.

      non-relational databases have been around for decades, and are in use for quite a number of applications involving rapid development and storage of very large records. couple this with map/reduce, and you have the ability to scale quickly with very large datasets.

      scaling quickly is a very difficult problem to solve with an RDBMS - you either need to continue to throw more hardware at the problem, to the point of diminishing returns, or re-architect your data at the cost of possible significant downtime, while still attempting to serve up the data in a timely manner. i've been deep in the bowels of oracle RAC, fighting to get just 5% more speed out of a query over a billion rows and realizing that i have to start over with a new schema, just to squeeze more data out. compare that to simply adding another machine and letting the map functionality run across one more cpu before returning it for the reduce.

      Is the notion of a "join" obsolete? No, but it is typically impractical in a high volume system. You would probably use denormalization as a strategy.

      once again, correct, but having to denormalize to a snowflake or a star isn't always the best solution. you're taking the best parts of the relational database model, and throwing them out - normalization, referential integrity, just to squeeze more out of something that may not be the best tool for the job.

      do you hammer with a wrench? i have before, and i managed to hurt my thumb.

    2. Re:Supid people who don't understand data by DragonWriter · · Score: 2, Insightful

      So we can establish that a SQL relational database can do *everything* a simpler system can do.

      In terms of expressive power, sure, but no one is arguing that distributed key/value stores are going to gain against RDBMS's because they have superior expressive power. What is being argued is that they will do so because they have superior scalability and distribution properties, and that in many real-world applications those are more important than the having the full expressive power of relational algebra. Particularly as you get ones that can provide ACID guarantees, that becomes a compelling selling point in many applications where RDBMS's would otherwise be used simply because they are the only available tool, but where distributed key/value stores are a better tool.

    3. Re:Supid people who don't understand data by DragonWriter · · Score: 2, Insightful

      If you are willing to get rid of ACID like the other solutions, there are no limitations.

      The other solutions (see below) do not, in all cases, "get rid of ACID".

      Please site one example, just one, where a simple key/pair data system is the "better" solution for a high volume site than a more powerful database like PostgreSQL wouldn't do a better job.

      Scalaris, a distributed transactional key/value store that does not get rid of ACID, is one of the "other solutions" (and one that has been demonstrated, by replicating Wikipedia on a distributed cluster, to scale better, at least, than Wikipedia's existing MySQL platform).

    4. Re:Supid people who don't understand data by DougWebb · · Score: 3, Interesting

      Without any details this sounds like an urban legend. If you designed your system as you would have with a lesser system like a simple "key/value" pair, how would a RDBMS be any different?

      The difference is optimization vs generalization. Many problems can be handled using simple key/value pair relationships. You can model this in an RDBMS using two-column tables that you never join across, where all of your queries are SELECT val FROM tab WHERE key=? and INSERT INTO tab (key,val) VALUES (?,?). However, if you use the RDBMS this way, you're paying for the overhead of the SQL engine, (usually) a client/server connection, and your language's library for interacting with an RDBMS.

      The alternative is a non-relational database like BerkeleyDB, which is optimized for key/value pair operations. All the fetch and store operations do is fetch and store the value for a given key, with a minimum of overhead. BerkeleyDB is also an in-process database, where your application is accessing the database files directly using the BerkeleyDB library code. (The library handles locking so that multiple processes can use the database files at the same time.) Again, the overhead is kept to a minimum.

      BerkeleyDB is much less flexible than an RDBMS, but for the problem domains where that flexibility is not needed, BerkeleyDB is much more efficient. I've easily achieved over 6000 read/write transactions per second on modest hardware in a single-threaded process; a multi-threaded and/or multi-process application can achieve much higher rates. Compare that to a typical Oracle database connection, where you're lucky to get as many as a few hundred transactions per second, just because of the network round-trip.

    5. Re:Supid people who don't understand data by DougWebb · · Score: 4, Informative

      Map/Reduce was developed at Google. It's a bit tough to wrap your head around at first, and once you get it you wonder what the big deal is, until you realize how suitable it is for Google's datacenters.

      Basically, you take a dataset (a bunch of key/value pairs) and a mapping function, and you run the mapping function over every item in the dataset. This gives you an intermediate dataset with different keys and values. You then run that through a reducing function, which produces your final dataset. This can be a single result, or a dataset that can then be processed with a different map/reduce pair of functions.

      The big deal for Google is that many of their problems can be expressed in terms of map and reduce functions that can operate in parallel over their datasets, and that their datacenters can handle absolutely enourmous quantities of parallel operations. So, for the mapping operation, they take the original dataset and mapping function, subdivide the dataset over thousands of servers, and let them run the mapping function in parallel. When these servers return their results, it's common for many different servers to return the same or related keys in the intermediate set. These are collated, so that when the intermediate dataset is distributed with the reduce function, all of the values with the same keys go to the same servers. This helps the reduce function to be run in parallel; it's often counting the number of original items that were assigned to the same key in the intermediate set.

    6. Re:Supid people who don't understand data by Matt+Perry · · Score: 2, Insightful

      do you hammer with a wrench? i have before, and i managed to hurt my thumb.

      Not usually, but I have done so before. If it hurts your thumb, you're holding it wrong.

      --
      Slashdot: Failed Car Analogies. Amateur Lawyering. Anecdote Battles.
  20. Some credibility... by jernejk · · Score: 3, Insightful

    form the article: "For example, a relatively simple SELECT statement could have hundreds of potential query execution paths, which the optimizer would evaluate at run time. All of this is hidden to us as users, but under the cover, RDBMS determines the "execution plan" that best answers our requests by using things like cost-based algorithms." So, you have no idea how optimizers work and how you can access tuning information, and you'd like to tell us RDBMSs are bad? Get of my lawn! (yay, I'm getting old)

    1. Re:Some credibility... by jernejk · · Score: 2, Funny

      no, really.. this is utter crap: so called benefit: "The first benefit is that they are simple and thus scale much better than today's relational databases. If you are putting together a system in-house and intend to throw dozens or hundreds of servers behind your data store to cope with what you expect will be a massive demand in scale, then consider a key/value store." but then:"Bugs in a properly designed relational database usually don't lead to data integrity issues; bugs in a key/value database, however, quite easily lead to data integrity issues." and then it just goes on on how RDBMSs are really cool... oh, I got it, it was really written by Oracle reverse marketing department!

  21. He can't even explain relations correctly... by iamhigh · · Score: 3, Informative

    Does that example of a relational DB have a serious error, or is that just me? Why have make key in two tables?

    He lost cred right then.

    --
    No comprende? Let me type that a little slower for you...
  22. Not buying it. by reginaldo · · Score: 5, Interesting

    In theory, I agree the most costly actions in a database are joins. It seems like the key/value model is a great solution to this, on the surface. However, what the key/value model does is push the cost to the application layer. Instead of ensuring relational integrity and conformity in the database, suddenly all app code has to do this on the frontend. Also, instead of managing this process in a single place, suddenly this process is distributed among multiple methods. Sure, the DB is more scaleable, but suddenly the app is a mess.

  23. Re:I see the problem! by poot_rootbeer · · Score: 3, Insightful

    they think Nissan makes the Civic!

    This lack of data integrity could have been prevented if they had used a relational database...

  24. Here's a match.. by Slicker · · Score: 3, Interesting

    Relational databases need to die. I loved them and preached the goodness of them 10 years ago, but they are just too rigid for contemporary needs. I've learned better ways of organizing and filtering data.. but the old RDBMS school is too canonical (stubborn) and self-indulging to realize that needs are changing and their model doesn't fit.

    We need efficient attribute/value models. We need to stop referencing data by where it is and start referencing it by what it is. There is too much data that needs to exist in different views, based on policy--not explicit placement.

    Dumb-tags (attributes without values) like those used with Delicious bookmarks are also broken. They are too vague.

    My own approach is that every attribute may have any number of value instances. Each value instance may, in turn, have sub-attributes. So you can look up data based on its characteristics even with disregard for its name. For example: /mycompany/mailserver1/ip of zone = infirewall

    This returns all IP addresses under the "zone" attribute while also under the mailserver1 attribute that is under the mycompany attribute.

    When validating instances of the "ip" attribute, it looks backward in the path because it is extremely quick that way.

    The data server's sole responsibility is storing and retrieving information (not just data) in context (aka filtering).

    Sorting is the responsibility of the client. This makes sense because there are an infinite number of algorithms one could have for sorting data (e.g. alphabetic mixed case, ASCII order, etc). To facilitate this, I wrote a method to return the number of values that would be returned if the values were requested. If too big a bite for the client, it can re-request the size of a smaller chunk, segmented according to the client's ordering method. This is useful for scale, in any case. Processing in chunks makes sense whether over a network of limited capacity or from directly form disk with limited memory.

    And--this is a columnar approach like Google's BigTable is.. That means you get 10+ times faster read performance.

    Matthew

    1. Re:Here's a match.. by lgw · · Score: 2, Insightful

      What do you mean by "informatin, not just data"? It seems like you have specific, personal definitions of those words that others might not share.

      If you make sorting the responsibility of the client, what do you do with large result sets? You can't sort chunked data client-side, as you have to sort before chunking. There should be *some* answer for result sets that don't fit in memory (client or server). I'd be happy with only being able to get results in a certain order if I've already built an index accorind to that ordering criteria, or something equally elaborate, but what's an index in your scheme?

      --
      Socialism: a lie told by totalitarians and believed by fools.
    2. Re:Here's a match.. by DarkOx · · Score: 4, Informative

      Wow, um where to being really....

      So you realize that the structure you are suggesting can be easily built in a traditional RDB, using a star-schema or cluster design right?

      Next you suggest doing the sorting on the client, and then say that if there is more data then a client can handle the server can be asked to send chunks according to the clients sort order. That means the server has to have all the sort logic the client has and probably in all but the most trival applications do all the sorting anyway... Seems to me a star schema and indexing the fact table on the attributes that are most comonly going to be used for sorting makes much more sense; because as I said the serve is going to be sorting anyway.

      Now there are data sets that non relational structers do make some more sense, but we have hierarchy , and navigational designes for those, yours is not one of them.

      --
      Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
    3. Re:Here's a match.. by jadavis · · Score: 3, Insightful

      We need to stop referencing data by where it is and start referencing it by what it is.

      You say that without any explanation of your apparent position that the relational model requires you to reference data by "where it is".

      You seem to think that the semantics of your system are somehow richer -- providing "information" rather than "data".

      Do you even know what a relation is?

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
  25. Re:WTF? by Lord+Ender · · Score: 2, Insightful

    If "key/value" databases do become more popular, they certainly might eat in to relational database mindshare. 90% of web applications use RDMSs merely as persistent data storage--the fact that they are "relational" doesn't matter at all; the fact that a separate SQL language is needed to get the data (rather than using language-native data structures as an interface) is even a negative for RDMs.

    As a web app developer, I'm excited that something other than SQL is getting attention. RDMSs won't go away because they have properties data miners, for example, need. But they aren't ideal for the simple persistent data stores most apps call for.

    --
    A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
  26. Just to be pendactic by plopez · · Score: 3, Insightful

    There really isn't a true implementation of the relational model as per Codd and Date.

    Also, SQL is a nightmare. A badly designed programming language which is not quite functional and not quite procedural and so needs a bunch of hacks to work properly. And then there is the issue of NULLS. And the fact that you can end up with ugly bag operations and path dependencies in SQL.

    And just to start yet another flame war (Iknow, I just know some one is going to mod me as a troll today) key/value is just another way of saying "network database".

    And another thing which I will probably get hammered for, if you normalize a DB properly you will get you objects almost for free. And vice versa. Where I see people having problems is that they either are :

    1) lazy about defining and understanding their data
    2) or likewise for their objects
    3) or both.

    If you do it properly will will get a nice set of multidimensional objects and fact/attribute tables which are orthogonal and lean. Easy to understand, search, join, build, compose, decompose, signal and track.

    As opposed to a snarled up hacked together, overloaded, over inherited nightmare with hidden dependencies which I have seen too many times.

    OK, you can slam me now.

    --
    putting the 'B' in LGBTQ+
  27. SQL is the problem, not RDBMSs by Savantissimo · · Score: 3, Interesting

    SQL and all its pointy-headed progeny are the real problem with databases, not the relational vs. newMarketingBuzzwordDuJour arguments.

    Database operations do not need to look like code or algorithms, the only reason they do is to provide jobs for database programmers.

    Over 15 years ago Paradox's query-by-example was light-years ahead of today's soul-killing SQL crap.

    SQL is not going away, though, any more than its idiot older brother Mumps (M, Caché).

    --
    "Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
    1. Re:SQL is the problem, not RDBMSs by Grapedrink · · Score: 2, Insightful

      Microsoft has CLR code running on top of MS SQL but it sucks performance wise. Oracle has Java. That's about as close as we have gotten, but both are just crutches.

      Unfortunately, you are right that SQL is terrible and not going away. The status quo, industry, and marketing will make sure we suffer for years to come.

    2. Re:SQL is the problem, not RDBMSs by emurphy42 · · Score: 2, Insightful

      Paradox's query-by-example

      *looks up* GUI query builder? Highly appropriate for simple things (e.g. Crystal Reports), but absolutely terrible for more complex things.

    3. Re:SQL is the problem, not RDBMSs by WuphonsReach · · Score: 4, Informative

      Over 15 years ago Paradox's query-by-example was light-years ahead of today's soul-killing SQL crap.

      QBE grids are nothing more then a UI abstraction of the underlying SQL SELECT statement. In fact, in MS-Access (which has a QBE grid), you can flip between looking at the QBE and looking at the raw SQL SELECT statement.

      Sometimes it's faster to do it in raw SQL, sometimes it's faster to setup the query in a QBE grid.

      --
      Wolde you bothe eate your cake, and have your cake?
    4. Re:SQL is the problem, not RDBMSs by Just+Some+Guy · · Score: 4, Insightful

      Database operations do not need to look like code or algorithms, the only reason they do is to provide jobs for database programmers.

      From Wikipedia:

      Relational database theory uses a different set of mathematical-based terms, which are equivalent, or roughly equivalent, to SQL database terminology.

      SQL looks like SQL because it's based on set theory. As an exercise, invent your own language that's as powerful (read: also based on a strong theoretical basis) but simpler. See you in a couple of decades!

      --
      Dewey, what part of this looks like authorities should be involved?
  28. STILL "relational" - Dynamic Relational by Tablizer · · Score: 2, Interesting

    Some of those systems appear to more or less still be "relational". If each row is treated like a map (associative array) of strings, then the "schema" for a given table is the set union of all attributes used in the table, and non-existing columns for a given row can be treated as nulls.

    As long as an asterisk is not used in a query (ex: "select * from tableX"), then it will pretty much act like existing RDBMS, and as long as the type-explicitness issues are resolved based on dynamic language conventions. (Asterisks can be implemented perhaps, but it could be computationally expensive.)

    It's kind of like dynamic (AKA "scripting") languages versus static or type-heavy languages. The static kind of languages requires more up-front info that "protects" the integrity of the thing at the expense of flexibility and declaration volume. The same dichotomy can be applied to RDBMS also. We have RDBMS that like a lot of info up-front, and now those which accept incremental or ad-hoc insertions are starting to be common (but still less standardized).

    And constraints can be incrementally added, such as later requiring that every new record in a "Cars" table have a value for "brand" or the like.

    One possible exception is that there were some examples that violated "map-ness" of records, such as having two colors for a car. If they instead supplied "color_1" and "color_2", then map set rules would not be violated, keeping it closer to true relational.

    In short: We don't have to abandon relational to get dynamism.

  29. Re:Ummm what regular graph/object databases by DragonWriter · · Score: 2, Insightful

    Long ago, hardware made much more of a difference than it does today and was one reason relational databases "won" out.

    Hardware makes just as big of a difference today, which is why distributed key/value stores are gaining currency at the moment. The hardware-related difference that was a big win for relational databases was their efficient use of disk space when normalized; the hardware-related difference that is a big win for distributed key/value stores now is their efficient scalability by distribution across multiple nodes.

    I am going to tear my eyes out if I see "yet another tuple store or graph db." Welcome to the last century, please try again.

    The big thing isn't "tuple stores or graph dbs" its distributed tuple stores, and, even better, distributed transactional tuple stores. Not a whole of them from the last century.

  30. Yep, this will happen by ghjm · · Score: 2, Funny

    I can see the meeting now.

    Developer: "Hey boss, I found a better product for the transaction processing data! It might save us a bunch of money on Oracle licenses!"
    Boss: "Great, what is it?"
    Developer: "Project Voldemort!"
    Boss: "..."
    Developer: "No really, let me explain..."
    Boss: "I have a meeting to get to, but hey, let me know if you have any other great ideas."

  31. A SQL query walks into a bar... by SystematicPsycho · · Score: 4, Funny

    A SQL query walks into a bar and sees two tables. He walks up to them and says 'Can I join you?'

    From Tom Kyte's blog sql joke

    --
    Analytic & algebraic topology of locally Euclidean meterization of infinitely differentiable Riemmanian manifold
  32. Object class anyone? by kilodelta · · Score: 2, Insightful

    Reading this I keep seeing OOP in there, and data as an object class.

    This is just the OOP crowd trying to not learn SQL and do things their way. It won't replace a full RDBMS. And an RDBMS can scale quite nicely if you know what the hell you're doing.

  33. MapReduce is a bunch of hype by Estanislao+Mart�nez · · Score: 4, Interesting

    The name of the MapReduce framework comes from the functional programming operations "map" and "reduce." Map takes as its input a collection of data, and a function that transforms data elements into other elements; it outputs a collection where each element of the input collection has been replaced by the result of applying that function to it. Reduce takes a collection of elements, an initial value of the same type as the elements, and a two-place, commutative, associative and symmetric operation; it produces as its output the value that results from applying the operation to the initial value and each element of the collection in turn, accumulating the partial results.

    Map and reduce are operations that can be trivially parallelized. To parallelize map, you divide the collection into subcollections (in any arbitrary manner), and map over each of them in parallel. To parallelize reduce, you divide the collection into subcollections, also arbitrarily, reduce each subcollection independently, then apply the reduction operation to the partial results. (That works because the reduction operation is commutative, associative and symmetric.)

    Well, guess what: this sort of technique is trivially applicable to relational database queries. A SQL query translates down to a combination of joins (the FROM clause), filters (the WHERE clause) and maps (the SELECT clause). Joins are trivially parallelizable; you give each execution unit a subset of the tuples of the driving relation. Filtering (the WHERE clause) is a kind of reduce operation. SELECT is a kind of map operation. This means that relational queries are not any less amenable to parallel execution than the stuff Google does.

    But the killer thing here is that MapReduce says absolutely nothing about the updates problem. This is one of the big features of RDBMSs: the ability to handle concurrent query and modification. It also says nothing about the data integrity problem, which is also one of the big RDBMS features.

    So, when you get down to it, there is a good argument to be made that many applications could make use of database technologies that support much faster querying, at the expense of very little updating. But there's no convincing argument that that technology isn't best implemented in the context of an RDBMS.

  34. Re:WTF? by ultranova · · Score: 2, Insightful

    If "key/value" databases do become more popular, they certainly might eat in to relational database mindshare.

    A "key/value" database is simply a relational database with a single table and two columns. It doesn't make any sense to build a separate server program for what current database servers can already easily do.

    90% of web applications use RDMSs merely as persistent data storage--the fact that they are "relational" doesn't matter at all; the fact that a separate SQL language is needed to get the data (rather than using language-native data structures as an interface) is even a negative for RDMs.

    I'm a bit uncertain what you're saying here. Surely the fact that the server can do more than what you need doesn't hinder your program? The same goes for SQL language; surely the fact that commands sent to the database are text strings isn't a negative? In any case, you can (and probably should) separate database access into a module of its own, offering whatever API you desire for the rest of the program.

    As a web app developer, I'm excited that something other than SQL is getting attention. RDMSs won't go away because they have properties data miners, for example, need. But they aren't ideal for the simple persistent data stores most apps call for.

    However, they can handle such data stores in a very simple fashion. A pair of "setvalue(key, value) / getvalue(key)" is trivially easy to implement on top of SQL language. It just doesn't make sense to pour resources into developing a less capable database server.

    --

    Forget magic. Any technology distinguishable from divine power is insufficiently advanced.